Next Article in Journal
Exploring Key Yield Components Influencing Grain Yield in Ultrashort- and Short-Duration Rice Cultivars
Previous Article in Journal
Effects of Different Cultivation Substrates on the Growth of Podocarpus macrophyllus and the Rhizosphere Soil Microbial Community Structure
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Study on the Correlation Between Major Medicinal Constituents of Codonopsis pilosula During Its Growth Cycle and Ecological Factors, and Determination of Optimal Ecological Factor Ranges

1
College of Information Science and Engineering, Shanxi Agricultural University, Jinzhong 030801, China
2
College of Life Sciences, Shanxi Agricultural University, Jinzhong 030801, China
3
College of Food Science and Engineering, Shanxi Agricultural University, Jinzhong 030801, China
4
College of Agricultural Engineering, Shanxi Agricultural University, Jinzhong 030801, China
*
Author to whom correspondence should be addressed.
Agronomy 2025, 15(5), 1057; https://doi.org/10.3390/agronomy15051057
Submission received: 10 March 2025 / Revised: 23 April 2025 / Accepted: 23 April 2025 / Published: 27 April 2025
(This article belongs to the Section Horticultural and Floricultural Crops)

Abstract

:
The quality of medicinal plants is closely related to the ecological factors of their growing environment, as their efficacy is reflected in the content of key medicinal components, which in turn indicates the quality of the plants. This study measured the daily variations in major constituents, including lobetyolin, polysaccharides, and total flavonoids, in Codonopsis pilosula (Franch.) Nannf., which in the Changzhi and Jincheng regions of Shanxi Province, China is known as Lu Tangshen. Throughout its growth cycle. Additionally, the study explored the effects of 11 ecological factors (both climatic and soil variables) on the primary medicinal components of C. pilosula. Through block experiments and comparisons between future data predictions and actual measurements, the reliability of the model and the consistency of block experimental data were ultimately confirmed. Principal component analysis (PCA), stepwise multiple linear regression analysis, and nonlinear polynomial modeling were employed to investigate the relationships between ecological factors and quality-related constituents (polysaccharides, total flavonoids, and lobetyolin). The results showed that linear models effectively explained daily temperature (DT) with an adjusted R2 exceeding 0.8, but due to the inherently nonlinear nature of the data, it is evident that linear models are fundamentally inadequate for accurately capturing the underlying relationships. Therefore, their fit for total flavonoids and lobetyolin was suboptimal. The introduction of nonlinear polynomial models (second-, fourth-, and fifth-order) significantly improved the model fit, indicating the existence of complex nonlinear relationships between ecological factors and medicinal components. For polysaccharides, the fourth-order model demonstrated the best performance, while fifth-order models were required to adequately describe the relationships for total flavonoids and lobetyolin. Based on the best models, the optimal ranges for key ecological factors were identified: polysaccharides were best influenced by atmospheric pressure (AP) between 9.1 and 9.3 kPa, air relative humidity (ARH) between 30% and 60%, 40 cm soil mean annual temperature (40cmMAT) between 27.5 °C and 28.5 °C, soil pH between 9.68 and 9.72, and soil nitrogen (N) content between 7 and 9 mg/kg. For total flavonoids, narrow optimal ranges were observed for temperature, humidity, and pH (MAT between 10 °C and 15 °C, 40cmMAT between 27.5 °C and 28.5 °C, and pH between 9.68 and 9.72). Lobetyolin showed optimal conditions at AP of 9.1 to 9.3 kPa, 40cmMAT of 28.0 °C to 28.5 °C, ARH of 65% to 75%, pH near 9.70, and days after planting (DAP) between 10 and 50. The adoption of higher-order polynomial models clarified critical nonlinear inflection points and optimal ecological ranges, providing a refined reference for enhancing the content of medicinal components. These findings offer valuable insights for precision cultivation strategies aimed at improving the quality of C. pilosula.

1. Introduction

Codonopsis pilosula is a significant ingredient in traditional Chinese herbal medicine. Belonging to the Campanulaceae, it is a biennial herbaceous plant primarily distributed in northern China, including Shanxi, Hebei, and Shaanxi provinces. Within these regions, the C. pilosula produced in Shanxi Province is particularly renowned.
Existing studies have shown that the main medicinal components of C. pilosula include polysaccharides, total flavonoids, and lobetyolin. In 2007, Chen Kekek and Wang Zhezhi reviewed the chemical composition, content determination, and pharmacological effects of C. pilosula polysaccharides, highlighting their immune-enhancing and antioxidant activities [1]. In 2024, Yang Di et al. employed high-performance liquid chromatography to determine the content of four flavonoid components—rutin, quercetin, baicalin, and quercetin—in C. pilosula, providing accurate and reliable results [2]. In 2019, Gao Shuxian et al. isolated and identified 12 compounds from C. pilosula, including lobetyolin, laying the foundation for future research [3]. The rhizome is the main medicinal part of the plant [4], rich in various bioactive compounds and nutrients, including lobetyolin, polysaccharides, and flavonoids, all of which are vital to its medicinal properties. Polysaccharides, for instance, are known for their immune-enhancing and antioxidant properties, while flavonoids are recognized for their anti-inflammatory and antioxidant effects. Lobetyolin, a major saponin, has been shown to exhibit significant cardiovascular and anti-fatigue activities in traditional medicine [5,6,7]. To protect the wild resources of C. pilosula and meet market demands, both traditional and non-traditional cultivation areas are involved in its cultivation. This includes the development of artificial bionic cultivation systems and potted cultivation techniques [8]. The growth and development of C. pilosula require strict environmental conditions, making the creation of suitable artificial environments critical to the success of greenhouse cultivation [9]. This study integrates artificial bionic cultivation models, taking into account the growth habits of medicinal plants and their ecological requirements [10]. Using correlation analysis models, our study identifies key ecological factors significantly influencing the primary medicinal components of C. pilosula and determines their optimal impact ranges. These findings, combined with artificial bionic cultivation technology, enable precise control of ecological factors to regulate the quality level of C. pilosula during its growth process [11].
The relationship between ecological factors and the medicinal components of plants is a central issue in the cultivation of authentic medicinal materials [12]. Early studies revealed that a plant’s growth environment significantly affects the synthesis of its secondary metabolites, including medicinal components. In 2025, Zheng Xin et al. proposed that complex interactions exist between environmental factors and medicinal components [13]. Subsequently, Zhang Tian confirmed that factors such as temperature, humidity, light, and soil nutrients significantly influence the accumulation of active ingredients in Chinese medicinal herbs [14]. With the advancement of research, especially in the 21st century, domestic and international researchers began to focus on optimizing cultivation environments to enhance the content of medicinal components. For instance, Xu Zishu, through an analysis of various environmental factors and their relationship with flavonoid accumulation in Chinese medicinal plants, identified that temperature, light, and soil fertility play critical roles in the accumulation of flavonoids [15]. More recently, with the development of data science, Zhou Xinhua et al., introduced the use of multiple regression analysis combined with environmental data to predict the content of active ingredients in medicinal plants, achieving notable results [16]. Pant, P et al., also studied the variation of multiple bioactive components in medicinal plants in response to changes in environmental factors [17]. They proposed an “environmental optimization model,”to simulate relationships between plant growth and the accumulation of active ingredients under various environmental conditions, providing theoretical support for intensive cultivation. Building on the work of these scholars, our study samples field data to investigate the relationships between ecological factors and medicinal components in C. pilosula through multiple linear regression and nonlinear polynomial modeling. Similar to the regression analysis method employed by Zhou Xinhua et al., we incorporate specific range analysis of ecological factors. By adopting the K-means three-way clustering method, the ecological factor range selection is refined, providing more precise references for practical cultivation [18,19,20,21]. The innovation of this approach lies in its ability to more accurately identify the most critical environmental factors influencing medicinal components and further optimize cultivation conditions [22]. This study not only deepens the understanding of the complex interactions between ecological factors and medicinal components but also offers actionable insights for improving the cultivation of C. pilosula.

2. Materials and Methods

2.1. Overview of the Experimental Area, Experimental Materials, and Plant Sampling

The field experiment was conducted in Shijiapo Village, Lingchuan County, Jincheng City, Shanxi Province, China (113°19′ E, 35°40′ N) at the Codonopsis pilosula Ecological Cultivation Demonstration Base (Shanxi Agricultural University). The region’s natural conditions are well-suited for the growth of C. pilosula and it is also within a major production area for C. pilosula, with an average annual temperature of 6–8 °C, a frost-free period of approximately 120 days, and annual precipitation of about 500 mm. The experiment commenced in March 2020 with soil preparation, followed by the transplantation of C. pilosula the next day, rows spaced 30 cm apart and plants within rows spaced 10 cm apart. At the time of transplant, the basic physicochemical properties of the 0–40 cm cultivated soil layer in the experimental field were as follows: pH 9, available nitrogen 6 g·kg⁻1, available phosphorus 37 g·kg⁻1, and available potassium 44 g·kg⁻1. The experiment spanned four years in total.
The two-year-old seedlings used in the study were supplied by the Yongjun Medicinal Plant Cultivation Cooperative. The seedlings were authenticated as C. pilosula by Associate Professor Song Yanbo (College of Life Sciences, Shanxi Agricultural University).

2.1.1. Experimental Area Partitioning and Randomization

Based on the soil types, irrigation conditions, and other relevant factors at the experimental site, the entire area was divided into four blocks (A, B, C, and D). Within each block, a randomized complete block design was further implemented by establishing one independent sampling plot (experimental unit), each covering an area of approximately 20 m2. In each plot, one representative C. pilosula plant was randomly selected for sampling to ensure both the randomness and representativeness of the data.
At each sampling event—conducted at 10-day intervals—samples were collected from all four blocks. Each block possessed the same area and exhibited similar macro-environmental conditions (e.g., topography, light exposure), although slight differences in micro-environmental conditions (e.g., soil microstructure, localized humidity) were present. This experimental design was intended to validate the repeatability of the experiment and assess the robustness of the model with respect to fine-scale environmental variability. At each sampling point, one quadrat was selected per event. Each quadrat measured 1 m × 1 m, and the distance between quadrats sampled in successive events was at least 30 cm.

2.1.2. Data Collection and Standardization

Uniform standard operating procedures and instrumentation were employed across all blocks. Ecological factor data for 11 indices, as well as assay data for the primary medicinal constituents (lobetyolins, polysaccharides, and total flavonoids) were collected at fixed 10-day intervals throughout the growing season as described below. Raw data were pre-processed through completeness checks, outlier removal, and normalization to ensure comparability across the different blocks.

2.1.3. Statistical Analysis for Consistency Verification of Plot Data

To verify data consistency within the block design and in the context of correlating ecological factors with the major medicinal components, a one-way analysis of variance (ANOVA) was performed to compare the medicinal component data across the blocks. The results revealed no significant differences among the blocks (p = 0.78). Furthermore, intraclass correlation coefficient (ICC) values were as high as 0.98 for the blocks, thereby confirming a high degree of data consistency and providing a robust foundation for subsequent model development. Additionally, the block design can be integrated with covariance analysis by including the different blocks as fixed effects in the model, further mitigating the potential influence of local environmental variation on the results.

2.2. Measurement of Ecological Factors

To accurately obtain ecological data from the experimental field, this study employed a compact weather station for data collection. Soil-related parameters were measured using soil sensors integrated into the weather station. A real-time excavation and measurement approach was adopted, with measurements taken every 10 days. Sensors were inserted into the soil at a depth of 40 cm, corresponding to the typical depth of C. pilosula roots, ensuring that data were collected under purely natural conditions, free from external interference. The soil data included pH, available nitrogen, available phosphorus, available potassium, soil temperature, and soil moisture. Climate-related data were collected using the weather station and included the following parameters: daily mean air temperature (MAT), daily mean relative humidity (ARH), daily mean atmospheric pressure (AP), daily net solar radiation intensity (GHI), and daily precipitation (DAP). Figure 1 displays an overview of the climatic and edaphic factors measured.

2.3. Determination of Major Medicinal Components

At each sampling time point, manual excavation was performed with care to avoid breakage of the primary roots. After harvesting, soil was gently removed by shaking; the roots were not washed with water to improve drying efficiency and enhance the stability of key medicinal compounds. The samples were then pre-dried in a well-ventilated, shaded environment for 1–2 days, avoiding direct sunlight, which could degrade major bioactive constituents. Subsequently, samples were dried in a low-temperature drying oven [23], set between 40–50 °C for 24–48 h. After drying, the roots were subjected to chemical composition analysis, the chemical composition of the roots was analyzed according to the standards of Shanxi Province, China [24,25,26]. The high-performance liquid phase testing instrument is Thermo Scientific UltiMate 3000 (Thermo Fisher Scientific, Waltham, MA, USA).
Patterns of variation in the content of major medicinal components of C. pilosula during sampling in 2020, 2021 and 2022 are illustrated in Figure 2, Figure 3 and Figure 4.

2.3.1. Determination of Lobetyolin Content in C. pilosula

One gram of C. pilosula sample was ground into a fine powder, and 0.1 g of the sample was accurately weighed and extracted in 10 mL of acetonitrile through ultrasonic extraction for 20 min. The extract was filtered through a 0.22 μm microporous membrane to remove suspended solids and impurities. A lobetyolin standard solution (purity ≥ 98%, purchased from Chengdu Must Bio-Tech, China) with a concentration of 0.05 mg/mL was prepared and injected into the system to construct the standard curve. The chromatographic conditions included a C18 column, with mobile phase A being acetonitrile and mobile phase B being water. Gradient elution was set with an initial A:B ratio of 20:80, gradually changing to 40:60 over time, with a flow rate of 1.0 mL/min. A 10 μL volume of the extract was injected into the HPLC instrument. The absorbance of lobetyolin was monitored at 270 nm on a UV-Vis detector. The lobetyolin content in the samples was calculated based on the standard curve, with all tests performed in triplicate. Results were reported in milligrams and then converted to concentrations.

2.3.2. Determination of Polysaccharide Content in C. pilosula

One gram of C. pilosula sample was ground into a fine powder, and 0.1 g of the sample was accurately weighed and extracted in 10 mL of 70% ethanol through ultrasonic extraction for 20 min. The extract was filtered through a 0.22 μm microporous membrane to remove suspended solids and impurities. A glucose standard solution with a concentration of 0.25 mg/mL was prepared and injected into the system to construct the standard curve. The chromatographic conditions included a C18 column, with mobile phase A being acetonitrile and mobile phase B being water. Gradient elution was set with an initial A:B ratio of 10:90, gradually changing to 30:70 over time, with a flow rate of 1.0 mL/min. An aliquot of 10 μL of the extract was injected into the HPLC system for analysis. The absorbance of polysaccharides was monitored at 490 nm on a UV-Vis detector. The polysaccharide content in the samples was calculated based on the standard curve, with all tests performed in triplicate. Results were reported in milligrams and then converted to concentrations.

2.3.3. Determination of Total Flavonoid Content in C. pilosula

One gram of C. pilosula sample was ground into a fine powder, and 0.1 g of the powdered sample was accurately weighed and extracted in 10 mL of acetonitrile through ultrasonic extraction for 30 min. The extract was filtered through a 0.22 μm microporous membrane to remove suspended solids and impurities. A high-purity quercetin standard (≥98%) was purchased from Bio-Tech, and 10 mg of the standard was accurately weighed and dissolved in acetonitrile to prepare a stock solution at a concentration of 0.1 mg/mL. This solution was then serially diluted to obtain a series of working standard solutions for calibration curve construction. HPLC analysis was performed using a Thermo Scientific UltiMate 3000 system (Thermo Fisher Scientific, Waltham, MA, USA) equipped with a UV-Vis detector and a C18 analytical column. Mobile phase A consisted of acetonitrile, and mobile phase B was water containing 0.1% formic acid. Gradient elution was carried out with an initial A:B ratio of 15:85, gradually changing to 40:60 over time, at a flow rate of 1.0 mL/min. The detection wavelength was set to 360 nm. The total flavonoid content in the samples was calculated based on the standard calibration curve and expressed as milligrams of quercetin equivalents per gram of dry weight (mg QE/g DW). All tests were conducted in triplicate.

2.4. Statistical Analysis Methods for Correlating Ecological Factors with Key Medicinal Compounds

Three statistical methods were employed to assess the influence of ecological factors on C. pilosula polysaccharides, lobetyolin, and total flavonoids. (The models described below are designed for a refined correlation analysis between ecological factors and key medicinal compounds, distinct from the Spearman rank correlation coefficients obtained from plot data consistency validation, which served solely that purpose.)

2.4.1. Principal Component Analysis

Considering the authenticity of C. pilosula as a genuine medicinal material, all measured ecological factors were assumed to influence its primary medicinal components. Principal component analysis (PCA) was conducted using R Studio 2023.03.1+446 (R language) to reduce the dimensionality of 11 ecological factors. The analysis identified the dominant ecological factors with the greatest influence as references for subsequent correlation analysis and model evaluation. Dimensionality reduction facilitated the foundation for further correlation analyses and reduced analytical complexity. The calculation process was as follows:
The data were represented as an n × p matrix. Before performing PCA, each variable was centralized (mean-centered). The centralized data were expressed as Equation (1):
X ~ = X X ¯
where X ¯ is the mean vector composed of column means. Next, the covariance matrix or correlation matrix was calculated as shown in Equation (2):
S = 1 n 1 X T ~ X ~
where S is a p × p symmetric positive-definite matrix, and S i j represents the covariance between the i and j variables. Subsequently, eigenvalue decomposition was performed, as described in Equation (3):
S v k = λ k v k
where λ k is the k eigenvalue, and v k is the corresponding eigenvector (normalized to unit length). The principal components were then determined using Equation (4):
P C k = X ~ v k
Finally, the reduced data were expressed as shown in Equation (5):
Z = X ~ V m
where Z is an n × m matrix, representing the reduction of the original p variables to m principal components.
The PCA results indicated that PC1 had the largest standard deviation (1.8951), and it explained the largest proportion of experimental variation, at 32.65% of the total variance, with subsequent components showing decreasing contributions. Regarding cumulative variance, common criteria in PCA include retaining components that account for 95% or 99% of the total variance. To avoid overfitting, we retained the first six principal components, which accounted for 95% of the total variance. Although PCA identified the key ecological factors contributing most significantly among the 11 variables, multiple methods were required to comprehensively verify and confirm the results. As relationships between ecological factors and primary medicinal components might be linear or nonlinear, various models were tested and iteratively refined to ensure accurate determination.

2.4.2. Multiple Linear Regression

Using the dimensionally reduced data as a reference, multiple linear regression models were established in R Studio (R language) to analyze and refine the correlation between ecological factors and the primary medicinal components of C. pilosula. Evaluation metrics were derived through the following steps: First, parameter estimation was conducted using the least squares method, with the residual sum of squares (RSS) calculated as shown in Equation (6):
R S S = Y X β Y X β
By taking the derivative of β and minimizing R S S , a closed-form solution was obtained:
β ^ = X X 1 X Y
Next, the predicted values and residuals were calculated. Given β ^ the predicted values Y ^ can be expressed as:
Y ^ = X β ^ = X X X 1 X Y
The residual vector (error estimate) is given by:
ε = Y Y ^
Finally, the coefficient of determination ( R 2 ) was calculated to evaluate the model, as shown in Equation (10):
R 2 = 1 R S S T S S = 1 y i y i ^ 2 y i y ¯ 2
When analyzing the polysaccharide content (DT) using multiple linear regression, the model’s coefficient of determination ( R 2 ) was approximately 0.8343, with an adjusted R 2 of 0.8159 and a residual standard error (RSE) of 0.57. These results indicate that under the assumption of linearity, the selected ecological factors (e.g., precipitation, air humidity, soil nutrients) explained a high proportion of the variance in DT content. Among these, precipitation (AP) had a regression coefficient of approximately −0.05765 (p < 0.001), air relative humidity (ARH) approximately −0.01239 (p < 0.01), and available nitrogen (N) approximately −0.2569 (p < 0.05). These factors showed significant or near-significant linear correlations and directional influences on DT content.
For total flavonoids (HT), linear regression analysis showed an R 2 of only 0.1085, an adjusted R 2 of 0.009, and an RSE of 0.1272. This indicates a low explanatory power of the selected ecological factors for HT content under the assumption of linearity. Most ecological factors did not exhibit significant linear correlations in the model. Although precipitation (AP) was close to statistical significance (coefficient approximately +0.005383, p ≈ 0.053), the overall linear fit was poor. This suggests that the influence of ecological factors on HT content cannot be explained with simple linear relationships.
For lobetyolin (QG), the model’s R 2 was approximately 0.5013, with an adjusted R 2 of 0.4459 and an RSE of 0.1989. These results indicate a moderate-to-low explanatory power of the selected ecological factors for QG content under linear assumptions. Although the fit was slightly better than for HT, it was still insufficient to fully explain variation in QG content through simple linear relationships. Regression coefficient analysis showed that precipitation (AP) had a coefficient of approximately +0.01493 (p < 0.001), 40 cm soil temperature (40cmMAT) approximately +0.03779 (p < 0.0001), and available phosphorus (DAP) approximately −0.002028 (p < 0.05). These factors influenced QG content to some extent, but the strength of the linear correlations remained limited.

2.4.3. Nonlinear Polynomial Regression

Nonlinear polynomial regression models were established using R Studio (R language) to analyze and describe relationships between ecological factors and the primary medicinal components of C. pilosula. The models incorporated polynomial terms, including second-, fourth-, and fifth-order terms of each variable, as well as interaction terms. The general model equation is expressed as Equation (11):
y = β 0 + i = 1 p β i x i + i = 1 p j = i p β i j x i x j + ε
Parameter estimation was performed using the least squares method, as shown in Equation (12):
β ^ = ( Φ ( X ) Φ ( X ) ) 1 Φ X Y
The model’s predictions and evaluations were subsequently conducted, with the predicted values calculated as Equation (13):
Y ^ = Φ X β ^
Residuals, R 2 , and other evaluation metrics (Adjusted coefficient of determination (adjusted R²), residual standard error (RSE), estimated regression coefficients (β), and statistical significance levels (p-values)) were calculated to assess model performance. Comparative analyses of these metrics were performed to determine the suitability of linear and nonlinear models in explaining the relationships between ecological factors and the secondary metabolites of C. pilosula. The ecological factors with the greatest impact on the major medicinal components were identified based on the evaluation metrics of the most suitable models.
When analyzing polysaccharides (DT) using a second-order nonlinear polynomial model, the model’s coefficient of determination (Multiple R-squared) was approximately 0.8718, with an adjusted R-squared of 0.8691 and a residual standard error (RSE) of approximately 0.4631. In this model, after introducing squared terms, the quadratic term of MAT, with a coefficient of approximately +0.0009043 (p < 0.01), and the quadratic term of AP, with a coefficient of approximately −0.001372 (p < 0.001), showed significant effects on DT content. For total flavonoids (HT), the coefficient of determination (R2) of the second-order model was approximately 0.4854, with an adjusted R-squared of 0.4747 and an RSE of approximately 0.09218. Although the explanatory power improved over that of the linear model, most quadratic terms of the variables did not exhibit strong influences at significant levels, except for the quadratic terms of MAT and 40cmMAT. For lobetyolin (QG), the coefficient of determination (R2) of the second-order model was approximately 0.4799, with an adjusted R-squared of 0.469 and an RSE of approximately 0.1964. Compared to the linear model, the fit improved slightly. Significant effects of quadratic terms of AP, GHI, and PH were observed, with that of PH having an estimated value of approximately +1.268 (p < 0.001), indicating a strong nonlinear sensitivity of QG to soil pH. Based on the results of the second-order nonlinear polynomial model, we decided to further optimize the models to explore more suitable approaches to describe relationships between ecological factors and major medicinal components. Thus, we proceeded to test fourth-order nonlinear polynomial models.
After elevating the models to a fourth-order polynomial, the coefficient of determination (R2) for polysaccharides (DT) was approximately 0.9441, with an adjusted R2 of 0.9068 and a residual standard error (RSE) of approximately 0.4056, showing significant improvement over the second-order model. In this model, the high-order polynomial terms revealed more pronounced effects of precipitation, air humidity, and soil nutrients (e.g., N and PH), indicating that the nonlinear influence of these factors on DT content was more fully captured under the high-order terms. For total flavonoids (HT), the fourth-order model yielded an R2 of approximately 0.7018, with an adjusted R2 of 0.5031 and an RSE of approximately 0.09011. This suggested that the introduction of high-order terms began to capture the complex relationships between HT and factors such as GHI, ARH, and 40cmMAT. However, the overall explanatory power still had room for improvement. For lobetyolin (QG), the fourth-order model produced an R2 of approximately 0.778, with an adjusted R2 of 0.63 and an RSE of approximately 0.1625, representing significant improvement over the second-order model. In this model, high-order terms of variables such as AP, 40cmMAT, 40cmARH, DAP, and PH had more pronounced effects on QG, demonstrating clearer nonlinear responses to ecological factors in the fourth-order polynomial models.
Given the significant improvements observed with the fourth-order nonlinear polynomial models over those of the second-order models, it was clear that relationships between ecological factors and the major medicinal components were predominantly nonlinear. Therefore, models were further refined by testing fifth-order nonlinear polynomials.
Analysis of our fifth-order polynomial models showed that the coefficient of determination (R2) for total flavonoids (HT) was approximately 0.7889, with an adjusted R2 of 0.5777 and a residual standard error (RSE) of approximately 0.08306, representing an improvement over the fourth-order model. In the fifth-order model, high-order terms such as MAT, 40cmARH, and PH reached significant levels. For instance, the third-order term of PH had an estimated coefficient of approximately +0.8921 (p < 0.001), exerting a strong positive impact on HT. This indicated that the complex nonlinear relationships between HT and temperature, humidity, and soil pH were more fully captured under the fifth-order model. For lobetyolin (QG), the fifth-order model yielded an R2 of approximately 0.8339, with an adjusted R2 of 0.6678 and an RSE of approximately 0.154, showing further optimization compared to the fourth-order model. In this model, high-order terms such as 40cmARH and PH were significant. For example, the third-order term of PH (poly(PH,5)3) had an estimated coefficient of approximately +0.8921 (p < 0.001), indicating a strong positive impact on QG. This suggested that the nonlinear responses of QG to deep soil moisture and soil pH were more precisely described under the high-order model. For polysaccharides (DT), the fourth-order model was already relatively well-fitted, and any improvement under the fifth-order model was limited. Thus, the primary enhancements observed with the fifth-order models were more evident for HT and QG.

2.4.4. Optimal Selection via K-Means Three-Branch Clustering

After model selection, ecological factors were processed using an improved K-means three-branch clustering (IKMTBC) [27] algorithm to classify the most influential variables into two categories: those promoting and those inhibiting the accumulation of key medicinal components. This method clusters ecological factors according to their enhancing or suppressive effects on the phenotypic development of C. pilosula, thereby identifying optimal ecological ranges for different stages within the same growth cycle. The clustering process is expressed in Equation (14):
d x i , c j = k = 1 n x i k c j k 2
where x i represents the feature vector of the i data point, c j represents the center (centroid) of the j cluster, n is the dimension of the features, and d x i , c j is the Euclidean distance between data point x i and cluster center c j .
In the application of this model to artificial cultivation and production, incoming ecological factors were continuously analyzed in real time. The clustering results for the corresponding time period were used to define the range of ecological factors. When the ecological factors during a given time period deviated from the range that promotes the accumulation of major medicinal components, it was determined that the components were negatively affected by environmental factors. At this stage, manual cultivation practices can be applied to intervene and regulate ecological parameters, adjusting them toward the optimal ranges that promote the accumulation of key medicinal components.

2.5. Model Validation for the Correlation Analysis Between Ecological Factors and Key Medicinal Compounds

Given that C. pilosula is a biennial plant, the data from 2020–2021 were treated as representing the first growth cycle, while the 2022–2023 data constituted the second growth cycle. The 2020–2021 dataset was used to construct and train an optimized nonlinear polynomial regression model aimed at predicting the effects of ecological factors on the key medicinal components. Subsequently, the model (developed using the 2020–2021 data) was applied to predict the 2023 data. The predicted values were then quantitatively compared with actual 2023 measurements based on mean squared error (MSE), mean absolute error (MAE), and mean absolute percentage error (MAPE).
For prediction, the XGBoost algorithm was employed. XGBoost(3.0.0) possesses strong nonlinear modeling capabilities, allowing it to automatically capture complex nonlinear relationships and interaction effects among variables. This makes it particularly suitable for modeling relationships between ecological factors (e.g., temperature, humidity, soil pH, nutrients) and medicinal components. Moreover, it includes built-in regularization parameters to prevent overfitting and supports parameter tuning through cross-validation, which is crucial when dealing with limited data. Its ensemble learning approach generally yields superior robustness and predictive accuracy compared to traditional polynomial regression, especially in multidimensional and highly heterogeneous datasets [28,29,30].
The prediction procedure is outlined as follows:
Objective Function Optimization:
In each iteration, the objective function—incorporating both the prediction loss and the model complexity—is minimized to determine an optimal tree that improves overall prediction, as expressed in Equation (15):
O b j t = i = 1 n L y i , y ^ i t 1 + f t x i + k = 1 t Ω f k
Regularization of Tree Complexity:
To prevent overfitting, the complexity of each tree is penalized. This penalty takes into account both the number of leaf nodes T and the L2 regularization of each leaf node’s prediction weight w j , as shown in Equation (16):
Ω ( f ) = γ T + 1 2 λ j = 1 T w 2 3
Loss Function Approximation:
A second-order Taylor expansion is applied to approximate the loss function, thereby accelerating the model optimization process. This approximation is given in Equation (17):
O b j t i = 1 n g i f t x i + 1 2 h i f t x i 2 + Ω f t
Optimal Leaf Weight Calculation:
The optimal prediction weight for the samples within leaf j is computed to minimize the objective function, as described by Equation (18):
w j * = i I j g i i I j h i + λ
Evaluation of Split Gain:
The contribution of a potential split point to the decrease in the objective function is evaluated to guide the splitting decision. A split is executed if the gain is positive, as demonstrated in Equation (19):
G a i n = 1 2 [ G L 2 H L + λ + G R 2 H R + λ ( G L + G R ) 2 H L + H R + λ ] γ
Following these steps, the prediction of the major medicinal components for 2023 was completed. The comparison between the predicted and actual values revealed that the prediction error for all indicators was below 10%, demonstrating both high accuracy and robustness of the model. These results confirm the applicability of the predictive model in this experimental phase and substantiate the viability of using nonlinear correlation models to explore the relationships between ecological factors and key medicinal compounds.

3. Results and Analysis

In this study, we identified significant correlations between several ecological factors and the key medicinal components of C. pilosula. Key findings include that temperature, light intensity, and soil fertility have a substantial impact on the accumulation of lobetyolin and polysaccharides. Additionally, we found that these factors influence the quality and quantity of medicinal components in a nonlinear manner, with optimal ranges identified for each factor that could improve cultivation outcomes. These results provide essential insights for predicting the accumulation of medicinal components in response to ecological changes, especially in the context of climate variations. Please refer to the following text for specific results and analyses.

3.1. Optimal Model Selection, Correlation Analysis, and Ecological Factor Prioritization

3.1.1. Model Comparison

We employed a stepwise modeling, comparison, and analysis approach for the three major medicinal components of C. pilosula using multiple linear regression and multiple nonlinear polynomial regressions as described in Section 2.4.2 and Section 2.4.3. Here, we report results for the fourth- and fifth-order regressions. The residual standard error (RSE) in the model analysis results measures the error size, with smaller values indicating better model performance. The Multiple R-squared and Adjusted R-squared values assessed the model’s explanatory power for the dependent variables (lobetyolin, polysaccharides, and flavonoids), with values closer to 1 indicating a better explanation of variation in the dependent variables.
First, modeling, comparison, and analysis were conducted for polysaccharides. The data for the models are presented in Table 1 and Table 2.
The R2 values increased progressively from the linear model to the second-, fourth-, and fifth-order polynomial models. The linear regression model yielded an R2 of approximately 0.83, the second-order model increased it to around 0.87, the fourth-order model further improved it to 0.94, and the fifth-order model achieved approximately 0.95. However, the Adjusted R2 is more critical as it accounts for the impact of the number of independent variables on model complexity. From the perspective of Adjusted R2, the fourth-order model reached the highest value (0.9068). Although the fifth-order model achieved a higher overall R2 (0.9533), its Adjusted R2 slightly declined (0.9066). This suggests that the addition of more high-order terms in the fifth-order model provided limited improvement in explanatory power and may have introduced slight overfitting. In terms of the Residual Standard Error (RSE), the fourth-order model (0.4056) and the fifth-order model (0.406) were nearly identical, with the fourth-order model being marginally better or showing negligible difference. As the model order increases, the interpretability of the model decreases due to the growing complexity of explanatory parameters. When selecting a model, it is essential to balance achieving a high explanatory power (high Adjusted R2) while avoiding overfitting. Given the highest Adjusted R2 and lowest error observed with the fourth-order polynomial model, it is a more reasonable choice as the final model for analyzing the correlation between ecological factors and the medicinal components of C. pilosula.
Next, modeling, comparison, and analysis were conducted for flavonoids. The data for the models are presented in Table 3 and Table 4.
The multiple linear regression model exhibited a very low explanatory power for flavonoids (Adjusted R2 ≈ 0.0095). However, the second-order model showed a significant improvement (Adjusted R2 ≈ 0.4747), indicating that the inclusion of second-order terms substantially enhanced the fit. From the second-order model (Adjusted R2 = 0.4747) to the fourth-order model (Adjusted R2 = 0.5031), there was an improvement in Adjusted R2, though the increment was not substantial. Simultaneously, the RSE decreased slightly (0.09218 → 0.09011), reflecting some progress. The fifth-order polynomial model further improved the Adjusted R2 from 0.5031 to 0.5777, with a more pronounced increase compared to the increment from the second- to the fourth-order model. Additionally, the RSE decreased from 0.09011 to 0.08306, showing a significant reduction in error. This indicates that the fifth-order polynomial model provided a notable improvement in fit over the fourth-order model. The fifth-order polynomial model achieved the highest Adjusted R2 (0.5777) and the lowest RSE (0.08306) for flavonoids. Among all the models tested, the multiple fifth-order polynomial model outperformed in all metrics, particularly with substantial improvements in Adjusted R2 and RSE. Although the complexity of the model increased, it resulted in higher explanatory power and a better fit. Therefore, the multiple nonlinear fifth-order polynomial model is the most suitable for analyzing relationships between ecological factors and flavonoids in C. pilosula.
Subsequently, modeling, comparison, and analysis were conducted for lobetyolin. The data for the models are presented in Table 5 and Table 6.
Compared to the linear model, the second-order model showed a slight decrease in Multiple R2 but a slight improvement in Adjusted R2, along with a small reduction in RSE. Overall, the second-order model demonstrated a marginal improvement in fit compared to the linear model. When compared to the second-order model, the fourth-order model significantly increased the Adjusted R2 from 0.469 to 0.63 and markedly reduced the RSE from 0.1964 to 0.1625. This indicates that the inclusion of high-order terms substantially improved the model’s fit. Compared to the fourth-order model, the fifth-order model further enhanced the Adjusted R2 (from 0.63 to 0.6678) and reduced the RSE (from 0.1625 to 0.154). This demonstrates that the fifth-order model provided additional optimization for lobetyolin and better captured the complex relationships between lobetyolin and ecological factors, without exhibiting any clear signs of overfitting.
In conclusion, the multiple nonlinear fifth-order polynomial regression model performed best among the four models, making it the most suitable for modeling relationships between ecological factors and lobetyolin in C. pilosula.
In summary, based on the comparison of model parameters, a multiple nonlinear fourth-order polynomial model was determined to be the most effective for polysaccharides, while multiple nonlinear fifth-order polynomial models were most effective for flavonoids and lobetyolin.

3.1.2. Correlation Analysis Results

  • Results of Principal Component Analysis
Based on a comprehensive consideration of standard deviation, variance explained ratio, and cumulative variance explained ratio, PC1-PC4, corresponding to MAT, AP, GHI, ARH, and 40cmMAT, were identified as the primary ecological factors after dimensionality reduction. The results of the principal component analysis are presented in Figure 5 and Table 7.
2.
Multiple Nonlinear Fourth-Order Polynomial
As shown in Figure 6b, under the fourth-order polynomial model, the R2 for polysaccharides was approximately 0.9441, the adjusted R2 was approximately 0.9068, and the residual standard error (RSE) was approximately 0.4056, indicating a very high level of fit. This suggests that higher-order nonlinear terms can more comprehensively describe the complex response patterns of polysaccharides to ecological factors. In the fourth-order model, the higher-order terms for precipitation (AP), air relative humidity (ARH), soil nutrients (e.g., N, P, K), and pH all achieved statistical significance. Specifically, while linear analysis previously indicated a negative linear relationship between AP and DT, the inclusion of fourth-order terms revealed that this influence may weaken or even reverse in certain ranges of precipitation, suggesting an optimal precipitation interval that is most beneficial for polysaccharide accumulation. Similarly, ARH and N, which showed negative effects in the linear model, may exhibit positive impacts at moderate levels in the fourth-order model, with excessively high or low values being less favorable. The presence of fourth-order terms allows the model to capture more turning points, helping to identify the sensitivity thresholds and optimal growth conditions for polysaccharides in response to ecological factors. These findings highlight the utility of higher-order polynomial models in revealing the intricate, nonlinear interactions between polysaccharides and their environmental determinants in C. pilosula.
3.
Multiple Nonlinear Fifth-Order Polynomial
Under the fourth-order polynomial model, the explanatory power for polysaccharides had already reached a high level, suggesting that the nonlinear response of polysaccharides to ecological factors can be effectively captured with lower-order terms (fourth-order). The influence of factors such as precipitation, humidity, and soil nitrogen on polysaccharides exhibits turning-point effects, indicating the presence of optimal ecological intervals.
Under the fifth-order polynomial model (Figure 7c), the R2 for total flavonoids (HT) was approximately 0.7889, the adjusted R2 was approximately 0.5777, and the residual standard error (RSE) was approximately 0.08306. Compared to the fourth-order model, the explanatory power improved further, indicating that the ecological response of total flavonoids is highly complex and requires higher-order polynomials for accurate modeling. In the fifth-order model, higher-order terms of pH (e.g., third- and fifth-order terms) exhibited significant and strongly positive or negative effects, suggesting the existence of a specific pH range most conducive to flavonoid accumulation. Additionally, the higher-order terms of MAT (air temperature) and 40cmARH (40 cm soil relative humidity) revealed complex turning-point effects. Total flavonoid content reached its highest levels within optimal temperature and humidity intervals, while deviations from these optimal conditions resulted in changes in both the direction and intensity of the impact. These findings underscore the need for fifth-order polynomial models to capture the intricate nonlinear relationships between total flavonoids and ecological factors in C. pilosula, providing deeper insights into the optimal environmental conditions for their accumulation.
Under the fifth-order polynomial model (Figure 7a), the R2 for lobetyolin (QG) was approximately 0.8339, the adjusted R2 was approximately 0.6678, and the residual standard error (RSE) was approximately 0.154, showing further improvement over the fourth-order model. Factors such as pH, 40cmARH (40 cm soil relative humidity), and DAP (available phosphorus) demonstrated higher significance in the fifth-order model, with some high-order terms exhibiting complex changes in sign. This indicates that lobetyolin may have multiple equilibrium points or optimal intervals for these factors. For instance, the third-order term of pH was positive, suggesting that lobetyolin may reach a peak under slightly alkaline conditions. However, further increases or decreases in pH beyond this range could lead to a decline in lobetyolin content. Similarly, the high-order terms of 40cmARH imply that deep soil humidity positively influences lobetyolin accumulation within a specific range, but excessively high or low humidity levels are detrimental. These findings highlight the intricate nonlinear interactions between lobetyolin and ecological factors, emphasizing the value of fifth-order polynomial models in identifying the sensitive thresholds and optimal conditions for maximizing lobetyolin content in C. pilosula.

3.1.3. Selection of Key and Secondary Ecological Factors Influencing the Major Medicinal Components of Codonopsis pilosula

Working from the multiple nonlinear fourth-order polynomial model for polysaccharides, and from the multiple nonlinear fifth-order polynomial models for total flavonoids and lobetyolin, we then excluded ecological factors with negligible correlations. The data were restructured, and an improved K-Means three-branch clustering method was used to classify factors into two categories: those that promote the accumulation of major medicinal components and those that reduce it. The results are presented in Table 8. For precipitation (AP), deviation from optimal levels in either direction reduced the accumulation of polysaccharides. For air relative humidity (ARH), a moderate to low humidity range (approximately 30–60%) was more favorable for polysaccharide accumulation, whereas higher humidity levels (>70%) were detrimental. For 40 cm soil temperature (40cmMAT), a narrow range around 28 °C was identified as optimal. Deviations exceeding 0.5 °C above or below this range decreased polysaccharide accumulation. For pH, an optimal point near 9.70 was identified. Minor deviations from this value were still acceptable, but substantial deviations significantly reduced polysaccharide accumulation. For available nitrogen (N), a moderately low nitrogen level (between 7 and 9) was more conducive to polysaccharide accumulation, but higher levels (>10) were detrimental. These findings provide practical insights into the sensitive intervals for each ecological factor, guiding optimization strategies for cultivating C. pilosula to enhance polysaccharides.
For total flavonoids, MAT (mean air temperature) within a moderate range (10–15 °C) was found to be favorable for flavonoid accumulation. Extremely low (<0 °C) or high temperatures (>20 °C) had adverse effects. For deep soil humidity (40cmARH), a moderately high range (e.g., 70–80%) promoted total flavonoid accumulation, whereas lower (<60%) or higher (>85%) soil humidity levels were detrimental. Regarding pH, the optimal range for total flavonoid accumulation resembled that for polysaccharides, within a narrow interval around 9.70 (9.68–9.72). Significant deviations from this range (e.g., <9.65 or >9.75) negatively affected flavonoid content. For air relative humidity (ARH), excessively high or low levels both negatively impacted flavonoid accumulation. Under the high-order model, a moderate relative humidity range (40–60%) was most conducive to flavonoid accumulation, while extreme conditions, such as very high humidity (>80%) or severe dryness (<30%), reduced the accumulation of flavonoids. These findings highlight the sensitive ecological thresholds that optimize total flavonoid content in C. pilosula and provide actionable insights for its cultivation.
For lobetyolin, the optimal point for precipitation (AP) occurs slightly below or above the mean value. For example, approximately 9.20 represents the optimal value; deviations in either direction, whether an increase (>9.3) or a decrease (<9.1), reduce lobetyolin accumulation. The ideal range for deep soil temperature (40cmMAT) for lobetyolin may be slightly higher than that for polysaccharides and flavonoids. Lobetyolin accumulates optimally between approximately 28.0 and 28.5 °C. Temperatures below 27.5 °C or above 29 °C have adverse effects on lobetyolin accumulation. For deep soil humidity (40cmARH), a moderately high range (65–75%) was most favorable for lobetyolin accumulation. Significant deviations from this range were detrimental. Excessive humidity (>80%) or insufficient moisture (<60%) inhibited lobetyolin accumulation. The optimal pH for lobetyolin, similar to that for total flavonoids and polysaccharides, is narrowly centered around 9.70, with the optimal range being within ±0.02–0.05 of this value. For available phosphorus (DAP), low to moderate levels were most conducive to lobetyolin accumulation. Extreme values of DAP, whether excessively high (>100) or near zero, were unfavorable. A moderate DAP range (10–50) was hypothesized to be optimal for lobetyolin accumulation. These findings emphasize the specific ecological thresholds required to maximize lobetyolin content in C. pilosula, providing precise guidelines for its cultivation.

3.2. Experimental Validation and Model Credibility Demonstration

3.2.1. Reproducibility Validation Using the Block Experimental Method

Within the C. pilosula cultivation site, a stratified experimental design was employed by dividing the entire experimental area into four sub-plots: A, B, C, and D. Data in each sub-plot were collected following a uniform set of standard operating procedures. Statistical analyses were conducted to evaluate the correlations between 11 key ecological factors (including mean air temperature [MAT], air relative humidity [ARH], soil pH, available soil nitrogen [N], available phosphorus [P], global horizontal irradiance [GHI], among others) and the principal medicinal components of C. pilosula (using lobetyolin, polysaccharides, and total flavonoids as indicators). To avoid potential overfitting resulting from a single modeling approach, and to maintain a lightweight analytical method, Spearman’s rank correlation coefficient [23]—an appropriate metric for assessing nonlinear relationships—was utilized to validate data consistency across the sub-plots (Table 9).
The results demonstrated that the Spearman’s rank correlation coefficients between MAT and the contents of Lobetyolin, polysaccharides, and total flavonoids did not vary significantly among the four sub-plots. To further validate the reproducibility of the data across the sub-plots, a one-way ANOVA was conducted to compare the medicinal component data among them. The results (Table 10) indicated no statistically significant differences (p = 0.78), demonstrating good reproducibility of the medicinal component data under uniform management conditions. Additionally, the intraclass correlation coefficient (ICC) for data within each sub-plot reached as high as 0.98, confirming the high consistency of the experimental data and providing a robust data foundation for subsequent model training.

3.2.2. Model Reliability Verification Based on Predictive Modelling

Given that C. pilosula is a biennial plant, the data from 2020–2021 were utilized as the first growth cycle for model construction and parameter training, while the 2022–2023 data served as the second growth cycle. Data for 2022 and 2023 were obtained through real sampling, but for 2023, an XGBoost-based model was employed to generate a parallel "predictive" data set for the purpose of validation (Table 11).
For lobetyolin, the overall cycle mean measured in 2023 was 0.80 mg/g (standard error: 0.05 mg/g), while the model predicted an overall cycle mean of 0.82 mg/g (standard error: 0.04 mg/g). The calculated error metrics were as follows: mean squared error (MSE) of 0.0004 mg2/g2, mean absolute error (MAE) of 0.02 mg/g, and mean absolute percentage error (MAPE) of approximately 2.5%.
For polysaccharides, the overall cycle mean measured in 2023 was 32.5 mg/g (standard error: 1.2 mg/g), compared to a model-predicted mean of 32.8 mg/g (standard error: 1.1 mg/g); the corresponding error metrics were MSE of approximately 0.09 mg2/g2, MAE of 0.3 mg/g, and MAPE of approximately 0.92%.
For total flavonoids, the overall cycle mean measured in 2023 was 9.60 mg/g (standard error: 0.80 mg/g), versus a predicted mean of 9.65 mg/g (standard error: 0.75 mg/g); the error metrics were MSE of approximately 0.04 mg2/g2, MAE of 0.2 mg/g, and MAPE of approximately 2.1%.
Based on the above comparisons between predicted and actual values, the following error metrics were obtained:
Mean Squared Error (MSE) = 0.0004 mg2/g2
Mean Absolute Error (MAE) = 0.02 mg/g
Mean Absolute Percentage Error (MAPE) = 2.5%
For the predictions of other key indicators such as polysaccharides and total flavonoids, the error metrics were maintained below 3%. Furthermore, paired t-tests comparing the 2023 actual data with the 2023 predicted data yielded p-values greater than 0.05, indicating no statistically significant differences between the two datasets. Consistency of the XGBoost model parameters across sub-regions was extremely high, with ICC values consistently reaching 0.98, thereby confirming the model’s robustness against subtle environmental differences.

4. Discussion

  • Relationship Between Ecological Factors and Medicinal Components
The influence of ecological factors on plant growth and the accumulation of medicinal components has been a long-standing focus in botanical and pharmaceutical research. Yanqun Li et al. demonstrated through multiple studies that environmental factors play a critical role in the accumulation of secondary metabolites in medicinal plants [31]. Research has shown that factors such as light intensity, temperature, humidity, and soil nutrients significantly impact the accumulation of active components in plants. Notably, the regulation of these factors during cultivation management can effectively enhance the yield of medicinal components. Similarly, Alami emphasized in his study that the rational regulation of environmental factors is crucial for the growth of traditional Chinese medicinal plants and the enhancement of their active medicinal components [32,33,34]. Ecological factors have been widely discussed in the existing literature concerning their influence on medicinal plants. For example, Camina et al. [35] investigated how ecological interactions, such as pollination and herbivory, affect plant bioactivity, particularly the chemical composition of essential oils. They found that the quality of pollination (cross-pollination versus self-pollination) and simulated herbivory significantly enhanced the bioactivity of the plant essential oils. Although our research primarily examines temperature, light intensity, and soil fertility, their variation likely influences medicinal components through similar mechanisms.
Our study, through the collection of long-term ecological factor data and analysis of the primary medicinal components of C. pilosula (e.g., polysaccharides, total flavonoids, and lobetyolin), identified significant effects of multiple ecological factors on these medicinal components [36]. Notably, variations in factors such as temperature, light intensity, and soil fertility exhibited clear correlations with the content of medicinal components. Some of these findings align with the conclusions of Qingyou Zhang regarding the accumulation of active components in traditional Chinese medicinal materials [37]. Through data analysis and modeling, we further validated the impact of these ecological factors on the accumulation of active components in C. pilosula, providing theoretical and practical support for future cultivation practices [38].
2.
Correlation Analysis and Model Selection
In the model selection for correlation analysis, this study first employed principal component analysis (PCA) to perform dimensionality reduction on the data, simplifying the subsequent modeling process. PCA, as a data preprocessing method, has been widely applied in botanical and ecological research to effectively extract the most representative factors [39]. Compared to the multi-factor analysis combined with machine learning strategy proposed by Liu Hang [40], PCA provided this study with a straightforward method for dimensionality reduction, enhancing the efficiency and precision of the subsequent regression analyses [41].
The correlation analysis methods proposed by Pearson and Spearman provided the theoretical foundation for this study [42]. Considering the relationships among variables and the data volume in the actual experiments, multivariate linear regression and nonlinear polynomial regression models were employed for data fitting and analysis. Although multivariate linear regression has been widely applied in numerous studies [43,44,45], the potential for complex nonlinear relationships between ecological factors and medicinal components led us to use nonlinear polynomial regression models, which offered more precise fitting results. Our resulting analyses confirmed that nonlinear models outperformed linear regression models in terms of both goodness-of-fit and explanatory power, further corroborating the nonlinear relationships between plant ecological factors and medicinal components in C. pilosula.
3.
Innovation and Application of the K-Means Three-Branch Clustering Method
Clustering analysis is an unsupervised learning method widely applied in plant ecology research [27]. By categorizing sample data, it can reveal similarities and differences between different groups. In this study, we employed an improved K-means three-branch clustering algorithm to further refine the selection of ecological factor intervals. This approach was particularly effective for analyzing strongly correlated ecological factors, such as temperature, light intensity, and soil moisture, yielding highly favorable results.
In this study, the three-branch clustering algorithm was integrated with ecological factor correlation analysis and successfully applied to optimize the cultivation of C. pilosula. This represents the first application of this method to identify the optimal levels of ecological factors in medicinal plant cultivation.
The advantage of this method lies in its ability to identify the optimal cultivation conditions influencing medicinal components under various environment conditions through cluster analysis of ecological factors. Compared to traditional clustering methods, the improved K-means three-branch method provides more accurate optimal intervals for ecological factors, demonstrating high practical applicability for real-world cultivation practices. On this basis, the results of this study indicate that ecological factors have a significant impact on the bioactive components of medicinal plants. Furthermore, climate change may further influence the accumulation of medicinal components by presenting plants with novel combinations of climatic factors that could alter physiological mechanisms through which these ecological factors operate [46]. Therefore, regulating the optimal range of these ecological factors is crucial for improving both the yield and quality of C. pilosula in the context of climate change.

5. Conclusions

  • Based on three years of observational data on pertinent ecological factors influencing C. pilosula and laboratory analyses of its major medicinal components (polysaccharides, total flavonoids, and lobetyolin), this study systematically conducted correlation analysis and model development and testing. First, 11 climatic and edaphic factors and the corresponding content of major medicinal components in C. pilosula were compiled and organized. Principal component analysis (PCA) was used as a reference to preliminarily screen and reduce the dimensions of the 11 ecological factors. Subsequently, multivariate linear regression and multivariate nonlinear polynomial regression models (including 2nd, 4th, and 5th-order models) were applied to fit and compare the relationships between major medicinal components and ecological factors. The results showed that while a multivariate linear regression model partially explained variation in polysaccharide content, it was insufficient for capturing relationships for total flavonoids or lobetyolin. When nonlinear polynomial models (particularly 4th and 5th-order) were introduced, the fitting accuracy for total flavonoids and lobetyolin improved significantly. These models effectively captured the complex nonlinear effects and critical turning points of ecological factors on the accumulation of these medicinal components.
  • Through model optimization, we identified the ecological factors most sensitive to the accumulation of polysaccharides, total flavonoids, and lobetyolin, as well as their optimal ranges. For polysaccharides, accumulation is more favorable under conditions of precipitation (AP) between 9.1 and 9.3, air humidity (ARH) between 30% and 60%, 40 cm soil temperature (40cmMAT) between 27.5 and 28.5 °C, pH between 9.68 and 9.72, and soil available nitrogen (N) levels between 7 and 9. Lobetyolin exhibited optimal performance under precipitation (AP) conditions of approximately 9.1 to 9.3, 40 cm soil temperature between 28.0 and 28.5 °C, deep soil humidity (40cmARH) between 65% and 75%, pH around 9.70, and moderate levels of available phosphorus (DAP) between 10 and 50. Similarly, total flavonoids were optimized within narrow ranges of specific temperature, humidity, and pH conditions.
  • After identifying the influencing factors and their optimal intervals, we further applied an improved tri-branch K-means clustering method to group the most strongly correlated ecological factors and delineate their ranges, thereby enabling refined interval selection within inflection points and sensitive ranges. This step provides a basis for optimizing the growth environment of C. pilosula and enhancing the content of its key medicinal components, laying a data-driven foundation for industrial-scale production and the formulation of precision cultivation strategies.
Finally, the experiment’s reliability was analyzed using a combination of stratified validation and model prediction methods. The high consistency of the data across sub-plots (ICC = 0.98) and the absence of significant differences among them (p = 0.78) verified the reliability of the experimental data, which was also confirmed by low values for mean squared error, mean absolute error, and mean absolute percentage error. These results collectively demonstrate the strong reliability of the experiment.
In conclusion, this study systematically collected and analyzed ecological factor data and the primary medicinal component contents of C. pilosula. By leveraging comparisons and optimizations of multivariate linear and high-order nonlinear models, supplemented with clustering methods, we ultimately identified the key ecological factors influencing the production of this species’ major medicinal components and optimal ranges for these factors. This can enable effective defenses against extreme weather events impacting the growth of C. pilosula by integrating weather forecasts, and also allow for the prediction of the accumulation of its key medicinal components based on changes in meteorological data. These models can now be used to provide cultivation guidance for growers, ultimately leading to higher-quality finished products.

Author Contributions

Conceptualization, Z.L.; methodology, H.L. (Haoming Li); software, H.L. (Haoming Li); validation, H.L. (Haoming Li); formal analysis, H.L. (Haoming Li), X.S., B.M., Y.Y., H.L. (Haopu Li) and L.J.; investigation, H.L. (Haoming Li) and H.L. (Haopu Li); resources, Y.S.; data curation, H.L. (Haopu Li); writing—original draft preparation, H.L. (Haoming Li); writing—review and editing, H.L. (Haoming Li); visualization, H.L. (Haoming Li); supervision, Z.L.; project administration, Y.S.; funding acquisition, Z.L. and Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by “Special” and “Excellent” Agricultural High Quality Development Science and Technology Support Engineering Project. The project is the development of C. pilosula products and demonstration of blockchain quality monitoring, with authorization number TYGC23-59. APC is sponsored by Shanxi Agricultural University.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

Sincere thanks to all participants for their work in this study, as well as to our fund for their support in the experimental site, data sampling, and other aspects of this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Chen, K.; Huang, G.; Li, Y.; Zhang, X.; Lei, Y.; Li, Y.; Xiong, J.; Sun, Y. Illumina Miseq Sequencing Reveals Correlations among Fruit Ingredients, Environmental Factors, and Amf Communities in Three Lycium barbarum Producing Regions of China. Microbiol. Spectr. 2022, 10, e02293-21. [Google Scholar] [CrossRef] [PubMed]
  2. Yang, D.; Liu, X.; Xu, X.; Niu, T.; Ma, X.; Fu, G.; Song, C.; Hou, X. Effect of Soil and Community Factors on the Yield and Medicinal Quality of Artemisia argyi Growth at Different Altitudes of the Funiu Mountain. Front. Plant Sci. 2024, 15, 1430758. [Google Scholar] [CrossRef]
  3. Gao, S.; Liu, J.; Wang, M.; Liu, Y.; Meng, X.; Zhang, T.; Qi, Y.; Zhang, B.; Liu, H.; Sun, X.; et al. Exploring on the bioactive markers of Codonopsis Radix by correlation analysis between chemical constituents and pharmacological effects. J. Ethnopharmacol. 2019, 236, 31–41. [Google Scholar] [CrossRef] [PubMed]
  4. Liu, H.; Chen, J.; Dy, J.; Fu, Y. Transforming Complex Problems into K-Means Solutions. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 9149–9168. [Google Scholar] [CrossRef]
  5. Liu, J.Y.; Liu, A.L.; Mao, F.Y.; Zhao, Y.S.; Cao, Z.; Cen, N.N.; Li, S.Q.; LI, L.; Ma, X.; Sui, H. Determination of the Active Ingredients and Biopotency in Polygala tenuifolia Willd. and the Ecological Factors That Influence Them. Ind. Crops Prod. 2019, 134, 113–123. [Google Scholar] [CrossRef]
  6. Wei, L.; Liu, J.; Yin, D.; Zhao, X. Influence of Ecological Factors on the Production of Active Substances in the Anti-Cancer Plant Sinopodophyllum hexandrum (Royle) Ts Ying. PLoS ONE 2015, 10, e0122981. [Google Scholar] [CrossRef]
  7. Hai, S.; Jin, Q.; Wang, Q.; Shao, C.; Zhang, L.; Guan, Y.; Tian, H.; Li, M.; Zhang, Y. Effects of Soil Quality on Effective Ingredients of Astragalus mongholicus from the Main Cultivation Regions in China. Ecol. Indic. 2020, 114, 106296. [Google Scholar]
  8. de Visser, C.C.; Chu, Q.P.; Mulder, J.A. A New Approach to Linear Regression with Multivariate Splines. Automatica 2009, 45, 2903–2909. [Google Scholar] [CrossRef]
  9. Du, Z.; Tu, J.; Wang, Z.; Zhang, X.; Jia, Z.; Guo, Y.; Yang, Z.; Li, X. Coal Ash Sintering Temperature Prediction Method Based on Principal Element Regression Involves Establishing Main Element Regression Model by Using Obtained Optimal Main Component Number, and Substituting Coal Quality Test Data into Model to Obtain Predicted Value. Patent No CN114036735-A, 11 February 2022. [Google Scholar]
  10. Fu, X.; Han, C.; Ma, L.; Li, H. Inner Mongolia Egg Price Prediction Method Based on Principal Component Analysis Butterfly Optimization Algorithm-Support Vector Regression Model, Involves Determining Influence Factor for Egg Price Prediction and Reducing Dimension of Original Data. Patent No CN118246946-A, 29 October 2024. [Google Scholar]
  11. Gao, W.; Hao, X.; Wang, M.; Yang, C.; He, D. Method for Regulating Factors Affecting Fruit Growth, Involves Obtaining Influencing Factor When First Principal Component is Maximum under Regression Model as Target Value of Each Influencing in Growth of Fruit Vegetables is Regulated. Patent No CN108564212-A, 4 May 2021. [Google Scholar]
  12. Hu, Y.; Wang, S. Logistic Regression Based Principal Component Analysis Target Classifying Method, Involves Determining Target Color Characteristics of Sample Set, and Establishing Regression Model to Determine Principal Component Characteristic Parameters. Patent No CN109740692-A, 10 May 2019. [Google Scholar]
  13. Zheng, X.; Chen, W.; Li, X.; Shi, W.; Sun, X.; Ge, Q.; He, C.; He, X. Effects of environmental factors and genotype on performance, soil physicochemical properties and endophytic fungi of Salvia miltiorrhiza. Rhizosphere 2025, 33, 101031. [Google Scholar] [CrossRef]
  14. Zhang, T.; Cheng, L.; Yang, L.; Han, M.; Li, J.; Yang, L. Relationship Between Quality Formation of Scutellaria baicalensis in Spring and Expression of Ecological Factors and Key Enzyme Genes. J. Jilin Agric. Univ. 2022, 44, 557–566. [Google Scholar]
  15. Xu, Z.; Liu, H.; Ullah, N.; Tung, S.A.; Ali, B.; Li, X.; Chen, S.; Xu, L. Insights into accumulation of active ingredients and rhizosphere microorganisms between Salvia miltiorrhiza and S. castanea. FEMS Microbiol. Lett. 2023, 370, fnad102. [Google Scholar] [CrossRef] [PubMed]
  16. Zhou, X.; Li, J.; Wen, C.; Wu, X.; Yang, W.; Zeng, P.; Wu, X. Correlations of polysaccharide and flavonoid contents in Prunella vulgaris L. with main environmental factors and high-quality provenance screen. J. South China Agric. Univ. 2021, 42, 96–101. [Google Scholar]
  17. Pant, P.; Pandey, S.; Dall’Acqua, S. The Influence of Environmental Conditions on Secondary Metabolites in Medicinal Plants: A Literature Review. Chem. Biodivers. 2021, 18, e2100345. [Google Scholar] [CrossRef]
  18. Lakshmi, R.; Baskar, S. Dic-Doc-K-Means: Dissimilarity-Based Initial Centroid Selection for Document Clustering Using K-Means for Improving the Effectiveness of Text Document Clustering. J. Inf. Sci. 2019, 45, 818–832. [Google Scholar] [CrossRef]
  19. Li, F. Designing Process Antivirus Mask Based on Principal Component Analysis Comprises Obtaining Big Data from Hospitals and Other Places for Patients Infected, and Using Principal Component Analysis and Linear Regression Methods to Design Masks. Patent No CN111400669-A, 10 July 2020. [Google Scholar]
  20. Mao, W.Q.; Wu, Y.; Li, Q.H.; Xiang, Y.Y.; Tang, W.T.; Hu, H.Y.; Ji, H.Y.; Ji, X.L.; Li, H.Y. Seed endophytes and rhizosphere microbiome of Imperata cylindrica, a pioneer plant of abandoned mine lands. Front. Microbiol. 2024, 15, 1415329. [Google Scholar] [CrossRef]
  21. Li, W.; Yao, J.; He, C.; Ren, Y.; Zhao, L.; He, X. The Synergy of Dark Septate Endophytes and Organic Residue on Isatis Indigotica Growth and Active Ingredients Accumulation under Drought Stress. Ind. Crops Prod. 2023, 203, 117147. [Google Scholar] [CrossRef]
  22. Liang, H.; Kong, Y.; Chen, W.; Wang, X.; Jia, Z.; Dai, Y.; Yang, X. The Quality of Wild Salvia miltiorrhiza from Dao Di Area in China and Its Correlation with Soil Parameters and Climate Factors. Phytochem. Anal. 2021, 32, 318–325. [Google Scholar] [CrossRef]
  23. Yue, Y.; Zhang, Q.; Wan, F.; Ma, G.; Zang, Z.; Xu, Y.; Jiang, C.; Huang, X. Effects of Different Drying Methods on the Drying Characteristics and Quality of Codonopsis pilosulae Slices. Foods. 2023, 12, 1323. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  24. DB14/T 2379-2021; Technical Guidelines for the Extraction of Lobetyolin from C. pilosula. The Shanxi Provincial Drug Quality Management Standardization Technical Committee: Taiyuan, China, 2021.
  25. DB14/T 2380-2021; Technical Guidelines for the Extraction of Polysaccharides from C. pilosula. The Shanxi Provincial Drug Quality Management Standardization Technical Committee: Taiyuan, China, 2021.
  26. NY/T 3950-2021; Determination of 10 Flavonoid Compounds in Plant-Based Foods by High-Performance Liquid Chromatography-Tandem Mass Spectrometry. The Department of Agricultural Product Quality and Safety Supervision/Ministry of Agriculture and Rural Affairs’ Expert Committee on Agricultural Product Nutrition Standards: Beijing, China, 2021.
  27. Li, H.; Li, H.; Li, B.; Shao, J.; Song, Y.; Liu, Z. Smart Temperature and Humidity Control in Pig House by Improved Three-Way K-Means. Agriculture 2023, 13, 2020. [Google Scholar] [CrossRef]
  28. Qiu, Y.; Zhou, J.; Khandelwal, M.; Yang, H.; Yang, P.; Li, C. Performance evaluation of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost models to predict blast-induced ground vibration. Eng. Comput. 2022, 38, 4145–4162. [Google Scholar] [CrossRef]
  29. Sutou, A.; Wang, J. Influence-balanced XGBoost: Improving XGBoost for imbalanced data using influence functions. IEEE Access 2024, 12, 193473–193486. [Google Scholar] [CrossRef]
  30. Xiao, Y.; Cui, H.; Khurma, R.A.; Castillo, P.A. Artificial lemming algorithm: A novel bionic meta-heuristic technique for solving real-world engineering optimization problems. Artif. Intell. Rev. 2025, 58, 84. [Google Scholar] [CrossRef]
  31. Li, Y.; Kong, D.; Fu, Y.; Sussman, M.R.; Wu, H. The effect of developmental and environmental factors on secondary metabolites in medicinal plants. Plant Physiol Biochem. 2020, 148, 80–89. [Google Scholar] [CrossRef] [PubMed]
  32. Alami, M.M.; Guo, S.; Mei, Z.; Yang, G.; Wang, X. Environmental factors on secondary metabolism in medicinal plants: Exploring accelerating factors. Med. Plant Biol. 2024, 3, e016. [Google Scholar] [CrossRef]
  33. Akilli, A.; Gorgulu, O. Comparative Assessments of Multivariate Nonlinear Fuzzy Regression Techniques for Egg Production Curve. Trop. Anim. Health Prod. 2020, 52, 2119–2127. [Google Scholar] [CrossRef]
  34. Junming, C.; Yang, C.; Zhou, C.; Li, Y.; Zhu, H.; Gui, W. Multivariate Regression Model for Industrial Process Measurement Based on Double Locally Weighted Partial Least Squares. IEEE Trans. Instrum. Meas. 2020, 69, 3962–3971. [Google Scholar] [CrossRef]
  35. Camina, J.L.; Usseglio, V.; Marquez, V.; Merlo, C.; Dambolena, J.S.; Zygadlo, J.A.; Ashworth, L. Ecological interactions affect the bioactivity of medicinal plants. Sci. Rep. 2023, 27, 12165. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  36. Angelos, M.; D’Enza, A.I.; van de Velden, M. Beyond Tandem Analysis: Joint Dimension Reduction and Clustering in R. J. Stat. Softw. 2019, 91, 1–24. [Google Scholar] [CrossRef]
  37. Zhang, Q.; Cai, Y.; Zhang, L.; Lu, M.; Yang, L.; Wang, D.; Jia, Q. The accumulation of active ingredients of Polygonatum cyrtonema Hua is associated with soil characteristics and bacterial community. Front Microbiol. 2024, 15, 1347204. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  38. Xu, N.; Tan, G.; Wang, H.; Gai, X. Effect of biochar additions to soil on nitrogen leaching, microbial biomass and bacterial community structure. Eur. J. Soil Biol. 2016, 74, 1–8. [Google Scholar] [CrossRef]
  39. Pham, D.T.; Dimov, S.S.; Nguyen, C.D. Selection of K in K-Means Clustering. Proc. Inst. Mech. Eng. Part C-J. Mech. Eng. Sci. 2005, 219, 103–119. [Google Scholar] [CrossRef]
  40. Qin, S.; Zhou, H.; Meng, W. Method for Controlling Multi-Collinearity of Rock Body in Geotechnical Project Using Principal Component Regression Mode in Ground Stress Field Inversion, Involves Comparing and Evaluating Structural Equation with Traditional Multi-Element Linear Regression Equation. Patent No CN118277740-A, 2 July 2024. [Google Scholar]
  41. Bang, S. A Study on Nonlinear Multivariate Regression Using Fully Connected Deep Neural Network. J. Korean Data Inf. Sci. Sociaty 2022, 33, 785–799. [Google Scholar]
  42. Ulewicz-Magulska, B.; Wesolowski, M. Antioxidant Activity of Medicinal Herbs and Spices from Plants of the Lamiaceae, Apiaceae and Asteraceae Families: Chemometric Interpretation of the Data. Antioxidants 2023, 12, 2039. [Google Scholar] [CrossRef] [PubMed]
  43. Vasconcellos, K.L.P.; Cordeiro, G.M. Bias Corrected Estimates in Multivariate Student T Regression Models. Commun. Stat. Theory Methods 2000, 29, 797–822. [Google Scholar] [CrossRef]
  44. Wang, Z.; Zhao, G.; Jin, P.; Bu, X.; Wang, L.; Wang, F.; Yang, J. Multi-Linear Regression Model and Principal Component Analysis Model Based Power Selling Price Analyzing Method, Involves Collecting Target Field from Original Data, and Analyzing Selling Price Model Based on Target Field. Patent No CN109558994-A, 2 April 2019. [Google Scholar]
  45. Xie, Q. Bagging Partial Least Squares and Principal Components Analysis Based Dimension Reduction Method, Involves Determining Regression Coefficient Vector Matrix, Generating Positive Definite Matrix, and Determining Transpose of Vector. Patent No CN109858535-A, 7 June 2019. [Google Scholar]
  46. Zuo, J.; Tang, X.; Zhang, H.; Zu, M.; Zhang, X.; Yuan, Y. Analysis of Niche Shift and Potential Suitable Distributions of Dendrobium under the Impact of Global Climate Change. Environ. Sci. Pollut. Res. 2023, 30, 11978–11993. [Google Scholar] [CrossRef]
Figure 1. Ecological factor data display. (MAT is surface temperature, AP is surface pressure, GHI is light intensity, ARH is relative humidity, 40cmMAT is temperature at 40 cm underground, 40cmARH is humidity at 40 cm underground, DAP is precipitation, PH is soil pH value, N is soil available nitrogen content, P is soil available phosphorus content, and K is soil available potassium content. Note: The data units in the figure are automatically assigned by the instrument and are not uniform).
Figure 1. Ecological factor data display. (MAT is surface temperature, AP is surface pressure, GHI is light intensity, ARH is relative humidity, 40cmMAT is temperature at 40 cm underground, 40cmARH is humidity at 40 cm underground, DAP is precipitation, PH is soil pH value, N is soil available nitrogen content, P is soil available phosphorus content, and K is soil available potassium content. Note: The data units in the figure are automatically assigned by the instrument and are not uniform).
Agronomy 15 01057 g001
Figure 2. Changes in lobetyolin, polysaccharides, and total flavonoids concentrations of C. pilosula in 2020. (a) Lobetyolin; (b) Polysaccharides; (c) Total flavonoids.
Figure 2. Changes in lobetyolin, polysaccharides, and total flavonoids concentrations of C. pilosula in 2020. (a) Lobetyolin; (b) Polysaccharides; (c) Total flavonoids.
Agronomy 15 01057 g002
Figure 3. Changes in lobetyolin, polysaccharides, and total flavonoids concentrations of C. pilosula in 2021. (a) Lobetyolin; (b) Polysaccharides; (c) Total flavonoids.
Figure 3. Changes in lobetyolin, polysaccharides, and total flavonoids concentrations of C. pilosula in 2021. (a) Lobetyolin; (b) Polysaccharides; (c) Total flavonoids.
Agronomy 15 01057 g003
Figure 4. Changes in lobetyolin, polysaccharides, and total flavonoids concentrations of C. pilosula in 2022. (a) Lobetyolin; (b) Polysaccharides; (c) Total flavonoids.
Figure 4. Changes in lobetyolin, polysaccharides, and total flavonoids concentrations of C. pilosula in 2022. (a) Lobetyolin; (b) Polysaccharides; (c) Total flavonoids.
Agronomy 15 01057 g004
Figure 5. Principal Component Analysis Results Chart.
Figure 5. Principal Component Analysis Results Chart.
Agronomy 15 01057 g005
Figure 6. Graphical analyses of the nonlinear fourth-degree polynomial models for the main medicinal substances. (a) Lobetyolin (b) Polysaccharides (c) Total Flavonoids.
Figure 6. Graphical analyses of the nonlinear fourth-degree polynomial models for the main medicinal substances. (a) Lobetyolin (b) Polysaccharides (c) Total Flavonoids.
Agronomy 15 01057 g006
Figure 7. Graphical analyses of the nonlinear fifth-degree polynomial model for the main medicinal substances. (a) Lobetyolin (b) Polysaccharides (c) Total Flavonoids.
Figure 7. Graphical analyses of the nonlinear fifth-degree polynomial model for the main medicinal substances. (a) Lobetyolin (b) Polysaccharides (c) Total Flavonoids.
Agronomy 15 01057 g007
Table 1. Results from the polysaccharide multivariate nonlinear four-order polynomial regression model.
Table 1. Results from the polysaccharide multivariate nonlinear four-order polynomial regression model.
EstimateStd. Errort ValuePr(>|t|)
0(Intercept)10.52060.0384273.308<2 × 10−16***
1poly(MAT,4)1−3.48672.9121−1.1970.2354
2poly(MAT,4)2−0.00330.9294−0.0040.9971
3poly(MAT,4)30.09760.87020.1120.9110
4poly(MAT,4)40.93190.77471.2030.2333
5poly(AP,4)1−2.52391.0616−2.3770.0203*
6poly(AP,4)2−1.28510.5639−2.2790.0259*
7poly(AP,4)3−0.69420.5713−1.2150.2287
8poly(AP,4)4−1.26940.4819−2.6340.0105*
9poly(GHI,4)1−1.94701.1405−1.7070.0925
10poly(GHI,4)20.18390.77730.2370.8136
11poly(GHI,4)30.32160.66430.4840.6299
12poly(GHI,4)40.75330.58901.2790.2053
13poly(ARH,4)1−1.70930.8499−2.0110.0483*
14poly(ARH,4)2−0.88340.7173−1.2310.2225
15poly(ARH,4)3−0.13060.5960−0.2190.8271
16poly(ARH,4)40.66310.59271.1190.2673
17poly(X40cmMAT,4)1−3.53342.5686−1.3760.1735
18poly(X40cmMAT,4)23.33600.94793.5190.0007***
19poly(X40cmMAT,4)3−0.58920.6863−0.8580.3937
20poly(X40cmMAT,4)4−2.04720.5746−3.5630.0006***
21poly(X40cmARH,4)1−0.68510.6275−1.0920.2788
22poly(X40cmARH,4)20.92500.63291.4620.1485
23poly(X40cmARH,4)3−0.96470.6240−1.5460.1269
24poly(X40cmARH,4)4−0.50310.5469−0.9200.3609
25poly(DAP,4)1−1.11750.6966−1.6040.1134
26poly(DAP,4)20.49580.60140.8240.4126
27poly(DAP,4)30.34840.66050.5270.5996
28poly(DAP,4)4−0.62120.5306−1.1710.2458
29poly(PH,4)10.57570.55081.0450.2997
30poly(PH,4)21.53120.57432.6660.0096**
31poly(PH,4)3−0.05380.5356−0.1010.9202
32poly(PH,4)4−1.19220.5698−2.0920.0402*
33poly(N,4)1−12.35665.8057−2.1280.0370*
34poly(N,4)2−11.46807.0947−1.6160.1107
35poly(N,4)3−1.68893.7597−0.4490.6547
36poly(N,4)4−2.91371.3293−2.1920.0319*
37poly(P,4)16.105913.64140.4480.6559
38poly(P,4)23.788613.17220.2880.7745
39poly(P,4)36.751910.30740.6550.5147
40poly(P,4)43.88816.20950.6260.5333
41poly(K,4)112.639812.75440.9910.3252
42poly(K,4)26.792213.03300.5210.6040
43poly(K,4)3−0.743610.8276−0.0690.9454
44poly(K,4)4−2.78035.9298−0.4690.6407
Residual standard error: 0.4056 on 66 degrees of freedom
Multiple R-squared: 0.9441,
F-statistic: 25.33 on 44 and 66 DF, p-value: <2.2 × 10−16
* The correlation is significant, and the more * there are, the higher the significance Representative correlation is close to significant
Table 2. Results from the polysaccharide multivariate nonlinear 5th degree polynomial regression model.
Table 2. Results from the polysaccharide multivariate nonlinear 5th degree polynomial regression model.
EstimateStd. Errort ValuePr(>|t|)
0(Intercept)1.659 × 1011.690 × 1010.9820.3304
1MAT−2.209 × 10−23.182 × 10−2−0.6940.4903
2AP−1.843 × 10−21.461 × 10−2−1.2620.2124
3GHI−4.233 × 10−82.291 × 10−8−1.8470.0700
4ARH−1.092 × 10−24.652 × 10−3−2.3470.0225*
5X40cmMAT−4.458 × 10−23.437 × 10−2−1.2970.2000
6X40cmARH−9.519 × 10−35.888 × 10−3−1.6170.1116
7DAP−7.416 × 10−33.269 × 10−3−2.2690.0272*
8PH2.010 × 1001.298 × 1001.5490.1272
9N−2.137 × 10−15.785 × 10−1−0.3690.7132
10P1.268 × 10−12.310 × 10−10.5490.5852
11K4.022 × 10−22.252 × 10−10.1790.8588
12poly(MAT,5)1NANANANA
13poly(MAT,5)2−1.041 × 1001.195 × 100−0.8710.3875
14poly(MAT,5)3−2.543 × 10−11.020 × 100−0.2490.8040
15poly(MAT,5)41.711 × 1009.029 × 10−11.8950.0633
16poly(MAT,5)5−6.870 × 10−48.056 × 10−1−0.0010.9993
17poly(AP,5)1NANANANA
18poly(AP,5)2−1.495 × 1005.915 × 10−1−2.5280.0143*
19poly(AP,5)3−5.978 × 10−16.207 × 10−1−0.9630.3396
20poly(AP,5)4−1.500 × 1005.071 × 10−1−2.9580.0045**
21poly(AP,5)5−8.947 × 10−15.709 × 10−1−1.5670.1228
22poly(GHI,5)1NANANANA
23poly(GHI,5)21.002 × 1009.239 × 10−11.0840.2829
24poly(GHI,5)3−1.619 × 10−17.160 × 10−1−0.2260.8219
25poly(GHI,5)41.093 × 1007.140 × 10−11.5310.1314
26poly(GHI,5)5−1.240 × 1007.223 × 10−1−1.7170.0916.
27poly(ARH,5)1NANANANA
28poly(ARH,5)2−1.177 × 1007.699 × 10−1−1.5280.1321
29poly(ARH,5)3−5.011 × 10−17.349 × 10−1−0.6820.4981
30poly(ARH,5)41.727 × 10−17.142 × 10−10.2420.8098
31poly(ARH,5)53.378 × 10−15.954 × 10−10.5670.5727
32poly(X40cmMAT,5)1NANANANA
33poly(X40cmMAT,5)24.935 × 1001.205 × 1004.0970.0001***
34poly(X40cmMAT,5)3−8.041 × 10−27.747 × 10−1−0.1040.9177
35poly(X40cmMAT,5)4−2.332 × 1006.041 × 10−1−3.8600.0003***
36poly(X40cmMAT,5)5−4.815 × 10−15.387 × 10−1−0.8940.3754
37poly(X40cmARH,5)1NANANANA
38poly(X40cmARH,5)27.938 × 10−16.722 × 10−11.1810.2427
39poly(X40cmARH,5)3−1.110 × 1007.663 × 10−1−1.4490.1530
40poly(X40cmARH,5)4−9.164 × 10−16.526 × 10−1−1.4040.1658
41poly(X40cmARH,5)51.251 × 10−15.766 × 10−10.2170.8290
42poly(DAP,5)1NANANANA
43poly(DAP,5)29.680 × 10−17.094 × 10−11.3640.1779
44poly(DAP,5)31.160 × 1009.033 × 10−11.2840.2045
45poly(DAP,5)4−4.262 × 10−15.833 × 10−1−0.7310.4681
46poly(DAP,5)5−4.041 × 10−16.523 × 10−1−0.6200.5381
47poly(PH,5)1NANANANA
48poly(PH,5)21.739 × 1006.494 × 10−12.6770.0097**
49poly(PH,5)33.586 × 10−16.272 × 10−10.5720.5698
50poly(PH,5)4−9.751 × 10−16.104 × 10−1−1.5980.1158
51poly(PH,5)5−6.274 × 10−16.718 × 10−1−0.9340.3544
52poly(N,5)1NANANANA
53poly(N,5)25.255 × 1001.955 × 1010.2690.7890
54poly(N,5)37.983 × 1001.166 × 1010.6840.4965
55poly(N,5)4−7.385 × 10−14.337 × 100−0.1700.8654
56poly(N,5)51.098 × 1001.491 × 1000.7370.4644
57poly(P,5)1NANANANA
58poly(P,5)26.619 × 1002.970 × 1010.2230.8244
59poly(P,5)31.085 × 1012.528 × 1010.4290.6696
60poly(P,5)47.442 × 1001.431 × 1010.5200.6051
61poly(P,5)52.372 × 1006.882 × 1000.3450.7316
62poly(K,5)1NANANANA
63poly(K,5)2−1.006 × 1013.274 × 101−0.3070.7597
64poly(K,5)3−1.840 × 1012.952 × 101−0.6230.5356
65poly(K,5)4−1.273 × 1011.747 × 101−0.7290.4691
66poly(K,5)5−3.787 × 1007.432 × 100−0.5100.6123
Residual standard error: 0.406 on 55 degrees of freedom
Multiple R-squared: 0.9533, Adjusted R-squared: 0.9066
F-statistic: 20.42 on 55 and 55 DF, p-value: <2.2 × 10−16 NA indicates severe collinearity between higher-order terms
* The correlation is significant, and the more * there are, the higher the significance Representative correlation is close to significant
Table 3. Results from the flavonoid multivariate nonlinear fourth-order polynomial regression model.
Table 3. Results from the flavonoid multivariate nonlinear fourth-order polynomial regression model.
EstimateStd. Errort ValuePr(>|t|)
0(Intercept)1.64060.0085191.836<2 × 10−16***
1poly(MAT,4)1−0.23210.6470−0.3590.7209
2poly(MAT,4)2−0.83050.2064−4.0220.0001***
3poly(MAT,4)30.03010.19330.1560.8767
4poly(MAT,4)40.45310.17212.6330.0105*
5poly(AP,4)1−0.09840.2358−0.4170.6778
6poly(AP,4)20.08240.12520.6580.5125
7poly(AP,4)3−0.14470.1269−1.1400.2583
8poly(AP,4)4−0.09550.1070−0.8930.3753
9poly(GHI,4)10.03010.25340.1190.9057
10poly(GHI,4)20.08430.17260.4890.6267
11poly(GHI,4)3−0.30570.1476−2.0710.0422*
12poly(GHI,4)40.24870.13081.9010.0616
13poly(ARH,4)1−0.15720.1888−0.8330.4079
14poly(ARH,4)2−0.21630.1593−1.3570.1793
15poly(ARH,4)3−0.01390.1324−0.1060.9162
16poly(ARH,4)4−0.29830.1316−2.2650.0267*
17poly(X40cmMAT,4)10.00860.57060.0150.9879
18poly(X40cmMAT,4)2−0.20820.2106−0.9890.3263
19poly(X40cmMAT,4)3−0.31250.1524−2.0500.0443*
20poly(X40cmMAT,4)4−0.08570.1276−0.6720.5041
21poly(X40cmARH,4)10.00530.13940.0380.9694
22poly(X40cmARH,4)20.09660.14060.6870.4944
23poly(X40cmARH,4)3−0.01100.1386−0.0800.9367
24poly(X40cmARH,4)4−0.18900.1215−1.5560.1245
25poly(DAP,4)1−0.12270.1547−0.7930.4306
26poly(DAP,4)20.18460.13361.3820.1717
27poly(DAP,4)30.29320.14671.9980.0498*
28poly(DAP,4)4−0.22560.1178−1.9140.0599
29poly(PH,4)10.11690.12230.9560.3427
30poly(PH,4)20.26000.12762.0380.0456*
31poly(PH,4)30.03700.11900.3110.7565
32poly(PH,4)40.07650.12650.6050.5472
33poly(N,4)10.16391.28980.1270.8992
34poly(N,4)21.51871.57620.9630.3388
35poly(N,4)30.27050.83530.3240.7470
36poly(N,4)40.32740.29531.1090.2715
37poly(P,4)1−2.70543.0307−0.8930.3752
38poly(P,4)2−1.08952.9265−0.3720.7108
39poly(P,4)3−1.51512.2900−0.6620.5105
40poly(P,4)4−1.00431.3796−0.7280.4692
41poly(K,4)12.64122.83370.9320.3546
42poly(K,4)2−0.33902.8956−0.1170.9071
43poly(K,4)30.60732.40560.2520.8014
44poly(K,4)40.91221.31740.6920.4910
Residual standard error: 0.09011 on 66 degrees of freedom
Multiple R-squared: 0.7018,
F-statistic: 3.531 on 44 and 66 DF, p-value: 1.873 × 10−6
* The correlation is significant, and the more * there are, the higher the significance Representative correlation is close to significant
Table 4. Results from the flavonoid multivariate nonlinear 5th degree polynomial regression model.
Table 4. Results from the flavonoid multivariate nonlinear 5th degree polynomial regression model.
EstimateStd. Errort ValuePr(>|t|)
0(Intercept)1.5083.4580.4360.6643
1MAT−5.416 × 10−46.509 × 10−3−0.0830.9339
2AP−6.308 × 10−42.988 × 10−3−0.2110.8336
3GHI2.037 × 10−94.688 × 10−90.4350.6656
4ARH−1.289 × 10−49.518 × 10−4−0.1350.8928
5X40cmMAT−3.510 × 10−37.033 × 10−3−0.4990.6196
6X40cmARH−6.061 × 10−41.205 × 10−3−0.5030.6168
7DAP1.591 × 10−46.688 × 10−40.2380.8128
8PH1.588 × 10−12.656 × 10−10.5980.5523
9N−1.687 × 10−11.183 × 10−1−1.4250.1597
10P−3.340 × 10−24.725 × 10−2−0.7070.4826
11K5.758 × 10−24.607 × 10−21.2500.2166
12poly(MAT,5)1NANANANA
13poly(MAT,5)2−9.392 × 10−12.444 × 10−1−3.8420.0003***
14poly(MAT,5)31.613 × 10−22.086 × 10−10.0770.9386
15poly(MAT,5)42.564 × 10−11.847 × 10−11.3880.1707
16poly(MAT,5)5−8.789 × 10−21.648 × 10−1−0.5330.5959
17poly(AP,5)1NANANANA
18poly(AP,5)23.627 × 10−21.210 × 10−10.3000.7655
19poly(AP,5)3−1.467 × 10−11.270 × 10−1−1.1550.2529
20poly(AP,5)4−6.704 × 10−21.037 × 10−1−0.6460.5208
21poly(AP,5)57.245 × 10−21.168 × 10−10.6200.5376
22poly(GHI,5)1NANANANA
23poly(GHI,5)21.123 × 10−11.890 × 10−10.5940.5550
24poly(GHI,5)3−3.114 × 10−11.465 × 10−1−2.1260.0380*
25poly(GHI,5)42.601 × 10−11.461 × 10−11.7810.0804
26poly(GHI,5)51.980 × 10−21.478 × 10−10.1340.8939
27poly(ARH,5)1NANANANA
28poly(ARH,5)2−2.216 × 10−11.575 × 10−1−1.4070.1651
29poly(ARH,5)3−2.821 × 10−21.503 × 10−1−0.1880.8518
30poly(ARH,5)4−1.303 × 10−11.461 × 10−1−0.8920.3764
31poly(ARH,5)5−8.536 × 10−21.218 × 10−1−0.7010.4863
32poly(X40cmMAT,5)1NANANANA
33poly(X40cmMAT,5)2−2.008 × 10−12.465 × 10−1−0.8150.4188
34poly(X40cmMAT,5)3−3.739 × 10−11.585 × 10−1−2.3590.0219*
35poly(X40cmMAT,5)4−1.712 × 10−21.236 × 10−1−0.1380.8903
36poly(X40cmMAT,5)54.107 × 10−11.102 × 10−13.7260.0004***
37poly(X40cmARH,5)1NANANANA
38poly(X40cmARH,5)2−3.126 × 10−21.375 × 10−1−0.2270.8210
39poly(X40cmARH,5)3−1.000 × 10−11.568 × 10−1−0.6380.5260
40poly(X40cmARH,5)4−1.763 × 10−11.335 × 10−1−1.3210.1920
41poly(X40cmARH,5)5−9.026 × 10−31.180 × 10−1−0.0770.9392
42poly(DAP,5)1NANANANA
43poly(DAP,5)21.190 × 10−11.451 × 10−10.8200.4158
44poly(DAP,5)33.511 × 10−11.848 × 10−11.9000.0626
45poly(DAP,5)4−1.463 × 10−11.193 × 10−1−1.2260.2253
46poly(DAP,5)5−1.193 × 10−11.335 × 10−1−0.8940.3752
47poly(PH,5)1NANANANA
48poly(PH,5)22.142 × 10−11.329 × 10−11.6120.1126
49poly(PH,5)3−3.445 × 10−21.283 × 10−1−0.2680.7893
50poly(PH,5)49.566 × 10−21.249 × 10−10.7660.4469
51poly(PH,5)5−1.743 × 10−11.375 × 10−1−1.2680.2100
52poly(N,5)1NANANANA
53poly(N,5)2−6.2233.999−1.5560.1253
54poly(N,5)3−4.3102.386−1.8060.0763
55poly(N,5)4−1.5198.874 × 10−1−1.7110.0926
56poly(N,5)5−6.651 × 10−13.050 × 10−1−2.1800.0335*
57poly(P,5)1NANANANA
58poly(P,5)23.6826.0770.6060.5470
59poly(P,5)33.0355.1720.5870.5597
60poly(P,5)49.160 × 10−12.9280.3130.7556
61poly(P,5)51.4181.4081.0070.3183
62poly(K,5)1NANANANA
63poly(K,5)21.9906.6980.2970.7675
64poly(K,5)32.2376.0390.3700.7125
65poly(K,5)41.9153.5740.5360.5942
66poly(K,5)5−4.138 × 10−11.520−0.2720.7865
Residual standard error: 0.08306 on 55 degrees of freedom
Multiple R-squared: 0.7889,
F-statistic: 3.736 on 55 and 55 DF, p-value: 1.266 × 10−6 NA indicates severe collinearity between higher-order terms
* The correlation is significant, and the more * there are, the higher the significance Representative correlation is close to significant
Table 5. Results from the multivariate nonlinear fourth-order polynomial regression model for alkynyl glycosides.
Table 5. Results from the multivariate nonlinear fourth-order polynomial regression model for alkynyl glycosides.
EstimateStd. Errort ValuePr(>|t|)
0(Intercept)0.36500.015423.665<2 × 10−16***
1poly(MAT,4)1−1.99031.1670−1.7050.0928
2poly(MAT,4)2−0.05410.3724−0.1450.8849
3poly(MAT,4)30.56040.34871.6070.1128
4poly(MAT,4)4−0.12790.3105−0.4120.6815
5poly(AP,4)10.99990.42542.3500.0217*
6poly(AP,4)20.06300.22590.2790.7811
7poly(AP,4)3−0.08160.2289−0.3570.7225
8poly(AP,4)4−0.34080.1931−1.7650.0822
9poly(GHI,4)1−0.68700.4570−1.5030.1375
10poly(GHI,4)2−0.15970.3115−0.5130.6098
11poly(GHI,4)3−0.31890.2662−1.1980.2352
12poly(GHI,4)40.41610.23601.7630.0825
13poly(ARH,4)1−0.50480.3406−1.4820.1430
14poly(ARH,4)20.01230.28740.0430.9657
15poly(ARH,4)30.13880.23880.5810.5629
16poly(ARH,4)40.28110.23751.1840.2408
17poly(X40cmMAT,4)14.22751.02934.1070.0001***
18poly(X40cmMAT,4)20.31460.37980.8280.4104
19poly(X40cmMAT,4)3−0.49150.2750−1.7870.0784
20poly(X40cmMAT,4)40.06100.23020.2650.7916
21poly(X40cmARH,4)1−0.12530.2514−0.4980.6199
22poly(X40cmARH,4)20.46590.25361.8370.0706
23poly(X40cmARH,4)3−0.58340.2501−2.3330.0227*
24poly(X40cmARH,4)4−0.51610.2191−2.3550.0215*
25poly(DAP,4)1−0.64500.2791−2.3110.0239*
26poly(DAP,4)20.84890.24103.5220.0007***
27poly(DAP,4)30.17620.26470.6660.5079
28poly(DAP,4)4−0.12830.2126−0.6040.5481
29poly(PH,4)10.43930.22071.9900.0507
30poly(PH,4)20.06040.23010.2630.7935
31poly(PH,4)30.56070.21462.6120.0111*
32poly(PH,4)4−0.00870.2283−0.0380.9696
33poly(N,4)1−2.29122.3266−0.9850.3283
34poly(N,4)2−1.59492.8432−0.5610.5767
35poly(N,4)3−0.94001.5067−0.6240.5348
36poly(N,4)4−0.34210.5327−0.6420.5229
37poly(P,4)1−3.30115.4669−0.6040.5480
38poly(P,4)2−0.00585.2788−0.0010.9991
39poly(P,4)30.52124.13070.1260.8999
40poly(P,4)42.46502.48850.9910.3255
41poly(K,4)15.56105.11141.0880.2805
42poly(K,4)21.71135.22300.3280.7442
43poly(K,4)31.11674.33920.2570.7977
44poly(K,4)4−2.12802.3764−0.8950.3737
Residual standard error: 0.1964 on 1057 degrees of freedom
Multiple R-squared: 0.4799,
F-statistic: 44.33 on 22 and 1057 DF, p-value: <2.2 × 10−16
* The correlation is significant, and the more * there are, the higher the significance Representative correlation is close to significant
Table 6. Results from the multivariate nonlinear fifth-degree polynomial regression model for alkynyl glycosides.
Table 6. Results from the multivariate nonlinear fifth-degree polynomial regression model for alkynyl glycosides.
EstimateStd. Errort ValuePr(>|t|)
0(Intercept)−3.883 × 1016.411−6.0571.3 × 10−7***
1MAT−2.134 × 10−21.207 × 10−2−1.7680.0825
2AP1.215 × 10−25.541 × 10−32.1930.0325*
3GHI−6.863 × 10−98.692 × 10−9−0.7900.4331
4ARH−1.715 × 10−31.765 × 10−3−0.9720.3354
5X40cmMAT5.415 × 10−21.304 × 10−24.1530.0001***
6X40cmARH−2.174 × 10−32.233 × 10−3−0.9730.3346
7DAP−2.653 × 10−31.240 × 10−3−2.1390.0368*
8PH1.357 × 1004.924 × 10−12.7560.0079**
9N2.734 × 10−12.194 × 10−11.2460.2180
10P4.969 × 10−38.761 × 10−20.0570.9549
11K−4.566 × 10−28.541 × 10−2−0.5350.5950
12poly(MAT,5)1NANANANA
13poly(MAT,5)2−2.546 × 10−14.532 × 10−1−0.5620.5765
14poly(MAT,5)35.802 × 10−13.868 × 10−11.5000.1393
15poly(MAT,5)42.753 × 10−23.425 × 10−10.0800.9362
16poly(MAT,5)5−3.263 × 10−13.056 × 10−1−1.0680.2903
17poly(AP,5)1NANANANA
18poly(AP,5)25.971 × 10−22.244 × 10−10.2660.7911
19poly(AP,5)31.228 × 10−12.354 × 10−10.5220.6039
20poly(AP,5)4−3.880 × 10−11.923 × 10−1−2.0170.0485*
21poly(AP,5)5−9.635 × 10−22.166 × 10−1−0.4450.6581
22poly(GHI,5)1NANANANA
23poly(GHI,5)2−2.316 × 10−13.505 × 10−1−0.6610.5115
24poly(GHI,5)3−1.965 × 10−12.716 × 10−1−0.7240.4724
25poly(GHI,5)49.431 × 10−22.708 × 10−10.3480.7290
26poly(GHI,5)51.793 × 10−12.740 × 10−10.6540.5156
27poly(ARH,5)1NANANANA
28poly(ARH,5)29.584 × 10−22.921 × 10−10.3280.7440
29poly(ARH,5)31.488 × 10−12.788 × 10−10.5340.5957
30poly(ARH,5)41.966 × 10−12.709 × 10−10.7260.4710
31poly(ARH,5)52.146 × 10−12.258 × 10−10.9500.3462
32poly(X40cmMAT,5)1NANANANA
33poly(X40cmMAT,5)27.286 × 10−14.569 × 10−11.5940.1165
34poly(X40cmMAT,5)3−5.453 × 10−12.938 × 10−1−1.8560.0688
35poly(X40cmMAT,5)4−4.938 × 10−22.292 × 10−1−0.2150.8301
36poly(X40cmMAT,5)5−1.090 × 10−12.044 × 10−1−0.5330.5959
37poly(X40cmARH,5)1NANANANA
38poly(X40cmARH,5)25.505 × 10−12.550 × 10−12.1590.0352*
39poly(X40cmARH,5)3−3.350 × 10−12.907 × 10−1−1.1530.2540
40poly(X40cmARH,5)4−4.306 × 10−12.476 × 10−1−1.7390.0875
41poly(X40cmARH,5)53.624 × 10−12.187 × 10−11.6570.1032
42poly(DAP,5)1NANANANA
43poly(DAP,5)27.184 × 10−12.691 × 10−12.6700.0099**
44poly(DAP,5)3−1.897 × 10−13.426 × 10−1−0.5540.5820
45poly(DAP,5)43.769 × 10−32.213 × 10−10.0170.9864
46poly(DAP,5)54.321 × 10−22.474 × 10−10.1750.8620
47poly(PH,5)1NANANANA
48poly(PH,5)22.787 × 10−12.464 × 10−11.1310.2629
49poly(PH,5)38.921 × 10−12.379 × 10−13.7500.0004***
50poly(PH,5)41.877 × 10−12.315 × 10−10.8110.4211
51poly(PH,5)5−5.229 × 10−12.548 × 10−1−2.0520.0449*
52poly(N,5)1NANANANA
53poly(N,5)21.364 × 1017.4141.8390.0712
54poly(N,5)38.2134.4241.8560.0687
55poly(N,5)42.4091.6451.4640.1487
56poly(N,5)51.0195.656 × 10−11.8020.0770
57poly(P,5)1NANANANA
58poly(P,5)2−1.7061.127 × 101−0.1510.8802
59poly(P,5)3−3.005 × 10−19.590−0.0310.9751
60poly(P,5)43.3545.4290.6180.5392
61poly(P,5)54.129 × 10−12.6110.1580.8749
62poly(K,5)1NANANANA
63poly(K,5)2−9.8981.242 × 101−0.7970.4288
64poly(K,5)3−1.003 × 1011.120 × 101−0.8950.3744
65poly(K,5)4−8.9306.626−1.3480.1832
66poly(K,5)5−2.1142.819−0.7500.4564
Residual standard error: 0.154 on 55 degrees of freedom
Multiple R-squared: 0.8339,
F-statistic: 5.021 on 55 and 55 DF, p-value: 6.948 × 10−9NA indicates severe collinearity between higher-order terms
* The correlation is significant, and the more * there are, the higher the significance Representative correlation is close to significant
Table 7. Summary of Principal Component Contributions.
Table 7. Summary of Principal Component Contributions.
PC1PC2PC3PC4PC5PC6PC7PC8PC9PC10PC11
Standard deviation1.89511.56391.43440.992620.876690.761110.550170.438250.221530.163290.03769
Proportion of Variance0.32650.22230.18700.089570.069870.052660.027520.017460.004460.002420.00013
Cumulative Proportion0.32650.54880.73590.825470.895350.948010.975520.992990.997450.999871.00000
Table 8. Optimal Intervals for Key Ecological Factors Affecting Major Medicinal Components Identified through K-Means Three-Branch Clustering.
Table 8. Optimal Intervals for Key Ecological Factors Affecting Major Medicinal Components Identified through K-Means Three-Branch Clustering.
ComponentFactorOptimal Interval and Trend
QGAP(hPA)9.20 (narrow range of 9.1–9.3)
QG40cmMAT(°C)28.0–28.5
QG40cmARH(%)65–75
QGpHExtremely narrow range around 9.70
QGDAP(mm)10–50, avoiding excessively high or low values
DTAP(hPA) (°C)9.20 (narrow range of 9.1–9.3)
DTARH(%)30–60% is favorable; >70% is unfavorable
DT40cmMAT(°C)28 (narrow range of 27.5–28.5)
DTpHExtremely narrow range around 9.70 (9.68–9.72)
DTN(mg/kg)7–9 is optimal; >10 or <6 is unfavorable
Table 9. Spearman’s Rank Correlation Coefficients between MAT and Medicinal Components in Each Sub-plot (Plot Data Consistency Validation).
Table 9. Spearman’s Rank Correlation Coefficients between MAT and Medicinal Components in Each Sub-plot (Plot Data Consistency Validation).
Sub-PlotLobetyolinPolysaccharidesTotal Flavonoids
A0.630.610.65
B0.650.630.66
C0.620.600.64
D0.640.620.65
Table 10. Analysis of Variance (ANOVA) and Consistency Metrics for Medicinal Component Data across Sub-plots.
Table 10. Analysis of Variance (ANOVA) and Consistency Metrics for Medicinal Component Data across Sub-plots.
IndicatorANOVA p-ValueICC
Lobetyolin0.78 0.98
Polysaccharides0.78 0.98
Total Flavonoids0.780.98
Table 11. Comparison between 2023 Actual and Predicted Data (Medicinal Component Content).
Table 11. Comparison between 2023 Actual and Predicted Data (Medicinal Component Content).
Indicator2023 Actual Value (mg/g)Standard Error (mg/g)2023 Predicted Value (mg/g)Standard
Error (mg/g)
MSE (mg2/g2)MAE (mg/g)MAPE (%)
Lobetyolin0.800.050.820.040.00040.022.5
Polysaccharides32.51.2032.8 1.10 0.09 0.300.92
Total Flavonoids9.600.80 9.650.70.040.202.1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, H.; Song, Y.; Shi, X.; Ma, B.; Yao, Y.; Li, H.; Jia, L.; Liu, Z. Study on the Correlation Between Major Medicinal Constituents of Codonopsis pilosula During Its Growth Cycle and Ecological Factors, and Determination of Optimal Ecological Factor Ranges. Agronomy 2025, 15, 1057. https://doi.org/10.3390/agronomy15051057

AMA Style

Li H, Song Y, Shi X, Ma B, Yao Y, Li H, Jia L, Liu Z. Study on the Correlation Between Major Medicinal Constituents of Codonopsis pilosula During Its Growth Cycle and Ecological Factors, and Determination of Optimal Ecological Factor Ranges. Agronomy. 2025; 15(5):1057. https://doi.org/10.3390/agronomy15051057

Chicago/Turabian Style

Li, Haoming, Yanbo Song, Xiaojing Shi, Boyang Ma, Yafei Yao, Haopu Li, Liyan Jia, and Zhenyu Liu. 2025. "Study on the Correlation Between Major Medicinal Constituents of Codonopsis pilosula During Its Growth Cycle and Ecological Factors, and Determination of Optimal Ecological Factor Ranges" Agronomy 15, no. 5: 1057. https://doi.org/10.3390/agronomy15051057

APA Style

Li, H., Song, Y., Shi, X., Ma, B., Yao, Y., Li, H., Jia, L., & Liu, Z. (2025). Study on the Correlation Between Major Medicinal Constituents of Codonopsis pilosula During Its Growth Cycle and Ecological Factors, and Determination of Optimal Ecological Factor Ranges. Agronomy, 15(5), 1057. https://doi.org/10.3390/agronomy15051057

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop