Application of Digital Image Analysis to the Prediction of Chlorophyll Content in Astragalus Seeds

Chlorophyll fluorescence (CF) has been applied to measure the chlorophyll content of seeds, in order to determine seed maturity, but the high price of equipment limits its wider application. Astragalus seeds were used to explore the applicability of digital image analysis technology to the prediction of seed chlorophyll content and to supply a low cost and alternative method. Our research comprised scanning and extracting the characteristic features of Astragalus seeds, determining the chlorophyll content, and establishing a predictive model of chlorophyll content in Astragalus seeds based on characteristic features. The results showed that the R2 of the MLR prediction model established with multiple features was ≥0.947, and the R2 of the MLP model was ≥0.943. By sorting of two single features, the R and G values, the R2 reached 0.969 and 0.965, respectively. A germination result showed that the lower the chlorophyll content, the higher the quality of the seeds. Therefore, we draw a conclusion that digital image analysis technology can be used to predict effectively the chlorophyll content of Astragalus seeds, and provide a reference for the selection of mature and viable Astragalus seeds.


Introduction
Astragalus is a large genus of plants in the botanical family Fabaceae, several species of which have been historically used in traditional Chinese medicine. As used here, "Astragalus" mainly refers to the species Astragalus mongholicus and A. membranaceus. Astragalus is one of the most popular medicinal herbs in China, and it has a long history of use as a medicinal material. Astragalus is one of 40 commonly used bulk medicinal materials, and the demand for it is increasing daily [1,2]. However, the quality of traditional Chinese medicinal plant seeds is far behind that of crop seeds, and there are many problems, such as mixed provenance, low purity, inconsistent maturity, low germination rate, and slow and irregular emergence. Seeds have become the weakest link in the production of traditional Chinese medicinal plants and are the "bottleneck" that restricts the standardized production and development of traditional Chinese medicine.
Seed maturity is closely related to seed vigor [3,4]. The comprehensive features such as bud rate, bud potential, soil arching ability, seedling rate, and seedling uniformity of seeds with poor maturity decrease, which seriously affects the quality of seeds and the entire production system. Therefore, it is necessary to improve seed maturity. Chlorophyll content in the seeds of many species can be used as an indicator of seed maturity and is a major contributor to seed quality [5][6][7][8].
Using chlorophyll fluorescence (CF) to determine seed maturity is a relatively new technology. CF seed detectors are used to detect the chlorophyll fluorescence (blue) gen-erated by pulsed light. Fluorescence emissions at wavelengths of 600 nm and above from all fluorescent substances will be detected. The pulsed light emitted by the detector can separate the fluorescence from the surrounding light. Therefore, the detector can calculate the rough chlorophyll concentration by measuring chlorophyll fluorescence. Kenanoglu et al. [9] classified the different stages of maturity in four varieties of pepper seeds based on CF. After classification, the germination percentage and vigor of the seeds with low CF were shown to be the highest. Jalink et al. [10,11] used the magnitude of the CF to judge and grade seed maturity in cabbage. These authors found that the smaller the signal value, the higher the seed germination rate. Although CF is widely used and is highly portable, it still has the problem of high cost. Digital image analysis technology is a low-cost and non-destructive technology. Images can be obtained by classical reflection technologies, such as optical and micrography methods, or by more modern technologies, such as thermal imaging, fluorescence and magnetic resonance [12][13][14][15][16]. This technology can obtain seed image information, extract the shape, color, and texture information of seeds, and is widely used in seed color feature determination [17,18], the improvement of seed vigor and quality monitoring [19][20][21][22], crop seed quality classification [23][24][25], variety identification [26,27], purity determination [28], quantifying intraspecific genetic diversity [29] and evaluating crop and seed maturity [30][31][32]. Digital image analysis technology is a relatively mature technology, and its application to the prediction of chlorophyll content in Astragalus seeds is highly feasible with excellent development prospects.
Therefore, it is important to develop a fast, non-destructive, low-cost and efficient method to measure the chlorophyll content of Astragalus seeds. In this study, we determined the chlorophyll content of Astragalus seeds and the characteristic features of the seeds were extracted. The models were established by using the characteristic features as the input factor of the model, the chlorophyll content as the output parameter, and the multiple linear regression (MLR) and multi-layer perceptron (MLP) methods to establish the model. A single-feature prediction model using the R and G values was attempted to provide a new idea for the prediction of chlorophyll content in Astragalus seeds.

Materials
Seeds of A. mongholicus were collected from the Anguo market in Hebei Province in 2020.

Seed Scanning and Extraction of Phenotypic Features
The Astragalus seeds were divided into two groups based on color: black-brown and yellow. The seeds of the two groups were then mixed in a certain proportion, and there were a total of 50 groups with 5 g of seeds in each group. A Scanmaker i360/i460 scanner was used for group scanning at a resolution of 300 dpi. The images were saved in TIFF lossless format. The Phenoseed automatic extraction system, which was jointly developed by the Seed Science and Technology Research Center of China Agricultural University (Beijing, China) and Nanjing Zhinong Yunxin Big Data Technology Co., Ltd. (Nanjing, China), was used to extract seed phenotypic features. The color features included R (red in the primary color), G (green in the three primary color light), B (blue in the primary color light), L (luminosity), a (range from red to green), b (range from blue to yellow), hue, saturation, value, gray, and standard deviation, for a total of 20 color features. The average seed color feature of each group was taken.
In order to explore the relationship between R, G values and chlorophyll content, 5000 seeds of Astragalus were selected for single seed scanning and seed phenotype feature extraction. The frequency histogram of R and G values of 5000 Astragalus seeds was shown in Figure 1. We divided 2500 seeds into five groups based on their R values: R ≤ 60, 60 < R ≤ 70, 70 < R ≤ 80, 80 < R ≤ 90, R > 90. The remaining 2500 seeds were divided into five groups based on their G values: G ≤ 55, 55 < G ≤ 60, 60 < G ≤ 65, 65 < G ≤ 70, G > 70. Each group of seeds weighed 2 g.

Determination of Chlorophyll Content
The chlorophyll content of A. mongholicus seeds in each group was determined by a FluoMini Pro optical chlorophyll fluorescence (CF) detector provided by Beijing Youkede Agricultural Technology Co., Ltd. (Beijing, China), six times for each group, and the average value was taken.

Standard Germination Test
Five groups of seeds were divided into two groups based on the R and G values. The seeds were germinated on paper at 25 °C with 24 h illumination. There were 100 seeds per replicate and three replicates per group. The seeds were observed for seven days, and moldy or rotten seeds were removed. On the 7th day of germination, the germination percentage, mildew percentage, number of abnormal seeds, percentage of dead seeds, percentage of hard seeds, and the percentage of high-quality seeds were recorded. Many studies have shown that the quality of hard seeds is better than that of non-hard seeds [33][34][35]. Therefore, in this study we used the following formula: High-quality seed percentage (%) = Germination percentage (%) + hard seed percentage (%)

Data Analysis
All data were processed and analyzed using Microsoft Office Excel 2019 and IBM SPSS statistics 23.
All color features, principal component factors extracted from all color features, all color average features, principal component factors extracted from all color average features, R_mean, G_mean, B_mean, and Gray_mean were independent variables, chlorophyll content was the dependent variable, and the MLR and MLP models were established.
A correlation analysis between each of the color features and chlorophyll content was performed. Two features, R_mean and G_mean, were used to predict the chlorophyll content of Astragalus seeds.

Determination of Chlorophyll Content
The chlorophyll content of A. mongholicus seeds in each group was determined by a FluoMini Pro optical chlorophyll fluorescence (CF) detector provided by Beijing Youkede Agricultural Technology Co., Ltd. (Beijing, China), six times for each group, and the average value was taken.

Standard Germination Test
Five groups of seeds were divided into two groups based on the R and G values. The seeds were germinated on paper at 25 • C with 24 h illumination. There were 100 seeds per replicate and three replicates per group. The seeds were observed for seven days, and moldy or rotten seeds were removed. On the 7th day of germination, the germination percentage, mildew percentage, number of abnormal seeds, percentage of dead seeds, percentage of hard seeds, and the percentage of high-quality seeds were recorded. Many studies have shown that the quality of hard seeds is better than that of non-hard seeds [33][34][35]. Therefore, in this study we used the following formula: High-quality seed percentage (%) = Germination percentage (%) + hard seed percentage (%)

Data Analysis
All data were processed and analyzed using Microsoft Office Excel 2019 and IBM SPSS statistics 23.
All color features, principal component factors extracted from all color features, all color average features, principal component factors extracted from all color average features, R_mean, G_mean, B_mean, and Gray_mean were independent variables, chlorophyll content was the dependent variable, and the MLR and MLP models were established.
A correlation analysis between each of the color features and chlorophyll content was performed. Two features, R_mean and G_mean, were used to predict the chlorophyll content of Astragalus seeds.

Principal Component Analysis (PCA)
PCA is a linear dimension reduction method, which simplifies multiple indicators into a small number of comprehensive indicators and uses a small number of variables to reflect the information of the original variables as much as possible to ensure that the loss of original information is small and the number of variables is as small as possible [36].
In this study, we selected the maximum variance method of the rotation method and the regression method for the factor scores. PCA was performed on all color features and all color average features to simplify the model.

Multilayer Perceptron Network (MLP)
MLP is one of the most common neural network topologies. The structure of the MLP is basically similar to a set of cascade perceptrons, which maps a group of input vectors to a group of output vectors. The input and output can be connected by multi-layer weighting, and has strong self-learning, adaptive, associative memory, parallel processing, and other capabilities for things and environment [37,38]. Figure 2 shows the MLP topology used in this study. A MLP network with two hidden layers was selected for this study. The hidden layer used the hyperbolic tangent activation function of SPSS. The output layer used the softmax activation function of SPSS. Training set: the test set was 7:3; that is, there were 35 samples in the training set and 15 samples in the test set.
PCA is a linear dimension reduction method, which simplifies multiple indicators into a small number of comprehensive indicators and uses a small number of variables to reflect the information of the original variables as much as possible to ensure that the loss of original information is small and the number of variables is as small as possible [36].
In this study, we selected the maximum variance method of the rotation method and the regression method for the factor scores. PCA was performed on all color features and all color average features to simplify the model.

Multilayer Perceptron Network (MLP)
MLP is one of the most common neural network topologies. The structure of the MLP is basically similar to a set of cascade perceptrons, which maps a group of input vectors to a group of output vectors. The input and output can be connected by multi-layer weighting, and has strong self-learning, adaptive, associative memory, parallel processing, and other capabilities for things and environment [37,38]. Figure 2 shows the MLP topology used in this study. A MLP network with two hidden layers was selected for this study. The hidden layer used the hyperbolic tangent activation function of SPSS. The output layer used the softmax activation function of SPSS. Training set: the test set was 7:3; that is, there were 35 samples in the training set and 15 samples in the test set.

Multiple Linear Regression (MLR)
In regression analysis, multiple regression involves two or more independent variables. In fact, a single phenomenon is often associated with multiple factors. It is more effective and practical to predict or estimate the dependent variable by the optimal combination of multiple independent variables than by using only one independent variable. In this study, the step-by-step method in SPSS was used to establish the MLR chlorophyll content prediction model. The setting of the model sample set was the same as in Section 2.5.2.
The specific experimental process using in this study is shown in Figure 3.

Multiple Linear Regression (MLR)
In regression analysis, multiple regression involves two or more independent variables. In fact, a single phenomenon is often associated with multiple factors. It is more effective and practical to predict or estimate the dependent variable by the optimal combination of multiple independent variables than by using only one independent variable. In this study, the step-by-step method in SPSS was used to establish the MLR chlorophyll content prediction model. The setting of the model sample set was the same as in Section 2.5.2.
The specific experimental process using in this study is shown in Figure 3. Appl. Sci. 2021, 11, x FOR PEER REVIEW 5 of 15

Principal Component Analysis (PCA)
All color features and all color average features were analyzed by principal component analysis, as shown in Table 1. Three principal component factors were extracted from all color features, and the cumulative variance contribution rate was 95.898%, indicating that the extracted factors can reflect ≥95% of the original information for all variables. Two principal component factors were finally extracted from all the color average features, and the cumulative variance contribution rate was 96.442%, which indicated that the extracted factors could reflect ≥96% of the original information for all variables.

Principal Component Analysis (PCA)
All color features and all color average features were analyzed by principal component analysis, as shown in Table 1. Three principal component factors were extracted from all color features, and the cumulative variance contribution rate was 95.898%, indicating that the extracted factors can reflect ≥95% of the original information for all variables. Two principal component factors were finally extracted from all the color average features, and the cumulative variance contribution rate was 96.442%, which indicated that the extracted factors could reflect ≥96% of the original information for all variables.

Establishment and Verification of the MLR Model
We used all color features (a), principal component factors extracted from all color features (b), all color average features (c), principal component factors extracted from all color average features (d), R_mean, G_mean, B_mean, and Gray_mean (e) as input factors, and chlorophyll content as the output parameter. Five MLR models of chlorophyll content in Astragalus seeds were established by stepwise regression.
The residual analysis results are shown in Figure 4. It can be seen from the histogram of standardized residuals that the standardized residuals of the five models basically obey the normal distribution with a mean value of 0 and a standard deviation of 1. At the same time, it can be seen from the normal probability diagram (P-P diagram) that the scatter basically walks around the diagonal of the first quadrant, so it is judged that the residual basically obeys the normal distribution. It can be seen from the scatter diagram of standardized residuals and standardized predicted values that the scatter fluctuation range of standardized residuals basically remained stable and did not change with the change of standardized predicted values, so it met the homogeneity of variance. We used all color features (a), principal component factors extracted from all color features (b), all color average features (c), principal component factors extracted from all color average features (d), R_mean, G_mean, B_mean, and Gray_mean (e) as input factors, and chlorophyll content as the output parameter. Five MLR models of chlorophyll content in Astragalus seeds were established by stepwise regression.
The residual analysis results are shown in Figure 4. It can be seen from the histogram of standardized residuals that the standardized residuals of the five models basically obey the normal distribution with a mean value of 0 and a standard deviation of 1. At the same time, it can be seen from the normal probability diagram (P-P diagram) that the scatter basically walks around the diagonal of the first quadrant, so it is judged that the residual basically obeys the normal distribution. It can be seen from the scatter diagram of standardized residuals and standardized predicted values that the scatter fluctuation range of standardized residuals basically remained stable and did not change with the change of standardized predicted values, so it met the homogeneity of variance. Generally, the adjusted R 2 represents the degree that the linear equation can reflect the real data. The Durbin-Watson statistic is usually used to determine whether there is autocorrelation between data. The variance inflation factor (VIF) value is usually used to judge whether there is collinearity between independent variables. If the VIF is ≤10, it means that there is no collinearity between independent variables, and the model operation is more accurate. Generally, the adjusted R 2 represents the degree that the linear equation can reflect the real data. The Durbin-Watson statistic is usually used to determine whether there is autocorrelation between data. The variance inflation factor (VIF) value is usually used to judge whether there is collinearity between independent variables. If the VIF is ≤10, it means that there is no collinearity between independent variables, and the model operation is more accurate.
The prediction results of the model are shown in Table 2. It can be seen from Table 2 that there were no significant differences among the five models, and the adjusted R 2 was about 0.98, which indicates that the linear equation reflects the real data better. In this study, the Durbin-Watson statistics were around 1.8, close to 2, and the p-values were >0.05, which basically shows that there was no autocorrelation in the data.  Table 3 shows the prediction results of the five MLR models. It can be seen that the corresponding significance levels of the independent variables in each model were ≤0.05, indicating that the above independent variables could have a significant impact on the chlorophyll content of the dependent variable. In addition, the VIF indicates that there was no collinearity between the independent variables, and the model operation was more accurate. Fifteen samples of Astragalus seeds (the test set) were then used to verify the above five models, and the results are shown in Figure 5. It can be seen that the predicted values for chlorophyll content were very near the true values, the R 2 was above 0.947 in all cases, and the predicted results were better. for chlorophyll content were very near the true values, the R 2 was above 0.947 in all cases, and the predicted results were better.   for chlorophyll content were very near the true values, the R 2 was above 0.947 in all cases, and the predicted results were better.

Establishment and Verification of MLP Model
Similarly, using all color features (a), principal component factors extracted from all color features (b), all color average features (c), principal component factors extracted from all color average features (d), and R_mean, G_mean, B_mean, and Gray_mean (e) as input factors and chlorophyll content as the output parameter, five MLP models of chlorophyll content in Astragalus seeds were established. The R 2 value predicted by the model is ≥0.968. The test set of 15 samples of Astragalus seeds was used to verify the above five models, and the results are shown in Figure 6. It can be seen that the predicted values for chlorophyll content were near the true values, and the R 2 values were ≥0.943.

Prediction of Chlorophyll Content Using the R Value and G Value
It can be seen in Section 3.1.2 that the MLR model was established by the stepwise regression method for different characteristic features, and R_mean was included in the model. The Pearson correlation coefficient between R_mean and chlorophyll content was 0.980. Therefore, this study will continue to explore the relationship between the R value and chlorophyll content in order to achieve the purpose of using a single R_mean to predict chlorophyll content. In general, we think that G value may be more related to chlorophyll content, so in order to explore more possibilities, we continued to examine the relationship between the G_mean value and chlorophyll content, and to study the effect of G value on chlorophyll content prediction.

R and G Value Distribution
The frequency histogram of R and G values of Astragalus seed batch showed that the R values of most seeds were between 60-90 and the G values of most seeds were between 55-70, but the R and G values of some seeds were not within this range (Figure 1). This showed that the R and G values of seed batches were not uniform, which may be due to the difference in chlorophyll content between different seeds.
Based on the distribution of R and G values, the seed batch was divided into five groups. The boundaries of the five sub samples were set as R values of 60, 70, 80 and 90 respectively; The G values were 55, 60, 65 and 70, respectively. The color difference between the seeds of the five subsamples was visible to the naked eye (Figure 7). The higher the R and G value, the lighter the seed coat color. The color difference features that chlorophyll may be located outside the seed coat.

Prediction of Chlorophyll Content Using the R Value and G Value
It can be seen in Section 3.1.2 that the MLR model was established by the stepwise regression method for different characteristic features, and R_mean was included in the model. The Pearson correlation coefficient between R_mean and chlorophyll content was 0.980. Therefore, this study will continue to explore the relationship between the R value and chlorophyll content in order to achieve the purpose of using a single R_mean to predict chlorophyll content. In general, we think that G value may be more related to chlorophyll content, so in order to explore more possibilities, we continued to examine the relationship between the G_mean value and chlorophyll content, and to study the effect of G value on chlorophyll content prediction.

R and G Value Distribution
The frequency histogram of R and G values of Astragalus seed batch showed that the R values of most seeds were between 60-90 and the G values of most seeds were between 55-70, but the R and G values of some seeds were not within this range (Figure 1). This showed that the R and G values of seed batches were not uniform, which may be due to the difference in chlorophyll content between different seeds.
Based on the distribution of R and G values, the seed batch was divided into five groups. The boundaries of the five sub samples were set as R values of 60, 70, 80 and 90 respectively; The G values were 55, 60, 65 and 70, respectively. The color difference between the seeds of the five subsamples was visible to the naked eye (Figure 7). The higher the R and G value, the lighter the seed coat color. The color difference features that chlorophyll may be located outside the seed coat.

Prediction of Chlorophyll Content Using the R Value and G Value
The results showed that there was a linear relationship between R value and the chlorophyll content of Astragalus seeds; R 2 was 0.969, so the R value could be used to predict chlorophyll content (Figure 8a). The results showed that there was a linear relationship between G value and the chlorophyll content of Astragalus seeds; the R 2 was 0.965, and the prediction result was still ideal (Figure 8b).

Prediction of Chlorophyll Content Using the R Value and G Value
The results showed that there was a linear relationship between R value and the chlorophyll content of Astragalus seeds; R 2 was 0.969, so the R value could be used to predict chlorophyll content (Figure 8a). The results showed that there was a linear relationship between G value and the chlorophyll content of Astragalus seeds; the R 2 was 0.965, and the prediction result was still ideal (Figure 8b). Appl. Sci. 2021, 11, x FOR PEER REVIEW 11 of 15 Figure 8. The linear relationships between the R value (a) and G value (b) and chlorophyll content.

Relationship between Seed Vigor and Chlorophyll Content of Astragalus Seeds
The chlorophyll contents of different Astragalus seeds could be predicted based on the R and G values. In order to further demonstrate the relationship between the vitality of Astragalus seeds and the chlorophyll content, we conducted germination tests of Astragalus seeds with known chlorophyll content and grouped them based on the R values and G values. The results are given in Table 4. The proportion of high-quality seeds of seeds with R > 80 was significantly lower than that of seeds with R ≤ 80. The germination percentage, hard-seed percentage, high-quality seed percentage, abnormal seedling percentage and mildew percentage of seeds with R > 90 were significantly lower than those with R ≤ 90, in which the germination percentage was only about 8.7% and the high-quality seed percentage was only about 37.7%. Similarly, G values show similar laws. The germination percentage and the percentage of the high-quality seed with G ≤ 55 were significantly higher than those with G > 55, and the percentage of the high-quality seed reached about 98.3%. Among the seeds with G > 55, the seeds with G > 70 were significantly lower than other seeds in all aspects, and the percentage of the high-quality seed was about 81.0%.  Figure 9 is a box plot between R values (a) and G values (b) of different groups and their corresponding high-quality seed percentage. As can be seen from Figure 9, different high-quality seed percentage correspond to different R value and G value ranges.

Relationship between Seed Vigor and Chlorophyll Content of Astragalus Seeds
The chlorophyll contents of different Astragalus seeds could be predicted based on the R and G values. In order to further demonstrate the relationship between the vitality of Astragalus seeds and the chlorophyll content, we conducted germination tests of Astragalus seeds with known chlorophyll content and grouped them based on the R values and G values. The results are given in Table 4. The proportion of high-quality seeds of seeds with R > 80 was significantly lower than that of seeds with R ≤ 80. The germination percentage, hard-seed percentage, high-quality seed percentage, abnormal seedling percentage and mildew percentage of seeds with R > 90 were significantly lower than those with R ≤ 90, in which the germination percentage was only about 8.7% and the high-quality seed percentage was only about 37.7%. Similarly, G values show similar laws. The germination percentage and the percentage of the high-quality seed with G ≤ 55 were significantly higher than those with G > 55, and the percentage of the high-quality seed reached about 98.3%. Among the seeds with G > 55, the seeds with G > 70 were significantly lower than other seeds in all aspects, and the percentage of the high-quality seed was about 81.0%.  Figure 9 is a box plot between R values (a) and G values (b) of different groups and their corresponding high-quality seed percentage. As can be seen from Figure 9, different high-quality seed percentage correspond to different R value and G value ranges. 3% high-quality seed percentage; (a) C: A seed population with 91.7 ± 2.5% high-quality seed percentage; (a) D: A seed population with 73.7 ± 0.6% high-quality seed percentage; (a) E: A seed population with 37.7 ± 15.0% high-quality seed percentage; (b) A: A seed population with 98.3 ± 2.1% high-quality seed percentage; (b) B: A seed population with 94.7 ± 0.6% high-quality seed percentage; (b) C: A seed population with 94.7 ± 1.2% high-quality seed percentage; (b) D: A seed population with 92.7 ± 1.2% high-quality seed percentage; (b) E: A seed population with 81.0 ± 2.6% high-quality seed percentage.
The lower the R and G values, the higher the percentage of high-quality seeds; that is, the lower the chlorophyll content, the higher the percentage of high-quality seeds (Figure 10). At the same time, this study also provided a reference for the selection of mature, viable Astragalus seeds.

Discussion
Seed maturity plays an important role in the production, processing, and marketing of Astragalus seeds. Chlorophyll content is not only related to photosynthesis, but also has an effect on crop yield. In seeds, the higher chlorophyll content, the lower the maturity. Stated another way, seed maturity is relatively high when the chlorophyll content is low [39].
The results of our study show that digital image analysis technology can be applied to the rapid and non-destructive prediction of chlorophyll content in Astragalus seeds. In this study, MLR and MLP models were established for the rapid prediction of chlorophyll content and we found that both the models were very effective in predicting the amount of chlorophyll present in Astragalus seeds. The R 2 of the MLR model test set was ≥0.947, and the R 2 of the MLP model test set was ≥0.943. There were no significant differences between the five types of model and the two modeling methods. Therefore, in practical application, only the R_mean, G_mean, B_mean, and Gray_mean are used to establish the model, which makes the sample imaging more convenient and faster.
We then used another batch of mature seeds of Astragalus to verify the established model. The predicted R 2 of the training set was ~0.7, and the effect was not ideal. The high maturity batches of Astragalus seeds were used to establish the model, and the training set R 2 was ~0.7. Two batches of Astragalus seeds were mixed and modeled, and the two batches were verified independently. The results showed that the predicted R 2 of 3% high-quality seed percentage; (a) C: A seed population with 91.7 ± 2.5% high-quality seed percentage; (a) D: A seed population with 73.7 ± 0.6% high-quality seed percentage; (a) E: A seed population with 37.7 ± 15.0% high-quality seed percentage; (b) A: A seed population with 98.3 ± 2.1% high-quality seed percentage; (b) B: A seed population with 94.7 ± 0.6% high-quality seed percentage; (b) C: A seed population with 94.7 ± 1.2% high-quality seed percentage; (b) D: A seed population with 92.7 ± 1.2% high-quality seed percentage; (b) E: A seed population with 81.0 ± 2.6% high-quality seed percentage.
The lower the R and G values, the higher the percentage of high-quality seeds; that is, the lower the chlorophyll content, the higher the percentage of high-quality seeds (Figure 10). At the same time, this study also provided a reference for the selection of mature, viable Astragalus seeds. The lower the R and G values, the higher the percentage of high-quality seeds; that is, the lower the chlorophyll content, the higher the percentage of high-quality seeds (Figure 10). At the same time, this study also provided a reference for the selection of mature, viable Astragalus seeds.

Discussion
Seed maturity plays an important role in the production, processing, and marketing of Astragalus seeds. Chlorophyll content is not only related to photosynthesis, but also has an effect on crop yield. In seeds, the higher chlorophyll content, the lower the maturity. Stated another way, seed maturity is relatively high when the chlorophyll content is low [39].
The results of our study show that digital image analysis technology can be applied to the rapid and non-destructive prediction of chlorophyll content in Astragalus seeds. In this study, MLR and MLP models were established for the rapid prediction of chlorophyll content and we found that both the models were very effective in predicting the amount of chlorophyll present in Astragalus seeds. The R 2 of the MLR model test set was ≥0.947, and the R 2 of the MLP model test set was ≥0.943. There were no significant differences between the five types of model and the two modeling methods. Therefore, in practical application, only the R_mean, G_mean, B_mean, and Gray_mean are used to establish the model, which makes the sample imaging more convenient and faster.
We then used another batch of mature seeds of Astragalus to verify the established model. The predicted R 2 of the training set was ~0.7, and the effect was not ideal. The high maturity batches of Astragalus seeds were used to establish the model, and the training set R 2 was ~0.7. Two batches of Astragalus seeds were mixed and modeled, and the two batches were verified independently. The results showed that the predicted R 2 of

Discussion
Seed maturity plays an important role in the production, processing, and marketing of Astragalus seeds. Chlorophyll content is not only related to photosynthesis, but also has an effect on crop yield. In seeds, the higher chlorophyll content, the lower the maturity. Stated another way, seed maturity is relatively high when the chlorophyll content is low [39].
The results of our study show that digital image analysis technology can be applied to the rapid and non-destructive prediction of chlorophyll content in Astragalus seeds. In this study, MLR and MLP models were established for the rapid prediction of chlorophyll content and we found that both the models were very effective in predicting the amount of chlorophyll present in Astragalus seeds. The R 2 of the MLR model test set was ≥0.947, and the R 2 of the MLP model test set was ≥0.943. There were no significant differences between the five types of model and the two modeling methods. Therefore, in practical application, only the R_mean, G_mean, B_mean, and Gray_mean are used to establish the model, which makes the sample imaging more convenient and faster.
We then used another batch of mature seeds of Astragalus to verify the established model. The predicted R 2 of the training set was~0.7, and the effect was not ideal. The high maturity batches of Astragalus seeds were used to establish the model, and the training set R 2 was~0.7. Two batches of Astragalus seeds were mixed and modeled, and the two batches were verified independently. The results showed that the predicted R 2 of immature Astragalus seeds was~0.9, while the predicted R 2 of highly mature Astragalus seeds was only~0. 6. This indicates that the model established in this study is suitable for the prediction of chlorophyll content in immature Astragalus seeds, but the prediction results for highly mature Astragalus seeds are not ideal, which merits further discussion.
We also found that the MLR model was established by the stepwise regression method through different characteristic features, and R_mean was included in the model. The results showed that there is a linear relationship between R value and chlorophyll content; the R 2 was 0.969, so the R value could be used to predict chlorophyll content. At the same time, our results also explored the relationship between the G value and chlorophyll content. These results showed that there is also a linear relationship between G value and chlorophyll content in Astragalus seeds. The R 2 was 0.965, and the prediction result was worse than the R value. The reason for this may be that the chlorophyll content of the seeds is mainly reflected in the differences in the G values, but the similarity of G values in the seed groups was very high, which leads to a concentration of the G value distribution and a dense sorting interval. Based on this, the R value was found to be ideal. Therefore, in practical application, using the R value to predict chlorophyll content will give better results. At the same time, another batch of mature Astragalus seeds was scanned to extract the R value of each seed. The results showed that the R values of mature Astragalus seeds were more concentrated and the R values were lower, which also confirmed the previous results.
In order to further verify the relationship between seed vigor and chlorophyll content in Astragalus, we performed germination tests of Astragalus seeds with known chlorophyll contents that were grouped based on the R and G values. The results showed that the lower the R and G values, the lower the chlorophyll contents, and the higher the percentage of high-quality seed. Therefore, our results also provide a reference for the selection of mature Astragalus seeds.
In this study, population multi-feature modeling and two single features, the R and G values, were used to predict chlorophyll content in Astragalus seeds. The multi-feature models were found to be suitable for the prediction of population seed chlorophyll content, market census, government management, and other scenarios. The single feature models were suitable for predicting the chlorophyll content of single Astragalus seeds. Using the single-seed R values, we can predict the chlorophyll content of individual seeds, which is of great significance toward improving the quality of seeds in seed batches.
To summarize, digital image analysis technology can be used to determine the relevant features and specific parameters of Astragalus seeds. MLR, MLP, and other modeling algorithms combined with the R value and G value single-feature predictions can promote the automation and accuracy of Astragalus seed chlorophyll content prediction, which is of great significance to increase the economic benefits of Astragalus seeds.

Patents
The research group has applied for a patent entitled "The construction method and application of a chlorophyll content identification model of Astragalus seeds".