Modeling the Essential Oil and Trans -Anethole Yield of Fennel ( Foeniculum vulgare Mill. var. vulgare ) by Application Artiﬁcial Neural Network and Multiple Linear Regression Methods

: Foeniculum vulgare Mill. (commonly known as fennel) is used in the pharmaceutical, cosmetic, and food industries. Fennel widely used as a digestive, carminative, galactagogue and diuretic and in treating gastrointestinal and respiratory disorders. Improving low heritability traits such as essential oil yield (EOY%) and trans -anethole yield (TAY%) of fennel by direct selection does not result in rapid gains of EOY% and TAY%. Identiﬁcation of high-heritable traits and using efﬁcient modeling methods can be a beneﬁcial approach to overcome this limitation and help breeders select the most advantageous traits in medicinal plant breeding programs. The present study aims to compare the performance of the artiﬁcial neural network (ANN) and multilinear regression (MLR) to predict the EOY% and TAY% of fennel populations. Stepwise regression (SWR) was used to assess the effect of various input variables. Based on SWR, nine traits—number of days to 50% ﬂowering (NDF50%), number of days to maturity (NDM), ﬁnal plant height (FPH), number of internodes (NI), number of umbels (NU), seed yield per square meter (SY/m 2 ), number of seeds per plant (NS/P), number of seeds per umbel (NS/U) and 1000-seed weight (TSW)—were chosen as input variables. The network with Sigmoid Axon transfer function and two hidden layers was selected as the ﬁnal ANN model for the prediction of EOY%, and the TanhAxon function with one hidden layer was used for the prediction of TAY%. The results revealed that the ANN method could predict the EOY% and TAY% with more accuracy and efﬁciency (R 2 of EOY% = 0.929, R 2 of TAY% = 0.777, RMSE of EOY% = 0.544, RMSE of TAY% = 0.264, MAE of EOY% = 0.385 and MAE of TAY% = 0.352) compared with the MLR model (R 2 of EOY% = 0.553, R 2 of TAY% = 0.467, RMSE of EOY% = 0.819, RMSE of TAY% = 0.448, MAE of EOY% = 0.624 and MAE of TAY% = 0.452). Based on the sensitivity analysis, SY/m 2 , NDF50% and NS/P were the most important traits to predict EOY% as well as SY/m 2 , NS/U and NDM to predict of TAY%. The results demonstrate the potential of ANNs as a promising tool to predict the EOY% and TAY% of fennel, and they can be used in future fennel breeding programs.


Introduction
Fennel (Foeniculum vulgare Mill. var. vulgare), which belongs to the Apiaceae family, is an open-pollinated plant, originating from the Mediterranean regions where it is possible to observe high genetic diversity. Fennel essential oil is widely used in pharmaceutical, food, and cosmetic industries. Trans-anethole is the main compound of the fennel essential oil, and the highest value of this compound is existing in seeds of fennel [1]. Trans-anethole is known as a flavoring agent in the food industry and in the production of perfume, as well as an anti-bloating compound in traditional medicine. It is an effective substance in the taste and smell of fennel [2][3][4].
Medicinal plants have a special place in traditional medicine folk and are used as treatment for many diseases [5,6]. Essential oils are naturally occurring in medicinal and aromatic plants which are rich in valuable biochemical compounds with high bioactivities such as antibacterial, antioxidant and phytotoxic activity [7][8][9][10].
Classical breeding methods, including the selection and release of elite cultivars, as well as the recognition of the agronomic traits as the suitable criteria to use in breeding programs, are being remembered as fast, easy and reliable methods to introduce superior fennel cultivars. Screening the best indicators to use in essential oil yield (EOY%) improvement plans and finding the correlation between EOY% and its components is the first priority [11][12][13]. Piccaglia and Marotti [14] reported that biomass weight and the number of umbels per plant are positively related to the essential oil content (R = 0.651 and 0.569, respectively) [14]. Cosge et al. [15] have elucidated that EOY% positively correlated with the 1000-grain weight trait [15]. The phenotypic expression process of quantitative and polygenic traits such as EOY% is non-linear, intricate, complex and time-variant. This intricacy is due to the high diversity within and between populations, also due to environmental impacts [16][17][18]. Therefore, because of the low heritability of EOY%, direct selection to improve this characteristic may lead to low genetic gain [19], whereas the indirect selection of EOY% through high-heritable and related characters with EOY% directly or indirectly affects the EOY% via positive or negative effects of other traits [20,21]. So far, some modeling studies have been used for EOY% prediction and researchers have studied the EOY% using physiological, phenological, morphological and phytochemical properties of plant by application of parametric analysis such as path analysis (PA), stepwise regression (SWR) and other techniques [11,14,15,22]. Previous studies have used linear procedures such as correlation analysis and multiple linear regressions (MLR), in which a linear correlation among variables is presumed. Nevertheless, linear methods were inadequate and could not really explain the interactions between variables and EOY% [23][24][25]. These complex relationships require non-linear methods such as adaptive neuro-fuzzy inference system (ANFIS), Bayesian classification (BC), artificial neural networks (ANNs) and genetic expression programming (GEP) to overcome the drawbacks of linear methods and find an accurate relationship among the studied traits [26][27][28][29][30]. Nonlinear nonparametric machine learning algorithms, such as ANN, have great potential in yield component analysis and indirect selection of highly complex quantitative traits of plants, which are strongly affected by several genes, the environment and their interaction (G × E) [31].
The ANN topology is used to solve complex systems which it tries to imitate into numerical models [23]. ANN models are classified according to their structure, neurons type, etc. Furthermore, according to the training convergence in an ANN model, different algorithms can be used [34]. Multi-layer perceptron (MLP) is one of the most commonly used ANNs in biological studies [25,[35][36][37][38][39]. An MLP is a feed-forward ANN model that contains an input layer, one or more hidden layers and an output layer. In each MLP, multiple layers of nodes in a directed graph are fully connected to the next one and each node (except for the input nodes) is a neuron with a nonlinear activation function [40].
According to the literature review, there is no report on forecasting fennel EOY% and trans-anethole yield (TAY%) using ANN methods, and predicting these traits based on the characteristics and parameters affecting them using an effective ANN method seems necessary to facilitate the process of breeding such complex traits. Therefore, the aims of the current project were to (1) model and predict the most important ingredient in the essential oil of fennel using an artificial neural network, (2) compare the predicted results with the results of conventional regression-based method and (3) determine the most important selection criteria for EOY% and TAY% in different fennel populations using sensitivity analysis. The developed model could be helpful to improve the TAY% and EOY% of different fennel subspecies/varieties.

Plant Material Source and Recorded Traits
In the present study, the seeds of 16 populations of fennel (Foeniculum vulgare Mill. var. vulgare) from Iran, 2 populations from Turkey and 2 populations from Germany (Table 1) were collected and cultured in the experimental field section at the Faculty of Agriculture, University of Tabriz, Iran (46 • 17 N, 37 • 5 E), with an altitude of 710 m during 2017-2018. Plant authentication was performed by using the voucher specimens (Table 1) available at the herbarium of the University of Tabriz, Tabriz, Iran. The soil texture was loamy clay with a pH value of 7.5 and less than 1% organic matter. The field was not under cultivation for any plant during the past year. The seeds of each population were sown in a plot (4 m × 0.5 m) as a randomized complete block design (RCBD) with three replications. The fertilizers used in this study were 70 kg N, 40 kg P and 25 kg K per hectare. Essential oil yield (EOY%), trans-anethole yield (TAY%), and different agro-morphological traits including number of days to germination (NDG), number of days to 50% flowering (NDF50%), number of days to 100% flowering (NDF100%), number of days to maturity (NDM), initial plant height (IPH) (plant height at the time of the first inflorescence emergence), final plant height (FPH) (plant height at harvest time), number of stems (NS), stem diameter (SD), number of internodes (NI), length of the first internode (LFI), length of the longest internode (LLOI), length of the last internode (LLAI), length of the peduncle (LP), number of umbels (NU), biomass per square meter (B/m 2 ), 1000-seed weight (TSW), seed yield per plant (SY/P), seed yield per square meter (SY/m 2 ), number of seeds per plant (NS/P), number of seeds per umbel (NS/U) and harvest index (HI) were randomly recorded from 15 plants per plot. An analysis of variance (ANOVA) was conducted to assess the significant statistical differences among evaluated fennel populations for the studied characteristics. A normality test was conducted with SAS software before the analysis of variance. The means of significant differences of traits (average of two years) were used for the statistical analysis (Table 2).

Isolation of Essential Oils and GC/MS Analysis
For the isolation of essential oils, 100 g of mature seeds of each population were subjected to hydro-distillation using a Clevenger-type apparatus for 3 h and the collected essential oil was dried over anhydrous sodium sulfate and kept at 4 • C until analysis. Some physical characteristics of mature seed including moisture content and average length, thickness and density of selected seeds were equal to 8%, 5.5 mm, 1.6 mm and 410 Kg/m 3 , respectively. Chemical compositions of essential oils were analyzed by an Agilent 7890A Network GC system pooled with Agilent 5975C Network with Triple-Axis mass detector. The GC analysis was carried out on the Agilent 7890A Network GC system equipped with a splitless model injector (with 1.0 µm volume and 250 • C temperature). The carrier gas was helium with a flow rate of 1.1 mL/min and the capillary column used was HP 5-MS (30 m × 0.25 mm, film thickness 0.25 µm). The column pressure was fixed to 56,054.38 Pa. The oven temperature was initially kept at 50 • C for 2 min after injection and then increased to 250 • C with a rate of 6 • C/min heating ramp and kept constant at 250 • C for 4 min. The ionization voltage and mass range were 70 eV and 34-500 m/z, respectively. The temperatures 280 • C and 250 • C were used as anion source and interface temperatures, respectively. Constituents of the essential oils were recognized based on their retention time and mass spectra pattern with related available data or with the Wiley library and literature. Percentages of each compound were calculated from the given GC peak area and these data were used for quantification purposes.

. Input Variables Selection
To ensure a realistic model, only a portion of independent variables was carefully chosen. At the beginning of this process, the relationship between the dependent and independent variables was found using Pearson's correlation. The important input variables were selected based on the stepwise regression (SWR) analysis results. Pearson correlation and SWR analyses were conducted using SAS ® software.

Multiple Linear Regression
Multiple linear regression (MLR) (stepwise method) was used and for each dependent variable (EOY% and TAY%), the desired models were fixed. All values for the independent variables X ( Table 2) are related to the dependent value of the variable Y. The general equation is as follows (Equation (1)): where y i is EOY% or TAY%, β 0 + β n are coefficients of regression, x 1 − x n are input variables and ε is an error associated with the ith observation. Stepwise regression was applied to estimate the MLR coefficients. The MLR analysis was carried out using SAS ® software.

Artificial Neural Network
The computation of ANN was conducted using Neuro-Solutions software version 5.07 software package from NeuroDimension Inc. (http://www.Neurosolutions.com, accessed on 22 November 2021). The variables including EOY% and TAY% were used as the dependent and the other traits were defined as independent variables. Two different ANN models were established for each EOY% and TAY%. The training and testing of ANN and MLR were carried out based on the recorded traits from 15 samples of each of the 20 fennel populations over two years. Therefore, the field experiment dataset was based on the 15 samples of 20 fennel populations. All the data were randomly divided into three subsets: 65% for training, 20% for network test and 15% for validation. This classification was based on (i) the results of previous studies with the same number of data, and (ii) trial and error and comparing the modeling results with different ratios. Descriptive statistical analysis of the measured traits in two years is shown in Table 2.
For the efficient ANN analysis and to avoid bias estimation due to differences in units of input variables, all data were normalized and transferred into values between −1 and +1 for hyperbolic tangent and 0 and 1 for sigmoid transfer functions [25] using Equation (2).
where x i is the original data, x norm is the normalized input or output values and x max and x min are the maximum and minimum values of the resultant variable, respectively. The three main input, hidden and output layers are essential for building the topology of a neural network system. In this study, the output of the network is assumed by Equation (3).
where y t is the network output (essential oil), n and m are the number of hidden nodes and number of input nodes, respectively, and f shows the transfer function. β ij {i = 1, 2, . . . , m; j = 0, 1, . . . , n} are the weights from the input to hidden nodes, α j {j = 0, 1, . . . , n} are the vectors of weights from the hidden to the output nodes and α 0 and β 0 j denote the weights of arcs leading from the bias terms, which always are equal to 1. The feed-forward multilayer perceptron (MLP) architecture with three layers and a Back-Propagation (BP) training algorithm along with the Levenberg-Marquardt, Momen-Agriculture 2021, 11, 1191 6 of 17 tum and Conjugate Gradient learning algorithms were applied in the present study. The most appropriate topology in various numbers of hidden layers (1)(2)(3)(4) and neurons related to each layer was determined by trial and error tests. Several activation functions including Tangent Hyperbolic Axon, Linear Tangent Hyperbolic Axon, Sigmoid Axon and Linear Sigmoid Axon were tested with the aim of finding the equation with high capability in both hidden and output layers.

Performance and Sensitivity Analysis
Three statistical quality parameters, mean absolute error (MAE), root mean square error (RMSE) and coefficient of determination (R 2 ), were used to compare the performance of the developed ANN with different transfer functions and hidden layers and MLR models for estimating the desired output of EOY% and TAY% according to Equations (4)-(6), respectively.
where n is the number of data, y i is the observed values, y j represents the predicted values and the bars denote the mean of the variable. High values of R 2 and low values of RMSE and MAE indicate the better performance of the ANN and MLR model. After developing the final ANN model, a sensitivity test was applied for choosing the most influential input variables on the EOY% and TAY% as the outputs. For this, the dataset was run without any input variables (i.e., SY, NS, NDM, TSW, NU and NI), and the models' performance was assessed using R 2 , RMSE and MAE. Neuro-Solutions software (version 5.0) was used for the ANN model developing, evaluating and sensitivity analysis.

Selection of Input Variables
Since the input variables have a significant effect on the weighted coefficients and the final architecture of the model, the selection of these variables is a very important step for the development of the model [35]. To this end, Pearson's correlation coefficient was used to consider the relationship between dependent variables (EOY% and TAY%) and the other characteristics (as the independent variables) (Figures 1 and 2). EOY% had the highest positive correlation with HI (R = 0.  [41] also reported a negative significant correlation between essential oil yield with plant height and flowering date of Iranian fennel accessions. Overall, the results of the correlation analysis showed that HI, SY/m 2 , NU, SY/P and NS/P are the most important parameters to determine essential oil yield in fennel populations. As reported by Bahmani et al. [11,22] and Cosges [15], there is a significant correlation between the essential oil content of fennel and length of the peduncle, stem diameter, plant height, the weight of dry biomass, number of nodes, number of leaves, length of middle internodes, number of inflorescences and 1000-seed weight [11,15,22]. sential oil yield in fennel populations. As reported by Bahmani et al. [11,22] and Cosges [15], there is a significant correlation between the essential oil content of fennel and length of the peduncle, stem diameter, plant height, the weight of dry biomass, number of nodes, number of leaves, length of middle internodes, number of inflorescences and 1000-seed weight [11,15,22].  Table 2. ns, * and **, non-significant, significant at the 0.05 and 0.01 probability level, respectively. TAY% (as a second dependent variable) had the strongest positive correlation with HI (R = 0.823) followed by SY/P (R = 0.693), SY/m 2 (R = 0.693), EOY% (R = 0.590), NS/P (R = 0.573), NU (R = 0.572) and LFI (R = 0.456), as well as a negative significant correlation with LLOI ( Figure 2). The correlation between various traits can be positively or negatively affected by other variables and these low coefficients can significantly reduce the capability of the correlation analysis to select the input variables [42,43]. However, there are parameters than other morphological and yield components that affect the oil yield and trans-anethole content of fennel. The correlation of climatic data (temperature) with oil yield and trans-anethole content of Iranian fennel accessions was assessed and a negative correlation between oil yield and Tmax and a positive correlation between trans-anethole and Tmax (r = 0.459) were reported [41]. These results indicate the importance of environmental parameters in assessing the correlation analysis of fennel populations. Incorporating such data into the model can increase the decision-making power and accuracy of the predictive model. In addition to perform the correlation analysis, stepwise regression (SWR) analysis was employed in this study to optimize the number of input variables [43].  Table 2. ns, * and **, non-significant, significant at the 0.05 and 0.01 probability level, respectively. Based on the SWR (Tables 3 and 4), nine traits out of eleven (SY/m 2 , NDF 50%, NS/P, NU, FPH, NS/U, NDM, TSW and NI) were entered into the models as the most suitable input variables (the dependent variables were EOY% and TAY%). The number of umbels is an important yield component characteristic that can affect grain yield and subsequently the essential oil yield of fennel populations. This characteristic affect both EOY% and TAY% of the evaluated fennel population of the present study. These results are consistent with the results of Sefidan et al. [44] and Kalleli et al. [45].   Figure 2). The correlation between various traits can be positively or negatively affected by other variables and these low coefficients can significantly reduce the capability of the correlation analysis to select the input variables [42,43]. However, there are parameters than other morphological and yield components that affect the oil yield and trans-anethole content of fennel. The correlation of climatic data (temperature) with oil yield and trans-anethole content of Iranian fennel accessions was assessed and a negative correlation between oil yield and T max and a positive correlation between transanethole and Tmax (r = 0.459) were reported [41]. These results indicate the importance of environmental parameters in assessing the correlation analysis of fennel populations. Incorporating such data into the model can increase the decision-making power and accuracy of the predictive model. In addition to perform the correlation analysis, stepwise regression (SWR) analysis was employed in this study to optimize the number of input variables [43].
Based on the SWR (Tables 3 and 4), nine traits out of eleven (SY/m 2 , NDF 50%, NS/P, NU, FPH, NS/U, NDM, TSW and NI) were entered into the models as the most suitable input variables (the dependent variables were EOY% and TAY%). The number of umbels is an important yield component characteristic that can affect grain yield and subsequently the essential oil yield of fennel populations. This characteristic affect both EOY% and TAY% of the evaluated fennel population of the present study. These results are consistent with the results of Sefidan et al. [44] and Kalleli et al. [45]. Table 3. Stepwise regression analysis for essential oil yield as dependent variable.
Step  Although HI and LFI were significantly correlated with EOY% and TAY%, according to the SWR analysis, these parameters could not be recognized as appropriate input variables (Tables 3 and 4). MLR results revealed the weakness of the correlation method, which can be due to the indirect positive and negative effects of other traits on the correlation between dependent traits (EOY% and TAY%) and other independent traits [43]. Bahmani et al. [11] also applied SWR to identify to most important characteristics affecting the essential oil content of Iranian fennel and reported low partial R 2 values for inserted variables in the model (partial R 2 = 0.32, 0.06, 0.03, 0.02 and 0.02 for number of leaves, length of peduncle, plant height and days to 50% flowering, respectively). The low estimated partial R 2 values for all independent variables indicate the insufficient efficiency of the linear regression model in interpreting the relationships between independent and dependent variables. Therefore, a non-linear model is needed to better interpret these relationships.

Prediction of Dependent Variables Using MLP/ANN Model
Several factors as numbers of the hidden layer (s) and their neurons (nods) and determination of the transfer function are important for the selection of input variables. They can be determined using trial and error [46]. There are four different transfer functions to running supervised neural networks, namely Sigmoid Axon, Linear Sigmoid Axon, Tangent Hyperbolic Axon and Linear Tangent Hyperbolic Axon, to select the proper transfer function ( Table 5).
As presented in Table 6, the lowest values of MAE and RMSE and the highest R 2 values were obtained by the Sigmoid Axon function in both training and testing stages for the prediction of EOY%, as well as the TanhAxon transfer function to predict TAY%. Table 5. Summary of the components of the neural networks used to predict essential oil and trans-anethole yield of fennel populations.

50-2000 Linear Sigmoid Axon
Marquardt TanhAxon  Momentum  Liner TanhAxon Conjugate Gradient The results of the linear transfer functions are not presented to maintain the clarity of the useful and applied results. In the testing and training phases, linear transfer functions reduced the efficiency of the models. These functions apply a simple linear conversion to the processed input variables and transfer it to the output stage, whereas nonlinear and TanhAxon functions produce outputs in the ranges 0 to 1 and −1 to 1, respectively [25]. Similar to the present study, Sigmoid Axon and TanhAxon functions have been applied in previous studies [24,25,32,35]. This is probably due to the high capability of these functions to justify nonlinear changes compared to the other functions. Various ANN models were implemented by selected transfer functions, with 1-5 hidden layers and 1-20 nodes per layer. As shown in Table 6, for the prediction of EOY%, the ANN model with two hidden layers provided the best performances in the training phases (R 2 = 0.953, RMSE = 0.522, and MAE = 0.375) and testing (R 2 = 0.929, RMSE = 0.544 and MAE = 0.385). Therefore, the results of Tables 5 and 6 reveal that the best EOY% (essential oil) predictive model consisted of an input layer with 11 input variables (NDF50%, NDF100%, NDM, FPH, NI, NU, SY/P, SY/m 2 , NS/P, NS/U, and TSW) and two hidden layers with nine and seven neurons in each layer, i.e., the 11-9-7-1 structure (Figure 3). The TanhAxon transfer function, Momentum learning algorithm and one hidden layer (with 11-10-1 structure) were the best parameters in the ANN model to predict TAY% of fennel (Table 6). This topology had the minimum amounts of RMSE and MAE and the highest coefficient of determination (Table 6). Levenberg-Marquardt back-propagation and Logsig and Tansig transfer functions for hidden and output layers algorithm and the number of 10 neurons in the hidden layer have been reported as best parameters of an ANN for the modeling and optimization of anethole ultrasound-assisted extraction from fennel seeds [47]. One of the main objectives of ANN modeling studies is to achieve a simple model with the least number of hidden layers and neurons and the highest performance values [30,32]. Niazian et al. [25] reported an ANN model with a 4-4-1 structure, for the prediction of grain yield in ajowan (Trachyspermum ammi L.) belonging to the Apiaceae family [25]. These results will be useful to fit an excellent model structure of ANN in future research on the Apiaceae family.
of the main objectives of ANN modeling studies is to achieve a simple model with the least number of hidden layers and neurons and the highest performance values [30,32]. Niazian et al. [25] reported an ANN model with a 4-4-1 structure, for the prediction of grain yield in ajowan (Trachyspermum ammi L.) belonging to the Apiaceae family [25]. These results will be useful to fit an excellent model structure of ANN in future research on the Apiaceae family. An insufficient number of epochs can decrease the ANN performance and too many epochs can increase the risk of network overtraining and subsequent memorization [25]. To minimize the over-training and memorization, a pretest using two hidden layers and various numbers of epochs  was conducted and the convergence point between training and validation was considered as the completion of training time to avoid overtraining (Figure 4). An insufficient number of epochs can decrease the ANN performance and too many epochs can increase the risk of network overtraining and subsequent memorization [25]. To minimize the over-training and memorization, a pretest using two hidden layers and various numbers of epochs  was conducted and the convergence point between training and validation was considered as the completion of training time to avoid overtraining ( Figure 4). least number of hidden layers and neurons and the highest performance values [30,32]. Niazian et al. [25] reported an ANN model with a 4-4-1 structure, for the prediction of grain yield in ajowan (Trachyspermum ammi L.) belonging to the Apiaceae family [25]. These results will be useful to fit an excellent model structure of ANN in future research on the Apiaceae family. An insufficient number of epochs can decrease the ANN performance and too many epochs can increase the risk of network overtraining and subsequent memorization [25]. To minimize the over-training and memorization, a pretest using two hidden layers and various numbers of epochs  was conducted and the convergence point between training and validation was considered as the completion of training time to avoid overtraining ( Figure 4). The comparison of predicted and measured EOY% values is shown in Figures 5 and 6 for both training and testing datasets in the form of scatter plots. As shown in the scatter plots, the measured data and the ANN model had the same distribution. The EOY% values predicted by the ANN model tended to follow the corresponding actual ones quite closely.
A scatter plot was also applied to compare observed and predicted values of TAY% from the ANN model in both training and testing datasets. The ability of the ANN model to predict TAY% in training (R 2 = 0.794) and testing (R 2 = 0.777) stages are shown in Figures 7 and 8, respectively. According to the scatter plot, there was no significant difference between predicted data and measured data of TAY% in the ANN model in both training and testing datasets (Figures 7 and 8).
The comparison of predicted and measured EOY% values is shown in Figures 5 and  6 for both training and testing datasets in the form of scatter plots. As shown in the scatter plots, the measured data and the ANN model had the same distribution. The EOY% values predicted by the ANN model tended to follow the corresponding actual ones quite closely.  plots, the measured data and the ANN model had the same distribution. The EOY% values predicted by the ANN model tended to follow the corresponding actual ones quite closely.

Comparing MLR and ANN Models to Predict EOY% and TAY% of Fennel Populations
The MLR models, especially when there are linear relationships between the input and output variables are known as efficient modeling approaches [48]. In order to determine the strength of linear regression to predict EOY% and TAY%, two stepwise regression of MLR models were determined and the following equations (Equations (7) and (8)) were computed to predict dependent variables.
A scatter plot was also applied to compare observed and predicted values o from the ANN model in both training and testing datasets. The ability of the ANN to predict TAY% in training (R 2 = 0.794) and testing (R 2 = 0.777) stages are shown in 7 and 8, respectively. According to the scatter plot, there was no significant di between predicted data and measured data of TAY% in the ANN model in both and testing datasets (Figures 7 and 8).

Comparing MLR and ANN Models to Predict EOY% and TAY% of Fennel Populat
The MLR models, especially when there are linear relationships between th and output variables are known as efficient modeling approaches [48]. In o A scatter plot was also applied to compare observed and predicted values of from the ANN model in both training and testing datasets. The ability of the ANN to predict TAY% in training (R 2 = 0.794) and testing (R 2 = 0.777) stages are shown in 7 and 8, respectively. According to the scatter plot, there was no significant dif between predicted data and measured data of TAY% in the ANN model in both t and testing datasets (Figures 7 and 8).

Comparing MLR and ANN Models to Predict EOY% and TAY% of Fennel Populat
The MLR models, especially when there are linear relationships between th and output variables are known as efficient modeling approaches [48]. In o According to Equation (7), the predicted value of EOY% is a linear transformation of seed yield per square meter, number of days to 50% flowering, number of seeds per plant, number of umbels and final plant height variables. On the other hand, seed yield per square meter, number of seeds per umbel, number of days to maturity, thousand seed weight, number of umbels and number of internodes were entered in the SWR model, when TAY% was considered as a dependent variable (Equation (8)). The depended variables are those that selected based on stepwise regression analysis (Tables 3 and 4). As shown in Tables 3  and 4, independent variables explained approximately 55% and 47% of variations in EOY% and TAY%, respectively. Therefore, the linear model was not strong enough to explain the variations of the dependent variables. The MLR constructions (Equations (7) and (8)) showed the importance and the effect of independent variables on dependent variables and these equations revealed that how the amounts of EOY% or TAY% in fennel can change by different amounts of independent variables. Bahmani et al. [11] used an MLR model to find the relationship between independent variables and grain yield of fennel and showed that 55.41%, 12.72%, 2.21% and 11.63% of total variance of grain yield was explained by weight of dry biomass, days to 50% flowering, number of inflorescent and days to 70% seed pasty, respectively [11]. Niazian et al. [25] studied the seed yield of ajowan using an MLR model and introduced shoot dry weight, number of umbellets in main inflorescence, number of branches and the biological yield as important independent variables [25].
The results of the ANN and MLR models' development based on performance evaluation indices including R 2 , RMSE and MAE provide a set of the reasonable criteria for comparison between two modeling methods. Compared to the MLR model, the ANN models could predict EOY% and TAY% much better than the MLR model with 39.97% and 32.69% increases in R 2 , reductions of 0.30 and 0.20 in RMSE and reductions of 0.25 and 0.12 in MAE, respectively. According to the obtained results, the ANN model had higher predictive power than the MLR model and was more efficient than MLR in predicting EOY% and TAY% traits in fennel populations. The different performance of the two models to predict EOY% and TAY% shows the importance of choosing the more suitable model. The superiority of the ANN modeling methods compared to the MLR methods has been reported in other previous studies [24,25,32,35]. The supremacy of ANN modeling seems to be due to the high capability of this model to capture the highly nonlinear and complex relationship between EOY% or TAY% and the relevant traits [23]. There is a considerable variation among different populations of fennel in terms of seed yield, yield components, essential oil content and essential oil composition [11,49]. This variation along with high genotype × environment interaction create a difficult situation to improve the fennel population for the desired traits in a short period using conventional statistical methods and direct selection [49]. However, using non-linear predicting methods, breeders are able to estimate the desired values of their desired traits in a faster and more confident way. Therefore, an advanced computational method can play a complementary role to conventional statistical methods previously employed to improve the fennel populations [11,49].

Sensitivity Analysis
A sensitivity analysis is a method of studying the behavior of a model and assessing the significance of each input variable on the values of the output variable of the model. Sensitivity analysis provides insight into the usefulness of individual variables. With the help of this kind of analysis, it is possible to judge which inputs for modeling EOY% or TAY% parameters should be considered as the most and least significant ones in the ANN model [36]. For this purpose, the sensitivity tests for ANN and MLR models were performed without a specific input variable, i.e., SY/m 2 , NS/P, NDF50%, NS/U and NDM. The results of the sensitivity test for EOY% showed that the highest RMSE (0.608, 0.911) and MAE (0.439, 0.659) and the lowest R 2 (75. 45, 39.12) were achieved without seed yield per square meter in both ANN and MLR models ( Table 7). Number of days to 50% flowering and number of seeds per plant were the other most effective characteristics on the EOY% of fennel populations. As shown in Table 7, the ANN and MLR models for TAY% without the seed yield per square have the lowest R 2 (61. 43 and 35.52) and highest RMSE (0.332 and 0.532) and MAE (0.487 and 0.471), respectively. As the results showed, SY/m 2 is the most influential factor to predict EOY% and TAY% in both models.
Bahmani et al. [11] investigated the direct and indirect effects of some morphological traits on the essential oil content of fennel populations by using path analysis. They reported that the number of leaves, days to 50% flowering and plant height had direct effects on essential oil content in the first year of the experiment as well as days to 50% flowering, stem diameter and the number of seeds per largest inflorescence in the second year of the experiment [11]. Abdipour et al. [24] used the sensitivity analysis in both ANN and MLR models to find the importance of each input variable on the oil content of sesame and reported capsule number per plant as the most important input variable that can significantly affect RMSE, MAE and R 2 of both ANN and MLR models [24]. In the other study, sensitivity tests were conducted in both MLR and ANN models and results showed that the highest RMSE and MAE and the lowest R 2 were achieved in the MLR and ANN models without biological yield [25]. Table 7. Sensitivity analysis and selecting three of the most influential inputs on the essential oil and trans-anethole yield of fennel populations. Since fennel is an indeterminate plant, having continuous growth of new leaves, flowers and seeds during the growing season, it can be said that indeterminate populations of fennel with a long period from flowering to maturity are the best in stable environmental conditions. This finding can be seen in the results of the sensitivity analysis (Table 7), indicating that the number of days to 50% flowering and number of days to maturity had significant effects on EOY% and TAY%, respectively. The findings of previous studies determined possible differences and similarities in essential oils and chemical composition of various plants at different phenological stages in the various medicinal plants [49][50][51]. The results showed that the EOY% and TAY% depend on different phenological stages. The effect of phenological stages on essential oil and its composition may be due to its effect on enzyme activity and the metabolism of essential oil production [52]. The contribution of each input trait to predict EOY% and TAY% are ranked from highest to lowest in both ANN and MLR models in Table 7.

Conclusions
Identifying high-heritable yield components and using efficient modeling methods can help breeders select the most advantageous traits in medicinal plant breeding programs. The results of the present study revealed that the ANN compared to the MLR was able to predict the EOY% and TAY% of fennel populations with more accuracy. The classical MLR model could not interpret the non-linear relationships between EOY% and TAY% and their corresponding independent variables. However, the ANN model showed more accuracy in interpreting complex relationships among EOY%, TAY% and other variables in the model, according to R 2 , RMSE and MAE indicators. These results showed that the selected ANN model could surely replace MLR to predict EOY% and TAY% of fennel populations. Based on the sensitivity analysis, SY/m 2 , NDF50%, and NS/P were the most important traits to predict EOY%, whereas SY/m 2 , NS/U and NDM were the most important traits to predict the TAY% of fennel populations. The findings of the present study can provide important information to improve the EOY% of the other medicinal plants of the Apiaceae family. Plant breeders can also use the optimized artificial neural network models to model other complicated polygenic traits of medicinal plants, such as the content of various secondary metabolites that are more valuable for the food and pharmaceutics industries.