Agricultural Parameters and Essential Oil Content Composition Prediction of Aniseed, Based on Growing Year, Locality and Fertilization Type—An Artificial Neural Network Approach

Simple Summary The artificial neural network (ANN) model was developed to predict and optimize the aniseed parameters including: plant height, umbel diameter, number of umbels, number of seeds, 1000-seed weight, yield per plant, plant weight, harvest index, yield per ha, essential oil yield, germination energy, total germination and essential oil content; as well as the content of obtained essential oil, such as: limonene, cis-dihydro carvone, methyl chavicol, carvone, cis-anethole, trans-anethole, β-elemene, α-himachalene, trans-β-farnesene, γ-himachalene, trans-muurola-4(14),5-diene, α-zingiberene, β-himachalene, β-bisabolene, trans-pseudoisoeugenyl 2-methylbutyrate and epoxy-pseudoisoeugenyl 2-methylbutyrate), according to growing year, locality and fertilization type. Abstract Predicting yield is essential for producers, stakeholders and international interchange demand. The majority of the divergence in yield and essential oil content is associated with environmental aspects, including weather conditions, soil variety and cultivation techniques. Therefore, aniseed production was examined in this study. The categorical input variables for artificial neural network modelling were growing year (two successive growing years), growing locality (three different locations in Vojvodina Province, Serbia) and fertilization type (six different treatments). The output variables were morphological and quality parameters, with agricultural importance such as plant height, umbel diameter, number of umbels, number of seeds per umbel, 1000-seed weight, seed yield per plant, plant weight, harvest index, yield per ha, essential oil (EO) yield, germination energy, total germination, EO content, as well as the share of EOs compounds, including limonene, cis-dihydro carvone, methyl chavicol, carvone, cis-anethole, trans-anethole, β-elemene, α-himachalene, trans-β-farnesene, γ-himachalene, trans-muurola-4(14),5-diene, α-zingiberene, β-himachalene, β-bisabolene, trans-pseudoisoeugenyl 2-methylbutyrate and epoxy-pseudoisoeugenyl 2-methylbutyrate. The ANN model predicted agricultural parameters accurately, showing r2 values between 0.555 and 0.918, while r2 values for the forecasting of essential oil content were between 0.379 and 0.908. According to global sensitivity analysis, the fertilization type was a more influential variable to agricultural parameters, while the location site was more influential to essential oils content.


Introduction
Anise (Pimpinella anisum L.) has been known for centuries as a spice, perfumery and medicinal plant from the Apiaceae family [1]. Its fruit is employed in the pharmaceutical industry and daily nutrition due to its vast benefits and ability to mask odors and flavoring [2,3]. The naturally derived compounds are inherently accepted by the human body and are becoming more prevalent as a therapeutic option against various diseases, including viral infections [3]. The characteristic aroma of aniseed arises from the high content of essential oil (3-4%) with trans-anethole as the main compound [4]. Aniseeds are frequently applied as an aromatic ingredient in traditional flavored wines [2]. In addition, aniseeds are widely engaged as an aromatic plant to provide flavor to various foods including soups [5], poultry [6], pickles [7], salad [8], drinks [9] and confectionery items, giving them a licorice flavor (chewing gum, jelly beans and candy) [10,11]. It can also be applied as a carminative and sedative agent (due to its antioxidant and antimicrobial properties) [12]. Aniseed can be found in seafood dishes to enhance sweet breath and provide digestive support [12,13]. The essential oils of anise seeds are multicomponent blends of volatile oils, typically terpenes and their derivatives, containing nearly 4% of essential oil (E-Anethole conveys 90% of these essential oils, including stragol, anisaldehyde, γ-himachalene, isoeugenol, anisol, p-anisic acid and acetoanisol) [14]. Besides essential oil, aniseeds also contain a significant amount of antioxidants, including phenolic acids and flavonoids [15]. The application of blended fertilizers can significantly improve the biomass and essential oil yield [16]. In addition, the targeted environment and the weather conditions in the growing year can have a substantial impact on essential oil composition [17].
Finding the challenging connection between growing conditions and essential oil composition can be successfully achieved by using mathematical modeling.
The Artificial Neural Network (ANN) was recently recognized as an attractive mathematical method for exploring agricultural production systems [18][19][20]. The ANN model does not require definite model parameters. Nevertheless, it adopts the ability to obtain results from the experimental data and manage the intricate system with nonlinearities and elaborate on the connections between variables [21].
The uses of ANN models cover numerous investigations of agricultural production studies [22]. Lately, the ANN model has been recognized as one of the practical analyses that have been demonstrated to be helpful in drought tolerance indices categorization [23].
The high costs of agricultural production demand to be predicted numerically as much as possible. One of the possible manners of lowering costs is the use of fitting tools that predict agricultural production and variations in kernel properties through breeding. Moreover, the agrotechnology level involved in the cultivation, particularly fertilization with nitrogen, influences seed features that are challenging to prognosticate.
In the research by Silitonga et al., [24], the multi-objective optimization (MOO) for adjusting the parameters of agricultural production to maximize the yields was achieved by ANN models associated with ant colony optimization, developed to optimize biodiesel production process parameters.
In accordance with this study, the MOO analysis combined with ANN and genetic algorithm (GA) was implemented in the agricultural process, bearing in mind that there might not be a unique solution due to the contradictory objective functions [18,25,26]. As a part of this study, the solution of the MOO was estimated introducing a Pareto optimal method [18].

Experiment Design
The research was carried out during two successive years at three localities in Vojvodina Province, Serbia (detailed information about locations and soil conditions are given in Table 1). The experiment was carried out in the field under different microclimatic and soil conditions. Field experiments were set up as a randomized block design with four replications. An experimental plot size was 5 m 2 (consisting of 5 rows, each 3 m long). Sowing was carried out at the optimum time (during April) with a hand seeder. The duration of the vegetation period (in days), as well as climatic conditions, such as growing degree days (GDD), precipitation and insolation, for both investigated years and localities are given in Table 1. Data of meteorological conditions were obtained from the nearest meteorological station for each experimental field (<30 km). The experiments analyzed the influence of six treatments: control-without fertilizers, Slavol, BactoFil B-10, Royal Ofert, vermicompost and NPK, on different properties of anise. Detailed information about fertilizers is given in Table 2. The harvest was performed by hand at a full ripening stage. Evaluation of morphological parameters (plant height, umbel diameter, number of umbels, number of seeds, yield per plant, plant weight and harvest index) was performed by sampling 10 randomly selected plants from the central row from each fertilised treatment. Quantity and quality parameters (seed yield per ha, EO yield per ha, EO content, 1000-seed weight, germination energy and total germination) were evaluated by harvesting all plants from three central rows (two outer rows were excluded in order to avoid marginal effect).

ANN Modelling
A multi-layer perceptron model (MLP) with three layers (input, hidden and output) was implemented to construct the ANN model. This format of the ANN model is approved for its high potential to estimate nonlinear functions [27][28][29][30][31].
Prior to the ANN model computation, normalization of input and output data is essential to enhance the outcome of the ANN [32]. Throughout the construction of the ANN model, input data were frequently inserted in the network [31][32][33]. The training process of the network was replicated 100,000 times, testing the various structures of the ANN model, including a diverse number of neurons in the hidden and the output layers (5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20), alternative activation functions (in particular, logarithmic, tangent hyperbolic, logistic or identity), and with random starting values of weight coefficients and biases. The ANN structure optimization was accomplished by achieving the minimal validation error. The BFGS method was implemented to resolve the unconstrained nonlinear optimization problem throughout the ANN construction [34].
The agricultural production database that was employed for the ANN modelling was stochastically segmented into training, cross-validation and testing data (60%, 20% and 20% of experimental data, accordingly). The training data set was applied during the learning cycle of the ANN calculation, the evaluation of the optimal number of neurons in the hidden layer, as well as for the computing of the weight coefficient of individual neurons in the network [35].
The weight coefficients and biases connected to the hidden and output layers of the ANN model are shown in matrices and vectors W 1 and B 1 , and W 2 and B 2 , individually. The neural network model can be displayed by matrix equation: where Y is the matrix of the outputs, f 1 and f 2 are transfer functions in the hidden and output layers, respectively and X is the matrix of inputs [36]. The elements of matrices W 1 and W 2 are computed during the learning cycle, in which the elements are constantly introduced by applying an optimization method to minimize the disagreement between the data and the model [34,37,38]. The BFGS algorithm was implemented to enhance the evaluation and stabilize the solution's convergence [39]. The coefficients of determination were utilized as parameters to monitor the execution of the achieved ANN model.

Global Sensitivity Analysis
Yoon's global sensitivity formula for the developed ANN model was used to determine the relative influence of the input parameters on output variables, using weight coefficients of the calculated ANN model [40]: where w-weight coefficient in ANN model, i-input variable, j-output variable, k-hidden neuron, n-number of hidden neurons, m-number of inputs.

Error Analysis
The numerical confirmation of the developed model was investigated by applying the coefficient of determination (r 2 ), reduced chi-square (χ 2 ), mean bias error (MBE), root mean square error (RMSE) and mean percentage error (MPE). These frequently used parameters can be obtained according to these equations [41]: where x exp,i marks the experimental values and x pre,i present value computed by the model. N and n are the number of observations and constants, respectively.

Multi-Objective Optimization
The obtained ANN model was utilized for MOO calculation, with the aim to obtain agricultural production conditions which would reach the maximal values of plant height, umbel diameter, number of umbels, number of seeds, 1000-seeds weight, yield per plant, plant weight, harvest index, yield per ha, EO yield, germination energy, total germination and EO content, as well as the content of obtained EOs, such as limonene, cis-dihydro carvone, methyl chavicol, carvone, cis-anethole, trans-anethole, β-elemene, α-himachalene, transβ-farnesene, γ-himachalene, trans-muurola-4(14),5-diene, α-zingiberene, β-himachalene, β-bisabolene, trans-pseudoisoeugenyl 2-methylbutyrate and epoxy-pseudoisoeugenyl 2methylbutyrate. The result of the MOO was extracted using a Pareto front, which appeared in the case of one objective function improvement without deteriorating the others [18]. The genetic algorithm (GA) was used to find the solutions to the MOO problem by a stochastic method inspired by natural evolution in applying the mutation, selection, inheritance and crossover [42,43]. For the MOO computation, Matlab R2018b, software (Gamax Laboratory Solutions Kft., Budapest, Hungary) was used, according to the multi-objective function. The primary population was formed by chance and then introduced to a set of points in the design area. The populations of the next generations were determined using distance measures and a non-dominated ranking of the particular points within the existing generation [18,43,44].

ANN Model
Supplementary Table S1 presents the agricultural parameters of aniseed, based on  growing year, locality and fertilization type, while Supplementary Table S2 shows the quantitative profile of Pimpinella anisum L. essential oil. The supplementary model was determined by utilizing Equation (1). The attained ANN model showed sufficient generalization ability for experimental data prediction. Based on the ANN model performance, the optimal number of neurons in the hidden layer for obtaining plant height, umbel diameter, number of umbels, number of seeds, 1000-seed weight, yield per plant, plant weight, harvest index, yield per ha, EO yield, germination energy, total germination and EO content, as well as the content of obtained Eos was obtained. The prediction number of neurons in the hidden layer was 10 (network MLP 10-10-30) to attain high values of r 2 (overall 0.936 for the training period) and as low as a possible sum of squares values (SOS) ( Table 3).   Table S4 displays the details of matrix W 2 and vector B 2 (bias) for the hidden layer in the ANN, used for calculation in Equation (1). The ANN model showed an insignificant lack of fit tests, which suggests the model satisfactorily predicted the agricultural parameters and essential oil content composition of aniseed based on growing year, locality and fertilization type. The quality of the model fit was tested, and the residual analysis of the developed model is presented in Tables 4 and 5. A high r 2 indicates that the variation was accounted for, and that the data fitted the proposed model satisfactorily [36]. The ANN model was employed to predict experimental variables, quite satisfactorily, for a wide range of the parameters (as observed in Figures 1 and 2, where the experimentally estimated and ANN model predicted values are displayed).
Most of the time, the predicted values were approaching the desired r 2 value for the ANN model. Therefore, the SOS achieved by the ANN model is of the same order of magnitude as the experimental errors in Figure 1. Comparison of experimentally obtained values of output variables with ANN predicted values is stated in the articles [34,39,44].
The ANN model is challenging (208 weights-biases) due to the high nonlinearity of the studied system [34,45]. The r 2 values within experimental and ANN model outputs of plant height, umbel diameter, number of umbels, number of seeds, 1000-seed weight, yield per plant, plant weight, harvest index, yield per ha, EO yield, germination energy, total germination and EO content, as well as the content of obtained EOs, such as limonene, cis-dihydro carvone, methyl chavicol, carvone, cis-anethole, trans-anethole, β-elemene, α-himachalene, trans-β-farnesene, γ-himachalene, trans-muurola-4(14),5-diene, α-zingiberene, β-himachalene, β-bisabolene, trans-pseudoisoeugenyl 2-methylbutyrate and epoxy-pseudoisoeugenyl 2-methylbutyrate) were: 0. The character of the ANN model fit is observed in Tables 2 and 3, where χ 2 , MBE, RMSE and MPE decrease [41]. The residual analysis of the developed model was additionally conducted. Skewness evaluates the variation of the distribution from normal symmetry. A skewness other than zero indicates the asymmetrical distribution, even though typical distributions are ideally symmetrical. The "peakedness" of distribution is assessed by kurtosis. When the kurtosis is greater than zero, the distribution is flatter or more peaked than predicted; the kurtosis of the normal distribution is zero. A high r 2 suggests that the variation was evaluated and that the data fit adequately to the suggested model [46][47][48].
The goodness of fit, among experimental computations and model estimated outputs, described as the ANN model performance (sum of r 2 within measured and calculated parameters), are displayed in Table 3.

Multi-Objective Optimization of the Outputs of the ANN
One of the main goals of this research was to optimize the developed ANN output variables throughout agricultural production, synchronously employing the ANN model by varying the input variables. These numerical assignments were solved for the ANN model involving the MOO computation in Matlab. The MOO method was set to obtain the most suitable agricultural parameter combinations by maximizing the ANN model's output variables. Constraints applied to the optimization method were used in the experimental series of parameters. The number of generations achieved was 495 for the ANN model, while the dimension of the population was set to 100 for all input variables. Thus, the number of points on the Pareto front was 232 for the ANN model. The computed maximums of output variables were reached in the first investigated year at Mošorin, without fertilization for harvest index, using Slavol for cis-dihydro carvone and trans-anethole content; BactoFil for γ-himachalene content; Royal Ofert biohumus for plant height; vermicompost for number of umbels and α-himachalene content; NPK for yield per plant, plant weight, yield per ha, and EO yield, as well as methyl chavicol, trans-β-farnesene, α-zingiberene, β-bisabolene and epoxy-pseudoisoeugenyl 2-methylbutyrate content. In the first investigated year at Veliki Radinci, using Royal Ofert biohumus achieved the maximum of output variables for umbel diameter and EO content. During the same year at Ostojićevo, we used fertilizer Slavol for α-himachalene content; BactoFil for 1000-seeds weight, germination energy and total germination; Royal Ofert biohumus for trans-β-farnesene content; vermicompost for cis-anethole content; and NPK for carvone and limonene content. In the second investigated year, at Veliki Radinci, using NPK, maximums of output variables were reached for the number of seeds. In the same year in Ostojićevo, we used BactoFil for EO compound β-elemene.
The optimal results obtained for plant height, umbel diameter, number of umbels, number of seeds, 1000-seeds weight, yield per plant, plant weight, harvest index, yield per ha, EO yield, germination energy, total germination and EO content, using MO were: 54.389; 6