The Application of Multiple Linear Regression and Artiﬁcial Neural Network Models for Yield Prediction of Very Early Potato Cultivars before Harvest

: Yield forecasting is a rational and scientiﬁc way of predicting future occurrences in agriculture—the level of production effects. Its main purpose is reducing the risk in the decision-making process affecting the yield in terms of quantity and quality. The aim of the following study was to generate a linear and non-linear model to forecast the tuber yield of three very early potato cultivars: Arielle, Riviera, and Viviana. In order to achieve the set goal of the study, data from the period 2010–2017 were collected, coming from ofﬁcial varietal experiments carried out in northern and northwestern Poland. The linear model has been created based on multiple linear regression analysis (MLR), while the non-linear model has been built using artiﬁcial neural networks (ANN). The created models can predict the yield of very early potato varieties on 20th June. Agronomic, phytophenological, and meteorological data were used to prepare the models, and the correctness of their operation was veriﬁed on the basis of separate sets of data not participating in the construction of the models. For the proper validation of the model, six forecast error metrics were used: i.e., global relative approximation error (RAE), root mean square error (RMS), mean absolute error (MAE), and mean absolute percentage error (MAPE). As a result of the conducted analyses, the forecast error results for most models did not exceed 15% of MAPE. The predictive neural model NY1 was characterized by better values of quality measures and ex post forecast errors than the regression model RY1.


Introduction
Estimating the yield of arable crops can be defined as predicting the size of the final crop yield of a given plant species, assuming that the environmental conditions characterizing a given growing season will be similar to many-year averages. Timely and accurate forecasts of crop yields before harvest are critical to the functioning of food markets. In addition, they are an important element of the organization of agricultural of environmental, climatic or constraint conditions, i.e., effects of pathogens, supply of available nutrients, environmental factors [20]. Known yield models for potato are related to growth, plant development and potential yield: POTATO 1. [21,22], POTATO 2. [23]; and actual yield: SUBSTOR-Potato [24][25][26], LINTUL-POTATO [27], POTATOS [28], NPOTATO [28], CropSystVB-CSPotato [29]. Predictive models and decision support systems are also worth mentioning: PLANT-PLUS makes it possible to predict the occurrence of various diseases in potato cultivation, especially potato blight [30]; MAPP (Management Advisory Package for Potatoes) makes it possible to optimize the planting and harvesting process based on data on the cultivar grown, price and size of seed potatoes, and expected profits [20].
Nowadays, efforts are made to produce more accurate predictive models, such as artificial neural networks (ANN) [31][32][33][34][35]. The mentioned models belong to the nonlinear model group that enables the description of complex phenomena and processes occurring in nature. They are characterized by a better quality and accuracy of the forecast, allowing for quick and precise analysis with many input variables, and the use of linguistic (qualitative) parameters [3,12,17,[36][37][38]. An important cognitive aspect is to enrich the analyses of neural modeling with the evaluation of factors responsible for the yielding of plants under field conditions. Indicating ranks of individual independent variables is particularly important for those parameters that can be controlled during the growing season before the harvest [38]. Predicting crop yields makes sense only if the forecasts are made before harvest. That is why it is so important to choose the key development stages in terms of yield formation. From the point of view of potato cultivation at a very early stage, the development of yield prognostic models before the planned harvest will significantly facilitate its production and enable control of the harvest date.
The aim of the following work is to develop forecasting models of tuber yield of three very early potato cultivars (Arielle, Riviera, and Viviana) grown in Poland. The forecast takes place on a specific day of the calendar year; i.e., June 20. In the article, the authors compared the prediction accuracy of the regression model (RY1) and the neural model (NY1).

Experiment Location and Research Material
For the need of performing the research, data from Research Center's for Cultivar Testing (COBORU) [39,40] system field trial books were used, which were created on the basis of the results of official cultivar experiments with very early potato harvested 40 days after full emergence. Predictive linear and non-linear models were built for three potato varieties: Arielle, Riviera, and Viviana. The mentioned varieties occupy leading positions in potato tubers production in Poland. The field experiments were performed in the field units of the COBORU and the Pomeranian Agricultural Advisory Center in Lubań in 2010-2017. The following research uses data from the area of northern and northwestern Poland (Figure 1), i.e., the Experimental Station for Variety Testing in Karzniczka and Szczecin Dąbie, the Experimental Station for Variety Testing in Rarwino and Białogard, and the Pomeranian Agricultural Advisory Center (PODR) in Lubań. The Central Research Center for Cultivar Testing is a unit responsible for conducting and adapting the system of Polish cultivar experimentation to the market economy and European Union standards. The results used in the presented study come from the fields managed as part of the Post-Registration Variety Testing. Data for the construction of linear and non-linear models can be divided into several groups. The first one represents agronomic and phytophenological data as well as yielding results. These data were obtained from COBORU's system field books. The second group contains weather data obtained directly from the electronic database of meteorological phenomena and observations as well as meteorological summary charts registered at each of the research points. In the absence of measurement data from the meteorological stations of the above-mentioned units, for example the insolation sum, observation, and measurement data from the archival resources of the Institute of Meteorology and Water Management-National Research Institute-synoptic and climatic stations located as close as possible to the experimental points were used. If, for some reasons, data on soil abundance in basic nutrients were missing, the results of current reports on soil tests carried out by Regional Chemical-Agricultural Stations were used during the preparation of the data for the models. Figure 2 show the general framework of the paper.  Data for the construction of linear and non-linear models can be divided into several groups. The first one represents agronomic and phytophenological data as well as yielding results. These data were obtained from COBORU's system field books. The second group contains weather data obtained directly from the electronic database of meteorological phenomena and observations as well as meteorological summary charts registered at each of the research points. In the absence of measurement data from the meteorological stations of the above-mentioned units, for example the insolation sum, observation, and measurement data from the archival resources of the Institute of Meteorology and Water Management-National Research Institute-synoptic and climatic stations located as close as possible to the experimental points were used. If, for some reasons, data on soil abundance in basic nutrients were missing, the results of current reports on soil tests carried out by Regional Chemical-Agricultural Stations were used during the preparation of the data for the models. Figure 2 show the general framework of the paper. Data for the construction of linear and non-linear models can be divided into several groups. The first one represents agronomic and phytophenological data as well as yielding results. These data were obtained from COBORU's system field books. The second group contains weather data obtained directly from the electronic database of meteorological phenomena and observations as well as meteorological summary charts registered at each of the research points. In the absence of measurement data from the meteorological stations of the above-mentioned units, for example the insolation sum, observation, and measurement data from the archival resources of the Institute of Meteorology and Water Management-National Research Institute-synoptic and climatic stations located as close as possible to the experimental points were used. If, for some reasons, data on soil abundance in basic nutrients were missing, the results of current reports on soil tests carried out by Regional Chemical-Agricultural Stations were used during the preparation of the data for the models. Figure 2 show the general framework of the paper.

Field Experiments
The experiments are performed on very early potatoes intended for early harvest in three trials, with each being a separate repetition. For experiments in which the number of tested variants is less than or equal to 15, a randomized complete block system is applied. When the number of objects is greater than 16, the experiment is assumed as incomplete block designs (reducible 1-simple). The area of a single plot is about 15 m 2 depending on the adopted spacing between the rows. When using the recommended row spacing of 75 cm, the distance between plants in a row is 33 cm. In a single field, 60 tubers are planted in two adjacent ridges. Harvesting takes place approximately 40 days after full sprouting.

Building an Experimental Database
The total number of fields for experiments with Arielle, Riviera, and Viviana potato cultivars in the Experimental Station for Variety Testing (SDOO) in Karzniczka and Szczecin Dąbie, ZDOO in Białogard and Rarwino and PODR in Lubań in 2010-2017 amounted to 324 plots. Each plot represented a separate analyzed case. Information from a single plot was the basis for constructing predictive models thanks to separate sets: Ap and Bp. The Ap set (300 cases) contained data that were used to build a linear and nonlinear model. The Bp set (24 cases) did not participate directly in the construction of the models because the data contained therein was used for their validation. It should be also added that the cases included in the Bp set were selected not completely randomly, i.e., from each research year, one case representing each variety was selected. The dependent variable for each of the prognostic models was the tuber yield (t·ha −1 ) collected 40 days from full emergence [YIELDP1].
The construction of the linear (RY1: R-regression model, Y-yield, 1-first crop of the year) and non-linear (NY1: N-neural model, Y-yield, 1-first crop of the year) yield prediction model was made on the basis of the expected date of the calendar year, i.e., 20th of June. Analysis of the experimental data from five research points from 2010-2017 showed that the Arielle, Riviera, and Viviana varieties intended for early harvest were collected at the earliest on June 24 and at the latest on July 11. The most common harvest took place on June 30. The suggested date of prediction, which is June 20, is the period of variety full bloom in a typical year. This means that forecasting the yield before the proposed date would be unjustified due to the intensive accumulation of biomass and building of tuber by flowering plants.

Selecting Variables for Building Predictive Models
Reliable evaluation of the prognostic properties of the developed linear and nonlinear models is possible when these models are created and verified on the basis of the same dependent and independent variables. The detailed definitions of independent variables and dependent variable used in the regression and neural model are presented in Table 1.

The Method of Building a Linear Forecasting Model (MLR)
Multiple linear regression is the most commonly used form of linear regression. As a predictive tool, it allows us to explain the relationship between many independent variables (X1, X2 Xk) and the tested dependent variable (Y) [41]. The coefficient of determination R 2 explains the percentage variability of the dependent variable explained by the model. In other words, it is a measure of the model's goodness-of-fit.
The computational problem of multiple regression is to fit a straight line to a set of points. The most frequently used method for its implementation is the least squares approach. The method enables the adjustment of the regression equation parameters so that the sum of squared distances of the measurement points from the determined straight line is as small as possible.
The construction of the linear model RY1 was performed on the basis of the data contained in the Ap set, whereas verification on the set Bp.

The Method of Building a Non-Linear Forecasting Model (ANN)
The Bp non-linear forecasting model was built on the basis of the independent variables presented in Table 1. As in the case of linear model, the explained variable by the non-linear model was tuber yield (t·ha −1 ) harvested 40 days from full emergence [YIELDP1].
The choice of the best network architecture and the optimal learning method was made on the basis of the assessment of the network's ability to generalize and approximate, based on the established measures of their quality. It was assumed that the best network can be obtained when the sum error of squared differences is the smallest. Using Statistica v7.1, it was possible to test networks with different architectures. Evaluation of network quality parameters enabled for selection of best network: σB-standard deviation of the error, x MB-mean value of the error modules, Iσ-standard deviation quotient, r-correlation coefficient, The Automatic Network Designer (AND) for testing 10,000 networks was used [42,43]. The performed calculations and a detailed analysis of the literature allowed us to select the best type of neural network architecture. For the discussed case, an MLP (Multilayer Perceptron) network with two hidden layers was selected; i.e., 13 neurons in 1 hidden layer, 20 neurons in the second hidden layer, and 1 neuron at the output layer. For the purposes of training and testing the MLP network, the Ap subset, on the basis of which the neural model was built, was randomly divided into training (U), validation (W), and test (T) sets. The number of harvests kept a constant proportion of 50%, −25%, and −25%. The set sizes were as follows: training: 150 cases, validation: 75 cases, test: 75 cases. MLP 13:13-20-10-1:1 neural network was trained using two methods, i.e., backpropagation (100 epochs) and conjugate gradients (135 epochs).
In order to determine the forecast errors of tuber yield of very early potato cultivars, the differences between the actual values and those predicted by the RY1 and NY1 models were used. Determining the accuracy of the forecasts was carried out by calculating the values of the forecasting properties of the models.
RAE-relative approximation error; RMS-root mean square error; MAE-mean absolute error; MAPE-mean absolute percentage error; where n-number of observations, y i -actual values obtained during the tests, y i average values, y i -values determined by the model.

Neural Network Sensitivity Analysis
The last stage of constructing a NY1 neural model was to perform a sensitivity analysis of the neural network. Such analysis would indicate and broadly interpret the importance (rank) of the explanatory variables in shaping the variability of the explained variable. The result of this test is written in numerical form, and the greater the value assigned to a given independent variable, the more it affects the yield. All values below 1 have a small effect on the yield of the given independent variable. Such a variable can be removed from the model, and then new analyses shall be performed.

Comparing Quality of Forecasting Models of Potato Tuber Yield 40 Days after Full Emergence
The developed RY1 regression model was based on 13 independent variables ( Table 1). As in the case of the NY1 model, the dependent variable was the tuber yield harvested 40 days from full emergence. The detailed results of the multiple regression analysis for the presented independent features and dependent feature are presented in Table 2. Basing on the results from Table 2, the multiple regression equation was constructed, which took the form (5): The selection of the best neural network that forecasts the yield of tubers of three potato varieties-Arielle, Denar, and Viviana-on June 20 was based on the analysis of the values of basic qualitative measures and error values for the training, validation, and test sets. The overall results for the network were taken into account (Table 3).

Forecasting Properties of Linear and Nonlinear Models
The proper functioning of the RY1 and NY1 models was verified by comparing the obtained forecasts with the actual yielding results for Arielle, Riviera, and Viviana tubers.

Forecasting Properties of Linear and Nonlinear Models
The proper functioning of the RY1 and NY1 models was verified by comparing the obtained forecasts with the actual yielding results for Arielle, Riviera, and Viviana tubers. In order to verify the prognostic properties, the data constituting the Bp subset were used. Four measures of forecast accuracy (ex post) were used in the study: relative approximation error (RAE), root mean square error (RMS), absolute mean error (MAE), and mean absolute percentage error (MAPE) ( Table 4). Their calculation enabled determining the quality of the models and their usefulness in the realization of crop forecasts harvested 40 days from full emergence. The results presented in the previous stages were supplemented by additional analyses and visualizations between the values of actual tuber yields achieved 40 days from full emergence and the values forecasted by the RY1 and NY1 models. The results of the analyses are shown in the following Figures 4 and 5.  Yield observed (t * ha -1 )   The NY1 model performed forecasts with greater accuracy. The value of the coefficient of determination R 2 was 0.623. For the RY1 model, this parameter was much lower 0.3483.

The Results of the Sensitivity Analysis of the MLP 13:13-20-10-1:1 Neural Network
Sensitivity analysis, which was carried out for the MLP 13:13-20-10-1:1 neural network, built on the Ap set has shown that the factor with the greatest influence on the yield of three very early potato varieties harvested 40 days from full emergence was the planting date [PLANT] defined in numbers of days from the beginning of the year (Table 5). Removing this variable from the model would increase the cumulative error of the neural network by 1.79 times. The second most significant factor in explaining the variability of the dependent variable was emergence date [EMERG], which was defined in the number of days since the beginning of the year. Removing this model variable would increase the error by 1.57 times. The third important factor was the total dose of nitrogen fertilization [kg•ha −1 ] [NITRO].

Discussion
The research results presented in the following study show that modeling the yield of very early potato varieties during the growing season is reasonable and brings promising application possibilities. Forecasting models are usually created based on the results collected during many years of field experience. Still, the amount of empirical data included in the modeling remains a controversial issue. In many works, the authors of the models use a lot of independent variables [3,12,44,45] or use classical prognostic models developed exclusively for potato: SUBSTOR-Potato [46,47], LINTUL-Potato-DSS Model [48], etc. In that situation, when the model tries to estimate too many unknowns for the number of observations made, the model's ability to detect real relationships is severely limited [49]. The presented prognostic models RP1 and NY1 are based on 13 independent variables aiming to explain the variability of the tuber yield of cultivars Arielle, Riviera, and Viviane harvested about 40 days from full emergence. The models are based on the results of varietal experiments carried out in 2010-2017 at five research sites located in northern and northwestern Poland. According to the data of the Voivodeship Inspectorate of Plant Health and Seed Inspection the Arielle, Riviera and Viviana varieties occupy leading positions in the popularity rankings of varieties grown for propagation purposes.
An important aspect in dealing with artificial neural networks is the selection of the right learning method and network topology. The testing stage of network topology showed that the MLP network is the best network for potato tuber yield forecasting, which very well solves prediction problems in agriculture [50,51]. In the presented work, singledirection learning of two-layer MLP neural networks were based on a two-stage process. In the first step, the generated networks were trained using the backward propagation of errors method. The second stage involved learning with the use of the conjugate gradients method. As far as choosing the right network topology for a specific task is not a big issue, it is quite a problematic to determine the optimal number of neurons creating progressive hidden layers. The selection of the number of neurons in successive layers is a key issue that is decisive for the generalizing properties of the network. Determining their optimal number depends mainly on the experience of the network creator. A relatively large number of neurons improves the computing power of the network; however, exceeding a certain number may result in overfitting the neuron network, which is selected on the basis of a detailed analysis of the value of the quality parameters of the neural network having 20 neurons in the first hidden layer and 10 neurons in the second hidden layer. The obtained results allow us to assume that the model's goodness-of-fit in respect to the data does not always involve the use of many neurons in hidden layers. Applying the right number of neurons is a complicated process, requiring a lot of testing, compromises, and excellent data preparation. Although the complexity of the model depends on the nature of the modeled issue, those neural networks with a relatively simple structure are more desirable and accepted also by other researchers [52,53].
The selection of the network that best fulfills the yield forecasting tasks for three selected potato varieties was based on the results for the following harvests: training, validation, and testing, as well as generalized data for the network. The quality of training, validation, and testing is a measure of accuracy of a trained network. Assessments of neural predictive models can be made on the basis of an analysis of the correspondence between the actual data and those predicted by the model. Such form of presentation of research results can often be found in the literature [54,55]. However, the formulation of more objective conclusions is possible thanks to the quantitative method of model evaluation, taking place in two stages. In the first step, after generating the trained networks, the so-called statistical modeling regression analysis is applied, which include arithmetic mean, calculated on the basis of the database used to build models; standard deviation of the actual data; standard deviation of errors for the dependent variable; mean absolute error; the quotient of the standard deviation of the errors and the standard deviation of actual data; correlation coefficient between actual and forecast values; and determination coefficient, which is a measure of the quality of the model goodness-of-fit with respect to the training data. The results of individual research take into account the values of all regression statistics presented above. One of the most important parameters characterizing the quality of the neural network is the deviation quotient (Iσ) and the value of the determination coefficient (R 2 ). The value of the deviation quotient (Iσ) for the constructed models usually oscillates between 0.1 and 0.7. Values below 0.1 indicate a very good network accuracy, whereas networks with an Iσ value above 0.7 should not be used in modeling. In the case of the MLP 13:13-20-10-1:1 network, the value of the deviation quotient was 0.455, which indicates its satisfactory quality. The value of the determination coefficient R 2 is in the range 0-1. The model goodness-of-fit is even better, once the value of R 2 is closer to oneness. The value of this parameter for the MLP 13:13-20-10-1:1 network is 0.793 and is greater than for the regression model RY1 (R 2 = 0.532).
The forecasting process begins with the determination of predictive values. Once having pairs of actual and forecasted values, the values of forecast errors (ex post) are determined. i.e., RAE, RMS, MAE, and MAPE [56][57][58]. The above measures of predictive properties were calculated separately for the regression model RY1 and the neural model NY1.
A highly effective parameter determining the quality and usefulness of performed forecasts is the mean absolute percentage error (MAPE). It can be interpreted as the average percentage deviation between the forecast value and the actual implementation. Peng et al. [59] provide threshold values for a proper MAPE evaluation. If the error is less than 10%, then the degree of goodness-of-fit of the model is perfect, while a range from 10 to 20% indicates a good fit and 20 to 30% at the level of acceptance. The MAPE above 30% implies poor accuracy of the model and disqualifies it from practical use. In the individual research, MAPE achieved low values. For the RY1 model, it was as much as 15.667%, and for the NY1 model, it was as much as 7.203%. These results prove a model's goodness-of-fit (especially the neural model) to the real dataset and thus enable great application possibilities. Moreover, the MAPE values are widely used for interpreting the usefulness of predictive models by other researchers [54,60]. For example, the MAPE error values (0.5%) were applied to evaluate the forecasts made by Khoshnevisan et al. [61], who estimated the potato yield on the basis of energy inputs using intelligent systems, based on adaptive neuro-fuzzy inference systems (ANFIS) and artificial neural networks (ANNs). The following interpretation tells us that the discussed model can lead to very reliable forecasts. Bearing in mind the significant influence of random factors on natural processes of discussed modeling, our obtained results of measures of the prognostic properties of neural models are satisfactory, which indicate the possibility of using these tools in practice.
The results of the above studies show that the goodness-of-fit between the actual values and those predicted by the RY1 and NY1 models was statistically significant (α = 0.05). Such a comparison covered 24 cases constituting the Bp verification subset. The relations between the discussed values were described by equations (Figures 4 and 5) and the determination coefficients R 2 . The NY1 neural model was characterized by a better fit of the generated forecasts in relation to the actual yield (R 2 = 0.8623) comparing to the regression model RY1 (R 2 = 0.3483). This is another argument in favor of neural modeling over regression methods in yield prediction.
The obtained results by means of sensitivity analysis of the MLP neural network 13:13-20-10-1:1 forming the NY1 model are fully consistent with the available literature. The analysis confirmed the significance of all tested independent variables in explaining the yield variability of Arielle, Riviera, Viviana cultivars. The most important factors in shaping the yields of these genotypes were planting date [PLANT] date of emergence [EMERG] and the total dose of nitrogen fertilization [NITRO]. The performed multiple regression analysis excludes the significance of the daily average air temperature [TEMP], the sum of phosphorus fertilization [PHOSP], and the sum of potassium fertilization [POTAS] in determining the yield at the assumed significance level α = 0.05. Factors unrelated to the inputs in potato cultivation, but-as indicated from individual research, having a great impact on the final yield-are planting date and date of emergence. Many authors emphasize that delaying the planting date has adverse effects on the growth and development of the potato, causing the shortening of successive phases of plant development. Research by Kawakami et al. [62], although conducted in completely different climatic conditions, confirm that early planting promotes an increase in tuber yield. The early planting date for very early potato varieties in the Baltic Sea region carries the risk of reducing the yield of potatoes intended for early harvest. The recommended planting period is the turn of April and May. If during this period, good thermal and humidity conditions occur, the emergence of plants is observed even several days after planting. The results of the conducted research show that among all the nutrients provided to the cultivation of very early potato varieties, nitrogen plays a fundamental role. Maintaining a rational fertilization level with this element is particularly important in the production of varieties intended for early harvest, being a typical yield-forming nutrient, both in quantitative and qualitative terms [63]. According to some authors, each increase in nitrogen dose causes a marked increase in potato productivity compared to the lower dose [64]. In turn, Olivier et al. [65] and Jamaati-e-Somarin et al. [66] report that the increase in tuber yield is observed with respect to certain doses of fertilization; however, after exceeding the upper limit, the increase in yield is no longer statistically significant, and the yield decreases. Moreover, nitrogen deficiency may result in premature aging of plants or a visible reduction in yield [67,68]. On the other hand, increasing doses of nitrogen fertilization lead to a significant deterioration of the quality characteristics of tubers-the reduction of starch content and dry matter content [69]. One should keep in mind that excessive nitrogen application creates environmental problems related to nitrate leaching or run-off [70]. In the case of growing very early and early potato varieties, relatively low doses of nitrogen should be applied. Plants fertilized with high doses of nitrogen absorb it intensively, starting from the early stages of development, but due to the shortened growing season, they do not metabolize it completely. In such cases, nitrates (V) accumulate in tubers intended for harvest.
Growing interest in specialized cultivation of potato, in connection with a declining supply of agricultural land, introduces the need for greater control of plant yield, conscious management of production, and making the right decisions before harvesting [54]. The presented results of individual research show that artificial neural networks are a very useful tool in forecasting the yield of very early potato varieties: Arielle, Viviana, and Riviera. Pre-harvest forecasts are a valuable source of knowledge prior to the harvest, sales, and storage of agricultural produce.

Conclusions
The presented modeling methods are an extension of the forecasting models used so far. One of the innovations in the presented concept of yield modeling is the possibility of performing simulations before harvesting in the current agrotechnical season. The presented models can be applied in precision agriculture as an element of decision support systems. Detailed analysis of the values of ex post predictive measures indicate a greater accuracy of the neural model in the implementation of the forecasts than the regression model. The MAPE value for the NY1 model, amounting to 7.203%, proves highquality prediction.
Working with predictive models, regression and neural models, is burdened with certain limitations. In the case of regression models, it is impossible to use data in qualitative form. For building models, it is recommended to use the full set of experimental data. Some of these limitations also apply to neural models. To be able to fully use the model, it is necessary to obtain complete sets of source files: the set for building the network, the network file, and the verification set. A great advantage of neural models is their ability to deal with incomplete source datasets and their ability to work with qualitative data. Of course, the research results we present relate to an unambiguous cultivation site and specific potato varieties. This, too, is some barrier to universal use of the model. It should be remembered that our research used only those variables that we could obtain from official experiments conducted by COBORU. It is worth mentioning that classic-known models of potato growth, development, and yielding are often not easy to use in practice. They require conducting strict experiments, taking measurements with the use of specialist equipment, etc. So far, no universal model has been elaborated, predicting potato yield for the whole continent, all cultivars admitted to cultivation on a given area, various agrotechnical methods.
Future research on the improvement of neural models in the production of very early potato varieties can be carried out on several levels. First of all, the influence of other equally important independent variables should be carefully considered, often occurring in linguistic form. Secondly, greater number of plots should be taken into account to increase the dataset. This would allow for the implementation of even more accurate forecasts for the adopted research area. Finally, optimization of significant controllable independent factors toward maximizing the tuber yield is needed.