Neural Modeling of the Distribution of Protein, Water and Gluten in Wheat Grains during Storage

: An important requirement in the grain industry is to obtain fast information on the quality of purchased and stored grain. Therefore, it is of great importance to search for innovative solutions aimed at the monitoring and fast assessment of quality parameters of stored wheat The results of the evaluation of total protein, water and gluten content by means of near infrared spectrometry are presented in the paper. Multiple linear regression analysis (MLR) and neural modeling were used to analyze the obtained results. The results obtained show no signiﬁcant changes in total protein (13.13 ± 0.15), water (10.63 ± 0.16) or gluten (30.56 ± 0.54) content during storage. On the basis of the collected data, a model artiﬁcial neural network (ANN) MLP 52-6-3 was created, which, with the use of four independent features, allows us to determine changes in the content of water, protein and gluten in stored wheat. The chosen network returned good error values: learning, below 0.001; testing, 0.015; and validation, 0.008. The obtained results and their interpretation are an important element in the warehouse industry. The information obtained in this way about the state of the quality of stored grain will allow for a fast reaction in case of the threat of lowering the quality parameters of the stored grain. sedimentation, water absorption, dough development time, dough stability time, dough softening degree, tenacity, extensibility, strength and baking test (loaf volume and weight) [41]. In total, 79 ﬂour samples of di ﬀ erent wheat cultivars grown in di ﬀ erent regions of Turkey were subjected to chemical analysis and the results of both NIR spectrum (400–2498 nm) and chemical analysis were used for training / testing the network using di ﬀ erent ANN architectures.


Introduction
It is a growing challenge and problem for mankind to meet the growing demand and need for adequate food supplies that result from the growing world population. By 2050, the population is projected to grow to 9.1 billion and food production will have to increase by about 70% to cover all food needs [1][2][3]. It is also estimated that the majority of this population growth will be attributed to developing countries, and it should be noted that at present some of these countries are facing the problem of hunger and lack of adequate food. Additional factors influencing the increase in demand for food will be increased urbanization, climate change and land use for non-food production [4][5][6][7].
Wheat, maize, rye, oats and rice grains are an important source of nutrients and energy. They have to be transported and stored in a manner that ensures their parameters and appropriate quality indicators [8,9]. The storage of cereal grains has been practiced for millennia. Cereals need to be transported and stored correctly in order to maintain their quality, because the production of cereal plants is seasonal in nature and, depending on the location, takes place under different conditions. The aim of the study was to determine the changes in the content of water, protein and gluten in cereals during the storage process in silos. Near-infrared spectrometry with an Infratec 1241 analyzer and neural modelling methods allowing for the analysis of the obtained empirical results were used in the study. These methods will be used to determine important process parameters during the storage of wheat grain.

Materials and Methods
The research was carried out from 3 June to 30 December 2019 in grain warehouses on the premises of the production plant located in Brzeg municipal district, Opolskie Voivodeship. Quality wheat variety called "Patras", which is winter wheat, was tested. Samples were taken twice a week at the same time intervals, every 3 days at 7:00 a.m. (Monday and Thursday). Fifty series of research were carried out. In each series, five samples were taken from a randomly selected batch of stored grain, and their average final score was obtained. In total, 250 results were analyzed. Wheat grain, which was stored in grain warehouses with a usable area of 1217.3 m 2 , 885 m 2 and 1173 m 2 , was tested.
The weight of a single sample was 50 g, give or take 0.01 g. Samples were collected in accordance with the PN-EN ISO 24333:2009 standard, in which all variants of sampling in cereal material are included [32].
The samples were collected using a prototype multi-chamber probe with an overall length of 150 cm, which was equipped with a humidity and temperature sensor type AM2302 (Figure 1). The probe enables the acquisition of a representative cross-sectional sample at a depth of up to 120 cm and for the study of temperature and humidity in the intergranular space at a depth of about 100 cm. An ARDUINO programmable board with an AM2302 sensor was used to build the prototype. The sensor parameters were: sensor AM2302 (DHT22); supply voltage from 3.3 V to 5.5 V; module dimensions: 40 mm × 15 mm × 10 mm. The temperature measurement ranged from -40 °C to +80 °C, the resolution was 8 bits (0.1 °C), with an accuracy of +/−0.5 °C. The measuring range of humidity was from 0% to 100% RH, resolution of 8 bits +/− 0.1% RH, with an accuracy of +/−2% RH. The measurement was made according to the following procedure: connection of the probe to the control element, starting the device, putting the probe in the mass of grain, starting the measurement, the collection of grain, pulling the probe out of the mass of grain, pouring grain into the container, ending the measurement. To determine the levels of protein, water and gluten content, the near-infrared spectrometry method was used using the Infratec 1241 grain analyzer by Foss Analytical (Hilleroed, Denmark) [33].
The analysis of cereals using the near-infrared technique is characterized by the highest accuracy for measurements in the transmission mode and not in the reflection mode [33]. Transmission measurements were performed in the wavelength range 570−1050 nm, while basic quality information in reflective measurements was obtained in the wavelength range 1100−2500 nm. Higher To determine the levels of protein, water and gluten content, the near-infrared spectrometry method was used using the Infratec 1241 grain analyzer by Foss Analytical (Hilleroed, Denmark) [33].
The analysis of cereals using the near-infrared technique is characterized by the highest accuracy for measurements in the transmission mode and not in the reflection mode [33]. Transmission Sustainability 2020, 12, 5050 4 of 14 measurements were performed in the wavelength range 570−1050 nm, while basic quality information in reflective measurements was obtained in the wavelength range 1100−2500 nm. Higher radiation energy in the wavelength range 570-1050 nm enables deeper penetration of particles by the measuring beam; not only their surface, but also the whole interior is covered by the measurement. This allows larger sample volumes to be used for transmission measurements, enabling greater representativeness of the analyzed samples. The obtained results were subjected to statistical analysis using the Statistica 13.1 package. The multiple linear regression (MLR) method was applied, on the basis of which statistical significance was analyzed and a graph of the scattering of observed values to predicted values was created. As a result of this analysis, it was confirmed that the analyzed data were not linear. Therefore, artificial neural networks were used for further analyses. It was assumed that under repeatability conditions the input value X i is Q. For n statistically independent observations (n > 1), an estimator of the input value Q determined by x was the average value of single observations x j (j = 1, 2, 3, .... n). The standard uncertainty associated with the estimator x was determined by the probability distribution s 2 (x) of xj expressed by the equation: When planning the regression model, the correlation of individual quantities was previously examined based on the Pearson's correlation. The model was built using the Multiple Regression module in Statistica 13.1, which gives access to the optimization of the model, among others, by using stepwise regression. This means that in the first step a standard regression was performed, as a result of which significant and statistically insignificant variables were indicated for a given model for the level of significance α = 0.05. If statistically insignificant variables are indicated, despite previous confirmation of the correlation, stepwise regression was applied. This ensured that only dependent variables were introduced to the model. In this case, backwards stepwise regression was applied, which initially assumes the introduction of all predictors and their stepwise removal from the model. The model fitting analysis was possible based on two coefficients: determination and F. The F (Fisher) test was carried out within the framework of the multiple regression module and served to assess the significance of the model by examining the ratio of variance estimates. The final assessment of model fitting was made on the basis of predicted and observed value charts.
The neural network wizard in Statistica 13.1 was used for neural modeling, which includes mainly multilayer perceptrons (MLPs). Networking is based on input, hidden and output layers. Each variable has its own input layer neuron. For the output layer, when using a neural network in the regression issue, there is only one neuron. Neurons of the hidden layer are involved in information processing. In addition to the correct classification of predictors to the individual layers, it is also important to divide the data set into learning, validation and testing data at a ratio of 70:15:15. One hundred networks were created in order to find the best fitting network, of which the top five were selected.
To assess the quality of the network, the errors of the network were taken into account. These included the learning, validation and testing error. The most important figure is the learning error, as the network is adjusted to the learning cases. It is also important that other errors are compared in size and slightly larger than the learning error. For similar error values, the quality of the network is taken into account. A well-fitted network is characterized by a similar quality of responses to the learning and validation data.
In addition, a sensitivity analysis can be performed in neural networks. This process analyzes the input data to assess a rank of relevance, assigning them a degree of utility. In the case of Statistica 13.1, such analysis is performed automatically. The procedure assumes the presentation of network results in such a way that in subsequent repetitions the data will be converted into data gaps. When presenting data, the total error is calculated as in the case of standard learning. As a measure of sensitivity, we treated the quotient of the error obtained when starting the network without a variable and with a set of variables. If the result was greater than 1, this variable was important, and the greater the result of the variables, the greater its usefulness. If the result was equal to 1, removing the variable does not affect the result. On the other hand, in the case of values below 1, removing a variable may even improve the quality of the network.

Results
The average levels of protein, water and gluten are shown in Table 1 Table 1. Average protein, water and gluten content in wheat samples.

Results
The average levels of protein, water and gluten are shown in Table 1   The protein and gluten content of the analyzed wheat samples were within the standard ( Table  1). In quality wheat, an amount of gluten less than 20% and a protein content less than 9.5% of dry matter indicate the suitability of the grain for the production of flour for certain confectionery products only. The minimum criterion of wheat grain suitability for the production of flour for breadmaking purposes is the amount of gluten, at least 25%, and protein content, at least 11.5% of dry matter. The grain, which may play the role of the so-called "improving agent" in milling mixes with grain of medium or low baking value, should contain over 14% protein and show an amount of gluten higher than 30%. [4]. The gluten content in the grain tested ranged from 29 to 32%.      As can be seen in Figures 2-4, the protein content of the tested wheat samples is more differentiated than that of gluten and water. The increase in protein content at the beginning of the measurements could result from the differences in harvesting, dominant weather conditions in the area and cultivation technology. In grain harvested from the field, post-harvest ripening processes take place. They are longer if the weather during the formation of protein and gluten is rainy. The optimum value of the grain is reached within about 6 weeks after harvesting. Cultivation technologies-medium-intensive and intensive-can also have a significant impact on the protein content of the grain [34]. In a further part of the study, the statistical analysis of protein, water and gluten content in the tested samples during storage in individual weeks was performed. Correlations between the data were examined in order to examine the variability of the content of three fractions during grain storage. The values show Pearson's coefficient of linear correlation, and the results are collected in Table 2. 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 A Sample number 28 28.5  7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47   The protein and gluten content of the analyzed wheat samples were within the standard (Table 1). In quality wheat, an amount of gluten less than 20% and a protein content less than 9.5% of dry matter indicate the suitability of the grain for the production of flour for certain confectionery products only. The minimum criterion of wheat grain suitability for the production of flour for bread-making purposes is the amount of gluten, at least 25%, and protein content, at least 11.5% of dry matter. The grain, which may play the role of the so-called "improving agent" in milling mixes with grain of medium or low baking value, should contain over 14% protein and show an amount of gluten higher than 30%. [4]. The gluten content in the grain tested ranged from 29 to 32%.
As can be seen in Figures 2-4, the protein content of the tested wheat samples is more differentiated than that of gluten and water. The increase in protein content at the beginning of the measurements could result from the differences in harvesting, dominant weather conditions in the area and cultivation technology. In grain harvested from the field, post-harvest ripening processes take place. They are longer if the weather during the formation of protein and gluten is rainy. The optimum value of the grain is reached within about 6 weeks after harvesting. Cultivation technologies-medium-intensive and intensive-can also have a significant impact on the protein content of the grain [34]. In a further part of the study, the statistical analysis of protein, water and gluten content in the tested samples during storage in individual weeks was performed. Correlations between the data were examined in order to examine the variability of the content of three fractions during grain storage. The values show Pearson's coefficient of linear correlation, and the results are collected in Table 2. In order to generate a regression model, multiple regression was performed. Since all cases of correlation showed dependencies to their structure, they were all taken into account. However, the original regression model based on four variables showed variables that were statistically insignificant. The summary of regression modelling is presented in Table 3. Thus, the multiple stepwise backward regression method was applied to eliminate the insignificant variable. A summary of the regression model using the backwards stepwise regression method is presented in Table 4. On the basis of the two models, the increase in Fisher's test parameter F is also worth noting from 1044.2 for four parameters (degree of freedom = 3) to 1559.4 for three parameters (degree of freedom = 2). A higher parameter value means a better model fit. In order to examine whether this is not a falsely high correlation, the data were checked on the basis of a graph of predicted values from those observed for the multiple regression model (Figures 5 and 6). As you can see, there are deviations that suggest that despite the high regression coefficient this model is not correct. As mentioned in the Materials and Methods section, the final evaluation of the models, obtained on the basis of multiple regressions, will take place on the basis of graphs. Due to the occurring deviations, an attempt was made to create a non-deterministic model based on neural networks. To generate neural networks, the data were remodeled: output data for each of the three characteristics describing water, protein and gluten content were added, depending on the week. This means that the initial value of k-week was the final value of (k-1)-week. Due to the properties of the network, it was decided not to exclude protein despite the results of earlier modeling with multiple regression, because the influence of this variable, in the case of a non-deterministic model, could be significant. Eventually, the data were divided into three types: • Qualitative input: week; • Quantitative input: water, protein, gluten; • Quantitative output: water output, protein output, gluten output. Table 5 shows the learning parameters of five selected networks. On the basis of the obtained correlation coefficients between the given values and quality of learning, the second neural network of MLP 52-6-3 was selected. This network was characterized by the highest quality values: learning, 0.999; testing, 0.978; and validation, 0.948. Moreover, it had good values of errors: learning, below 0.001; testing, 0.015; and validation, 0.008.
On the basis of the generated MLP 52-6-3 network, the changes of water, protein and gluten content in wheat during storage are graphically presented in Figures 7-9. As mentioned in the Materials and Methods section, the final evaluation of the models, obtained on the basis of multiple regressions, will take place on the basis of graphs. Due to the occurring deviations, an attempt was made to create a non-deterministic model based on neural networks. To generate neural networks, the data were remodeled: output data for each of the three characteristics describing water, protein and gluten content were added, depending on the week. This means that the initial value of k-week was the final value of (k-1)-week. Due to the properties of the network, it was decided not to exclude protein despite the results of earlier modeling with multiple regression, because the influence of this variable, in the case of a non-deterministic model, could be significant. Eventually, the data were divided into three types: • Qualitative input: week; • Quantitative input: water, protein, gluten; • Quantitative output: water output, protein output, gluten output. Table 5 shows the learning parameters of five selected networks. On the basis of the obtained correlation coefficients between the given values and quality of learning, the second neural network of MLP 52-6-3 was selected. This network was characterized by the highest quality values: learning, 0.999; testing, 0.978; and validation, 0.948. Moreover, it had good values of errors: learning, below 0.001; testing, 0.015; and validation, 0.008.
On the basis of the generated MLP 52-6-3 network, the changes of water, protein and gluten content in wheat during storage are graphically presented in Figures 7-9.   As mentioned in the Materials and Methods section, it is also important to determine the sensitivity of the network. In Statistica software, the network sensitivity analysis is performed   As mentioned in the Materials and Methods section, it is also important to determine the sensitivity of the network. In Statistica software, the network sensitivity analysis is performed As mentioned in the Materials and Methods section, it is also important to determine the sensitivity of the network. In Statistica software, the network sensitivity analysis is performed automatically. For the MLP 52-6-3 network, the following sensitivity values were obtained for individual data (Table 6). Thus, the significance of all the variables in the model was also demonstrated, as no value of the error quotient was less than or equal to 1.

Discussion
On the basis of the conducted tests, we can conclude that the water, protein and gluten content of wheat during storage was within the standard. The statistical analysis and neural modeling did not show significant change in water, protein, and gluten content during wheat storage. The generated neural network MLP 52-6-3 satisfactorily described changes in water, protein and gluten content in wheat during storage. Moreover, it had good values of errors: learning, below 0.001; testing, 0.015; and validation, 0.008. In their study, Niedbała et al. also stated that the use of artificial neural networks for predicting yield in winter rapeseed was satisfactory [35]. According to the authors, the network that turned out to be appropriate for predicting was the MLP 21:21-13-6-1:1, characterized by a learning error of 0.07376, testing of 0.08536 and validation of 0.07346 [36]. A correctly constructed predictive model should correctly describe the analyzed phenomenon [28], which means that the model should be similar to the tested empirical system. Therefore, the common problem is to choose an appropriate neural network topology for a given problem, which most often takes place by reviewing many variants of network topology. For predictive applications, the MLP network is the most commonly used. [37][38][39][40]. In wheat and flour processing, quality control requires fast analytical tools to predict physical, rheological and chemical properties. Mutlu et al. [41][42][43] used near infrared spectrometry (NIR) combined with an artificial neural network to predict flour quality parameters such as the protein content, moisture content, Zeleny sedimentation, water absorption, dough development time, dough stability time, dough softening degree, tenacity, extensibility, strength and baking test (loaf volume and weight) [41]. In total, 79 flour samples of different wheat cultivars grown in different regions of Turkey were subjected to chemical analysis and the results of both NIR spectrum (400-2498 nm) and chemical analysis were used for training/testing the network using different ANN architectures. The results obtained gave also very good accuracy with the coefficient of determination (R 2 ) from 0.83 to 0.952, respectively. The results indicate that NIR in combination with ANN can be successfully used for forecasting wheat flour quality parameters [41]. Studies on wheat quality indicators also point out to a significant correlation between protein and gluten content during storage. However, an attempt to apply linear multiple regression for the results obtained in the study showed that this relation cannot be treated as linear. The neural network that was chosen to describe the regression model is characterized by good parameters. The learning error, which for the selected network was the lowest and amounted to 0.001, was taken into account. This network was also characterized by very similar values of validation and test errors, which were 0.008 and 0.015, respectively. As the value of errors of other networks were comparable, for the purpose of the evaluation of the network, the quality of responses to the learning and validation data was also taken into account. The smallest difference in these two types of quality was shown by the previously selected MLP 52-6-3 neural network. This means that this network shows the best fit, which is also confirmed by the graphs showing the response surface of individual variables. In the studies of Mao et al. [44] on wheat protein content, the algorithm of near infrared reflection and the RBF (Radial Basis Function) algorithm of neural networks were applied [44]. This method was used due to its high speed and better efficiency compared to the traditional method. The algorithm was proposed to optimize the concentration of data in hidden layers of the RBF neural network. The experimental analysis showed that the improved algorithm significantly reduced the complexity of the network and improved the quality and learning speed of RBF networks [14]. Since there are no studies on the modeling of proteins in plant material during processing, Pietsch et al. [9] made attempts at neural modeling. They found that the forecasting of gluten behavior using neural modelling is more accurate than that resulting from isothermal modelling. The RBF neural network is a type of feedback network that is widely used because of its strong global optimization capacity and good generalization ability. Therefore, in studies, NIR's and RBF neural networks were combined to establish a predictive model for wheat protein measurement [45,46]. In real applications, the number of cluster centers in hidden layers of the RBF neural network and the output mass value have a large impact on the performance of the RBF neural network, so establishing the exact number of cluster centers in hidden layers is crucial, and the wrong choice of the number can easily lead to a "curse of dimensionality" [47]. However, so far, there are no effective methods for the theoretical calculation of the optimal number of clusters, so this value can be only obtained by several experiments. To some extent, this increases the difficulty of using the RBF neural network and limits the widespread use of the RBF neural network in practice.

Conclusions
On the basis of the research and neural modelling, it can be concluded that the water (10.63 ± 0.16), protein (13.13 ± 0.15) and gluten (30.56 ± 0.54) contents in wheat samples were within the standard. The results of the protein content determination were characterized by greater variability than the results of the gluten and water content. The obtained correlation results do not indicate a statistically significant relationship between the protein, water and gluten content during storage. The application of artificial neural networks made it possible to determine changes in the content of water, protein and gluten in wheat during storage. The network suitable for forecasting is the MLP 52-6-3 type network with good error values: learning, below 0.001; testing, 0.015; and validation, 0.008. All input variables used in the model were shown to be significant due to the fact that no error quotient value was lower than or equal to 1. The proposed system of sampling with a multi-chamber probe enabled correct sampling. The modality of the result interpretation is essential for the storage industry. Currently, there is a constant need to search for new solutions allowing for the fast evaluation of the selected quality parameters of stored grain. Obtaining information about the quality of stored grain will allow rapid reaction in the case of a threat related to lowering the quality parameters of the stored grain. The proposed method can be successfully applied in companies purchasing and storing wheat and allows for a rapid assessment of the selected quality parameters of stored grain, enabling the control of different Triticum species.