Use of Artificial Neural Network Model for Rice Quality Prediction Based on Grain Physical Parameters

The main goal of this study was to test the ability of an artificial neural network (ANN) for rice quality prediction based on grain physical parameters and to conduct a comparison with multiple linear regression (MLR) using 66 samples in duplicate. The parameters used for rice quality prediction are related to biochemical composition (starch, amylose, ash, fat, and protein concentration) and pasting parameters (peak viscosity, trough, breakdown, final viscosity, and setback). These parameters were estimated based on grain appearance (length, width, length/width ratio, total whiteness, vitreous whiteness, and chalkiness), and milling yield (husked, milled, head) data. The MLR models were characterized by very low coefficient determination (R2 = 0.27–0.96) and a root-mean-square error (RMSE) (0.08–0.56). Meanwhile, the ANN models presented a range for R2 = 0.97–0.99, being characterized for R2 = 0.98 (training), R2 = 0.88 (validation), and R2 = 0.90 (testing). According to these results, the ANN algorithms could be used to obtain robust models to predict both biochemical and pasting profiles parameters in a fast and accurate form, which makes them suitable for application to simultaneous qualitative and quantitative analysis of rice quality. Moreover, the ANN prediction method represents a promising approach to estimate several targeted biochemical and viscosity parameters with a fast and clean approach that is interesting to industry and consumers, leading to better assessment of rice classification for authenticity purposes.


Introduction
With the varying market valorization of rice (Oryza sativa L.) production, continuous control of its quality, authentication, and contamination issues is required. Rice quality can be evaluated from grain physical parameters, as well as the milling performance, biochemical composition, and cooking properties [1]. The grain physical parameters include the external and integral properties, such as its appearance (size, shape, smoothness, color), weight, hardness, volume, and flow properties, and are of paramount importance in all activities from harvesting, drying, handling, and storage to milling, packaging, marketing, cooking, product-making, and utilization of rice [1]. In addition to the grain physical parameters, rice quality is characterized by basic chemical composition such as protein, moisture, fat, ash, and amylose content, as well as gelatinization temperature, gel consistency, and pasting viscosity. Amylose content is highlighted due to its correlation with the pasting and retrogradation behavior, influencing the textural properties of cooked rice and the dynamic viscoelasticity of rice starch gel [2]. Proteins and lipids are also parameters currently accepted to define rice quality during processing and storage [3,4]. The acceptability and the commercial value of the paddy according to industrial standards are mostly based on grain milling performance such as the husked, milled, and milled

Milling Yields and Grain Appearance
The potential yields of husked, milled, and head rice were determined according to ISO 6646, 2011 [21]. Biometric parameters of polished rice grains such as length (L), width (W), length/width ratio (L/W), chalkiness (CH), total whiteness (TW), and vitreous whiteness (VW), were evaluated in 50 g samples by image processing (S21 model and software, Suzuki, Brazil).

Biochemical Composition
The polished rice samples were ground using a Cyclone Sample Mill (falling number 3100, Perten, Stockholm, Sweden), with a 0.8 mm screen. Starch (ST), protein (P), fat (FA), and ash (AS) content were assessed using NIR transflection MPA equipment (Bruker Optics, Ettlingen, Germany). The calibrations used were provided by Bruker Company (Billerica, MA, USA). For each sample, approximately 25 cm 3 of rice flour was loaded in a circular cup and pressed slightly to obtain a similar packing density. Sixteen consecutive scans were performed for a wavenumber range (12,000-4000 cm −1 ), at 16 cm −1 resolution. For each rice sample, two spectra were obtained. Amylose (AMY) content was quantified using a standard curve developed from absorbance values of 4 calibrated samples from standard rice varieties (IR8, IR24, IR64, and IR65) obtained from the International Rice Research Institute. The amylose content was determined using a colorimetric technique with a spectrophotometer (Hitachi, Tokyo, Japan) at 720 nm, according to the ISO 6647-2:2015 method [22]. The determination and evaluation of biochemical parameters were performed in duplicate. The value considered is the average of both samples obtained.

Pasting Parameters
The paste gelatinization and viscosity properties of rice were assessed using a viscosity analyzer (RVA-4, Newport Scientific, Warriewood, Australia). Peak viscosity (PV), setback (ST), breakdown (BD), trough (TR), and final viscosity (FV) were determined according to the AACC International Approved Method 61-02.01. The determination and evaluation of physical parameters were performed in duplicate.

Multilinear Regression
Multiple linear regression (MLR) was used to develop a model for predicting the biochemical parameters that characterize the rice grain. MLR is one of the oldest regression methods, being used to establish linear relationships between several independent variables (Xi) and the dependent variable (sample property) (Y) that depends on them. The model can be represented in the following Equation (1): where y represents the sample property, b 0 the intercept, b i represents the computed coefficient for each variable x i , while e i,j is the standard estimation error. Each independent variable was analyzed and correlated with the specific property y j . After the MLR model was developed, the accuracy in the prediction of the dependent variable was evaluated by computation of the correlation coefficient, which is calculated when true values are compared to predicted ones. The determination coefficient (R 2 ) is one of the most used  (2)).
The statistical analysis of several parameters was performed using the data analysis toolbox in Excel software for ANOVA processing.

Artificial Neural Network (ANN)
The ANN consisted of input and one hidden and one output layer. The number of nodes of the input layer corresponds to the number of variables tested, while the number of neurons in the output layer corresponds to the number of classes. The number of hidden layers and neurons depends on the complexity of the task and the quantity of training data. In the hidden and output layer, each neuron was connected to all the nodes in the proceeding layer by an associated numerical weight. A neural network is an adaptable system that learns relationships from the input and output datasets and predicts a previously unseen dataset of similar characteristics to the input set [23]. A multilayer perceptron (MLP) is a widely used neural network architecture for regression problems, using the backpropagation learning algorithm [24][25][26]. MLPs are usually used for prediction and classification using suitable training algorithms for the network weights ( Figure 1). In the ANN models developed, a three-layer network architecture was established, consisting of one input layer, one hidden layer, and one output layer. The input layer accepts the data and the hidden layer processes them, and, finally, the output layer displays the resultant outputs of the model [27,28].
where y represents the sample property, b0 the intercept, bi represents the computed coefficient for each variable xi, while ei,j is the standard estimation error. Each independent variable was analyzed and correlated with the specific property yj. After the MLR model was developed, the accuracy in the prediction of the dependent variable was evaluated by computation of the correlation coefficient, which is calculated when true values are compared to predicted ones. The determination coefficient (R 2 ) is one of the most used statistical parameters for the assessment of the developed model regardless of the model type (Equation (2)).
The statistical analysis of several parameters was performed using the data analysis toolbox in Excel software for ANOVA processing.

Artificial Neural Network (ANN)
The ANN consisted of input and one hidden and one output layer. The number of nodes of the input layer corresponds to the number of variables tested, while the number of neurons in the output layer corresponds to the number of classes. The number of hidden layers and neurons depends on the complexity of the task and the quantity of training data. In the hidden and output layer, each neuron was connected to all the nodes in the proceeding layer by an associated numerical weight. A neural network is an adaptable system that learns relationships from the input and output datasets and predicts a previously unseen dataset of similar characteristics to the input set [23]. A multilayer perceptron (MLP) is a widely used neural network architecture for regression problems, using the backpropagation learning algorithm [24][25][26]. MLPs are usually used for prediction and classification using suitable training algorithms for the network weights ( Figure 1). In the ANN models developed, a three-layer network architecture was established, consisting of one input layer, one hidden layer, and one output layer. The input layer accepts the data and the hidden layer processes them, and, finally, the output layer displays the resultant outputs of the model [27,28]. A hyperbolic tangent sigmoid transfer function was used at the input layer and the hidden layer, and a pure line transfer function was used at the output layer. The number of neurons for the input layer is equal to the number of input variables introduced to the networks. According to the biochemical and pasting parameters, the output layer contains one neuron for each parameter in the study. A total of 40 samples out of 66 samples were used for training, and the rest were equally divided for validation and testing (26 samples). Each node, except for the input, is a neuron that is based on a nonlinear A hyperbolic tangent sigmoid transfer function was used at the input layer and the hidden layer, and a pure line transfer function was used at the output layer. The number of neurons for the input layer is equal to the number of input variables introduced to the networks. According to the biochemical and pasting parameters, the output layer contains one neuron for each parameter in the study. A total of 40 samples out of 66 samples were used for training, and the rest were equally divided for validation and testing (26 samples). Each node, except for the input, is a neuron that is based on a nonlinear activation function. The MLP can be regarded as a hierarchical mathematical function planning some set of input values to output values via many simpler functions. Three different numbers of hidden nodes (4, 8, and 12) were used for the selection of the best models. The multilayer feedforward fully connected ANN was trained with the Broyden-Fletcher-Goldfarb-Shanno (BFGS) learning algorithm (200 epochs). The number of neurons in the hidden layer was optimized through an early-stop learning procedure. In this procedure, the best topology of the ANNs was searched using the training, validation, and testing datasets. According to the R 2 and RMSE values, the best ANN models were developed to predict the different biochemical and rheological parameters. Normally, the nodes are fully linked between layers, and therefore the number of parameters quickly increases with a considerable risk of overfitting [29]. Activation functions of the artificial neurons in hidden layers are necessary for the network to be able to learn nonlinear functions. For the implementation of the backpropagation algorithm, the hyperbolic tangent function was used (tansig). The testing models were verified based on the determination coefficient R 2 and root-mean-square error (RMSE). In terms of the model performance analysis, the RMSE and R 2 of calibration and validation data were used. The smaller RMSE indicates the better performance of models (Equation (3)), where n represents the number of the observations in the test data,ŷ is the values of the output in the test data, and y represents the value of the predicted output [30]. A significance level of α = 0.05 was used.
During the ANN development the Levenberg-Marquardt algorithm, derived from Newton's method, was designed for minimizing functions that are sums of squares of nonlinear functions [31,32]. The Levenberg-Marquardt algorithm is a combination of the gradient descendent rule and the Gauss-Newton method. The algorithm uses a parameter to decide the step size, which takes large values in the first iterations (equivalent to the gradient descent algorithm), and small values in the later stages [33]. The Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization algorithm, usually used for nonlinear least squares, is presented with the modified backpropagation algorithm, yielding a new fast training multilayer perceptron (MLP) algorithm (BFGS/AG).
The ANN models were developed using MATLAB ® software (R2017a) (MathWorks, Inc; Natick, MA, USA). The MLR models were developed using the Excel application. Ten models were developed separately for predicting biochemical (FA, P, AS, AMY, and ST) and pasting parameters (SB, TR, PV, and FV) based on grain appearance (L, W, L/W, C, TW, and VW) and milling yields (MYH, MYM, and MIY).

Multilinear Regression
The aim of the present study was to evaluate whether the multilinear regression (MLR) technique and artificial neural network (ANN) algorithms could effectively predict rice biochemical and pasting parameters based on the grain appearance and milling yields. MLR models established a relationship between the biological and processing factors and, consequently, the quality feature. The coefficients related to the MLR model, p-value, determination coefficient (R 2 ), and standard error (SE) of each parameter were determined ( Table 1). The F-test showed that several independent variables in a multiple linear regression model for all parameters are significant. A low p-value (<0.05) represents the high significance of the corresponding coefficient [34]. The SB (R 2 = 0.92) and AMY (R 2 = 0.86) are characterized by a significative determination coefficient, which both can be evaluated using a significative predictive MLR model. However, the BD (R 2 = 0.74), PV (R 2 = 0.71), TR (R 2 = 0.62), and ST (R 2 = 0.62) were characterized by a low determination value and are considered unsuitable for an accurate evaluation of the parameters. The actual experimental data versus predicted values were plotted, showing a relative correlation for BD, PV, and AMY ( Figure 2).
However, it was very apparent from the plot that the rest of the models had weaker predictive ability and lower performance (data not shown). Based on these results, the MLR models showed a relative disadvantage because they describe the only linear relationship between variables without considering other types of relations, which can be considered as a limitation. Table 1. Comparative analysis of several ANOVA parameters for the models obtained after MLR developed for different biochemical and pasting models developed based on the biometrics and industrial parameters. Peak viscosity (PV); setback (ST); breakdown (BD); trough (TR); peak viscosity (PV); final viscosity (FV); starch (ST); protein (P); fat (FA); ash (AS); amylose (AMY); total whiteness (TW); vitreous whiteness (VW); chalkiness (CH); milling yield husked (MYH); milling yield milled (MYM); milling industrial yield (MIY); length (L); width (W). predictive ability and lower performance (data not shown). Based on these results, the MLR models showed a relative disadvantage because they describe the only linear relationship between variables without considering other types of relations, which can be considered as a limitation. The detailed analysis of the predictive models for BD, PV, and TR are characterized by a high determination coefficient, in which the MYH and W-white parameters present a positive and significant effect (p-value). The acceptability of the paddy according to industrial standards is related to the milling yield, and these parameters can also influence the pasting behavior and their commercial value. Meanwhile, the L-white parameter presents a positive effect in the predictive model of BD and PV but a negative effect in the AMY model. On the other hand, the MYH, L-white, and W-white parameters are characterized by a negative effect in the SB and AMY models. The L/W ratio presents a negative effect compared to the BD and PV and positive for SB, FV, and AMY. The relations between milling parameters and AMY are relevant due to their impact on the cooking behavior of rice, directly affecting the water absorption, firmness of grain, and, conversely, the stickiness of cooked rice [35].

BD
The combined knowledge of the physical properties and anatomical composition of the rice grain is a prerequisite to gaining a deep understanding of what happens to the grain in the different postharvest operations. The understanding of the anatomy of the rice grain explains why rice kernels break so easily on mechanical impact during the physical operations of threshing and milling and under thermal stress during the drying process. The variability between rice grain varieties concerning the surface tissue of the kernel and their layers leads the milling industry to select the correct adjustment of hulling machines to prevent breakage and ensure higher milling recovery. However, it is important to note that there are several correlations among agronomic and quality traits [36][37][38]. Milling quality aspects affected by temperature during rice ripening include chalkiness, immature kernels, kernel dimensions, fissuring, protein content, amylose content, and amylopectin chain length [5].
The TW presents a positive effect in the FV, TR, ST, and FA models but a negative effect in the AS and P models. Studies conducted by Chikubu et al. (1985) found that rice protein content had a negative correlation with appearance, aroma, taste, and stickiness but a positive correlation with hardness [39]. The VW presents a positive effect in terms of the AMY and P predictive models but a negative effect in terms of the FV, TR, and FA models. The FV is an important technological property for the assessment of the cooking quality of rice and paste properties of pre-gelatinized flours. According to Hu et al. (2004), The detailed analysis of the predictive models for BD, PV, and TR are characterized by a high determination coefficient, in which the MYH and W-white parameters present a positive and significant effect (p-value). The acceptability of the paddy according to industrial standards is related to the milling yield, and these parameters can also influence the pasting behavior and their commercial value. Meanwhile, the L-white parameter presents a positive effect in the predictive model of BD and PV but a negative effect in the AMY model. On the other hand, the MYH, L-white, and W-white parameters are characterized by a negative effect in the SB and AMY models. The L/W ratio presents a negative effect compared to the BD and PV and positive for SB, FV, and AMY. The relations between milling parameters and AMY are relevant due to their impact on the cooking behavior of rice, directly affecting the water absorption, firmness of grain, and, conversely, the stickiness of cooked rice [35].
The combined knowledge of the physical properties and anatomical composition of the rice grain is a prerequisite to gaining a deep understanding of what happens to the grain in the different postharvest operations. The understanding of the anatomy of the rice grain explains why rice kernels break so easily on mechanical impact during the physical operations of threshing and milling and under thermal stress during the drying process. The variability between rice grain varieties concerning the surface tissue of the kernel and their layers leads the milling industry to select the correct adjustment of hulling machines to prevent breakage and ensure higher milling recovery. However, it is important to note that there are several correlations among agronomic and quality traits [36][37][38]. Milling quality aspects affected by temperature during rice ripening include chalkiness, immature kernels, kernel dimensions, fissuring, protein content, amylose content, and amylopectin chain length [5].
The TW presents a positive effect in the FV, TR, ST, and FA models but a negative effect in the AS and P models. Studies conducted by Chikubu et al. (1985) found that rice protein content had a negative correlation with appearance, aroma, taste, and stickiness but a positive correlation with hardness [39]. The VW presents a positive effect in terms of the AMY and P predictive models but a negative effect in terms of the FV, TR, and FA models. The FV is an important technological property for the assessment of the cooking quality of rice and paste properties of pre-gelatinized flours. According to Hu et al. (2004), the viscosity profile is a useful parameter in the selection of germplasm with certified quality in rice breeding programs, being subject to varietal differences [40].
The degree of milling or polishing is an important factor that influences the quality of milled rice. Rice milling quality refers to the ability of the kernels to withstand the rigors of hulling and bran removal without breaking, being significantly influenced by genotype, cultural practices, environment, drying, and milling processes [41]. Rice grain quality involves some complex interrelated traits that cover biochemical composition, cooking, eating, nutritional, and sensory properties. Rice endosperm is composed mainly of starch, and its quality is traditionally defined by characterizing starch structure and composition, which is then subsequently correlated with functional properties of the grain [42]. The increase in milling yield may be due to greater agglomeration of starch granules [43], enhancing the endurance of the rice kernels during milling [44]. Milling processes could be influenced by amylose content and starch physicochemical properties. The amylose content in deep milled rice was greater than regular milled rice [45]. Protein composition is also a factor that influences milling performance [46].
The whiteness and gloss of cooked rice are also affected by amylose content [47]. Whiteness, measured with a colorimeter, is used to indicate the degree of milling. However, a common method for the degree of milling quantification is measuring the fat amount on the milled grains. As milling progresses and the degree of milling increases, the whiteness of milled rice increases, the surface lipid content decreases, and milled rice yield decreases. The currently accepted standard for measuring the degree of milling is the Kett Whiteness Meter, and most commercially milled rice must meet some form of degree of milling specification [48].
The whiteness is an important parameter because it is related to the appearance of the grains. The changes of whiteness during milling can be related to their physicochemical properties [49]. Among the factors that influence the percentage of whiteness of the grains, nitrogen fertilization can change the amylose content of the grains [50]. The degree of milling or polishing (e.g., polishing time and polishing pressure) is an important factor that influences the quality of milled rice. The milling operation influences morphological characteristics such as vitreous whiteness, total whiteness, and chalky area vitreous percentage.
Grain appearance is characterized by biometric parameters (length, width, length/width ratio), total whiteness, vitreous whiteness, and chalkiness, being considered as a crucial factor that affects its market acceptability. The L/W ratio is used internationally to describe the shape and class of the rice variety [6]. The CH is characterized by a negative effect on the PV, FV, TR, FA, and ST models, presenting an opposite effect compared to the previous parameters analyzed, which can be related to the specific grain characteristics and, consequently, its effect in biochemical and pasting parameters. Studies performed by Li et al. (2019) showed a correlation between agronomic traits and yield depending on the ecotypes of rice variety [51]. Many studies have shown that the physical characteristics of the rice grain are associated with the yield of head rice [52].
Chalkiness is an undesirable trait that negatively affects milling, cooking, eating, and grain appearance and represents a major problem in many rice-producing areas of the world, being associated with genetic and enzymatic factors. Chalky grains were found to contain a lower density of starch granules as compared to vitreous ones [53].

Artificial Neural Network
The artificial neural network (ANN) was accomplished using the training, validation, and test datasets. Several ANN models were developed using variables related to the grain appearance such as biometrics and milling yield parameters were taken as the input parameters, while protein (P), amylose (AMY), peak viscosity (PV), trough (TR), breakdown (BD), final viscosity (FV), setback (SB), ash (AS), starch (ST), and fat (FA) were considered as output parameters. A multilinear perceptron (MLP) training algorithm was used for ANN model development. The training algorithm and kernel function are very important factors in the ANN. The network structures developed for data included an input layer, one hidden layer, and an output layer. The correlation coefficient between the output and the target simulated value was used to select the optimal number of neurons in the hidden layer. The numbers of neurons in the hidden layers were established when the maximum values of correlation coefficients were found. Three different neuronal structures were tested, characterized by 4, 8, and 12 hidden layers. The input layer (9) and the output layer (1) were similar for all models (9:4:1, 9:8:1, and 9:12:1). According to the results, the best ANN models were characterized by a network model with 12 hidden nodes, presenting high R 2 for testing step: ST (0.  nodes and R 2 = 0.81. The determination coefficient R 2 = 0.98 showed a suitable match between the observed and predicted data ( Table 2). According to these results, the number of nodes in the hidden layer should be correlated with the quantity of input data. The correlation coefficient, R, between the outputs and targets was a measure of how well the variation in the output was explained by the targets and outputs. The results obtained revealed that the MLP algorithm associated with the Broyden-Fletcher-Goldfarb-Shanno learning algorithm was more efficient in modeling the different biochemical parameters. The neural network can learn complex relationships and generalize results from a specific pattern of data, being considered an appropriate technique for modeling complex systems. Solving problem using ANNs depends on the magnitude, type, quality, and preprocessing step of the training data [54].

Model Validation
The model validation was performed based on the R 2 and RMSE determined with calibration, validation, and test datasets for all parameters ( Table 2). The number of hidden neurons was tested to find the best result in term of the correlation coefficient. According to the parameters, the best ANN model was obtained for 12 hidden layers, being characterized by high R 2 for training, validation, and test models, while the error for each parameter was also very low.
The ANN model improved the estimation results by lowering the value of RMSE compared to the MLR models for the calibration/training set as well as for the validation set, respectively. However, it should be noted that for the calibration dataset, the ANN method always performed better than the MLR method.
To test the performance of the developed ANN models, the predicted and experimental datasets of training samples were compared, and the results showed the high ability of the ANN to generate outputs close to the experimental data ( Table 3). The average accuracy of training data (R 2 = 0.98) represents that the developed network could be used for testing data in the subsequent analysis. The correlation between the predicted and targets values is highly significant. The average testing accuracy (R 2 = 0.90) indicates that the developed network is efficient and feasible. The error statistics evaluated for developed ANN models are highly constant for both training and test data of each output, suggesting a lack of overfitting throughput in the training process ( Table 3). The important key of the ANN model is not necessary to specify a previous proper fitting function, so it has a complete calculation capability to estimate practically all types of nonlinear functions which helps us to develop the most accurate prediction model. Based on the high accuracy of the predicted data both in the training and testing processes, the neural networks could predict the biochemical and pasting parameters, being fundamental for classification and analysis of rice quality. These achievements are supported by modeling studies previously performed in different research areas that have also indicated higher accuracy of ANN modeling technique than regression modeling [55,56].
However, the promising results, which are following many correlation studies that have been conducted on the relationships among starch quality parameters, can be affected by the wide diversity of rice germplasm and the complexity of the inheritance of quality parameters [57].
As an effective comparison of the prediction models for the parameters, the observed values against the predicted values obtained were plotted, as well as the regression parameters between observed and predicted values, as represented in Table 3 and Figures 3 and 4. Table 2. Parameters of the ANN models for training, validation, and testing procedures for the biochemical and pasting parameters based on the Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization algorithm. The transfer function tansig was used along with the model development. Peak viscosity (PV); setback (ST); breakdown (BD); trough (TR); peak viscosity (PV); final viscosity (FV); starch (ST); protein (P); fat (FA); ash (AS); amylose (AMY).  In this study, the MLR and ANN modeling methods were applied for monitoring rice quality using the experimental data registered along with the study. The Figures 3 and 4 show the experimental and predicted values related to the biochemical and pasting ANN models. The ANN models were most efficient, and the regression line between the observed and the predicted values nearly overlapped the 1:1 line, which was the case for both the calibration and the validation sets. High R 2 and low RMSE values showed that the ANN models present promising potential to improve the estimation of different biochemical and pasting parameters, being especially able to cope with nonlinearity in the dataset. Furthermore, although ANN models are unable to identify sensible bands due to the nature of the method, they resulted in generally higher R 2 values and lower RMSE values than linear regression models. This implies that the relationship between biochemical and pasting parameters and biometrics properties may indeed be nonlinear. Based on these models, this study showed that the ANN algorithm was an efficient method for biochemical and pasting prediction based on milling yields and grain appearance parameters.  As an effective comparison of the prediction models for the parameters, the observed values against the predicted values obtained were plotted, as well as the regression parameters between observed and predicted values, as represented in Table 3 and Figures and 4.    In this study, the MLR and ANN modeling methods were applied for monitorin rice quality using the experimental data registered along with the study. The Figures  and 4 show the experimental and predicted values related to the biochemical and pastin ANN models. The ANN models were most efficient, and the regression line between th observed and the predicted values nearly overlapped the 1:1 line, which was the case fo both the calibration and the validation sets. High R 2 and low RMSE values showed tha the ANN models present promising potential to improve the estimation of differen biochemical and pasting parameters, being especially able to cope with nonlinearity in th dataset. Furthermore, although ANN models are unable to identify sensible bands due t the nature of the method, they resulted in generally higher R 2 values and lower RMSE values than linear regression models. This implies that the relationship between biochemical and pasting parameters and biometrics properties may indeed be nonlinear Based on these models, this study showed that the ANN algorithm was an efficien method for biochemical and pasting prediction based on milling yields and grain appearance parameters.

Conclusions
The ANN algorithms tested in the development of prediction models for ric biochemical and pasting parameters based on grain physical data are characterized b significant regression coefficients. These achievements can be considered as an added value for rice quality improvement in breeding purposes and processing, being suitabl for qualitative and quantitative measurement of different physicochemical features o rice. In the future, based on these promissory results, we intend to develop a robus prediction model for several parameters based on a large number of rice varieties from different countries and, consequently, to implement an automatic evaluation system fo different pasting and biochemical parameters, reducing costs associated with severa

Conclusions
The ANN algorithms tested in the development of prediction models for rice biochemical and pasting parameters based on grain physical data are characterized by significant regression coefficients. These achievements can be considered as an added value for rice quality improvement in breeding purposes and processing, being suitable for qualitative and quantitative measurement of different physicochemical features of rice. In the future, based on these promissory results, we intend to develop a robust prediction model for several parameters based on a large number of rice varieties from different countries and, consequently, to implement an automatic evaluation system for different pasting and biochemical parameters, reducing costs associated with several time-consuming experimental procedures.