Spectral and Hyperspectral Technologies as an Additional Tool to Increase Information on Quality and Origin of Horticultural Crops

: Nowadays, consumer awareness of the impact of site of origin and method of production on the quality and safety of foods, and particularly of fresh produce, is driving the research towards developing various techniques to assist present certiﬁcations, traceability, and audit procedures. With regard to horticultural produce, consumer preferences have shifted to fruit and vegetables, which are healthy and ecologically produced, and toward processed foods having sustainable or social certiﬁcations and with sites of origin clearly reported on the label. Some recent studies demonstrate the potentiality of near infrared (NIR) technology (including hyperspectral imaging) for discriminating fresh and processed horticultural products based on their composition, quality attributes, and origin. These studies principally mention that each biological tissue possesses a ﬁngerprint NIR spectrum, which consists of a unique and characteristic pattern of radiation, distinguishing a particular biological tissue from physically and / or chemically di ﬀ erent samples. Particularly, recent studies discriminated apples, wine, wheat kernels, and derived ﬂours based on their geographical origins. Spectral information allowed discrimination among growing methods (organic and conventional) for asparagus and strawberry fruits, and among harvest dates for fennels, table grapes, and artichokes. Moreover, information about freshness and storage days after minimal processing can be obtained. Recent literature and original results will be discussed. From our perspective, present results suggest that these techniques may have a potentiality to increase information about product history, but if and only if the variability captured by the classiﬁcation models is vast in terms of diverse samples belonging to various cultivars, varieties, harvest times, cultural practices, geographical origins, storage conditions, and maturity stages, while being used as a complementary method to the conventional ones—either to make an initial screening of critical features, or to add to the amount of available information. Lacking the inclusion of these parameters could result in good classiﬁcation results, but the reliability of the classiﬁcation in this case would be dubious in terms of assessment of the factor contributing towards correct classiﬁcation.


Introduction
Awareness and concerns of the modern food consumers regarding food fraud, safety scandals and a globalized food production have oriented the research in food industries to develop standards for effectively interconnecting the systems of food production and distribution [1]. Food fraud and adulteration, are becoming highly sophisticated, vitalizing the need for high standard control on an increasing number of samples. As for fresh fruit and vegetables, product authenticity is the major concern, both from the consumer and the processors who are concerned about unfair competition in

NIRS Technique and Application for Fruit and Vegetable Authentication
Near infrared spectroscopy has gained wide attention in the food sector due to its potentiality to attain spectral fingerprints of the various products as a result of the interaction between light and the molecular structure of food, since every product has a different fingerprint indicating its contrast with the others [4], being the results of the different pre-harvest factors, which also affect its final quality and composition. Hyperspectral imaging devices are being used in the food research sector in a bid to evaluate the quality, class, authenticity, adulteration, or fraud in a rapid and non-destructive way [5][6][7]. Chemometric techniques and particularly discriminant analysis are the statistical tools used for analyzing differences between various samples or groups of samples relative to a number of variables simultaneously. Soft independent modelling of class analogy (SIMCA) [8], partial least squares discriminant analysis (PLSDA) [9], artificial neural networks (ANN) [10], discriminant analysis (DA) [11], and support vector machine (SVM) [12] have been used for addressing discrimination among fruit and vegetable belonging to different quality classes. Reference [13] discussed different class modelling methods for addressing the types of potential problems rising during studies related to food authentication, but there is a need of making state of the art use of spectral information and chemometric tools for discrimination purposes.
The origin and production methods of the foods significantly impact the health, social, and environmental domains since consumers have shown increasing awareness regarding these aspects. Protected designation of origin is, in fact, very important also for processed products as wines [14][15][16][17], olive oil [18,19], flour [20], and tomato sauce [21], for which different studies have proved the potentiality of the method. As for other horticultural products and seeds, a wide range of products have been tested, starting from seeds as Arabica coffee and grains up to fruits, as shown in Table 1. Green Arabica coffee seeds from two years and different genotypes and four different cities of Paranà state (Brazil) were classified in two subsequent studies [22,23] using PLSDA and SVM, respectively, obtaining 94.4% and 100% of correct classification of the prediction samples. In these research works, no indication about cultural practices was provided, therefore not clarifying if the found differences were only due to the location or also to the impact of different cultural inputs on quality and composition, but the effect of location was consistent over the different genotypes. In another study, NIRS was applied for the determination of geographical origin of 240 wheat samples collected over a period of two years from four different geographical locations, including main wheat-producing counties and towns and taking the most common varieties in each province, therefore including many sources of variation [24], and it was observed that the differences based on the geographical locations were evident in the classification models. Table 1. Literature review of studies on discrimination of fresh produce based on geographical origin, production system, variety/cultivar and harvest time. The sensitivity and specificity of 100% was achieved using NIR-SVM approach while FTIR-SVM yielded slightly low performance [22] Discrimination of Arabica coffee based on 4 geographical origins from Brazil NIR/PLS /DA 94.4% correct classification was achieved in the validation [23] Discrimination of wheat based on 4 different geographical origins in China NIR/LDA, DPLS Using DPLS classification accuracies as high as 85%-92.5% were achieved [24] Discrimination of Fuji apples from 3 major geographical locations in China NIR/SVM 92.75% classification rate in the training and 89.86% in the prediction set. It was proposed that a combination of imaging and SVM classifiers can be a potential way for geographical separation [25] Classification of persimmon fruit origin using NIRS from 7 different regions

LS-SVM
The samples were clearly distinguished by using the OSC data using SVM obtaining an R 2 in training 1.00 and 0.99 in prediction. [26]

Classification Based on Production System
Discrimination of green organic and conventional asparagus (One conventional and one organic) NIR/PLSDA Three NIR devices were compared for the classification purpose and the accuracy ranged between 82.1%-91.3% in class unbalanced sets and 83.7%-91.2% in class balanced sets. [27] Discrimination of strawberries by production system (one conventional and 2 organics) The production systems were defined with >95% sensitivity and >94% specificity which witness the potentiality of the technique for classification purpose [28] Discrimination of organic potatoes from non-organic potatoes and sweet potatoes (5 conventional and one organic) HSI/PLSDA Using the PLSDA for the classification of organic potatoes, an accuracy of 100% was achieved [29]

Main Results in Terms of Achievements (Potentiality, Model Parameters) Citations Classification Based on Variety/Cultivar
Discrimination of grapes from 2 varieties NIR/ DA The classification accuracy ranged between 82.7%-96.2% [30] Discrimination of strawberries from 5 varieties NIR/PLSDA The classification yielded an accuracy ranging between 57%-78%. There is still scope for improvement in varietal discrimination [31] Discrimination of 2 cherry tomatoes varieties Discrimination of apricots based on 4 harvest times NIR/ SIMCA The mean classification rate using SIMCA was 87% [43] Discrimination of of white asparagus based on 3 harvest times NIR/PCA, DA 71% of the asparagus samples were classified correctly in this study, the base of asparagus being the best part for the purpose of harvest date discrimination [44] Hyperspectral imaging has proved to be a useful tool for the discrimination of "Fuji apples" from three major production regions of China [25], where two-hundred and seven samples of apple from these three major production regions of China were analyzed, with about 60-70 samples per location. K-nearest neighbor (KNN), partial least squares discriminant analysis (PLSDA), and moving window partial least squares discriminant analysis (MW-PLSDA) were also compared for the discrimination of apples belonging to four different geographical locations in another study [41] with classification accuracy as high as 98.61%, and MW-PLSDA in this case was recommended to be the most suitable for this purpose. The apples for this experiment were obtained from six local markets and a total of 500 apple samples were collected. There were four varieties of apple samples (200 'Fuji', 200 'New Jonagold', 50 'Red Star', 50 'Ralls Janet' samples, respectively), which were collected. The Fuji samples were collected from four geographical origins (Japan, Shanxi, Shandong, Hebei in China), having 50 fruits per location, and all fruit were first grade. The 'New Jonagold' samples from Shandong in China were composed of 50 Special grade, 50 First grade, 50 Second grade, and 50 substandard grade samples, respectively. In this study, samples from different varieties, locations, and grades highly enhanced the model performance and reliability but it would be interesting to investigate if these classification models possess the capability to coup with the effects of different production systems in case of the future samples. Reference [26] studied the geographical distribution of persimmon fruit from seven different geographical locations in Spain and used LS-SVM as a classification algorithm for 166 samples taken over two years. Among these samples 122 were produced under the Protected denomination of Origin 'Ribera del Xúquer', and a few samples represented the other classes (from four to 13 samples), without a specification of sample distribution over the two years nor on cultural factors, which might have contributed to the discrimination results (classification result of 1.00 and 0.99, in training and prediction models, respectively).
As for production systems, [27] investigated the potentiality of the NIRS technology for the discrimination of the green asparagus grown under the organic and conventional methods, comparing three different spectrophotometers. For the study, 300 asparagus (180 conventional and 120 organic) were tested, harvesting asparagus spears in two different months, but apparently from the same plot. The classification results showed very good performances (91% of correct classification) using the diode array instrument, but in absence of growing system replications, it is difficult to ensure the reliability of the potentiality of this method for this purpose. Organic standards refer to allowed practices and technical tools but its application may largely vary depending on the producer. A more recent study developed a classification model for the authentication of the strawberries [28] from one conventional and two organic production systems, with different nutrient input and controlled growth conditions. The results confirmed and even improved the accuracy of results, in comparison to the previous studies, but also in this case only one experimental field per growing condition was used. Nonetheless, the study proved that the method could detect differences among organic fruits obtained by only varying the fertilization management.
Finally, the approach of using these techniques is interesting to discriminate minimally processed products, for which the morphological traits, characterizing the genotype are lost. [29] conducted a research for determining the reliability of the hyperspectral imaging along with the multivariate techniques for the authentication of sliced organic potatoes from non-organic tubers for which the samples were taken from four different geographical locations. Organic potatoes were discriminated from conventional tubers with a classification accuracy of 100%.
Moreover, the production system is also relevant for wine derived from organically grown grapes. MIR and discriminant analysis have been used for the classification of Australian organic and inorganic wines from 13 different regions in Australia [11]. More than 85% of the wines belonging to organic or inorganic classes were correctly identified in their respective classes. For wine grapes, variety certification is also highly significant. Two varieties (red and white) of wine grapes grown in the same geographical location were classified based on type and irrigation regime [30], with a miniature fiber optic NIR spectrometer obtaining accuracy higher than 82%. In this case, a total of 55 samples were used out of which 23 samples belonged to the white grapes and 31 samples belonged to red grapes grown under two different irrigation regimes. An overall rate of correct PLSDA classification based on the irrigation regime during ripening was 82.69% out of which 53.85% of the grapes belonging to RDI regime were correctly classified with a non-error rate of 92.31% for rain-fed regime. It was concluded by the study that the higher rate of incorrect classification for regulated defect irrigation regime grapes was due to the lower number of samples as compared to the other class. Discrimination of five strawberry varieties was not so satisfying, where [31] used 300 strawberries from five different varieties (60 samples from each variety) grown under the same geographical location for classification purposes, but the classification rate varied from 57% for the 'Camarosa' variety to 78% for 'Antilla Fnm'. In this case, all the samples from different varieties belonged to the same geographical origin and the same ripeness stage so it may be important to enlarge the dataset based on geographical locations. Classification accuracy was higher in a study comparing 11 tomato cultivars for which the tomato seeds were acquired from the same geographical origin, and they were grown under similar conditions and fertilization routines aiming towards the minimization of variations occurring due to seasons or growing conditions. In this case, the classification modelling approach yielded 96% and 86% for the high and low sensitivity [33]. In this regard, the classification accuracy was quite reliable in terms of significant relation to cultivar discrimination. Four Chinese bayberry varieties were successfully classified using a PCA-ANN model with a classification accuracy of 95% [39]. Three different classification algorithms were used for classification of three orange varieties [40], with a classification accuracy of 95% in the case of each algorithm, and specially, LGR was demonstrated to reach a classification accuracy of 100%. Two-hundred apples belonging to four different varieties and collected from four different geographical locations (50 samples of each variety and geographical location) were discriminated using KNN, PLSDA and MWPLSDA, and the classification accuracies were found to be higher for MWPLSDA (98.08%) and PLSDA (96.15%), while the KNN model yielded a lower classification accuracy [41]. Even better results were obtained from three different hazelnut cultivars [38], five different plum varieties [34], and four varieties of apricots [35], pears [36], and apples [37], where 100% correct classification was achieved. Harvest time significantly influences the quality of the fresh produce and is an important factor in postharvest processing and marketing. Therefore, determination of the optimal harvest time contributes towards better quality, prolonged shelf life, increased profitability, and enhanced consumer satisfaction. Many non-destructive techniques have been used in various studies for the classification of the produce based on harvest time. Classification models worked satisfactorily for the classification of grapes, which are one of the most widely consumed fruit in the world, and for which the decision of correct time to harvest is very critical. Authors obtained 100% correct classification of samples belonging from five harvest times [8]. Moreover, in a recent study, [42] showed that analyzing spectra changes over time during on-vine holding of table grapes was possible to monitor ripening and to correctly classify grapes by harvest time, using only 14 wavelengths. These findings encourage further implementation of this method to monitor ripening of table grapes in the vineyard and better define the most suitable harvest time. Harvest time based classification models were also developed for the apricots [43]. Additionally, [9] conducted a study to classify the fennels based on harvest times using hyperspectral imaging. Fennels were harvested form the same field/plot and same production system at seven different times over a span of three weeks. The results depicted that all classes have almost been correctly classified (non-error rate of 88.75% in the prediction set). White asparagus was classified according to the harvest time [44]; the product was harvested over a time span of two years (2003)(2004) in nine lots, yielding 71% classification accuracy. Amodio et al. (submitted) classified two different cultivars of artichokes, namely, 'Catanese di Brindisi' and 'Violetto Foggiano', belonging to two different geographical locations (Brindisi and Foggia in the Apulia region of Italy) using hyperspectral imaging in the range of 400-1000 nm. 'Catanese' samples were collected over a period from January to April (four harvest times) while 'Violetto Foggiano' samples were collected over a time span from December to May (six harvest times), allowing also the classification based on harvest times. Additionally in this case, the effect of the location and of the different cultural practices may have contributed to differentiate the two varieties, and therefore a wide sampling over different producers and location would be advisable for more reliable results.

Method Potentiality and Conclusions
Most of the fresh produce users (consumers, distributors, etc.) are interested in attaining simple answers to their concerns about the quality or origins of their products in the form of good/bad sample or yes/no for queries such as geographical origin. As it has been shown, spectral information derived with different techniques and devices, combined with chemometric and multivariate methods, are able to discriminate among different crops according to the product history. This is due to the fact that every biological sample has a fingerprint NIR spectrum which distinguishes it from other biological tissues. On the other hand, based on the experimental designs, each sample can significantly vary from the other, but the actual focus should be the diagnosis and assignment of the reason for these differences. All the supervised classification methods, such as partial least square-discriminant analysis, are, in fact, planned to maximize differences among different samples, so they work very well on laboratory-size experiments. Very often, due to the small number of cases analyzed in each experiment, without real replications (in different fields/locations/years as for instance for variety screening and farming system) it is difficult to guarantee that differences in the spectra fingerprints are due to a particular factor of interest and therefore the reliability of the model cannot be ensured with complete confidence. A majority of the presented studies are in fact comparing, different varieties (usually grown in different locations, with different agricultural practices), different farming systems (usually under similar environmental conditions, but without farming replication), and different geographical origins (also in this case corresponding to different agricultural practices), so the found differences are sometime the result of different factors. If the classification modelling is aimed at varietal discrimination, it would be interesting to know that in fact the variety itself is being discriminated without any external influence of other parameters/factors including origin or farming systems, or a combination of these factors. Currently this is a recurrent problem already inside the agricultural studies, where differences in composition and organoleptic traits are often attributed to varieties, without considering that the different growing conditions may have affected the quality of the crop. Addressing this problem is imperative and a potentially reliable option in this case is to enlarge the variability of the experiment design which might result in finding changes in the spectra, still detectable, after subtracting a large part of variation due to different location/years/farming input. In addition different chemometric methods can be applied to assign part of the variance to the different factors. Having robust results of discrimination over a large dataset will increase chances of correct and reliable classification of the new unknown samples (test sets). The samples that eventually do not belong to the conditions included in the calibration models can still behave as the main classifying group, provided that this would be very large. Standing to these considerations, the state of the art is still far away from assessing the potentiality of the methods, and new case studies should be designed to assess these issues. Once the screening capacity would be really assessed, the implementation of these methods should be questioned. It is very unlikely that spectral techniques will substitute routine analysis, but according to our opinion they can be a valuable tool to complete the official procedures, allowing a previous screening of an unlimited number of products to help identifying possible samples laying out of the borders (clouds) of what is expected for that sample (outliers). In addition, another possible perspective is that these methods will allow to add some additional information about the product history. Due to the capability of NIR devices to also predict internal constituents, it is in fact, desirable to develop NIR sensor devices to be placed at the retail, which can help consumers to make their choices, providing information about product quality and origins.
Author Contributions: M.L.A., and G.C. provided the manuscript outline and substantial contribution to drafting, and to correcting the manuscript; M.M.A.C. provided substantial contribution to first drafting and correcting the manuscript. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.