Fast and Reliable Determination of Virgin Olive Oil Quality by Fruit Inspection Using Computer Vision

The presence of minor compounds in virgin olive oils has been proven to play multiple positive roles in health protection, encouraging its production. The key factors that influence the oil quality are ripening stages and the state of health of the fruit. For this reason, at the oil mill’s reception yard, fruits are visually inspected and separated according to their external appearance. In this way, the process parameters can be better adjusted to improve the quantity and/or quality of olive oil. This paper presents a proposal to automatically determine the oil quality before being produced from a previous inspection of the incoming fruits. Expert assessment of the fruit conditions guided the image processing. The proposal has been validated through the analysis of 74 batches of olives coming from an oil mill. Best correlation results between the image processing and the analytical data were found in the acidity index, peroxide values, ethyl ester, polyphenols, chlorophylls, and carotenoids.


Introduction
Due to its healthy composition and great commercial value, the Virgin Olive Oil (VOO) market is in current expansion [1]. Emerging countries are developing their own olive industries, thereby increasing competition among traditional Mediterranean producers [2]. Given this new scenario, widening the range of products is an opportunity to increase profits by opening up new markets.
Existing legislation [3] recognizes only three VOO commercial categories (in the order of highest to lowest quality): Extra Virgin Olive Oil (EVOO), Virgin Olive Oil (VOO), and Lampante Virgin Olive Oil (LVOO). These categories are marked mainly by degradation parameters such as the acidity index, peroxide values, ethyl esters, K232 and K270. The limit values of these parameters are progressively less strict for each of those categories. When the values reach the threshold for Lampante, the oil cannot be consumed and needs to be refined before doing so.
Olive oil is extracted in the oil mill using exclusively mechanical means. That is why the incoming fruits play a major role in the quality of oil. The mechanical extraction allows the presence of minor compounds, granting taste, aroma, and colour. These holistic features are a potential distinction for commercial brands [4].
Minor compounds include two main groups: polyphenols and pigments, which are responsible for relevant health properties and also for increasing the shelf life [5]. Polyphenols are the main group, acting as antioxidant and antiradical agents [6]. These properties have been recognized, in a specific health regulation approved by the European Commission [7], because of its contribution to protecting blood lipids from oxidative damage. In this matter, a recent review [8] proposes to extend the types of polyphenols considered in the EC 2012 claim. The other group, pigments, which are related to oil colour [9], protect against degenerative diseases. Green tones are associated with the chlorophyll content, while yellow tones are related to carotenoid components [10]. Both pigments act as antioxidants or prooxidants, depending on the storage conditions [11].
The quantity of these minor compounds is influenced by the olive variety, the fruit ripeness and the environmental conditions [6]. Considering the whole fruit development, the polyphenol content increases with the degree of ripeness until it reaches a maximum at the fully ripe stage. Afterwards, the quantity of polyphenol decreases in overripe stages. Consequently, chlorophyll and carotenoids show a similar behaviour. Both reach a maximum value at the unripe stage, decreasing as fruits ripen and almost disappearing at the overripe stage. In addition, the carotenoids loss is less severe than that of chlorophyll [12].
Overripe fruits have purple or black skin colour and show a lack of turgor. At this stage, they are vulnerable to harvest damage and wind falls with the subsequent external ground contamination, increasing the risk of spoiling the oil. Overripe stages are not strictly related to low quality oil, although a careful manipulation is required to avoid any fruit damage at harvest and storage [13].
The ripe stage of the fruits and their quality are manually assessed at the reception yard. The objective of this inspection is twofold. First, the production of similar oil lots requires a homogeneous incoming fruits. Second, damaged fruits should be rejected or stored separately for further processing. The fruit assessment is normally done by a trained operator. This task is subjective, time-consuming, highly influenced by the lighting conditions, and subject to errors.
For those reasons, it is highly recommended to automate this procedure. Computer vision has been proven as a feasible technology to couple external fruit inspection with accurate results [14]. Riquelme et al. [15] identified different types of olive skin damage, mainly at the unripe stage, using colour and texture features. Further research done by Ram et al. [16] incorporated shape, colour and texture features to determine the fruit oil content. It was successfully applied to sets of ten fruits of just one variety. Furferi et al. [17] established a correlation among the fruit ripeness, obtained by computer vision, and the content of polyphenol, oil and sugar in fruit from different varieties. Guzmán et al. [18] established different levels of external damage in fruit lots acquired by an infrared sensor. In Cáceres et al. [19], the oil acidity index and peroxide values were accurately correlated with colour and texture features by an Artificial Neural Network. A separation of fruits coming from the ground and trees, in three different varieties, was achieved by processing the skin texture in [20].
In the same context, an estimation of the olive-fruit mass and its size was done through the analysis of correlations between fruit images and reference weight measurements in [21]. Other non-invasive technologies such as electronic noses [22] and hyperspectral imaging [23] have been applied to food inspection or remote sensing applications for monitoring environmental resources [24,25]. Computer vision techniques have also been applied to measuring the main variables of olive tree architecture [26] and for oil spill surveillance [27]. Recently, the research group Benalia et al. [28] contemplated the automatic classification of one to one olives, according to a standard index [29]. Despite the laborious task, principal component analysis at the CIELAB colour space was successfully performed for class determination. Best results were found at green stages.
The aim of this study is to determine olive oil quality parameters by processing images acquired from olive fruits. The image processing is based on expert knowledge. More specifically, an exhaustive evaluation of the fruit condition, made by an expert, has been done and correlated with analytical data of minor compounds and degradative parameters. This valuable information has been included in the image processing to determine, in a fast and reliable way, the oil quality. The results of this work will be of great interest for olive oil producers. Having an estimation of the oil quality before its production will enable to improve the adjustment of the production parameters.
The rest of the paper is organized as follows. Section 2 details the proposed methodology presenting the results of the visual assessment, the hardware set-up for the image acquisition and the image processing algorithms. Section 3 shows the experimental validation of the proposal and a discussion of the results. Finally, the conclusions are drawn in Section 4.

Material and Methods
To evaluate the link between the olive fruit appearance and the quality of its extracted oil, the methodology shown in Figure 1 was applied. Different olive fruit batches were sampled from various producers when they arrived at the olive oil mill. Images from these batches were acquired and a representative group of olives was selected for each batch to visually assess their physical appearance. After this, all the olive batches were processed into oil, and this oil was chemically analysed. Statistical comparisons were performed between the physical appearance of the fruits and the quality parameters of the oil. This comparison enabled us to establish significant relationships between the human visual perception and the quality parameters obtained by analytical methods. Hence, the criteria used in the visual inspection were employed to select and extract features from the acquired images. These features served as a basis for the development of predictive models, based on PLS regressions, to automatically achieve our goal, the determination of olive oil quality parameters.

Olive Batches Preparation and Oil Extraction
From the end of November 2017 until January 2018, a total of 84 olive batches were obtained weekly from an Olive Oil Cooperative and all of them were used by the expert in the visual assessment. The batches were randomly collected after the cleaning step and before the hopper storage at the mill yard. Each batch consisted of 3 kg of olives and they were shipped to the Group of Robotics, Automation and Computer Vision (GRAV) facilities at the Universidad of Jaén with a view to being processed in less than 12 h. A representative group of 100 fruits were selected for assessing the ripeness and the health state of the batch. Additionally, two groups of approximately 250 g. were used to acquire the corresponding images and then they were returned into their respective batch. Finally, each olive batch was processed to extract its corresponding virgin olive oil (VOO).
For the VOO extraction task, an industrial extraction factory system at a laboratory scale was used. The registered name is Abencor and it consisted of a hammer mill, a thermomixer and a centrifugal machine. Each batch was processed with the same process variables, that is, the olive paste was kneaded for 40 min at 30 • C and it was centrifuged at 3000 rpm for 90 s. The extracted oil was disposed into dark glass 125 mL bottles (from now on defined as samples). Samples were identified by its batch number and delivered to the accredited chemical laboratory for its analysis.
The chemical analyses were performed three times following standard methodologies in the accredited laboratory CM Europa S.L. The acidity index, peroxide values, K270 and K232 were analysed according to the EU REG. 2568/91. Ethyl esters were obtained following the EU REG. 2569/91. For polyphenols the COI/T.20/Doc No 29 was used. Finally, pigments were measured by spectrophotometry according to Minguez-Mosquera et al. [9].

Visual Assessment
According to the fruit colour and turgor, three ripening groups were initially established. Thus, batches containing unripe fruits with a higher green component were catalogued in the first group, including those ones with totally green or with at least more than 50% of green colour in their skin. The second group consisted of ripe batches with more than 50% of the fruits with purple and black colours, including those completely dark but sustaining the turgor of the fruit. Finally, the third group was assigned to overripe batches containing fruits with completely dark colours, but still tender. This methodology does not include the official eight ripe stages (COI/OH/Doc. No 1, 2011), because some indexes can only be evaluated by peeling the fruit which is not applicable to a non-invasive system; as described in our proposal.
Each batch was also evaluated by its general appearance, regarding its health, cleanness or damage condition, which could influence the final oil quality. After these evaluations, the aforementioned second group was subdivided into two groups. The first group involved batches with a higher percentage of ripe fruits and with optimal health conditions. The second group included batches with a high number of ripe but, in this case, spoiled fruits. Therefore, considering the ripeness of the fruit and its external appearance, the four categories shown in Figure 2 were defined. The olive oil quality parameters for the former categories are presented in Table 1. It is shown that the acidity index was not influenced so much by maturity stages as it was by the fruit health. This issue is revealed by the significant differences observed between Cat-2 and Cat-3. A similar tendency was also followed by the peroxide index. On the contrary, the ultraviolet absorbance parameter K232, which indicates primary oxidation metabolites, was significantly different between Cat-1 and the others. Due to the quick processing for oil extraction, the parameter K270, which indicates secondary oxidation products, did not give a significant variation among categories. Provided that a new analysis is performed after a period of storage, this parameter could show substantial differences. The last parameter related to the degradation process of the fruits, ethyl esters, had a major impact depending on the fruit health condition. In case of polyphenols, however, their contents were higher in Cat-2 with a tendency to decrease in Cat-3, according to fruit poor health conditions. Also, significant differences were found between Cat-1 and Cat-2 because these compounds were still being generated on the fruits from unripe to ripe stages.
As for the chlorophyll and carotenoid pigments, the first two categories presented higher levels of these compounds. The maturing process involves losses of chlorophyll and carotenoid compounds. Changes start in the skin and then grow into the pulp. So, even in the early black fruit stages, compounds of chlorophylls and carotenoids still remain.
These statistical results were useful to justify the extraction of colour and texture features from the acquired images in order to develop predictive models to estimate the aforementioned olive oil chemical parameters.

Image Processing
The olive fruit images for further processing were acquired using an ad-hoc hardware set-up. In the acquisition set-up, olive batches were placed into a methacrylate rectangular tray (250 × 165 × 20 mm) with a white background. Images were acquired by means of a CMOS camera MAKO G-223C with colour sensor, 2048 × 1088 resolution and 5.5 µm pixel size. The camera with a lens of 25 mm was positioned at 600 mm. With the former conditions the camera field of view was 270 × 143 mm, enabling the image acquisition of a large part of the tray.
The lighting system consisted of one 125 W halogen lamp placed in the camera optical axis. This high luminance lighting system allows the camera to run at its maximum frame rate, 50 fps. Due to technical problems with the lighting system, batches corresponding to 21st of November were discarded. Hence, a total of 74 batches were used in the image processing.
Once acquired, the images were preprocessed to remove the background and different features, extracted from the resultant images. These features were correlated with the chemical parameters to obtain the predictive models. Both issues are detailed in the next subsections.

Features Extraction
The main goal of this step was to extract the useful information from the acquired images. First, the original images in the RGB colour space (A r,g,b ) were turned into grey level images and then binarized. To extract olives from the white background, a global threshold binarization algorithm was used and the threshold was heuristically fixed. The inverse of the resultant logical mask (M) was applied to the original images through the logical AND operator (Equation (1)). The result for each step can be seen in Figure 3. Now, colour and texture features can be extracted from the images. The masked RGB images I r,g,b were turned into HSV (I h,s,v ) and Lab (I l,a,b ) colour spaces. Then, the average of the grey levels for each channel was computed by (Equation (2)).
whereĪ c is the mean intensity of the different c channels and N × M is the number of pixels in the image. The texture features were extracted from the images according to the Haralick descriptors [30]: angular second moment, contrast, correlation, variance, inverse difference moment, sum average, sum variance, sum entropy, entropy, difference variance, difference entropy, information measures of correlation and maximal correlation coefficient. The former features, along with the colour ones, were used to build the matrix of features X i,j where each row i belongs to the batch and the columns j are the extracted features from the batch (a total of 23 features). Finally, different vectors Y i,n were coded for each n analytical result.

Regression Model
The regression model, based on Partial Least Squares (PLS) [31], was applied to correlate the image features with the analytical reference values. PLS models try to find the multidimensional direction in the X i,j space that explains the maximum multidimensional variance direction in the Y i,n space. The general model can be explained according to Equations (3) and (4).
where T and U are the score matrices, P and Q are the loading matrices and E and F are the error terms, all of them of X and Y, respectively. The number of selected latent values is denoted by lv.
The PLS algorithm was applied to each analytical parameter through the methodology presented in Figure 4. It was an iterative process, where the number of latent values used by the model was increased from 1 to 23 according to the maximum number of latent values, which depended on the number of features. For each iteration, the model was trained and validated 1000 times by using the 50\50 holdout validation process. The prediction ability of the models was compared by means of the following parameters: Root Mean Square Error (RMSE) (Equation (5)), the Regression coefficient (R 2 ) (Equation (6)) and the Ratio of Performance to Deviation (RPD) (Equation (7)).

Number of PLS components for lv=1 to lv=23
Iteration number k = 1000

RMSE
where N is the number of predicted samples, y i is the target value,ŷ i the predicted value andȳ is the average value of the analytical reference value.

Results and Discussion
This section introduces and discusses the different results achieved in this paper. First, the characterization of the olive batches, used in this research, is presented through the analytical parameters of the oil samples. This will serve as a basis for a better understanding of the data distribution for the analytical parameters shown in the second part of this section, the image processing results.

Laboratory Results for the Analytical Parameters
With the purpose of establishing the oil quality, the most outstanding parameters were analysed in each oil sample. Table 2 shows the maximum, minimum, mean and standard deviation for the quality parameters in eight sampling dates. The number of olive batches, obtained to produce the oil samples, varied according to the production capacity of the oil mill. Fruits for each batch were randomly chosen from the conveyor belts, located in the reception yard. The random selection of batches entailed different producers and locations, which implied differences in ripening stages and health conditions. Values for the standard deviation in the analytical parameters show that the lack of homogeneity in the batches was not related to the date.  Kg ; ***: % Oleic Acid.
Additional information can be obtained from Table 2. In general, the acidity index, peroxide values, K270, K232, ethyl esters, including ethyl palmitate and ethyl oleate tend to rise in the last sampling dates. This is due to a prevalence of overripe fruits at the end of the season. If we had extended the period of time in this study (after January) more samples with higher values of these components would have been obtained.
On the contrary, polyphenols, chlorophyll and carotenoids follow a usual decrease, as these quality components are present in higher amounts at early ripe stages (mostly present at the beginning of the season). In this case, an early start of the present research would have shown an increase of these parameters.

Image Processing Results
As detailed in Section 2, the olive fruits were visually assessed and correlated with VOO quality parameters (Table 1). This study showed conclusively that the colour and state conditions of the fruits provide a valuable information to determine the quality parameters.
The former knowledge was used to design a methodology, based on computer vision, to automate the expert task. So, the features assessed in the manual process were obtained from the images. In this sense, the maturity of the fruits was analysed by processing images in different colour spaces, and the health state was obtained by computing texture features of the fruit skin. Then, a prediction model based on PLS was developed for each of the chemical parameters related to the VOO quality. The validation results of these models are presented in Table 3.
As shown in this table, every PLS model had a different design since the number of latent variables was different. Most of them presented a regression coefficient above 0.7, despite the fact that most of the fruits were in the ripe and overripe stages. This is a significant advantage compared to early studies in which it was difficult to evaluate these stages [15,28]. The outstanding regression plots are presented in Figure 5. In every sub figure, the x-axis is the target (analytical values from chemical laboratory) and the y-axis is the regression model output. The best correlation result was for the acidity index Figure 5a. It was a regression coefficient of 0.84, a RMSE v of 0.12 and a RPD of 2.16. As the acidity index is the most important parameter to establish the commercial category of the olive oil, the former values are particularly relevant. Generally, this parameter remains stable throughout the production process and at the oil storage [32], in contrast to the rest of the chemical parameters considered in this study. Figure 5b corresponds to the peroxide index which also had fine correlation results (R v = 0.74 and RMSE v = 4.12). According to [33], this can be easily explained, since this parameter rises when the olive fruit is spoiled and it represents the primary oxidation compounds. It is directly proportional to degradation conditions of the fruits, decreasing when they are transformed into secondary oxidation compounds. Then, the secondary oxidation compounds, which were assessed by the K270 parameters, reached lower correlation results (R v = 0.64 and RMSE v = 0.04). It means that for this parameter there are no differences among the olive fruit categories, although these differences might appear in the produced oil during its storage. Since K232 is another way of measuring primary oxidation compounds, the correlation results (R v = 0.69 and RMSE v = 0.15) were slightly below the peroxide value. However, the assessment of this quality parameter is optional, so this lower fitting is not so critical. In ref. [34], the authors reached a similar conclusion. Ethyl esters, another degradative parameters which are related with early fermentative reactions, increases in damaged fruits. These parameters are subject of the latest discussion about reducing their limits for best quality oils. Recent research has found that the ethyl ester production starts when the fruits are on the tree and increases at overripe stages [13]. It also has demonstrated that ethyl esters depend on the ripe fruit conditions and varietal influence. This last factor was not considered in this research. So, the correlation results which were R v = 0.72 and RMSE v = 16.89 (Figure 5c) presented an accurate fitting.
For the polyphenols content, the fitting presented in Figure 5d is remarkable. This parameter is influenced by many factors such as the variety of fruits, environmental conditions and agronomical practices [6]. High accuracies of the prediction models for this parameter would indicate the potential to improve the health benefits and the shelf life of the future extracted oil. Even though the RMSE v for this parameter is quite high, considered the wide range of it should be taken into account, from 171 to 1275 mg/kg, over the samples (Table 2).
It is important to denote that differences of fitting have been found for pigments. Particularly, chlorophyll (Figure 5e) had the best correlation overall the studied parameters (R v = 0.85 and RMSE v = 6.71). This shows that the colour features were properly selected, as this pigment is highly colour skin dependent as confirmed [10]. Conversely, for carotenoids (Figure 5f), the correlation fit was high but not so accurate (R v = 0.72 and RMSE v = 3.40). This issue can be explained by the inherent behaviour of these compounds. Their declining is less pronounced and they are more stable among different ripe stages [11].

Conclusions
The assessment of the olive fruit conditions at the start of the virgin olive oil production process is an issue of critical importance to optimize the quality of the produced oil. Nowadays, it is the master miller who supervises the fruits brought by the farmers and selects the production line according to the quality. In this paper we present the relationship between colour and texture features, extracted from olive fruit images, and different chemical parameters of the olive oil produced from these fruits. To fulfil this task, 10 prediction models have been implemented based on Partial Least Squares (PLS), one for each quality parameter. The best results in the prediction were achieved for the acidity index (R v = 0.84 and RMSE v = 0.12) and chlorophylls (R v = 0.85 and RMSE v = 6.71). The proposed method could be implemented on-line in an olive mill in order to classify olive batches at the beginning of the industrial process, thus avoiding losses in quality in the produced olive oil. Last but not least, the prediction models integrated into an on-line system could be useful to comply with the regulation RES-2/94-V/06 related to the quality management guides for the olive oil industry proposed by the International Olive Oil Council. Funding: This work has been partially supported by the project DPI2016-78290-R.

Acknowledgments:
The authors thank the PICUALIA oil mill for the olive batches provided to carry out this study and the accredited laboratory CM Europa S.L. for the analytical tests.

Conflicts of Interest:
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript:

VOO
Virgin olive oil VOOs Virgin olive oils EVOO Extra virgin olive oil LVOO Lampante virgin olive oil PLS Partial least squares RMSE Root mean square error R Regression coefficient RPD Ratio of performance to deviation