Effect of Spanish-Style Table Olive Processing on Fatty Acid Profile: A Compositional Data Analysis (CoDA) Approach

This manuscript considers that the composition of Manzanilla and Hojiblanca fats are compositional data (CoDa). Thus, the work applies CoDa analysis (CoDA) to investigate the effect of processing and packaging on the fatty acid profiles of these cultivars. To this aim, the values of the fat components in percentages were successively subjected to exploratory CoDA tools and, later, transformed into ilr (isometric log-ratio) coordinates in the Euclidean space, where they were subjected to the standard multivariate techniques. The results from the first approach (bar plots of geometric means, tetrahedral plots, compositional biplots, and balance dendrograms) showed that the effect of processing was limited while most of the variability among the fatty acid (FA) profiles was due to cultivars. The application of the standard multivariate methods (i.e., Canonical variates, Linear Discriminant Analysis (LDA), ANOVA/MANOVA with bootstrapping and n = 1000, and nested General Linear Model (GLM)) to the ilr coordinates transformed data, following Ward’s clustering or descending order of variances criteria, showed similar effects to the exploratory analysis but also showed that Hojiblanca was more sensitive to fat modifications than Manzanilla. On the contrary, associating GLM changes in ilr with fatty acids was not straightforward because of the complex deduction of some coordinates. Therefore, according to the CoDA, table olive fatty acid profiles are scarcely affected by Spanish-style processing compared with the differences between cultivars. This work has demonstrated that CoDA could be successfully applied to study the fatty acid profiles of olive fat and olive oils and may represent a model for the statistical analysis of other fats, with the advantage of applying appropriate statistical techniques and preventing misinterpretations.


Introduction
Over the last decades, a comprehensive study of Spanish table olive cultivars' fatty acid profiles was undertaken by López et al. [1]. In green Spanish-style table olives, the proportion of fat in the edible flesh ranged from 11 (Gordal) to 16 (Manzanilla) g/100 g wet flesh. Their major fatty acid components were C18:1c, C16:0, C18:2n-6, and C18:0 [1], in proportions comparable to those reported for olive oil [2].
Usually, the fatty acid profile of olive oil is estimated, according to Commission Regulation (ECC) No 2568/91 [3], by: w i = (A i / ∑ A) × 100 where A i is the area under the peak of each fatty acid (FA) methyl ester (i) and ∑ A is the sum of the areas of all FA peaks. The composition of olive oil in FAs is then expressed as the percentages by mass of their corresponding methyl esters. They are always characterised by being positive and sum to a constant value, usually 100.
According to the literature, these values are compositional data (CoDa) [4] and belong to the simplex metric space [5], where they should be represented and interpreted. Thus, the standard multivariate methods developed for the Euclidean space should not be applied to them [6]. Among the reasons that Van den Boogart and Tolosana-Delgado [7] mention for considering inappropriate the application of the standard multivariate statistics, developed for real-value data sets, to compositions are: (i) individual components mixed and closed exhibit negative correlations, which contradicts the usual interpretation of correlations and covariance; (ii) correlations and covariance between two parts depend on the components included in the analysis; (iii) due to the row (cases) constant sum, variance matrices are singular; (iv) the bounded range of values also implies that components cannot be normally distributed.
Recent studies on the overall fat composition of the most relevant cultivars devoted to table olive revealed that several FA proportions in their fats did not agree with the limits established for the characteristics of the olive oil categories [8][9][10][11][12]. Furthermore, marked differences are also observed between the limits established by the IOC/EU and other international regulations [13]. In addition, the percentages of components depended on the number of FAs considered in the analysis. On the contrary, the application of partial CoDa analysis (CoDA) to the fatty acid profile of Spanish-style Gordal [8], pig fat [14], or a preliminary study of a reduced set of these data [15] was successful. Recent advances and publications on CoDA [16,17] make the application of the new statistical tools to table olive fatty acid profiles challenging. This work represents a simple approach to studying them exclusively with CoDA.
There is a general concern about the relationship between health and food [18]. Many governments have developed legislation to improve information on food nutrients [19,20]. Table olives have been essential in the Mediterranean diet, but they are currently consumed worldwide, reaching a production of about 3·10 6 tonnes for the 2020/2021 season [21]. The green Spanish-style table olives enjoy consumers' preference because of their attractive aspect, organoleptic characteristics, and numerous presentations. Their processing includes lye treatment, washing with tap water, brining, fermentation, and packaging, which produce marked transformations in the fruits' physicochemical characteristics. Their primary nutrient is fat [22]. Therefore, studying the modifications in the fatty acid profile with the appropriate statistical tools seems a reasonable challenge.
This manuscript considers that the composition of Manzanilla and Hojiblanca fats are compositional data (CoDa). Thus, the work applies CoDa analysis (CoDA) to investigate the effect of processing and packaging on the fatty acid profiles of these cultivars. To this aim, the values of the fat components in percentages were successively subjected to exploratory CoDA tools and, later, transformed into ilr (isometric log-ratio) coordinates in the Euclidean space, where they were subjected to the standard multivariate techniques [6,16,17,23]. Since the standard multivariate techniques were developed for this space, such coordinates were subjected to them without any restriction. The results of this study would demonstrate the possibility of using CoDA for the statistical analysis of fat compositions in general.

Cultivars
The raw material was hand-picked olives of the Manzanilla and Hojiblanca cultivars, at the so-called green maturation stage (Maturity Index = 1, according to Ferreira [24]), provided by a local processor (JOLCA S.L., Huevar, Sevilla, Spain). The fruits were subjected to processing 24 h after harvesting.

Processing
The olives were processed according to the green Spanish style [22]. The fresh olives were treated with 2.5 and 3.0 g/100 mL NaOH solutions for Manzanilla (M) and Hojiblanca (H), respectively, up to 2/3 flesh (≈7 h). Then, olives were submerged in tap water for 18 h and brined in a 9 g NaCl/100 mL solution. After seven months of spontaneous fermentation, the olives were packed and pasteurised (80 • C for 15 min). The experiment was run in duplicate. Two samples from the fresh fruits of each cultivar (T0), as well as from each replicate of the lye-treated olives (T1), the fermented fruits (T2), and two-months-packaged fruits (T3) were successively withdrawn. They were coded as MT0, MT1, MT2, and MT3 and HT0, HT1, HT2, and HT3.

Oil Extraction
The fruits were ground using an Ultraturrax T25 (IKA-Labortecnik, Staufen, Deutschland), and the oil was removed by centrifugation in ABENCOR equipment. The method is fully described elsewhere [25,26]. Because of the mild extraction conditions, the procedure causes minimal changes in the oil [8].

Fatty Acid Composition
The method for analysing the fatty acid profiles is described elsewhere [25,26]. The methyl esters were determined using a Hewlett-Packard 5890 series II gas chromatograph, a fused silica capillary column Select FAME (100 m × 0.25 mm, 0.25 µm film thickness) (Varian, Bellafonte, PA, USA), and a flame ionisation detector. Quantification was achieved according to Commission Regulation (ECC) No 2568/91 [3]. Values are the average of two determinations per sample.

Statistical Analysis
Fatty acid profiles of olive oil or table olive fats are habitually studied by standard multivariate statistical tools. However, according to Aitchison [4], they represent a straightforward case of compositional data which contains only relative information. The techniques used for their analysis must respect their scale properties [17] and the new CoDa geometrical and statistical approach. The CoDA tools applied were bar plots of geometric means, tetrahedral plots, compositional biplot, balance dendrogram, and ilr transformation of the original values into coordinates in the Euclidean space (to apply standard multivariate techniques). Appendix A includes a succinct description of them. Some brief explanations are also included in the next text sections when appropriate.

Results and Discussion
The physicochemical characteristics of brines after fermentation were: pH~4.0; titratable acidity,~1 g lactic acid/100 mL; combined acidity,~0.12 Eq/L; NaCl level,~6.0 g/100 mL. Equilibrium conditions in brine after packaging were 5 g lactic acid/100 mL and 5.5 g NaCl/100 mL. These parameters' levels are typical for green Spanish-style fermentation and commercial products, respectively [22].

Fatty Acid Data Set
The matrix of data comprised 28 rows (i), four for the raw material (two cultivars × two samples each) + 24 for the processing phases (two cultivars × three processing phases × two replicates × two samples), and 20 columns (j) (one for the names of the observations (samples) and 19 for the quantified fatty acids) (Table S1). Each cell x ij was the average fatty acid value of two analyses per sample.
First, verifying the conditions that justify the improper application of conventional standard statistics to table olive (or olive oil) fatty acid profiles is intended. If present, the studies applying them might lead to misleading conclusions. In agreement with the first reason (i) stated in the Introduction section, the initial data set (expressed as percentages) led to multiple negative correlations and covariance values between the same pairs of FAs in both the complete set and sub-compositions. For example, see results from only those more relevant fatty acids C16:0, C18:0, C16:1, C18:1c, and C18:2n-6, representing 97.57% of the total variance (Tables 1 and 2). Moreover, as expected from argument (ii), the correlations and covariance values depended on the components in the sub-composition (Tables 1 and 2). Finally, reasons (iii) and (iv) are present in the usual FA data sets just because of their structure (constant row sums). Therefore, applying standard multivariate methods to the habitual table olive (and olive oil) FA profiles suffers from the drawbacks mentioned by Van den Boogart and Tolosana-Delgado [7] and the analysis of table olive fat by CoDA techniques is not optional but an appropriate statistical approach.

Evaluation of Group Trends by Geometric Mean Bar Plots
The geometric mean is recommended as a proper alternative to the standard mean for describing the central tendency of data in the simplex or Aitchison geometry [5]. It can also give an overview of the effect of treatments on compositions [16]. The most abundant FAs in olive fat were C18:1c, C16:0, C18:2n-6, and C18:0 (Table 3), with similar proportions as in olive oil [2] or green Gordal table olive fat [8]. The geometric mean of C16:0 was higher in Manzanilla than in Hojiblanca (Table 3). An opposed relationship was observed for C18:1c. The remaining comparisons between the geometric mean contents in the two cultivars are similarly deduced from Table 3. Table 3. Overall (estimated from the complete set of fatty acid percentages) geometric means as well as geometric means of cultivar and processing phase treatments. The plot of the log-ratio (that is, the difference between their natural logs) of each FA content (geometric mean), according to processing phases and cultivar, over the overall geometric mean is an approach for estimating the effects of such variables ( Figure 1). The graph shows that C16:0 (mainly) or C18:0 ( Figure 1A) and C20:0 or C16:1 (mostly) ( Figure 1C) are more abundant in Manzanilla (positive log-ratios) than in Hojiblanca (negative). On the contrary, C15:1 (mostly) ( Figure 1B), C20:1 ( Figure 1C), and C18:3n-3 ( Figure 1D) were more abundant in Hojiblanca. Differences in percentages between cultivars were previously reported for C18:0, C20:0, and C16:1 (higher in Manzanilla) as well as for C18:1c, C18:3n-3, and C22:6n-3 (higher in Hojiblanca) but not for C16:0, C15:1, or C20:1 [3]. Regarding processing phases (within cultivar), the changes noticed were always observable but less pronounced than those appreciated between cultivars: e.g., C14:0 or C15:0 in Manzanilla ( Figure 1A) or C16:1 in Hojiblanca ( Figure 1C). Moreover, the log-ratio values (Table 3) could also be subjected to standard statistical tests. The bar plot discloses the changes intuitively but depends on the entire set of variables through the overall geometric means. This circumstance is a clear limitation and makes necessary to complement this information with other statistical techniques.

Study of Log-Ratio Dispersion through the Variation Array Matrix
In CoDA, there is no counterpart to standard deviation directly constructed from the original composition. Instead, considering that compositional data only carry relative information, the approach for measuring dispersion focuses on pairwise log-ratios, with their variances gathered in the so-called variation array [4]. The overall variance of a specific part is deduced from the row-wise log-ratios of each component over the other parts (Table 4, upper half diagonal) or from its clr coefficients (Table 4, last column on the right). The value for each part is the so-called clr variance, which includes the variance due to differences between groups and other sources. The clr variance is sometimes the criterion for selecting the variables used for checking the effects of the design factors [31,32]. The greatest clr variance value was for C15:1, followed by C18:3n-6, C22:6n-3, C16:1, C18:3n-3, C18:0, C24:0, C18:2t, C15:0, C20:0, C20:1, C21:0, C16:0, and so on ( Table 4, last column on the right). In the order mentioned, they could then be candidates for sequential binary partition (SBP) since the first balances (ilr coordinates) may explain the greatest variance proportions. Then, these usually decrease as clr variances are lower.

Study of Log-Ratio Dispersion through the Variation Array Matrix
In CoDA, there is no counterpart to standard deviation directly constructed from the original composition. Instead, considering that compositional data only carry relative information, the approach for measuring dispersion focuses on pairwise log-ratios, with their variances gathered in the so-called variation array [4]. The overall variance of a specific part is deduced from the row-wise log-ratios of each component over the other parts (Table 4, upper half diagonal) or from its clr coefficients (Table 4, last column on the right). The value for each part is the so-called clr variance, which includes the variance due to differences between groups and other sources. The clr variance is sometimes the criterion for selecting the variables used for checking the effects of the design factors [31,32]. The greatest clr variance value was for C15:1, followed by C18:3n-6, C22:6n-3, C16:1, C18:3n-3, C18:0, C24:0, C18:2t, C15:0, C20:0, C20:1, C21:0, C16:0, and so on ( Table 4, last column on the right). In the order mentioned, they could then be candidates for sequential binary partition (SBP) since the first balances (ilr coordinates) may explain the greatest variance proportions. Then, these usually decrease as clr variances are lower.
The variation array is symmetric regarding its diagonal, so the lower half is substituted with the corresponding log-ratios to provide additional information. These values were initially used for standard multivariate analysis, but nowadays, other transformations with better mathematical properties (e.g., ilr coordinates) are preferred. The variation array is symmetric regarding its diagonal, so the lower half is substituted with the corresponding log-ratios to provide additional information. These values were initially used for standard multivariate analysis, but nowadays, other transformations with better mathematical properties (e.g., ilr coordinates) are preferred.

Tetrahedral Display
The triangle and tetrahedral plots are the simplest form of CoDa representation in the Aitchison geometry (or simplex). They show the cases as a function of three and four parts, respectively [33]. The lower the abundance of a part is, the closer the sample is to the opposing border (face) to its vertex. Usually, centring the data before plotting prevents grouping of the samples close to faces or vertices. These plots can offer a rough image of the data distribution and groups' trends [17]. Compositions with numerous parts may be plotted as a function of those with the highest variance and expected greater segregation power. Here, C15:1, C18:3n-6, C22:6n-3, and C16:1 (with the most relevant clr variance in the variation array) were chosen ( Figure S1). Notice that the centred samples from the two cultivars, regardless of the processing phase, are fairly well segregated. However, no trend within cultivars is observed, in agreement with Figure 1, where differences between cultivars were apparent but no effect of processing phases was clear. No linear trend along Principal Components (PCs) was noticed either. In Gordal olives [8], the FAs with the greatest variance were C14:0, C18:2t, C18:3n-6, and C:22:6n-3 and the tetrahedral graph of samples as a function of them showed that the FA profile of the fresh fruits was affected by processing. Besides, the extraction by Soxhlet significantly affected the FA profiles compared to the method using Abencor. Notice that C18:3n-6 and C22:6n-3 showed high variability in this work and Gordal experiments [8].

Compositional Biplot
The adaptation of the traditional biplot to compositional data, based on clr coefficients, was developed by Aitchison and Greenacre [34] and is usually drawn in two dimensions. On it, row and column points are both centred at the origin of the display. Depending on the factorisation, two main types are obtained: (i) covariance biplot (ν = 0), preserving covariance structure between log-ratios ( Figure 2), and (ii) form biplot (ν = 1), which preserves distances between observation (rows) ( Figure S2). Following Aitchison and Greenacre [34], the distances between the ends of two rays (parts) in the covariance biplot approximate the standard deviation of their log-ratios ( Figure 2); the highest values are between C15:1 and C16:1 (and the other FAs in the first quadrant). As well, log-ratios of C18:3n-6 over C22:6n-3, C16:1 or practically any other FA have large standard deviations ( Figure 2). These log-ratios thus have good potential segregation power between processing phases. Moreover, the cosine between two links could be an estimation of the correlation between the two log-ratios; thus, ln(C15:1/C16:1) and ln(C18:3n-6/C22:6n-3) are not related at all since their links form an angle of~90 • (whose cosine is about 0), but ln(C18:3n-6/C22:6n-3) and ln(C18:3n-6/C14:0) are closely associated since their angle is low ( Figure 2). Thus, the covariance biplot resulted in an interesting tool to identify the log-ratios with considerable variation between groups. In the form biplot ( Figure S2), the separation between individuals is an approximations of distances between observations (rows). There is segregation between cultivars (Hojiblanca on the left and Manzanilla on the right), based mainly on PC1 (strongly associated with C15:1 (negatively), C16:1 (positively), and many other FAs); however, there was no clear segregation among processing phases, in agreement to the previous results. PC2 (linked negatively to C18:3n-6 and positively to C22:6n-3) had low differentiation power.
Differences in FA compositions have been reported for Turkish cultivars, allowing the segregation of Hurma cultivar from Erkence and Gemlik cultivars [35]. Issaoui et al. [36] studied the autochthonous Meski, Picholine, and Manzanilla cultivars according to two Tunisian processes, observing significant differences among cultivars but not an effect of processing. However, they applied standard multivariate statistics. Sensible differences between cultivars were also observed among the oil-fatty compositions of eighteen Mediterranean cultivars cultivated under the arid conditions of Boughrara in Tunisia [9]. In the form biplot ( Figure S2), the separation between individuals is an approximations of distances between observations (rows). There is segregation between cultivars (Hojiblanca on the left and Manzanilla on the right), based mainly on PC1 (strongly associated with C15:1 (negatively), C16:1 (positively), and many other FAs); however, there was no clear segregation among processing phases, in agreement to the previous results. PC2 (linked negatively to C18:3n-6 and positively to C22:6n-3) had low differentiation power.

Application of the Sequential Binary Partition for Deducing ilr Coordinates
Differences in FA compositions have been reported for Turkish cultivars, allowing the segregation of Hurma cultivar from Erkence and Gemlik cultivars [35]. Issaoui et al. [36] studied the autochthonous Meski, Picholine, and Manzanilla cultivars according to two Tunisian processes, observing significant differences among cultivars but not an effect of processing. However, they applied standard multivariate statistics. Sensible differ-ences between cultivars were also observed among the oil-fatty compositions of eighteen Mediterranean cultivars cultivated under the arid conditions of Boughrara in Tunisia [9].

Application of the Sequential Binary Partition for Deducing ilr Coordinates
As mentioned by diverse authors [6,16,17,23], the transformation of the original data set into ilr coordinates in the Euclidean space is a necessary step for applying standard multivariate techniques in CoDA since it moves the values into the Euclidean space (see brief explanation in Supplementary Material). There are no specific criteria for selecting the fatty combinations to construct ilr coordinates. Ideally, the procedures chosen should lead to the maximum segregation power when evaluating the effects of variables. In this work, they were based on the decreasing order of clr variances and Ward's clustering.
3.6.1. Sequential Binary Partition Following the Decreasing Order of clr Variances The SBP was applied following the formula provided in Appendix A. It was expected that the best segregation between observations, using the decreasing order of clr variance, would include in the first ilr coordinate, apart from the normalisation parameter, the log-ratio between the fatty acid with the greatest clr variance (C15:1) and the geometric mean of the rest: where C15:1, already used in ilr1, is removed from the calculation not only of ilr2 but also of the subsequent coordinates. The decreasing order of variances criterion was applied in the first 13 balances. However, after this point, the SBP followed the successive unused FA over the remaining ones because of the similarity of their variances. The operation was repeated row-wise for all the samples of each cultivar-processing phase combination. The SBP is summarised in a matrix where the FA is identified, row-wise, by 1, −1, or 0 when the value in the initial data set is included in the numerator, denominator, or does not take part in the partition, respectively (e.g., C15:1 shows 1 for the first balance and 0 for the rest) (Table S2). This matrix allows reproduction of the SBP and is also helpful for back-transformation (ilr coordinates into the original scale). In this way, the actual fatty acid composition matrix (D parts) is substituted row-wise with its D-1 ilr coordinate matrix (Table S3) since the last partition combines the cells of the last two columns in just one. The data defined by the SBP matrix represent a set of ilr coordinates in orthonormal axes (basis) that can be subjected to the standard multivariate techniques. However, it has previously been observed that clr variance may be caused not only by the effect of treatments but also by determination errors in cases of low contents [8,14]. Therefore, checking alternative criteria is important.

Use of Cluster Analysis for Sequential Binary Partition Selection
Clustering parts in CoDA face the problem of choosing the measure for distances or dissimilarities. The variation matrix is usually the proper data set [17]. The Ward's clustering method (Figure 3) minimises the within-cluster variance [17]. When the SBP followed the clustering order, the decreasing heights were successively observed for C15:1 over the remaining fatty acids, C18:3n-6 over their remaining fatty acids, and C22:6n-3 over the still remaining fatty acids. Therefore, these log-ratios could be chosen for the first three steps of ilr coordinates and could provide the most remarkable segregation power of observations. In addition, these ilr coordinates were similar to those selected using the decreasing order of clr variances. However, the following partitions corresponded to groups of FAs. Thus, in the next log-ratio, C16:1, C16:0, C18:2t, C18:0, and C20:0 were assigned to the numerator and the remaining ones on its right to the denominator. Similar binary partitions could be deduced attending the successive cluster sequences (Figure 3), which confront single or groups of FAs within the following clusters of two groups. As observed in Figure 3, most of the consecutive steps included several FAs until ending with opposing only two FAs. As previously, the SBP is summarised in a matrix (Table S4), which identifies, row-wise, by 1, −1, or 0, when the value in the initial data set is included in the numerator, denominator, or does not take part in the partition, respectively. As before, this matrix allows reproduction of the SBP and the back-transformation (ilr coordinates into the original scale). The resulting ilr coordinates matrix is presented in Table S5. As previously mentioned, their values are on an orthonormal basis, and the means and variances of the resulting data set can be obtained by the usual procedures (Table S6, for overall estimations). Moreover, this data set can also be analysed using standard statistics [6,16,17,23]. The values obtained using the clr decreasing variance and Ward's criteria differ (Table S6) because of their different basis, but their total variance (0.607) and conclusions after applying the standard multivariate techniques to both are the same. Also, both reproduce the original data structure.

Balance Display by the Compositional Dendrogram
The CoDa dendrogram is the graphical presentation of balances and variances [6,23]. To display the effects of the factors, the balances were estimated based on overall averages and according to groups (cultivars and processing phases). In the plots (Figures 4 and S3), the overall balance is represented on the horizontal axis. Those corresponding to groups are displayed as boxes situated below these axes. A displacement towards the right (left) indicates that the geometric mean of parts in the SBP numerator predominates over those in the denominator (or vice versa). A centred position means equilibrium between them. The vertical lines stand for the variances of the corresponding balances. The plot allows the comparison of the overall balances (and their variances) as well as the contrasts among the levels of factors (and their variances). The first three balances (B1-B3) are displaced toward the left (negative log-ratios) (Figures 4 and S3), signifying an overall lower presence of C15:1, C18:3n-6, and C22:6n-3 over the geometric means of the remaining FAs. However, the segregation power between cultivar and processing phase was limited (scarce displacements of boxes below the horizontal axes) (Figures 4 and S3).
Regarding variances (vertical lines), those corresponding to the three first balances are the most relevant irrespective of the criterion used for the SBP estimation. Under them, the balances of the processing steps of Hojiblanca and Manzanilla are visible, although their comparison is difficult because of the numerous factor levels. For variances according to groups, those from MT2 and HT2 (fermented olives) showed the largest values. An

Balance Display by the Compositional Dendrogram
The CoDa dendrogram is the graphical presentation of balances and variances [6,23]. To display the effects of the factors, the balances were estimated based on overall averages and according to groups (cultivars and processing phases). In the plots (Figures 4 and  S3), the overall balance is represented on the horizontal axis. Those corresponding to groups are displayed as boxes situated below these axes. A displacement towards the right (left) indicates that the geometric mean of parts in the SBP numerator predominates over those in the denominator (or vice versa). A centred position means equilibrium between them. The vertical lines stand for the variances of the corresponding balances. The plot allows the comparison of the overall balances (and their variances) as well as the contrasts among the levels of factors (and their variances). The first three balances (B1-B3) are displaced toward the left (negative log-ratios) (Figures 4 and S3), signifying an overall lower presence of C15:1, C18:3n-6, and C22:6n-3 over the geometric means of the remaining FAs. However, the segregation power between cultivar and processing phase was limited (scarce displacements of boxes below the horizontal axes) (Figures 4 and S3. So, the two criteria followed to define the SBP led to dissimilar CoDa dendrograms (only sharing the three first balances) with different averages and variances because of the diverse basis. The basis used to determine the ilr coordinates were just two of the possible options, but there is no guarantee that any other non-tested were more appropriate for describing the effects of cultivars and processing phases. However, they will explain the same overall and total clr variances ( Table 4). Regardless of the criterion, the C15:1 was the FA of Manzanilla and Hojiblanca with the highest variance in its balance, while C14:0 was in Gordal [8]. This may indicate fatty acid profile structural differences among cultivars.

Canonical Variates Plot and Linear Discriminant Analysis
A canonical variate is a new variable obtained as a linear combination of the original ones attempting to distinguish between groups. As Martin et al. [16] state, a canonical variate in CoDA is where is a log-contrast. In this technique, the first canonical variate is defined by the vector 1 which maximizes the F value associated with the ANOVA test: H0: 1 · 1 = ⋯ = 1 , = 1, … , (centres of the groups) It can be proved that the vector 1 is the eigenvector of the matrix −1 associated with its maximum eigenvalue, where W is the within-group and B the between-groups sums of squares, verifying these matrices' variability decomposition property (T (total) = B + W). Following this process iteratively, the D − 1 eigenvectors that define the corresponding canonical variates can be obtained. Importantly, if the procedure is applied to coordinates using a different basis, the same canonical coordinate is obtained; that is, the canonical covariate is invariant to the change of basis. Since the coordinates on their respective basis were different, the scores and the proportion of variances accounted for by the axes in both cases are different. Regardless of the ilr coordinates, there was a net separation between cultivars (Figures 5 and S4), which responded differently to the treatments. In Regarding variances (vertical lines), those corresponding to the three first balances are the most relevant irrespective of the criterion used for the SBP estimation. Under them, the balances of the processing steps of Hojiblanca and Manzanilla are visible, although their comparison is difficult because of the numerous factor levels. For variances according to groups, those from MT2 and HT2 (fermented olives) showed the largest values. An association between low contents with large variances is generally apparent, as already pointed out in the literature [37]. The remaining balances showed alternant signs and reduced variances (Figures 4 and S3). Thus, the effects of cultivar and processing phases on most FAs were relatively low.
So, the two criteria followed to define the SBP led to dissimilar CoDa dendrograms (only sharing the three first balances) with different averages and variances because of the diverse basis. The basis used to determine the ilr coordinates were just two of the possible options, but there is no guarantee that any other non-tested were more appropriate for describing the effects of cultivars and processing phases. However, they will explain the same overall and total clr variances ( Table 4). Regardless of the criterion, the C15:1 was the FA of Manzanilla and Hojiblanca with the highest variance in its balance, while C14:0 was in Gordal [8]. This may indicate fatty acid profile structural differences among cultivars.

Canonical Variates Plot and Linear Discriminant Analysis
A canonical variate is a new variable obtained as a linear combination of the original ones attempting to distinguish between groups. As Martin et al. [16] state, a canonical variate in CoDA is where y is a log-contrast. In this technique, the first canonical variate is defined by the vector v 1 which maximizes the F value associated with the ANOVA test: . . , g (centres of the groups) It can be proved that the vector v 1 is the eigenvector of the matrix W −1 B associated with its maximum eigenvalue, where W is the within-group and B the between-groups sums of squares, verifying these matrices' variability decomposition property (T (total) = B + W). Following this process iteratively, the D − 1 eigenvectors that define the corresponding canonical variates can be obtained. Importantly, if the procedure is applied to coordinates using a different basis, the same canonical coordinate is obtained; that is, the canonical covariate is invariant to the change of basis. Since the coordinates on their respective basis were different, the scores and the proportion of variances accounted for by the axes in both cases are different. Regardless of the ilr coordinates, there was a net separation between cultivars (Figures 5 and S4), which responded differently to the treatments. In Hojiblanca, the fresh fruits (HT0) diverged from the treated ones, and, as the process progressed, they were increasingly differentiated from the raw material. In Manzanilla, the composition of the fresh product was comparable to the fermented fruits, while the lye-treated olives had the most different composition. Hojiblanca, the fresh fruits (HT0) diverged from the treated ones, and, as the process progressed, they were increasingly differentiated from the raw material. In Manzanilla, the composition of the fresh product was comparable to the fermented fruits, while the lyetreated olives had the most different composition.  Principal Component Analysis (PCA) applied to Gordal table olive FA compositions expressed as percentages led to slightly different results than using ilr coordinates [8]. The biplot of the FAs of Aloreña de Má laga based on ilr coordinates produced clearer segregation between fresh and fresh green packaged olives with respect to stored and packaged products [38]. These results, however, deserve attention since the ilr coordinates reproduce the original structure of the initial data [6]. Also, the standard chemometric, based on percentages, showed non-conclusive results when analysing the effect of ripe olive processing (darkening phases) on the oil composition but detected differences among cultivars [25]. Moreover, PCA based on FA proportions properly segregated naturally debittered Turkish cultivars [35]. Apparently, in the case of notorious differences in the FA compositions (e.g., between cultivars), the standard statistics applied to data expressed in percentages may lead to about the same results to CoDA, but, at least in this case, the canonical variate plot, based on ilr coordinates, was somewhat more efficient to segregate changes due to processing phases.
Moreover, the ilr coordinates obtained by SBP based on the two options (decreasing clr variance and Ward's clustering) were subjected to standard linear discriminant analysis (LDA) using bootstrapping (1000). There was 100% success in the correct assignations of the samples (Table S7); however, the results were less satisfactory in the one-out validation (below in parenthesis). Overall, ilr coordinates deduced following Ward's criterion tended to have better assignations than those using the decreasing order of clr variance (Table S7) and could be preferable. The initial correct assignations in this work was higher Principal Component Analysis (PCA) applied to Gordal table olive FA compositions expressed as percentages led to slightly different results than using ilr coordinates [8]. The biplot of the FAs of Aloreña de Málaga based on ilr coordinates produced clearer segregation between fresh and fresh green packaged olives with respect to stored and packaged products [38]. These results, however, deserve attention since the ilr coordinates reproduce the original structure of the initial data [6]. Also, the standard chemometric, based on percentages, showed non-conclusive results when analysing the effect of ripe olive processing (darkening phases) on the oil composition but detected differences among cultivars [25]. Moreover, PCA based on FA proportions properly segregated naturally debittered Turkish cultivars [35]. Apparently, in the case of notorious differences in the FA compositions (e.g., between cultivars), the standard statistics applied to data expressed in percentages may lead to about the same results to CoDA, but, at least in this case, the canonical variate plot, based on ilr coordinates, was somewhat more efficient to segregate changes due to processing phases.
Moreover, the ilr coordinates obtained by SBP based on the two options (decreasing clr variance and Ward's clustering) were subjected to standard linear discriminant analysis (LDA) using bootstrapping (1000). There was 100% success in the correct assignations of the samples (Table S7); however, the results were less satisfactory in the one-out validation (below in parenthesis). Overall, ilr coordinates deduced following Ward's criterion tended to have better assignations than those using the decreasing order of clr variance (Table S7) and could be preferable. The initial correct assignations in this work was higher than in the segregation between season, montanera length, or sampling location in Iberian pigs [14], but not in the cross-validation results, possibly because of the higher number of observations in it.

Study of the Effects by ANOVA/MANOVA Contrasts
The one-way ANOVA tests (bootstrapping = 1000) showed no significant differences between groups (irrespective of the estimation choice) for ilr1 and ilr2 as well as for ilr11, ilr6c, ilr7c, ilr10c, ilr13c, and ilr18c (c stands for ilr deduced following Ward's clustering; otherwise, from decreasing variance) ( Table 5). Notice that the ilr coordinates from the two fatty acids with the most considerable clr variances and forming separated Ward's clusters do not show significant differences in processing phases. Their large variability in the log-ratios with these FAs in their numerator was more due to low contents and subsequent significant relative errors than to effects of treatments. This analysis was a tool to detect such circumstances and prevent improper conclusions based only on the variances of their log-ratio variances. According to Martín-Fernández et al. [16], considering X k as a random composition of a group k, where k = 1, 2, . . . , g, a basic model may be generated considering that X k is obtained by adding a random variability k around a centre µ k in a multiplicative pairwise: X k = µ k k . In this case, the expected value of variability is 1, and the model is equivalent to ilr(X k ) = ilr(µ k ) + ilr( k ), where ilr( k ) is centred at the origin of coordinates. That is, working in coordinates, one can assume the same model as for interval scale data in the real space. Therefore, ilr coordinates were subjected to MANOVA (bootstrapping = 1000), resulting in significant (p < 0.0001) diverse multivariate tests (Wilks, Hotelling-Lawley, Pillai, and Roy), regardless of the criterion used for the ilr definition. Then, the effects of cultivar and processing phases (nested in cultivar because of the differences in behaviour observed for each cultivar in the canonical analysis) were checked by GLM, using the two ilr estimation options.
Focusing on the significantly affected ilr coordinates, two ilr groups were observed: (i) those not (or scarcely) influenced by processing phases but showing significant differences between cultivars, and (ii) those affected by the processing phases in one or both cultivars. Because of the large number of significant ilr, only a few of them are commented. Regarding (i) (Figure S5), the ilr16, ilr12, and ilr10 were greater for Manzanilla than for Hojiblanca but reversed in the case of ilr5. To relate such trends with the FA profiles, one should consider the respective ilr definitions. For ilr16 (C17:1 over the geometric mean of C18:1c, C20:1, and C18:2n-6), C18:1c and C20:1 had a higher value in Hojiblanca than in Manzanilla, but similar levels of the other two FAs in both cultivars, in agreement with Figure 1C. As C18:1c and C20:1 are included in the denominator of ilr16, they are responsible for the lower ilr16 values in Hojiblanca than in Manzanilla ( Figure S5A). A similar strategy can relate the remaining ilr changes ( Figure S5) to their involved FAs.
Regarding the (ii) cases ( Figure 6), some ilr estimation formulas are simple, but the explanations are not always obvious. Regarding (i) (Figure S5), the ilr16, ilr12, and ilr10 were greater for Manzanilla than for Hojiblanca but reversed in the case of ilr5. To relate such trends with the FA profiles, one should consider the respective ilr definitions. For ilr16 (C17:1 over the geometric mean of C18:1c, C20:1, and C18:2n-6), C18:1c and C20:1 had a higher value in Hojiblanca than in Manzanilla, but similar levels of the other two FAs in both cultivars, in agreement with Figure 1C. As C18:1c and C20:1 are included in the denominator of ilr16, they are responsible for the lower ilr16 values in Hojiblanca than in Manzanilla ( Figure S5A). A similar strategy can relate the remaining ilr changes ( Figure S5) to their involved FAs.
Regarding the ilr coordinates from Ward's clustering (Figure 7), indicated by a "c" after the ilr number, ilr17c (C18:1c over the geometric mean of C17:1 and C17:0) only suffered slight non-significant changes (opposed for each cultivar) during processing ( Figure 7A). However, in ilr14c (C15:0 over the geometric mean of C14:0, C17:0, C17:1, C18:1c, and C18:2n-6), the diminution after lye treatment in both cultivars ( Figure 7D) could be caused by the partial decrease (possibly degradation) of C15:0 (mainly in Manzanilla), in agreement with Figure 1A. Still, the assignation is not conclusive due to the several FAs involved. Similar strategies can be applied to the remaining ilr coordinates from Ward's clustering. Figure 7. Examples of the influence of processing phases on ilr coordinates deduced from SBP following Ward's clustering ("c" after the ilr coordinate number). (A-D) show ilr coordinates which showed significant differences between cultivars. T0, fresh olives; T1, lye-treated olives; T2, fermented olives; T3, packaged olives.
As observed, associating the effects on ilr coordinates to the involved FAs is not always straightforward in the current CoDA state of the art. Therefore, identifying the FAs responsible for the changes is sometimes complex and planning processing modifications to prevent their occurrence may still not be possible. The problem is currently under study, but the solutions are not simple [39].
No classification of Manzanilla and Hojiblanca table olive fat is intended in this work, which it is exclusively devoted to CoDA, since the "official" classification in the current legislation of olive oil is based on limits established in percentages [3]. However, this circumstance may be an opportunity for CoDA to be used to develop new standards defined according to the true structure of FA profiles.
Furthermore, many ratios between FAs in oil chemistry/biochemistry are transcendent. As an example, the ratio of C18:1c/C16:0 is considered of interest in nutrition because palmitic acid has a regulatory influence on thrombogenic and fibrinolytic markers during the postprandial state in healthy subjects [40]; however, the recommended value (<5) [41] is deduced from percentages. Since CoDA is based on relative concentrations, using the new statistic may provide this and many other (log) ratios habitually found in the nutrition/olive oil literature in a natural way, with solid statistical support, and lead to easy handling and interpretable thresholds.

Conclusions
This work has demonstrated that green Manzanilla and Hojiblanca table olive FA profiles can be considered as compositional data and, consequently, statistically analysed with CoDA tools. In line with this hypothesis, the application of the typical CoDA exploratory tools was helpful to study the effects of green Spanish-style processing and packaging on the FA composition of the mentioned cultivars. A simple geometric mean barplot displayed the changes in each FA concerning the overall mean and allowed observation of the effects of the different elaboration steps and cultivar. Furthermore, CoDA exploratory techniques such as tetrahedral display, biplot, and CoDa dendrogram agree with the barplot and also displayed that the more relevant influences were observed among FA profiles of cultivars while the processing effects were scarce.
The transformation of the original data in percentages into ilr coordinates allowed the application of standard multivariate techniques to the new data set. Canonical variate analysis confirmed the differences between cultivars and showed that Hojiblanca was more prone to suffer FA modifications than Manzanilla. LDA (bootstrapping = 1000) was 100% successful in assigning cultivars but the one-out cross-validation was not so efficient, with the ilr coordinates from Wards' clustering leading to a greater level of successful LDA assignations than the decreasing order of variances (100 vs. 50% for MT1, MT3, or HT0). The MANOVA analysis found significant differences between cultivars and processing phases. The p-values were diverse between the same order of ilr balance (coordinate) due to the different basis; however, both showed that the first two ilr coordinates (based on the same FAs) were not significantly different between treatments (p-values: 1.663 (ilr1) and 0.609 (ilr2), decreasing order of variance; 0.176 (ilr1) and 0.742 (ilr2), based on Ward's clustering). That is, their high variances were due to large errors because of the low proportion of the FA in the numerator. The GLM studied the MANOVA differences in detail (nested model), detecting significant differences in ilr balances (coordinates) between cultivars or changes during processing. The FAs responsible for the changes were identified in the simpler cases (i.e., the greater value of ilr16 in Manzanilla than in Hojiblanca could be assigned to higher values of C18:1C and C20:1, denominator in the formula, in Hojiblanca) but in most cases were hardly identifiable because of the complex ilr structures.
Overall, according to the CoDA, table olive fatty acid profiles are different between cultivars and are scarcely affected by Spanish-style processing. Moreover, this work has demonstrated that fatty acid profiles, as compositional data, can be successfully studied by CoDA and that the same strategy can be applied to olive oils and fats in general, with the advantage of applying appropriate statistical techniques and preventing misinterpretations. This survey may represent a model for such statistical analyses.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/foods11244024/s1, Figure S1. Tetrahedral display of the processing phases, according to cultivars, as a function of the fatty acid showing the largest clr variances. M, Manzanilla; H, Hojiblanca; T0, fresh olives; T1, lye-treated olives; T2, fermented olives; T3, packaged olives; Figure S2: CoDa form biplot, which preserves distances between treatments, helpful for studying differences among groups. PC1 accounted for 45.70% variance and PC2 for 26.07% (together, 71.77%). M, Manzanilla; H, Hojiblanca; T0, fresh olives; T1, lye-treated olives; T2, fermented olives; T3, packaged olives; Figure S3: Balance dendrogram, based on the sequential binary partition of FAs according to the descending order of variance. M, Manzanilla; H, Hojiblanca; T0, fresh olives; T1, lye-treated olives; T2, fermented olives; T3, packaged olives. Only first relevant balances are enumerated; Figure S4: Segregation of processing phases, according to cultivars, by canonical variate plot of ilr coordinates. The coordinates were obtained by following the clr decreasing variance for the first 11 fatty acids and just the variable order for the remaining balances. The overall centres for Manzanilla and Hojiblanca are also plotted. M, Manzanilla; H, Hojiblanca; T0, fresh olives; T1, lye-treated olives; T2, fermented olives; T3, packaged olives; Figure S5: Processing phase effects, according to cultivar, on ilr coordinates. Similar changes were also observed for ilr8, ilr6, ilr11c, ilr9c, ilr8c, ilr5c, and ilr4c (with "c" indicating obtained following Ward's clustering), but with the values of Manzanilla always above those of Hojiblanca. T0, fresh olives; T1, lye-treated olives; T2, fermented olives; T3, packaged olives; Table S1: Fatty acid composition of the Manzanilla and Hojiblanca fat throughout the Spanish-style processing and packaging; Table S2. Sequential binary partition (SBP) matrix using the decreasing order of fatty acid clr variances (except last five balances, which just followed the variable order of the remaining acids); Table S3. Ilr coordinates obtained by SBP based on the decreasing order of the fatty acid clr variance (except last five balances, which just followed the variable order of the remaining acids); Table S4. SBP matrix, obtained following Ward's clustering sequence; Table S5. Ilr coordinates obtained by SBP, based on Ward's clustering sequence; Table S6. Means and variances of the balances obtained by SBP based on the decreasing clr variance and Ward's clustering sequence; Table S7. Proportion of successful assignation after application of Linear Discriminant Analysis, using bootstrapping (1000), to the ilr coordinates based on the clr variance decreasing order and Ward's clustering sequences. where diagonal terms are zeros (log-ratio of each component by itself) and is symmetric with respect to the diagonal. To prevent unnecessary repetition, the bottom half array is used to show the corresponding (average over treatment) log-ratios ln x i xj . The total variance of components is summarized by: var ln x i xj

Log-ratio transformation of compositional data
The most common transformations are the centred log-ratio (clr) and the isometric log-ratio (ilr), which leads to orthonormal coordinates in the Euclidean space.
It is defined as: , . . . , ln x D g m (x) which is estimated row-wise and, apart from CoDa biplot, has diverse applications since it allows relating results directly to the original variables. The variance of each clr part (clr variance) is estimated column-wise by the standard method [6].
They are obtained by the successive partition of the composition into two mutually exclusive groups (SBP). The process ends when the group has only two single parts. The number of transformed values is always (D parts-1). Their normalized log-ratios (balances or coordinates) are estimated, row-wise, by: The parts in the numerator, denominator, or absent are coded by 1, −1, and 0, respectively, and are gathered in a matrix, which is required for back-transformation of the obtained coordinates into the original units [6].
Readers are encouraged to consult specialised texts for more detailed information on CoDA techniques [6,7,17,33]. Nevertheless, most of the techniques are briefly commented on when introduced in the text. The standard multivariate techniques were always applied to log-ratios, ilr-transformed data (usually known as coordinates).