Methodology for the Automatic Inventory of Olive Groves at the Plot and Polygon Level

: The aim of this study was to develop and validate a methodology to carry out olive grove inventories based on open data sources and automatic photogrammetric and satellite image analysis techniques. To do so, tools and protocols have been developed that have made it possible to automate the capture of images of different characteristics and origins, enable the use of open data sources, as well as integrating and metadating them. They can then be used for the development and validation of algorithms that allow for improving the characterization of olive grove surfaces at the plot and cadastral polygon scales. With the proposed system, an inventory of the Andalusian olive grove has been automatically carried out at the level of cadastral polygons and provinces, which has accounted for a total of 1,519,438 hectares and 171,980,593 olive trees. These data have been contrasted with various ofﬁcial statistical sources, thus ensuring their reliability and even identifying some inconsistencies or errors of some sources. Likewise, the capacity of the Sentinel 2 satellite images to estimate the FCC at the cadastral polygon, parcel and 10 × 10 m pixel level has been demonstrated and quantiﬁed, as well as the opportunity to carry out inventories with temporal resolutions of approximately up to 5 days.


Introduction
The olive tree is one of the most representative crops in the Mediterranean basin closely linked to the economy and culture of the region. Worldwide, the Mediterranean basin is the producer of 99% of olive oil and the consumer of 87% [1].
Traditional olive groves have been and still are a very important component of the Mediterranean landscape. To a considerable extent we could classify them as forests that produce an important range of ecosystem services: healthy food, biodiversity, living soils, carbon sequestration, culture, employment, life in villages, etc. Particularly with regard to olive grove landscapes, the EU has shown great interest in their conservation through strategies such as the European Landscape Convention [2], developing a specific tool for the protection and management of olive groves.
The lack of economic sustainability is causing traditional olive groves to be at serious risk of survival, disappearing on many occasions. More recently, this crop has been changing from traditional rainfed olive groves with a low density of trees (less than 100 trees per hectare) to olive groves of medium and high density, mostly associated with the introduction of irrigation, which is also promoting the substitution of various crops (wheat, barley, sunflower, cotton) by high-density olive groves [3,4]. All this is causing major changes in the management of the olive groves, especially in their intensification, as well as in the economic, social and environmental impacts.
These changes have been strongly noted in one of the main olive-growing regions in the world, Andalusia. Figure 1 and Table 1 show the difference in olive grove areas between 2015 and 2018. In general, there has been an increase in high-density plantation to the detriment of low ones [5].  Studies such as [3] indicate that this is causing a high environmental impact, among which the problems derived from water needs stand out. However, there is also evidenc that the intensification of deficit irrigation has improved carbon sequestration, as investi gated by [6], through the modeling of the implications of climate variability and agricul tural management on the productivity and environmental performance of olive crops in the Mediterranean. In addition, an increase in such intensification would increase irriga tion needs. For this reason, it is becoming more and more necessary to be able to system atize the monitoring of olive tree density with detailed and permanently updated infor mation in large areas. Studies such as [7,8] show the need to provide more detailed infor mation on olive farming practices and to make and quantify proposals to increase specifi sustainable practices at the farm level [9].
In this regard, there are numerous publications related to data-capture mechanism in the field and the use of platforms for their management and visualization [10][11][12][13][14][15]. In complementary way, important efforts are being made to create common data spaces in the agricultural field [16][17][18][19][20][21][22], which try to overcome the existing barriers regarding th global management of data. The main obstacles found are the following: the complexity of data management [23], the lack of interoperability [16,17,22], the insufficiency of stor age units and processing platforms [24,25], as well as the scarcity of reference architecture [23,[26][27][28][29][30]. Overcoming these limitations would make it possible to take full advantage o the potential of data analysis and management, strengthening the capabilities of decision making support systems.
Further, the characterization and monitoring of large areas of crops is becoming key factor to improving and supporting decision-making. The combination of remote dat with ground measurements, obtained from interpretations of high spatial resolution aeria photogrammetry through image analysis, significantly improves the ability to study land processes. In this line, different studies are being promoted, as is the case of [31] where methodology for landscape sampling, mapping and characterization of a complex agro Studies such as [3] indicate that this is causing a high environmental impact, among which the problems derived from water needs stand out. However, there is also evidence that the intensification of deficit irrigation has improved carbon sequestration, as investigated by [6], through the modeling of the implications of climate variability and agricultural management on the productivity and environmental performance of olive crops in the Mediterranean. In addition, an increase in such intensification would increase irrigation needs. For this reason, it is becoming more and more necessary to be able to systematize the monitoring of olive tree density with detailed and permanently updated information in large areas. Studies such as [7,8] show the need to provide more detailed information on olive farming practices and to make and quantify proposals to increase specific sustainable practices at the farm level [9].
In this regard, there are numerous publications related to data-capture mechanisms in the field and the use of platforms for their management and visualization [10][11][12][13][14][15]. In a complementary way, important efforts are being made to create common data spaces in the agricultural field [16][17][18][19][20][21][22], which try to overcome the existing barriers regarding the global management of data. The main obstacles found are the following: the complexity of data management [23], the lack of interoperability [16,17,22], the insufficiency of storage units and processing platforms [24,25], as well as the scarcity of reference architectures [23,[26][27][28][29][30]. Overcoming these limitations would make it possible to take full advantage of the potential of data analysis and management, strengthening the capabilities of decision-making support systems.
Further, the characterization and monitoring of large areas of crops is becoming a key factor to improving and supporting decision-making. The combination of remote data with ground measurements, obtained from interpretations of high spatial resolution aerial photogrammetry through image analysis, significantly improves the ability to study land processes. In this line, different studies are being promoted, as is the case of [31] where a methodology for landscape sampling, mapping and characterization of a complex agroforestry system in sub-Saharan Africa is provided.
Therefore, the use of remote sensing has an increasingly important role in the continuous, effective, precise and complete monitoring of large areas, being key in the decisionmaking processes of agroforestry management [32,33]. It allows crop mapping to be carried out at a low cost and with high frequency, which makes it possible to extend these studies [6][7][8] to large areas.
For this reason, an aspect of great importance is the possibility of using the synergies between automated procedures to identify and characterize the different ecological units in olive groves from very high resolution images, such as orthophotos with a spatial resolution of 0.5 m or superior. The analyses carried out with satellite images of lower spatial resolution could substantially improve their usefulness and accuracy if models based on spectral mixtures were developed using previous segmentations carried out with image analysis techniques. These studies would make it possible to complement the temporal resolutions of the systems for obtaining digital aerial orthophotography, such as the National Plan for Aerial Orthography (PNOA) [34], whose update period is every three years, which makes it impossible to update inventories periodically.
Regarding the treatment and processing of satellite images, there are platforms such as Google Earth Engine (GEE), which provide easy access to a wide catalogue of images, including those captured by the Sentinel 2-MSI (MultiSpectral Instrument) satellites, and allow for the extraction of relevant vegetation indices in a simple way [35]. Vegetation indices are capable of monitoring crop growth with high-resolution satellite images [36]. Among these indices, the NDVI (Normalized Difference Vegetation Index) has been defined as a good tool to indicate significant changes in land use and cover [37,38]. NDVI shows better results than other indices such as the adjusted vegetation index to the ground (SAVI) [38] and is one of the most widespread, due to its simplicity and availability [39,40]. In addition, according to [41], the analysis of the NDVI for the estimation of the surface of the crops and the qualitative evaluation of these with hydric stress, can lead to an optimization in irrigation management systems.
Thus, it is very important to be able to automate the inclusion of new data, coming from: (i) high-resolution aerial photogrammetry image analysis, such as the crown area, Fraction Canopy Cover (FCC), tree density or the identification of different typologies; (ii) analysis of the NDVI at the pixel and sub-pixel level; (iii) existing open data sources such as the Geographic Information System of Agricultural Parcels SIGPAC [42], the Andalusian Phytosanitary Information and Alert Network RAIF [43] and the Integrated Treatments in Andalusia in Agriculture TRIANA [44]. Furthermore, it is necessary to enable the creation of common data spaces that make it possible to value the tools for the conservation of the olive grove and the detection of changes in its management.
In addition, the processing of measurements from high-resolution aerial frames is very useful for improving the interpretation of satellite images; they provide field data from which it is possible to calibrate the models. Recent studies have focused on the development of tools to generate automated agroforestry inventories for the analysis of large areas [31]. These allow us to calibrate and optimize the analyses carried out with lower resolution satellite images, which are needed in order to complement the studies with spectral mixture analysis techniques of automatic pixel analysis, to improve feature extraction at the olive tree level. Nevertheless, further work is still necessary to delve further into in order to optimize results. In this study, the FCC-NDVI relationship has been evaluated.
We can summarize that it is necessary to obtain detailed information on large olive grove areas, having the plot as the sampling unit and, where appropriate, scaling it to larger territories, which will make it possible to carry out studies at the farm level and characterize the ecosystem services of the olive grove crop providing tools to help decision making.
In this regard, the general objective of the work was to develop and validate a methodology to carry out olive grove inventories based on automatic analysis techniques of photogrammetric images of PNOA, satellite images of the Sentinel constellation and open data sources.

Study Area
The study has been carried out in 11,488 olive grove polygons, which occupy a total of 1,519,438 ha (shown in red in Figure 2), which represents 92% of the total olive grove cultivation in Andalusia. The area was divided into eight zones that correspond to the 8 provinces of Andalusia, thus covering the most widespread varieties such as Picual, Hojiblanca, Manzanilla Verdial, Lechín, Empeltre, Blanqueta, Farga and Arbequina, as well as different plantation frameworks.

Study Area
The study has been carried out in 11,488 olive grove polygons, which occupy a total of 1,519,438 ha (shown in red in Figure 2), which represents 92% of the total olive grove cultivation in Andalusia. The area was divided into eight zones that correspond to the 8 provinces of Andalusia, thus covering the most widespread varieties such as Picual, Hojiblanca, Manzanilla Verdial, Lechín, Empeltre, Blanqueta, Farga and Arbequina, as well as different plantation frameworks.

Data Set
The set of high-resolution images used to characterize the olive grove comes from aerial orthophotographs of the PNOA [34] obtained by photogrammetric flights with a high-resolution digital camera. This image dataset has a spectral resolution of 3 bands (blue, green, and red), a radiometric resolution of 12 bits per band, and a spatial resolution of 50 cm.
To obtain the NDVI, images from Sentinel 2 [45] have been used, which provide multispectral data with 13 bands in the visible, near-infrared and short-wave infrared part of the spectrum. They have a spatial resolution of 10 m and a temporal resolution of approximately 5 days.
The contrast and validation of the data was carried out through manual counts and information from the RAIF of olive grove plots [43].
Geospatial and crop data were obtained from SIGPAC [42] and varietal data from TRIANA [43].
The different data were taken for the time interval between 1 January 2019 and 31 December 2019. This interval was selected based on the dates of the most recent PNOA photos at the time of the study, which dated from 2019.

Programming Languages
The programming languages and the specific processes for which they have been used are as follows: • MATLAB 2021a (9.10) with two libraries, the Image Analysis Processing toolbox [46] and openearthtools [47], was used for: i.
High resolution digital image processing. • Python 3.6.13 was used for: i.
Obtaining NDVI data through the Google Earth Engine platform [35] ii.
The creation of linear regression models. iii.
The calibration of the results.

Data Set
The set of high-resolution images used to characterize the olive grove comes from aerial orthophotographs of the PNOA [34] obtained by photogrammetric flights with a high-resolution digital camera. This image dataset has a spectral resolution of 3 bands (blue, green, and red), a radiometric resolution of 12 bits per band, and a spatial resolution of 50 cm.
To obtain the NDVI, images from Sentinel 2 [45] have been used, which provide multispectral data with 13 bands in the visible, near-infrared and short-wave infrared part of the spectrum. They have a spatial resolution of 10 m and a temporal resolution of approximately 5 days.
The contrast and validation of the data was carried out through manual counts and information from the RAIF of olive grove plots [43].
Geospatial and crop data were obtained from SIGPAC [42] and varietal data from TRIANA [43].
The different data were taken for the time interval between 1 January 2019 and 31 December 2019. This interval was selected based on the dates of the most recent PNOA photos at the time of the study, which dated from 2019.

Programming Languages
The programming languages and the specific processes for which they have been used are as follows: • MATLAB 2021a (9.10) with two libraries, the Image Analysis Processing toolbox [46] and openearthtools [47], was used for: i. High resolution digital image processing.
• Python 3.6.13 was used for: i. Obtaining NDVI data through the Google Earth Engine platform [35] ii. The creation of linear regression models.
iii. The calibration of the results.
• Python 3.8.5 was used for: i. Automatic acquisition of PNOA images.
ii. The identification of the area of interest through shapefile files of the geographic information system of agricultural parcels (SIGPAC). iii. Development of APIs for the integration of different open data sources (SIGPAC, RAIF, TRIANA) iv. Data cleaning and preprocessing.

Procedure
The first step was to adapt and validate the tool developed in the study [48] for the case of olive groves. Subsequently, the olive grove inventory was created from the data obtained in the image processing with the validated tool. Then, the integration of metadata from different open data sources such as SIGPAC, RAIF and TRAINA was carried out. Finally, with the information generated in the inventory, the FCC-NDVI relationship was evaluated at the polygon, plot, and pixel levels.

Validation of the Tool for Olive Groves
In order to validate the tool, the following steps were carried out: (i) manual counts of 33 olive grove plots chosen at random from the set of olive groves in Andalusia ( Figure 3); (ii) checks by an observer on the FCC mask obtained by the tool to validate the FCC ( Figure 4). • Python 3.8.5 was used for: i.
Automatic acquisition of PNOA images. ii.
The identification of the area of interest through shapefile files of the geographic information system of agricultural parcels (SIGPAC). iii.
Development of APIs for the integration of different open data sources (SIGPAC, RAIF, TRIANA) iv.
Data cleaning and preprocessing.

Procedure
The first step was to adapt and validate the tool developed in the study [48] for the case of olive groves. Subsequently, the olive grove inventory was created from the data obtained in the image processing with the validated tool. Then, the integration of metadata from different open data sources such as SIGPAC, RAIF and TRAINA was carried out. Finally, with the information generated in the inventory, the FCC-NDVI relationship was evaluated at the polygon, plot, and pixel levels.

Validation of the Tool for Olive Groves
In order to validate the tool, the following steps were carried out: (i) manual counts of 33 olive grove plots chosen at random from the set of olive groves in Andalusia (   • Python 3.8.5 was used for: i.
Automatic acquisition of PNOA images. ii.
The identification of the area of interest through shapefile files of the geographic information system of agricultural parcels (SIGPAC). iii.
Development of APIs for the integration of different open data sources (SIGPAC, RAIF, TRIANA) iv.
Data cleaning and preprocessing.

Procedure
The first step was to adapt and validate the tool developed in the study [48] for the case of olive groves. Subsequently, the olive grove inventory was created from the data obtained in the image processing with the validated tool. Then, the integration of metadata from different open data sources such as SIGPAC, RAIF and TRAINA was carried out. Finally, with the information generated in the inventory, the FCC-NDVI relationship was evaluated at the polygon, plot, and pixel levels.

Validation of the Tool for Olive Groves
In order to validate the tool, the following steps were carried out: (i) manual counts of 33 olive grove plots chosen at random from the set of olive groves in Andalusia (     Figure 5 shows a summary of the methodology followed to prepare the inventory. Each of the modules is detailed below. agronomy 2022, 12, × FOR PEER REVIEW 6 of 20 Figure 5 shows a summary of the methodology followed to prepare the inventory. Each of the modules is detailed below.

Open data sources
The data sources consulted to extract olive grove information were: the Andalusian Phytosanitary Alert Network (RAIF) [43] for phytosanitary information and the Geographic Information System for Agricultural Parcels (SIGPAC) [42,49] and the TRIANA program [44] for geographic and crop information.
RAIF is a set of open data obtained from the monitoring of pests and diseases in the biological control stations, in addition to generic crop information. The data are displayed in Excel files by crop type from 2006 to 2021. The information is updated every week. The main parameters selected for this study were crop type, cultivated area, tree density and crown diameter. These parameters have been used for the automatic validation of the methodology.
TRIANA is a computer program for crop management, in addition to phytosanitary information it provides crop information. The information provided by this data source is the following: crop type, cultivated area, irrigation, nearest climatic season, planting frame, main variety, planting date, secondary variety and planting date. This information has been contrasted and included in the inventory.
The SIGPAC allows the geographic identification of parcels declared by farmers and ranchers. It is accessible through WMS services [49]. The land is sectorized, the smallest unit to be treated is this study is the plot (PROV;NUM;POL;PLOT). By selecting such a sector, the relevant information of the sector is obtained, such as geospatial information, land use, cultivated area, irrigated area and soil slope. The geospatial information provided has been used to automate the download of high resolution images, and land use to identify the area of interest (AOI).
From the different data sources, aspects such as complexity, access limitations, automation capacity, temporal and spatial frequencies, as well as the available time range were analyzed. After that, data acquisition was automated, carrying out temporally and geographically limited tests with the aim of evaluating the availability and access to the data.

1.
Open data sources The data sources consulted to extract olive grove information were: the Andalusian Phytosanitary Alert Network (RAIF) [43] for phytosanitary information and the Geographic Information System for Agricultural Parcels (SIGPAC) [42,49] and the TRIANA program [44] for geographic and crop information.
RAIF is a set of open data obtained from the monitoring of pests and diseases in the biological control stations, in addition to generic crop information. The data are displayed in Excel files by crop type from 2006 to 2021. The information is updated every week. The main parameters selected for this study were crop type, cultivated area, tree density and crown diameter. These parameters have been used for the automatic validation of the methodology.
TRIANA is a computer program for crop management, in addition to phytosanitary information it provides crop information. The information provided by this data source is the following: crop type, cultivated area, irrigation, nearest climatic season, planting frame, main variety, planting date, secondary variety and planting date. This information has been contrasted and included in the inventory.
The SIGPAC allows the geographic identification of parcels declared by farmers and ranchers. It is accessible through WMS services [49]. The land is sectorized, the smallest unit to be treated is this study is the plot (PROV;NUM;POL;PLOT). By selecting such a sector, the relevant information of the sector is obtained, such as geospatial information, land use, cultivated area, irrigated area and soil slope. The geospatial information provided has been used to automate the download of high resolution images, and land use to identify the area of interest (AOI).
From the different data sources, aspects such as complexity, access limitations, automation capacity, temporal and spatial frequencies, as well as the available time range were analyzed. After that, data acquisition was automated, carrying out temporally and geographically limited tests with the aim of evaluating the availability and access to the data. Finally, different analysis, preprocessing and cleaning techniques were used, erroneous data were eliminated through Random Forest techniques [50] and missing values were imputed through the imputeTS and missForest libraries [51]. Lastly, to check the consistency of the data, the data from the different sources were compared, eliminating non-coherent data, thus generating a unified access point for reliable and contrasted agronomic data.

Automatic image acquisition
This module automates the downloading, identification and delimitation of study areas from PNOA images for different ecosystems or crops and zones, obtaining a high-resolution image of each polygon/plot and a shapefile with geographic and crop information.
The image download procedure was as follows: i. Geospatial and crop information of the area of interest was obtained from the SIGPAC shapefiles. ii. The database stores geospatial and crop information at the polygon and/or plot level. iii. The PNOA Downloader script takes the geospatial information from the DB and will call the IGN WMS services to download the orthophotography. iv. The PNOA Downloader script will generate two outputs: the orthophoto of the polygon/plot with the AOI of the selected crop (olive grove for this particular study) and a shapefile with metadata. v.
This processed information will be saved in the olive grove data inventory for later interpretation by the tool for precise olive grove characterization.

Identification of elements of interest
Image analysis techniques were used for preprocessing [52,53], segmentation [54,55] and classification [56]. In addition, specific developments carried out in previous studies [48] were used to identify the elements of interest in the high-resolution images obtained in point 2. Based on the regularity present in the olive grove, false positives were considered to be those elements whose area or eccentricity were much higher than the average of the detected objects, which allowed for the elimination of elements of other species. As a result, a .TIFF image was obtained with the identified elements and a shapefile with metadata per polygon, thus obtaining automated information on the number of trees, FCC, tree density and tree canopy area. These data obtained from the interpretation of the images were stored in the database to form part of the olive grove data inventory.

FCC-NDVI Evaluation
The evaluation of the FCC-NDVI relationship was carried out considering the polygon and geographical area of the whole Andalusia as the sampling unit and also another approximation with the pixel sampling unit and the geographical area as a plot.
The steps followed are detailed below: i.
Polygon-level study. The polygon was taken as the sampling unit and the different months of the year 2019 and provinces were evaluated. For this calculation, the mean values of the NDVI of the polygons were taken for each month and the FCC obtained in the characterization of olive groves through orthophotos of the PNOA of 2019. A total of 10,031 polygons were used. ii. Plot-level study. The plot was taken as the sampling unit. The evaluation was carried out with a subset of plots from one of the provinces. In the same way as for the polygon, the mean values of the NDVI of the plots for the summer months and the FCC obtained in the characterization of the olive grove through orthophotos of the PNOA of 2019 were taken. A total of 287 plots were used. iii. Pixel-level study. Finally, a pixel-level study was carried out, selecting a plot (PROV_14, NUM_900,POL_18,PLOT_14) with diversity in terms of plantation frames and a Sentinel image corresponding to the month of August 2019. This evaluation was carried out by comparing the mean NDVI of each pixel of the Sentinel image, with the pixels of a 10 × 10 m resolution image generated from the FCC obtained in the characterization of olive groves. An image with a size of 95 × 57 pixels was used (see Figure 6). carried out by comparing the mean NDVI of each pixel of the Sentinel image, with the pixels of a 10 × 10 m resolution image generated from the FCC obtained in the characterization of olive groves. An image with a size of 95 × 57 pixels was used (see Figure 6). In all the procedures, it has been considered that the FCC does not present significant variability in one year. Figure 7 shows the methodology followed to develop the FCC estimation model from the NVDI.

Olive grove data extraction
From the olive grove data inventory, the shapefile (shp) files were obtained with the necessary geographic information to obtain the NDVI by remote sensing and the FCC of the polygons and plots for the validation data.

Remote sensing data
A Python script was developed that obtained the time series for each plot and polygon from the ShapeFile obtained in the characterization of the olive grove, as well as the images necessary for the evaluation at the pixel level.

Validation data
The data considered as real for the calibration and validation of the model were extracted from the FCC obtained in the olive grove characterization. In all the procedures, it has been considered that the FCC does not present significant variability in one year. Figure 7 shows the methodology followed to develop the FCC estimation model from the NVDI.

1.
Olive grove data extraction From the olive grove data inventory, the shapefile (shp) files were obtained with the necessary geographic information to obtain the NDVI by remote sensing and the FCC of the polygons and plots for the validation data.

2.
Remote sensing data A Python script was developed that obtained the NDV I mean time series for each plot and polygon from the ShapeFile obtained in the characterization of the olive grove, as well as the images necessary for the evaluation at the pixel level. Validation data The data considered as real for the calibration and validation of the model were extracted from the FCC obtained in the olive grove characterization.

Data analysis
The data collected were grouped by province and month and two simple linear regression models were evaluated to model the relationship between FCC and NDVI: scikit-learn and statsmodels approach (Equation (1)) To evaluate the model, the data were divided into two groups: a training group for calibration, in which 80% of the data were used, and a test group, with the remaining data, to evaluate the capacity of the model prediction. The goodness of fit of the model was evaluated using the coefficient of determination (R-squared), the p-value statistics were determined (using the F-Test), and the performance of the model was evaluated using the mean squared error (rmse). After generating the model, the confidence interval was used to measure the uncertainty associated with the prediction. The data collected were grouped by province and month and two simple linear regression models were evaluated to model the relationship between FCC and NDVI: scikitlearn and statsmodels approach (Equation (1)) . ( To evaluate the model, the data were divided into two groups: a training group for calibration, in which 80% of the data were used, and a test group, with the remaining data, to evaluate the capacity of the model prediction. The goodness of fit of the model was evaluated using the coefficient of determination (R-squared), the p-value statistics were determined (using the F-Test), and the performance of the model was evaluated using the mean squared error (rmse). After generating the model, the confidence interval was used to measure the uncertainty associated with the prediction.  Table 2 shows the results of the estimation and real value of the number of trees and FCC, as well as the relative error of the 33 plots where the manual counts were made.   Table 2 shows the results of the estimation and real value of the number of trees and FCC, as well as the relative error of the 33 plots where the manual counts were made.

Validation of the Tool for Olive Groves
As an example, Figure 8 shows

Creation of Inventories of Olive Groves
After the validation of the developed automatic analysis tool, the olive grove area of Andalusia as a whole was characterized. Table 3 collects the detail of the entire surface characterized by the olive grove tool. The columns Processed Surface, FCC and number of trees, are the data calculated by the automatic analysis tool. The Total Surface column corresponds to the total hectares of olive groves by province, with data collection from the olive grove production capacity carried out by the Andalusian Government for the 2021-2022 campaign.

Creation of Inventories of Olive Groves
After the validation of the developed automatic analysis tool, the olive grove area of Andalusia as a whole was characterized. Table 3 collects the detail of the entire surface characterized by the olive grove tool. The columns Processed Surface, FCC and number of trees, are the data calculated by the automatic analysis tool. The Total Surface column corresponds to the total hectares of olive groves by province, with data collection from the olive grove production capacity carried out by the Andalusian Government for the 2021-2022 campaign.  Individualized information was obtained from the units of interest at the polygon and plot level. These processed data form part of the olive grove inventory, which served as support for advanced analytical applications.
Considering the polygon as the study unit allows for a broader view of the olive grove landscape and minimizes edge-processing errors from image analysis, in addition to eliminating double imputation in the tree count of those trees found in the plot edges. Another important aspect is that the analysis at the polygon level allows for a perception of the landscape as a whole, identifying areas with trees and their continuity between plots, as well as other units larger than the plot and of interest for decision-making on a larger scale. From them, the plots can be segregated, to obtain more precise information on the plantation framework and estimation of the crown area. Figure 9 shows an example of a processed polygon, where different plantation frames and treeless areas can be seen.
In the process of integrating and contrasting the information generated by the tool with the different data sources consulted (RAIF; SIGPAC; TRIANA) some non-coherent data were found regarding the cultivated area of olive groves of the RAIF and the SIGPAC. Table 4 shows some records where there is no correspondence between the cultivated area proposed by the RAIF and the cadastral references (SIGPAC). To avoid errors and inconsistency, only the data that were consistent between the two sources were considered from the RAIF. This allowed demonstrating the usefulness of the tool in the analysis processes of different data sources.
Another important aspect is that the analysis at the polygon level allows for a perception of the landscape as a whole, identifying areas with trees and their continuity between plots, as well as other units larger than the plot and of interest for decision-making on a larger scale. From them, the plots can be segregated, to obtain more precise information on the plantation framework and estimation of the crown area. Figure 9 shows an example of a processed polygon, where different plantation frames and treeless areas can be seen.
(a) (b) Figure 9. Example of processing a polygon (14,55,17) belonging to Córdoba: (a) Control image generated by the tool; (b) Binary mask with identified trees.  After contrasting and integrating the different sources of information, the data included in the olive grove inventory were: (i) crown area, number of trees, FCC, cultivated hectares and tree density from automatic analysis tool; (ii) plantation frame, diameter crown, tree density (no. trees/ha cultivated) of the RAIF; (iii) type of crop, cultivated area, irrigation and geographical limitation of the SIGPAC; (iv) crop type, cultivated area, irrigation, nearest climatic season, planting frame, main variety, planting date, secondary variety and planting date of TRIANA.
The evaluation of 1,519,438 ha of olive groves and the inclusion of metadata from open data sources has allowed for the creation of inventories automatically, which will facilitate evolutionary analyses, and establish the data structure for a more in-depth characterization, identifying peculiarities automatically. In addition, it allows for obtaining general metrics, such as those shown in Figures 10 and 11. Specifically, Figure 10 shows the metrics of the ecological units obtained from the olive grove tool, such as the FCC and the number of trees, and contrasts them with other sources of data consulted, such as the hectares of irrigated land versus those of dry land of the processed area, obtained from the SIGPAC, which allows adding value to the data. Figure 11 shows the estimate of the area by province, with data calculated by the olive grove tool.     Additionally, to identify critical points of robustness of the tool, the comparison of calculated tree density and crown diameter was carried out, with one of the most complete sources of the characteristics of olive groves at the plot level (RAIF). To do so, the 200 plots that presented non-null values of these parameters for the year 2019 were selected. Table 5 shows the comparative analysis in a random set of 35 plots of the RAIF of Jaén.   Table 6 shows the results of the simple linear regression analysis between the NVDI and FCC value data at the cadastral polygon level, aggregated at the province level and for the months of the year with the best results. Table 6. R-squared, rmse, month, and best-fit simple regression model for each observation area (* p-values: <0.05; Models: S-L: Scikit-Learn; S-M: Stats-Models).  Figure 12 shows the best R-squared and rmse results and the month for which they were obtained.  Figure 12 shows the best R-squared and rmse results and the month for which they were obtained.  Table 7 shows the results of the simple linear regression analysis of the different models evaluated. The results show that the model that obtained the best approximation was the simple Stats-Models model. In both cases, the p-values and the F-Test were significant. Table 7. R-squared, rmse, month and best fit model for simple regression (* p-values: <0.05).  Table 7 shows the results of the simple linear regression analysis of the different models evaluated. The results show that the model that obtained the best approximation was the simple Stats-Models model. In both cases, the p-values and the F-Test were significant.  Table 8 shows the results of the simple linear regression analysis of the different models evaluated. The results show that the model that obtained the best approximation was the Scikit-Learn model, for the month of August. In both cases the p-values and the F-Test were significant.

Discussion
The objective of this work was to develop a system that allows for creating inventories of olive groves at different scales from the integration of open data sources and calculated automatically through image analysis. As a result, the characterization of 1,519,438 ha of olive groves (92% of the olive grove area of Andalusia) was obtained. This study is in line with [8], where a systematic analysis of the effects on the typology of the olive grove in the countryside of Córdoba and with the strategies of the European Landscape Convention [2] was carried out. In addition, it provides specific information at the polygon and plot level, which serves to be able to evaluate the specific practices at the farm level, a need detected in the studies [7,9].
Our proposal has achieved unified and operational access to the different data sources, allowing their publication and consumption through intuitive interfaces, facing the problem of lack of interoperability indicated in [16,17,22]. To achieve this objective, configurable algorithms have been developed that extract key agronomic information for different attributes, including: (i) crop and phytosanitary information; (ii) access to PNOA highresolution aerial photogrammetry; (iii) access to images for remote sensing; (iv) time series of the main vegetation indices. All these developments have a great potential to be used for other purposes and crops.
Furthermore, the analysis and integration of the different data sources has allowed their evaluation and comparison. With this, it has been possible to identify some noncoherent data between the different sources studied (see Table 4). Additionally, as can be seen in Table 5, in some cases there are quite a few differences between the data provided by the RAIF and the estimated data, detecting relative errors that reach up to 37%. A detailed analysis of these discrepancies has made it possible to identify that these deviations usually occur in super-intensive olive groves, where the estimation of the area with the methods used loses precision. For this reason, the use of different methodologies for calculating areas based on the plantation framework is proposed for future work. Other discrepancies could be partially explained considering that the data provided by the RAIF could to some extent be the result of rounding, concluding that at least the most important discrepancies with the estimated data would merit a comparison with real data. In the same sense, the automatic inclusion of this type of measurements would improve the confidence and precision of those collected in the RAIF.
Another point to highlight from the work is the interpretation of data models through image analysis and the use of remote sensing, which is key for the effective and continuous monitoring in large areas [32,33]. The results of our study indicate that the NDVI calculated from the Sentinel-MSI images, particularly in the summer season, has a high relationship with the FCC in all provinces. During this period, the NDVI signal is not influenced by vegetation cover between trees [57,58]. For the same reason, the prediction errors were greater in the remaining seasons (winter, spring and autumn), since they are influenced by the existing vegetation in the streets, which would also allow for characterizing this herbaceous stratum by subtraction.
Our proposal has achieved precise approximations in the different provinces (R-squared between 0.43 and 0.815), similar to those presented in other investigations for other crops [59]. Regarding the results at the plot level, an R-squared of 0.79 was reached. The usefulness of the models at the plot level, in addition to the estimation of the FCC, allows for the identification of the plantation framework, as well as a more precise approximation of the crown area, allowing for the inclusion of more detailed cartographic information in data sources, including existing data, such as RAIF, SIGPAC or TRIANA.
The studies carried out at the pixel level, where R-squared results of 0.655 were achieved, have allowed us to delve into the calculation of spectral mixtures within pixels. Despite the fact that the results were worse than those achieved at the plot level, it is still an attractive line of work to interpret the satellite images at the pixel level and their distribution in the territory.
Regarding the processing of high-resolution image analysis, the tool developed in the study [48] was used, parameterizing it for olive cultivation and the different geographical areas of Andalusia, which has shown the ability to extrapolate this tool to other ecosystems and study areas. In this sense, the shapefiles generated with detailed geographic and agronomic information are a valuable contribution to the inventory of olive groves that allow delving into studies such as sub-pixel classification and the estimation of mixtures, with the aim of classifying and accurately identifying the elements of the olive groves. This raises an interest in multispectral images provided by remote sensing, which is proposed for future studies.

Conclusions
The tools and protocols developed make it possible to automate the capture of images of different characteristics and origins, as well as from different open data sources, and integrate them and metadata them so that they can later be used for the development and validation of algorithms that can improve the characterization of the surfaces of olive grove at the plot and cadastral polygon scales.
The proposed system allows for identifying, locating, counting and measuring the fraction of canopy cover (FCC) of olive trees in different locations, plantation frameworks, varieties and tree cultivation techniques. It is robust and useful for carrying out automated inventories of olive groves and incorporating them into decision support strategies.
An inventory of the Andalusian olive grove has been automatically carried out at the level of cadastral polygons and provinces, which has accounted for a total of 1,519,438 hectares and 171,980,593 olive trees, data that have been contrasted with various official statistical sources allowing us to ensure the reliability of this study and even identify some inconsistencies or errors of some sources.
Obtaining singular information at the tree level opens up a great opportunity to systematize the measurement of the impact of various farming practices, the measurement of ecosystem services, the control of compliance with regulations and the granting of public aid.
The ability of Sentinel 2 satellite images to estimate the FCC at the cadastral polygon, plot and 10 × 10 m pixel levels, as well as to perform inventories with temporal resolutions of approximately up to 5 days, has been demonstrated and quantified.
The combination of object-oriented automatic image recognition techniques, with automatic pixel analysis techniques, have allowed us to explore the opportunity of mixture analysis to improve the estimation of olive trees and their characteristics, although it is still necessary to delve further in order to optimize results.