Monitoring System for the Management of the Common Agricultural Policy Using Machine Learning and Remote Sensing

: The European Commission promotes new technologies and data generated by the Copernicus Programme. These technologies are intended to improve the management of the Common Agricultural Policy aid, implement new monitoring controls to replace on-the-spot checks, and apply up to 100% of the applications continuously for an agricultural year. This paper presents a generic methodology developed for implementing monitoring controls. To achieve this, the dataset provided by the Sentinel-2 time series is transformed into information through the combination of classiﬁcations with machine learning using random forest and remote sensing-based biophysical indices. This work focuses on monitoring the helpline associated with rice cultivation, using 13 Sentinel-2 images whose grouping and characteristics change depending on the event or landmark being sought. Moreover, the functionality to check, before harvesting the crop, that the area declared is equal to the area cultivated is added. The 2020 results are around 96% for most of the metrics analysed, demonstrating the potential of Sentinel-2 for controlling subsidies, particularly for rice. After the quality assessment, the hit rate is 98%. The methodology is transformed into a tool for regular use to improve decision making by determining which declarants comply with the crop-speciﬁc aid obligations, contributing to optimising the administrations’ resources and a fairer distribution of funds.


Introduction
The main objective of the new European Common Agricultural Policy (CAP) (EU Regulation No. 1307/2013) is the development of favourable and sustainable actions to promote efficient agricultural practices. To this end, they must ensure food security, provide raw materials to the agri-food industries, and generate benefits for a sustainable and competitive agricultural sector [1]. To achieve these objectives, the European Commission supports the implementation and use of new technologies to achieve simplification of the implementation and management of CAP support. Furthermore, with the revision made by the Commission Implementing Regulation (EU Regulation 2018/746), remote sensingbased methodologies have become an effective mechanism for agricultural monitoring and management. In this respect, monitoring controls play an important role as the primary control methodology to be applied during the period of the new CAP reform to area-based interventions, replacing classical on-the-spot checks. Monitoring controls are a procedure of periodic and systematic observation of the earth, based mainly on satellite images. These checks are preventive, and their objective is the regular and continuous verification of the compatibility between the agricultural activity declared by the farmer and that is observed in the time series through the use of satellite images [2]. This information, processed with and periodically, throughout the agricultural campaign, under the premise of monitoring compliance with the obligations undertaken by the declarants on the aid line associated with rice. The application of this work contributes to automating and improving the management of aid to bring about a change of approach in the management of CAP aid, with automated mass control and almost no field visits or measurements.
In summary, it was possible to design a methodology that combines the results obtained by three intermediate products, i.e., biophysical indices, RF classification of date, and RF classification of a time series, to determine whether the obligations of the rice aid were being met with up to 97% effectiveness. The time series RF classification showed the most significant specific weight in the final performance. However, it was clear that the combination of all intermediate products was instrumental in minimising errors, resulting in a better and fairer distribution of CAP resources. This methodology contributed to detecting 12 fraudulent aid applications, representing 4% of the total number of applications monitored. The process was carried out continuously on 100% of the aid applications for the agricultural year without any field visits.

Related Work
Today, different research projects promote the use of AI, of which we highlight the following: NaLamKI [11], which aims to promote more efficient and sustainable agriculture by developing artificial intelligence methods to analyse remote sensing data for modelling agricultural processes and for 5G networks on farms, or knowlEdge [12], which proposes, among other things, a knowledge-based platform to distribute and exchange trained AI models. Other, more concrete, examples allow us to solve needs in the agricultural sector: reducing the environmental impact of pesticides in pest management [13] or controlling the artificial lighting system to optimise the energy efficiency of greenhouses [14].
Currently, several studies use satellite images for crop identification, which serve as a helpful tool for managing the different CAP campaigns. The work carried out by Sarvia, F. et al. [15] was based on the classification of the Normalised Difference Vegetation Index (NDVI) time series of S2 using supervised classification algorithms (minimum distance and random forest) in the province of Vercelli (Northwest Italy). Siesto, G. et al. [16] presented a novel crop identification approach by applying neural networks with multiple satellite images using synthetic inputs at a pixel level. Moreover, crops that were abandoned and did not qualify for the CAP basic payment were detected. In the work of Portalés-Julià, E. et al. [17], such plots in the province of Valencia (Spain) were determined by evaluating time series of S2-derived spectral indices, together with machine learning and deep learning algorithms, to discriminate abandoned plots.
Different works have addressed the detection of rice cultivation using satellite imagery. Hedayati et al. combined rice phenology and an object-based classification method using Landsat-8 image series, MODIS sensor temperature data, and a digital terrain elevation model. Ramadhani et al. [18] used different Sentinel (1 and 2) and MODIS data sources combined with the support vector machine classification method to generate a multitemporal rice map. Moreover, rice was monitored using synthetic aperture radar-based solutions that are immune to rainfall and clouds, as proposed by Chang et al. [19]. The use of time series of S2 images combined with the help of different vegetation indices allowed the quantification of the rice yield in each of its phenological stages, as proposed by Nazir et al. [20]. Sitokonstantinou et al. [21] implemented a complete line for rice mapping at massive scales and high spatial resolution, using a distributed RF classifier that was trained with pseudo-labels on a time series of Sentinel image data.
A major constraint in rice crop monitoring for the management of CAP subsidies is the mandatory use of Sentinel imagery, which makes using other satellites impossible. The combination of RF classification for a time series and a specific date increases the accuracy from 95% to 97%, which may not be considered a significant improvement. However, it is associated with generating a more efficient aid distribution from European funds, representing considerable savings. The use of deep learning was not considered, as 97% accuracy was deemed sufficient to determine fraudulent applications.

Study Area
After choosing rice as the crop to be identified through the defined methodology, one of Spain's most important producing areas was selected. The municipality of Calasparra (Coordinates: 38.13, −1.42) is located in the northwest of the Region of Murcia (Spain) and is known for its rice (especially the Bomba variety), which has the Denomination of Origin Calasparra Rice [22], the rice reserve being delimited by three municipalities: Calasparra, Moratalla (Murcia Region), and Hellín (Autonomous Community of Castilla-La Mancha).
This work focused on the 2020 crop year, which had 64 registered dossiers, representing a total of 324 declaration lines associated with rice cultivation, with around 340 hectares declared. Figure 1 shows the different plots that form part of the declared lines under study. Contrary to what may occur in other areas, due to the orography of the terrain, in the study area, we find plots of small dimensions whose shape is heterogeneous. In Table 1, the relationship between the number of plots and the range of surface area is specified. Finally, it should be noted that vegetables and leguminous plants are usually grown in the study area, a circumstance which, together with the shape and layout of the plots and the spatial resolution of the S2 images, complicates the identification of rice cultivation.

Materials
The materials used come from three different sources: information on rice cultivation and farming practices, satellite image data, and vector data.
For the development of the implemented system, tools based on free software were used. Python libraries, such as rasterio, shapely, or rasterstats were used for GIS data processing, and the Python scikit-learn library was used for ML functionality. A PostgreSQL database with its PostGIS spatial extension was used to store the data. To access the S2 image repository, the Python requests library and the Copernicus Open Access API Hub [23] were used.
Finally, the implemented tool was deployed on a proprietary server with Windows 2016 Server as the operating system, 64 GB of RAM, 4 TB of hard disk, and an RTX2080Ti graphics card.

Phenological Information
The methodology developed requires information on the different phenological stages of the development of the crop to be identified. With the data obtained from the phases and their cultivation dates, we can locate the phenological stage of the crop [24].
In the specific case of rice cultivation, the previous work showed that two main events allowed us to identify this crop. The first is the flooding of the agricultural plot, which takes place from May to June, and the second is the vigour that the crop reaches during its development between June and August. Figure 2 shows these two events.

Sentinel Data
The Copernicus Programme includes the Sentinel missions, which are made up of two satellites to provide an ideal coverage and observation frequency to offer a robust data set. There are five Sentinel missions [25], although only Sentinel-1 (S1) and S2 focus on ground monitoring. S1 provides radar imagery, while S2 provides optical imagery [26].
Due to the low cloud cover and the temporal resolution (5 days) offered by S2, it was considered that these data are sufficient for this work, so the use of S1 was discarded. The study area is entirely covered by orbit number 051, and the S2 tessera T30SXH is in the EPSG:32630 reference system.
The previously published work concluded that monitoring agricultural plots outside the crop development period did not provide additional accuracy to the model. Therefore, concerning the satellite images, this work focused on the period where crop development was most noticeable, based on the phenological information. For this reason, there was a time series of 13 elements from 20 May 2020 to 28 August 2020, both dates inclusive.
All the elements of this time series had a 0% cloud cover. S2 carries a multispectral instrument (MSI), which is defined as a high-resolution wide-band multispectral imaging system operating in 13 spectral bands at multiple spatial resolutions. The product used was the so-called S2MSI2A, whose main features are level 2A processing, orthorectification, and UTM geocoding, BOA, and multispectral reflectance [27].
The S2 bands used in this work were B2, B3, B4, B8, and B11, because of their spatial resolution and because they are necessary for the generation of level 3 products used as complementary data. Table 2 shows detailed technical information for these bands:

Vectorial Data
The aid declarations of the applicants are mechanised in the aid management system (SGA) application, offered by the Spanish Agricultural Guarantee Fund (FEGA) to the different autonomous communities. The aid declarations entered in the system contain both alphanumeric and spatial information. The spatial information is based on the Geographical Information System of Agricultural Parcels (SIGPAC), although the declarant also can digitise their agricultural parcel. As for the alphanumeric information supplied by the SGA, we can use the identification of the aid line, the applicant, identification of the enclosure, or the type of aid requested.

Overview
The generic methodology followed to monitor crops, especially rice, using S2 imagery and machine learning and traditional remote sensing techniques to replace the field controls of CAP aid management, follows the scheme presented in Figure 3.
When defining the methodology, the findings of previous studies played an important role, especially the fact that the monitoring of the crop on dates of little relevance did not have a significant final contribution. It should be noted that in the different phenological phases of the crop, only those characteristics were used that would help us to identify the rice crop univocally, thus avoiding the creation of large matrices with data that would probably contribute little value to the final result and introduce uncertainty into the model [28].
Since monitoring must be treated as a continuous process, the methodology used was carried out on 13 occasions, as many times as the number of S2 images available for the area, with zero cloud cover, between the dates when the rice crop was most easily recognizable, either because of agricultural practices or because of its vegetative development, more specifically between May and September.
In short, this work aimed to develop a methodology that would make it possible to identify a crop, specifically rice, continuously, during a complete agricultural campaign, using classical remote sensing and RF classification with satellite data and, in addition, to discern whether the principle of cardinality was met, which implies that 100% of the area declared in the aid application was cultivated.

Sentinel-2 Repository
A process was implemented in which, using the Copernicus Hub API, connected to the Hub, and, for a specific date, using the scene classification map (SCL) of the T30SXH scene, the cloud cover over the study area was checked. If the cloud cover in the area was zero, a raster of the study area was generated using the bands indicated in Table 2. After applying a clipping operation with the study area, the resulting raster had a size of 1538 rows and 1003 columns with a total of 1,542,614 pixels.

Biophysical Indices
Using as input the raster generated in the previous point, the plot with the geometries of the aid declarations, and the phenological characteristics of the crop, we proceeded with the creation of the most representative biophysical index, according to the stage of development of the crop in which we found ourselves.
The NDVI index is widely used to estimate the quantity, quality, and development of vegetation based on measuring the intensity of radiation from certain bands of the electromagnetic spectrum that vegetation emits or reflects. It normalises the scattering of green leaves at near-infrared wavelengths (S2 band 8) with chlorophyll absorption at red wavelengths (S2 band 4). It ranges in value from −1 to 1. Negative values close to −1 correspond to water, and values close to zero usually correspond to arid areas. Values between 0.2 and 0.4 typically represent shrubs and grasslands, while high values represent living green vegetation.
The NDVI [29] index was used for the initial non-crop stage and the growth phase and is defined as The Normalised Difference Water Index (NDWI) index is used for the detection of water bodies. These water bodies absorb light in the visible to infrared electromagnetic spectrum (S2 band 8); the NDWI uses the green band (S2 band 3) and near-infrared to highlight water bodies. Near-infrared minimises the low reflectance for water features and maximises the high reflectance for terrestrial vegetation and soils, while green maximises the reflectance of water features.
The NDWI Index [30] for the flooding phase of the agricultural plots is defined as Values higher than −0.2 correspond to water bodies flooding the crop plots in the study area. Figure 4 shows how the NDWI index highlights flooded plots and how the NDVI index highlights plots with optimal vegetative development. With the resulting raster and the geometries of the parcel with the aid applications, a zonal statistic was carried out, obtaining a series of basic statistics, such as mean, standard deviation, or number of pixels, which were finally stored in the database generated for this work for subsequent processing.

Learning Matrix for Classifications by Machine Learning
The learning matrix used contains elements of two classes; code 80 to identify the rice crop and code 1 for any other crop or type of surface.
The samples were generated using information obtained from field visits and the photo interpretation of a Pleiades-1 image with a spatial resolution of 50 centimetres. An attempt was made to avoid the inclusion of agricultural plots that had applied for aid in the 2020 campaign. As a result, a sample of 35,275 pixels was obtained, which was distributed as shown in Table 3. Table 3. Distribution between classes of the learning matrix.

Class Pixels Percentage
No Rice (1)  For the non-rice category, agricultural plots were selected with vegetable and leguminous crops, wild vegetation on the Segura river mota, forest areas, and to a lesser extent, tilled and bare soil.
For the independent variables, based on the study results in the 2017-2018 campaigns, it was decided to generate different learning matrices, depending on the event to be found.
As far as the raw S2 data were concerned, the bands indicated in Table 2 were used for the flooding phase, and the same bands except for B11 were used for the vegetative development phase of the crop.
As far as level 3 S2 products were concerned, the NDWI indices by the McFeeters method, NDWI according to the Gao method [31], and the Modified Normalised Difference Water Index (MNDWI) [32] were used for the flooding stage.
For the maximum vigour event, in addition to the S2 data, the NDVI index, the Green Normalised Difference Vegetation Index (GNDVI) [33], and the Enhanced Vegetation Index (EVI) [34] were used. These same features were also used for the cardinality check.
EV I = 2.5 * (B8 − B4) (B8 + 6.0 * B4 − 7.5 * B2) + 1.0 (6) Table 4 shows a summary of the characteristics of the learning matrix per phase: On this occasion, the learning matrix was not divided between training and test, but K-Fold cross-validation was used for model fitting and validation [35], with k = 5.
Finally, for the scaling of the learning matrix, a series of pre-tests were conducted, comparing the results obtained with the scaled and unscaled matrix, and no representative improvements were observed.

Machine Learning Classification
In the defined methodology, two classifications were generated with machine learning in raster format, both at a pixel level. The only difference between the two was the learning matrix.
The first one, called classification with machine learning at the moment (CLAML-M), only included in the learning matrix the features indicated in the previous section (Section 4.2.4), depending on the phenological phase of the crop, for the specific date on which the algorithm was executed.
The second one was called classification with machine learning with cumulative time series (CLAML-TS), and the learning matrix was composed of the same features as in the previous case but accumulating the features of the previously processed dates within the same phenological phase [36]. Thus, toward the end of the flooding phase, the learning matrix had the base characteristics corresponding to the dates: 20 May 2020, 30 May 2020, 14 June 2020, and 29 June 2020, which resulted in a raster with about 30 bands. These images generated a raster with about 30 bands. By the end of the phase of maximum crop development, the learning matrix was composed of the features corresponding to the dates, 4 July 2020, 9 July 2020, 19 July 2020, 23 July 2020, 29 July 2020, 8 August 2020, 18 August 2020 and 28 August 2020, obtaining approximately 60 bands.
Once the learning matrix was available, the RF classifier [37] was used as a model, optimising specific hyperparameters [38] through K-Fold cross-validation, with k = 5. The parameters evaluated in this optimisation process were the number of estimators, the function that measures the quality of the splits, the maximum number of features, the maximum depth, the minimum elements of a split, the minimum elements of a leaf, the maximum number of nodes per leaf, and the weights associated with the classes.
With our optimised and adjusted model, a validation of the model was carried out with the whole learning matrix through K-Fold cross-validation, with k = 5, and ending with the storage in a database of the mean values and the standard deviation of different metrics, such as accuracy, precision, F1 score, and recall.
With the RF classifier optimised and adjusted, the prediction was carried out on the total surface of the study area, and as a result, two different classifications were obtained, both at a pixel level. These classifications were translated into a raster composed of three bands, the first one relating to the prediction class, the second one with the probability that it was not rice, and the third one with the probability that it was rice.
In order to be able to issue a traffic light colour for each of the agricultural plots, as indicated in the CAP monitoring technical guidelines, it was necessary to move from a classification by pixel to a classification by the object at the enclosure level. For this purpose, the algorithm obtained different zonal statistics between the raster resulting from the RF classification and the parcel with the aid declarations. At this point, generic statistics were stored, such as the mean, standard deviation, number of pixels, or majority, and others were created on-demand, such as the number of rice class and non-rice class pixels, the average probability of the prediction for the rice class pixels and the non-rice class pixels.
On the zonal statistics of the previous paragraph, an adjustment was made, consisting of analysing the results obtained by excluding the pixels on the perimeter of the enclosure. Once these pixels were excluded, statistical information was generated regarding the number of pixels, the number of pixels in the rice class and non-rice class, and the average prediction probability for pixels classified as rice and for those classified as non-rice.
In short, this part of the methodology generated more than 80 parameters, which, to a greater or lesser extent, had a direct impact on the light traffic colour established on the enclosures.

Traditional Decision Tree
The final step in the defined methodology consisted of introducing all the information generated by analysing the biophysical indices and the RF classifications into a traditional decision tree. Depending on the crop phase in which we found ourselves, one data set prevailed over another.
Thus, in the flooding phase, the set of results that carried the most weight was CLAML-TS, followed by CLAML-M, and traditional remote sensing, with the NDWI flooding index, only came into play if the RF classifications raised doubts because the number of pixels labelled as rice was very close to the defined threshold (60%). On the other hand, for the crop development phase, the NDVI vegetation index had to be above the predefined threshold (0.54) obtained from the crop survey in the area in the 2017 and 2018 campaigns. In addition, the number of pixels marked as rice had to be above the parametrised threshold (60%) in the RF classifications.
The decision tree also continued the status of the traffic lights until the next phase. So, if it was established that a specific enclosure had been flooded, the green colour of the traffic light was automatically set for all subsequent dates until the crop growth phase was reached. The same was true for the vegetative growth phase of the crop. On the other hand, if an enclosure was established as non-flooded, it automatically turned red at the end of the flooding phase. It was considered that flooding of the agricultural plot was a prerequisite for it to be considered to be rice cultivated.
The so-called cardinality check was introduced, referring to whether 100% of the declared area was cultivated on an enclosure. This condition was analysed in the traditional decision tree toward the end of crop development, i.e., end of August or beginning of September. For this point to be considered, the enclosure had to arrive with the green traffic light, and only the results obtained by the RF classifications were taken into account. This point defined a very restrictive threshold on the number of pixels detected as rice without considering those included in the perimeter of the plot. To assume that a plot complied with the cardinality principle, 80% of the pixels, excluding the periphery, had to be labelled as the rice class.

Results
This section discusses in detail the metrics obtained by the RF models, an analysis of how the selected features influenced the RF-generated models, the results of the quality assessment that was performed based on the technical guidelines issued by the JRC, and a comparison of the results obtained by the different methods individually, without resorting to the traditional decision tree.

Machine Learning Metrics
The tool implemented, in this work, recorded a varied number of metrics each time the process was executed, both for CLAML-M and CLAML-TS. Tables 5 and 6 show the values, for each of the runs, of the metrics recorded by the two types of ML classification implemented (Section 4.2.5).  From the above tables, we obtain the mean values and the standard deviation reflected in the corresponding Table 7. Graphically, Figure 5 shows the evolution of some metrics of the two RF classifiers implemented with successive runs. In summary, it can be observed that the CLAML-TS metric scores were systematically better than CLAML-M metrics as dates were added to the time series.

Features Importance
Thanks to the metrics generated by the algorithm implemented in this work, it was possible to analyse the use made by the models of the features provided in the learning matrix. Figure 6 shows that, to some extent, no single characteristic becomes more prominent than the others. Another way to analyse this information is to generate the cumulative importance that shows the contribution to the overall importance of each additional variable, which can be seen in Figure 7. The dashed line is set at 95% of the total accounted importance, which, although an arbitrary threshold, can be the meeting point between the model performance and optimisation.
It can be seen from the figure above that the features that made a residual contribution to the model performance were unrepresentative. Even in the CLAML-TS model for the flooding phase, all features in the learning matrix were of considerable importance.

Quality Evaluation
Following the end of the 2020 agricultural season, a quality assessment of the monitoring was carried out, following the guidelines issued by the Joint Research Centre (JRC) [39]. This quality assessment was based on the international standard ISO 2859/2 [40] and had associated acceptance numbers that indicated whether or not the tests were passed depending on the errors found. In addition, a series of batches were created to define the events or milestones to be evaluated.
In this quality assessment, the implemented algorithm's performance was tested to identify the rice crop and the so-called cardinality. Each batch was assigned a sample of 200 elements, which did not have to coincide with each other. Supporting material included the S2 images used for automatic monitoring and a high-resolution image of the study area from the Pleiades-1 satellite, with a spatial resolution of 50cm, taken at the beginning of September.
The results found zero errors for the crop identification batch and two for the cardinality batch. With the errors obtained, the tests were considered to have passed.
For example, in Figure 8, what looks like an area of trees and even a small building was included in the aid declaration in the lower right part of the figure. The model marks this area as non-rice (red), correctly. The threshold corresponding to the cardinality, which corresponds to the minimum percentage of pixels labelled with the rice class (green), should be increased to avoid error.

Biophysical Indices vs. CLAML-M vs. CLAML-TS
Thanks to the information gathered during the quality assessment process, we were able to compare the results that would have been obtained, in the hypothetical case of not implementing the traditional decision tree, to the joint treatment of the results obtained by the intermediate products, i.e., biophysical indices and the classifications by RF CLAML-M and CLAML-TS.
For the rice crop identification batch, we found the error ratio reflected in Table 8. For the batch associated with cardinality, Table 9 lists the errors that would have been obtained if the intermediate products were treated individually. In this batch, the results related to the biophysical indices were not treated, as it is not possible to check cardinality with this type of product. For the crop identification batch, combining the data generated by the three intermediates gave optimal results. This situation is more evident in the case of the cardinality batch.
Analysing the detailed data from which Table 8 and Section 5.3 are derived, it was observed that, in the case of the crop identification batch, the combination of the three products generated within the decision tree succeeded in eliminating the errors of the CLAML-TS product. In contrast, the errors of the CLAML-TS product for the cardinality batch (Table 9) were not corrected by introducing the CLAML-M product into the final decision tree.

Conclusions
Crop monitoring controls based on satellite imagery aim to perform a periodic and continuous check that verifies the activity declared by the farmer, observed through time series of images. In this work, we proposed a decision support system based on ML, S2 imagery, and derived products that allow the simple integration of heterogeneous data sources to assist and improve the decision-making process on granting CAP subsidies. The foreseeable savings in field trips and in situ controls will improve the efficiency in the management and granting of aid promoted by the CAP.
The methodology defined is based on combining the crop phenology being identified, in this case, rice, with S2 images and the land parcel with the aid applications. As the process must be carried out continuously throughout the campaign, and depending on the agricultural practices applied for each phenological phase of the crop, the characteristics used were rotated in such a way as always to use those that helped to determine the crop in question uniquely. In parallel, data were generated from three different sources: biophysical indices, RF classification on a pixel-by-pixel basis for a specific point in time, and RF classification using a cumulative time series at a pixel level. Since any improvement, however small, directly impacted better management and distribution of CAP resources, it was decided to implement a traditional decision tree to which the data obtained from the three previous sources were introduced. Finally, a chromatic indicator was issued to indicate whether the requested aid obligations were being met (green) or not (red) or whether the automatic system could not issue a verdict (yellow).
The algorithm associated with the methodology was run 13 times, one for each available zero cloud cover image of the study area, resulting in RF classifications with an accuracy of 92% (+/− 0.02%) at the moment (CLAML-M) and 97% (+/− 0.02%) in the time series (CLAML-TS), while traditional biophysical indices reported a 16% error. At this point, it was clear that RF classification using cumulative time series gave the best results. With the introduction of the final decision tree, which evaluated the results of each of these three products with different weights, the quality control was 100 percent effective in terms of pure rice crop identification and 99 percent effective in terms of cardinality.
An improvement of this line of research would be to introduce the concept of feature engineering by incorporating into the learning matrix information related to the location (using the coordinates of each pixel or location within a matrix using row and column), and detailed date information, such as the number of the day, month, year, number of the day of the week, or number of the week within the year. Moreover, it is considered necessary to carry out a study on how dimensionality reduction or feature selection affects the final performance of the model. In addition, this methodology can be extrapolated to identify other types of crops in a larger study area. Finally, we intend to use deep learning to identify the rice crop and to compare the results obtained with the results of this work. Another would be to define the neural network that would use the learning matrix of this work to make predictions. One other possibility would be related to artificial vision applied to satellite images employing convolutional neural networks [41] thanks to semantic segmentation architectures [42,43]. Funding: This research was funded by the European Regional Development Fund (ERDF), through project FEDER 14-20-25 "Impulso a la economía circular en la agricultura y la gestión del agua mediante el uso avanzado de nuevas tecnologías-iagua".