Use of Machine Learning in Evaluation of Drought Perception in Irrigated Agriculture: The Case of an Irrigated Perimeter in Brazil

: This study aimed to understand the perception of drought among farmers, in order to support decision-making in the water allocation process. This study was carried out in the Tabuleiro de Russas irrigated perimeter, in northeast Brazil, over the drought period of 2012–2018. Two analyses were conducted: (i) drought characterization, using the Standardized Precipitation Index (SPI) based on drought duration and frequency criteria; and (ii) analysis of farmers’ perceptions of drought via selection of explanatory variables using the Random Forest (RF) and the Decision Tree (DT) methods. The 2012–2018 drought period was deﬁned as a meteorological phenomenon by local farmers; however, an SPI evaluation indicated that the drought was of a hydrological nature. According to the RF analysis, four of the nine study variables were more statistically important than the others in inﬂuencing farmers’ perception of drought: number of cultivated land plots, farmer’s age, years of experience in the agriculture sector, and education level. These results were conﬁrmed using DT analysis. Understanding the relationship between these variables and farmers’ perception of drought could aid in the development of an adaptation strategy to water deﬁcit scenarios. Farmers’ perception can be beneﬁcial in reducing conﬂicts, adopting proactive management practices, and developing a holistic and e ﬃ cient early warning drought system. an SPI value of −2 the intensity of drought Médio Jaguaribe constant after 2015, while the drought intensity for Baixo Jaguaribe decreased after 2017. To compare these results with the field survey, Questions 12–14 were included in the questionnaire (Appendix A) to represent the long-term memory of the irrigators. In comparison, field research identified 2015–2018 as the most severe and damaging years for farmers, i.e., 90% of irrigators the years 2015–2018 drought years these, 10%, 17%, 17% 23% respectively, 2016, 2017, and most drought year irrigated


Introduction
Droughts, which are becoming increasingly prolonged and severe globally, have prompted research on water security and prioritized water management issues on the international scale. They are recognized as a slow, creeping natural hazard that occur in virtually all climatic zones [1,2]. Droughts result in a reduction in precipitation over an extended period of time and can eventually translate into a runoff deficit, which is a major indicator of the propagation of a meteorological drought into a hydrological drought [3].
The diversity of drought definitions and lack of consensus in the scientific community on the establishment of universal drought indices makes it important to understand users' perception, as that will influence their acceptance of mitigation actions [4][5][6].
Irrigated agriculture is one of the most affected activities by water deficits, given its high vulnerability to droughts. Thus, efficient irrigation measures that focus on arid and semi-arid regions can increase resilience to droughts [7]. Such measures require efficient resource management and This study was carried out with the farmers from the Tabuleiro de Russas irrigated perimeter. This perimeter is located in the municipalities of Russas, Limoeiro do Norte, and Morada Nova in the state of Ceará, Brazil (5°37'20" S, 38°07'08" W). The study site is located at an altitude of 82 m. This area was chosen for the study because it is one of the main irrigated perimeters in this state.
The perimeter extends over a total area of 18,276 ha. According to the National Department of Actions Against Droughts (DNOCS), the farmers of this region grow a variety of crops, including various fruits, such as bananas, guavas, and coconuts, as well as vegetables, grains, green corn, forage grasses, cane sugar, and oilseeds, among others [32]. Local irrigation technologies include microsprinkler systems, drip irrigation, and central pivots.
Located in the river basin of Baixo Jaguaribe (Figure 1), the climate of the region is hot and semiarid and characterized as dry and very hot as per the Köppen classification. The mean annual precipitation of 720 mm is irregularly distributed throughout the year, mostly between the months of December and June or July, with the highest precipitation rates between February and April. The mean relative air humidity and the mean monthly temperature are 60% and 27 °C, respectively, while the evaporative demand is 2900 mm year −1 [33]. Peter et al. reported that government departments at the federal and state levels adopted the policy of building reservoirs with storage capacities ranging from 10 3 to 10 9 m³, in order to adapt to water shortages and recurrent cyclical droughts [34]. The Banabuiú and Casthões reservoirs (located in the river basin of the Médio Jaguaribe) are the main supply sources for the irrigation perimeter, with total capacities equal to 1.6 and 6.7 km³, respectively, although the supply was later sourced from the Castanho reservoir only [35].

Data and Methods
This study analyzed the 2012-2018 drought period that affected Ceará, focusing on the Tabuleiro de Russas irrigated perimeter. The methodology used to develop the study consisted of the following steps: bibliographic research, questionnaire presentation, application of the questionnaire using snowball sampling, and data tabulation and analysis. The last step (data tabulation and analysis) involved two sub-steps: (i) characterization of drought using the SPI and farmers' perceptions; and Peter et al. reported that government departments at the federal and state levels adopted the policy of building reservoirs with storage capacities ranging from 10 3 to 10 9 m 3 , in order to adapt to water shortages and recurrent cyclical droughts [34]. The Banabuiú and Casthões reservoirs (located in the river basin of the Médio Jaguaribe) are the main supply sources for the irrigation perimeter, with total capacities equal to 1.6 and 6.7 km 3 , respectively, although the supply was later sourced from the Castanho reservoir only [35].

Data and Methods
This study analyzed the 2012-2018 drought period that affected Ceará, focusing on the Tabuleiro de Russas irrigated perimeter. The methodology used to develop the study consisted of the following steps: bibliographic research, questionnaire presentation, application of the questionnaire using snowball sampling, and data tabulation and analysis. The last step (data tabulation and analysis) involved two sub-steps: (i) characterization of drought using the SPI and farmers' perceptions; and (ii) characterization of farmers' perceptions of the drought by selecting explanatory variables using the RF and DT methods ( Figure 2). These steps are described in the following sections. (ii) characterization of farmers' perceptions of the drought by selecting explanatory variables using the RF and DT methods ( Figure 2). These steps are described in the following sections.  Methodological flowchart describing the steps taken in this study. Bibliographic research was followed by questionnaire preparation, application of questionnaires via the snowball method, and data tabulation and analysis. Two types of data tabulation and analysis were used: characterization of drought using SPI and the selection of explanatory variables for farmers' perception using RF and DT techniques.

Semi-Structured Interviews
The data used in this research were obtained by creating questionnaires and applying them to the studied perimeter through face-to-face interviews with the farmers.
The snowball sampling technique was used to determine the sample size. This selection was made given the descriptive dataset and because this methodology allows exploration of the subjectivity and personal notions of the respondent in light of his/her experiences [36]. Snowball sampling follows a referral system, whereby each respondent recommends the next person to be interviewed. The respondent made his indication for three references: small, medium, and large irrigant. Given the fact that the study site is characterized by large tracts of inactive, cultivated land that are difficult to access, this technique assisted the authors of this study in covering a considerable number of farmers and active (irrigated) fields.
The questionnaire was based on the works of Udmale et al. [37] and Cunha et al. [38], whose studies covered social and political relations relevant to drought, the implications of drought, and the adaptation strategies associated with the agriculture sector. The questionnaire included openand closed-ended (dichotomous, multiple choice, and ratable) questions. The answers were coded from 0 to 9 according to the question profile, and tabulated using a semi-structured template. The data were processed and analyzed statistically using descriptive and inference techniques.

Meteorological Data
The daily precipitation dataset (1911-2018) used to analyze and characterize drought in the irrigated perimeter was obtained from the Hidroweb Portal, which was made available by the Methodological flowchart describing the steps taken in this study. Bibliographic research was followed by questionnaire preparation, application of questionnaires via the snowball method, and data tabulation and analysis. Two types of data tabulation and analysis were used: characterization of drought using SPI and the selection of explanatory variables for farmers' perception using RF and DT techniques.

Semi-Structured Interviews
The data used in this research were obtained by creating questionnaires and applying them to the studied perimeter through face-to-face interviews with the farmers.
The snowball sampling technique was used to determine the sample size. This selection was made given the descriptive dataset and because this methodology allows exploration of the subjectivity and personal notions of the respondent in light of his/her experiences [36]. Snowball sampling follows a referral system, whereby each respondent recommends the next person to be interviewed. The respondent made his indication for three references: small, medium, and large irrigant. Given the fact that the study site is characterized by large tracts of inactive, cultivated land that are difficult to access, this technique assisted the authors of this study in covering a considerable number of farmers and active (irrigated) fields.
The questionnaire was based on the works of Udmale et al. [37] and Cunha et al. [38], whose studies covered social and political relations relevant to drought, the implications of drought, and the adaptation strategies associated with the agriculture sector. The questionnaire included open-and closed-ended (dichotomous, multiple choice, and ratable) questions. The answers were coded from 0 to 9 according to the question profile, and tabulated using a semi-structured template. The data were processed and analyzed statistically using descriptive and inference techniques.

Meteorological Data
The daily precipitation dataset (1911-2018) used to analyze and characterize drought in the irrigated perimeter was obtained from the Hidroweb Portal, which was made available by the National Water Agency [39]. The mean precipitation area for each of the two river basins of Ceará (Baixo Jaguaribe and Médio Jaguaribe) was obtained using interpolated daily precipitation data for each pluviometer. The data were raised to the second power and sourced from grid points sized 0.05 • , in line with the inverse distance weighting method. Selecting mean precipitation areas allowed the inclusion of the perimeter and the river basin. The mean of the interpolated precipitation values was also extracted for each analyzed area. In addition to homogenizing drought analysis, the scale of the river basin was used to reduce random fluctuations when applying the point-based approach [22].

Drought Characterization
The drought analysis and characterization were based on calculations of the SPI with 12 calendar months added [40]. This SPI was used to compare the precipitation over the 12 consecutive months specific to our study to the 12 consecutive months of all years preceding this period in the historical data series. In accordance with Silva et al. [41], the time scale used for calculating the SPI is directly related to the time needed for the effects of the drought to be experienced in the different economic sectors and the various water resources of the region. As such, this index facilitates the determination of the intensity, magnitude, duration, and onset probability of a specific drought, given a historical data series.
To convert the continuous information provided by the SPI moving window, a discretization process was carried out [22]. Thereafter, a new time series was generated using the values for December from each year (SPI12 DEC ). This series included accumulated annual information, with negative values indicating drought and positive values indicating wet periods. This process aimed to archive independent random variables that represented total annual precipitation, smooth the time series, and avoid false SPI information influenced by values above or below the mean precipitation over the course of the dry season months (July to December).
The SPI classification with respect to drought severity is displayed in Table 1, according McKee et al. [40]. SPI classification values were obtained for each month. For this classification, the onset of a drought was identified by means of a retroactive procedure. As such, an event was classified as a drought by identifying a series of continuously negative values (i.e., when the SPI takes the value of −1 or less). The duration corresponds to the number of years that elapsed between the beginning and the end of the event.

Selection of Explanatory Variables and Classification of Drought Perception
Understanding the perception of a given population is a difficult task, given that this depends on several factors (e.g., social, cultural, political, and economic). Evaluating the variables which influence perception allows for a better understanding of the factors that guide the formation of the perception and how these factors interact.
These variables are called predictive variables and can interact in complex and nonlinear ways. Thus, it may be challenge to common statistical techniques interpret them. Given their technical character, superior performance, ease of visual interpretation, and implementation availability with R software, machine learning techniques are considered as an alternative approach to conventional statistical models [20].
This study used a RF method to classify the explanatory variables and a DT methodology to understand the synergy between these variables. The choice of the predictive variables used as input data for the model was based on the drought literature [6,18,37,38].
The farmers' perceptions to drought were characterized using nine explanatory variables: gender, age, experience in the agriculture sector, education, number of actively cultivated land plots, years of droughts experienced, reason listed as opinion of the main cause of droughts, number of information Water 2020, 12, 1546 6 of 20 sources regarding climate, and participation in discussion groups regarding droughts. Table 2 shows the selected variables. For the perception variable, the respondents were asked about the increased duration and frequency of droughts, with the option to remain neutral or respond positively or negatively. These responses were considered as indicators of perception. Perception was classified into four categories: indifferent, high, low, and no perception. In the questionnaires, these categories corresponded to the following codes: indifferent (0), high (1), low (2), and no perception (3).

Random Forest
RF is a prediction method, based on Breiman's classic algorithm, that chooses variables as part of its learning process. The RF method is a popular machine learning algorithm based on the combination of several classification tree and/or regression (CART) models trained with bootstrap aggregation (bagging) [42]. The bagging technique is used to train the data by randomly resampling the original dataset with replacements. Therefore, while some data can be used more than once during training, other data may not be used at all. The final decision for the selection of variables comes from the votes of several trees that were built from this sampling with repetition (bootstrap). This method was applied to evaluate the importance of each explanatory variable of perception.
At the outset, the selection of the training set for each tree involved the reduction of approximately one-third of the observations for performance evaluation, resulting in the Out-of-Bag (OOB) sample. This sample was used to obtain an unbiased estimate of the prediction error as well as an estimate of variable importance [43].
The RF method evaluates the importance of each variable based on the increase in the mean squared error (MSE), calculated on the OOB sample subset (Equation (1)) [44,45]. From permuting test data, the percent IncMSE is computed, which accounts for the mean decrease in accuracy or how the prediction gets worse when that variable changes its value. The higher is the IncMSE, the more important is the variable.
Here, y i corresponds to the mean value of the variable andŷ i oob is the mean of the OOB predictions for the ith observation. IncMSE was considered as a measure of variable importance. One hundred test sets were used, and the tests were randomly selected. The median of IncMSE for the 100 test sets was used as a classification measure for the variables.
The definition of the sample size (an) was based on the size of the dataset (an = 29) for this classification model. The number of trees (ntree) underwent a change in the pattern (ntree = 500), yielding a value of 100. The remaining parameters followed the pattern defined by Breiman et al. [42].

Decision Tree
The decision analysis was based on the DT model, a tool that provides decision-making support, and describes the logical structure, uncertainties, and positive results of decisions [46]. Fundamentally, a DT consists of a hierarchy of internal and external nodes connected by branches.
In a DT, the route corresponds to the presentation of a dataset in an initial node (or root node) of the tree. Thereafter, depending on the result of the logical test used by the node, the tree branches to one of the leaf nodes, repeating the procedure until an end node is reached. This repetition characterizes the recursion of the DT.
Instead of classifying end notes, in the CART methodology, the tree grows until a limit is reached (for instance, when the minimum number of data remains in the sample).
With regards to the classification tree, the value (class) obtained by the end node in the training data is considered to be the mode of the observations. As such, the value of the corresponding mode is attributed to a new observation.
Finally, the CART method requires locating the ideal division to minimize the impurities in the DT. To measure the impurities, the algorithm uses the Gini Index [42] as a measure for the best division selection. This index is defined as where f (t X(xi) , j) corresponds to the proportion of samples in which xi belongs to leaf j in node t.
The lowest index of purity determines which attribute is chosen and consequently, the division of the DT. The DT had two main parameters, one for each variable in the model: Minsplit, namely the minimum number of observations that must exist in a node for a division to be made, and Cost, which is a vector of non-negative costs. In this study, the split that costs the least was chosen. For the model, Minsplit was defined as 2, while Cost took the default value (Cost = 1).

Drought Characterization
Droughts can extend over many years or recur over short periods of time. A comparison between drought duration and frequency was adopted to parameterize farmers' perceptions. To understand the impact of these criteria on farmers' perceptions, it was worthwhile to visualize two areas of concern: the local reality (Baixo Jaguaribe) and the supply reservoir (Médio Jaguaribe).
The Figure 3a,b present the duration of drought events and their frequency during the period 1911-2018 for Baixo Jaguaribe and Médio Jaguaribe, respectively. The definition of the sample size (an) was based on the size of the dataset (an = 29) for this classification model. The number of trees (ntree) underwent a change in the pattern (ntree = 500), yielding a value of 100. The remaining parameters followed the pattern defined by Breiman et al. [42].

Decision Tree
The decision analysis was based on the DT model, a tool that provides decision-making support, and describes the logical structure, uncertainties, and positive results of decisions [46]. Fundamentally, a DT consists of a hierarchy of internal and external nodes connected by branches.
In a DT, the route corresponds to the presentation of a dataset in an initial node (or root node) of the tree. Thereafter, depending on the result of the logical test used by the node, the tree branches to one of the leaf nodes, repeating the procedure until an end node is reached. This repetition characterizes the recursion of the DT.
Instead of classifying end notes, in the CART methodology, the tree grows until a limit is reached (for instance, when the minimum number of data remains in the sample).
With regards to the classification tree, the value (class) obtained by the end node in the training data is considered to be the mode of the observations. As such, the value of the corresponding mode is attributed to a new observation.
Finally, the CART method requires locating the ideal division to minimize the impurities in the DT. To measure the impurities, the algorithm uses the Gini Index [42] as a measure for the best division selection. This index is defined as where f(tX(xi), j) corresponds to the proportion of samples in which xi belongs to leaf j in node t. The lowest index of purity determines which attribute is chosen and consequently, the division of the DT. The DT had two main parameters, one for each variable in the model: Minsplit, namely the minimum number of observations that must exist in a node for a division to be made, and Cost, which is a vector of non-negative costs. In this study, the split that costs the least was chosen. For the model, Minsplit was defined as 2, while Cost took the default value (Cost = 1).

Drought Characterization
Droughts can extend over many years or recur over short periods of time. A comparison between drought duration and frequency was adopted to parameterize farmers' perceptions. To understand the impact of these criteria on farmers' perceptions, it was worthwhile to visualize two areas of concern: the local reality (Baixo Jaguaribe) and the supply reservoir (Médio Jaguaribe).
The Figure 3a,b present the duration of drought events and their frequency during the period 1911-2018 for Baixo Jaguaribe and Médio Jaguaribe, respectively. The drought duration and frequency have not monotonically increased over the years. Instead, cyclical behavior for drought duration and frequency was observed. Drought duration reached its peak recorded value in the most recently recorded drought (which began in 2012 and lasted seven years), which affected both regions. An increase in the frequency occurred since the 2000s for both regions.
The 2012-2018 drought, which severely affected the northeast region of Brazil, deserves special attention, because it is essential for understanding the perceptions of farmers in relation to drought characterization. Rainfed agriculture suffered the most over the first two years of the drought [22], during 2012-2013, causing many farmers to completely abandon their crops. The abandoned soil adapted to dry conditions, and the native vegetation recovered despite the drought. Irrigated agriculture was practically exempt from consequences at the beginning of the drought; its water supply was guaranteed by multi-annual reservoirs with considerably high volumes of stored water. However, the long drought period drastically influenced these levels, affecting the allocation of water used for irrigation. The Tabuleiro de Russas irrigated perimeter was directly impacted by these sanctions, resulting in the abandonment of cultivated land and the loss of a large crop area.
The drought events for the two studied river basins in the state of Ceará were identified using SPI12 (Figure 4a,b). For the 2012-2018 drought, the majority of the SPI12 values were found to be within the threshold of −1 to 0 (mild) or −1 to −1.49 (moderate), which was linked to the persistence of this event. Extreme drought conditions occurred between 2012 and 2014, with the SPI values reaching and exceeding the threshold of −2. Despite being the longest drought duration on record, it was not the first drought to reach this magnitude in these regions. The events in the 1930s and during the 1970s/1980s, for instance, are displayed in the historical series (Figure 1), and were highlighted by the farmers as difficult years for agriculture.  The drought duration and frequency have not monotonically increased over the years. Instead, cyclical behavior for drought duration and frequency was observed. Drought duration reached its peak recorded value in the most recently recorded drought (which began in 2012 and lasted seven years), which affected both regions. An increase in the frequency occurred since the 2000s for both regions.
The 2012-2018 drought, which severely affected the northeast region of Brazil, deserves special attention, because it is essential for understanding the perceptions of farmers in relation to drought characterization. Rainfed agriculture suffered the most over the first two years of the drought [22], during 2012-2013, causing many farmers to completely abandon their crops. The abandoned soil adapted to dry conditions, and the native vegetation recovered despite the drought. Irrigated agriculture was practically exempt from consequences at the beginning of the drought; its water supply was guaranteed by multi-annual reservoirs with considerably high volumes of stored water. However, the long drought period drastically influenced these levels, affecting the allocation of water used for irrigation. The Tabuleiro de Russas irrigated perimeter was directly impacted by these sanctions, resulting in the abandonment of cultivated land and the loss of a large crop area.
The drought events for the two studied river basins in the state of Ceará were identified using SPI12 (Figure 4a,b). For the 2012-2018 drought, the majority of the SPI12 values were found to be within the threshold of −1 to 0 (mild) or −1 to −1.49 (moderate), which was linked to the persistence of this event. Extreme drought conditions occurred between 2012 and 2014, with the SPI values reaching and exceeding the threshold of −2. Despite being the longest drought duration on record, it was not the first drought to reach this magnitude in these regions. The events in the 1930s and during the 1970s/1980s, for instance, are displayed in the historical series (Figure 1), and were highlighted by the farmers as difficult years for agriculture.    Figure 5). Of these, 10%, 17%, 17% and 23% indicated, respectively, that 2015, 2016, 2017, and 2018 was the most severe drought year within the time period they worked with irrigated agriculture.
Even though 2013 was the most critical year in terms of drought intensity, according to the SPI (Figure 3), the drought years that most affected the farmers were concentrated after 2015. Of these, 2018 is worthy of attention as it was considered the most severe drought year based on the farmers' perceptions, although this is contradicted by the meteorological data. This fact can be explained by the decrease in the storage levels of the reservoir after 2016, which significantly affected the users' perceptions. Even though 2013 was the most critical year in terms of drought intensity, according to the SPI (Figure 3), the drought years that most affected the farmers were concentrated after 2015. Of these, 2018 is worthy of attention as it was considered the most severe drought year based on the farmers' perceptions, although this is contradicted by the meteorological data. This fact can be explained by the decrease in the storage levels of the reservoir after 2016, which significantly affected the users' perceptions. Figure 6 displays the water reserves of Castanhã o, over the period 2010-2018, where there is a notable reduction in the percentage of stored water since 2012. The reservoir reached its dead volume in 2017, which in turn negatively influenced irrigated agriculture within the perimeter in 2018 and led to a hydrological drought [47]. At the beginning of the drought, the reservoir had more than 75% of its accumulated capacity, since 2011 was quite rainy in the region. These reserves were used to guarantee the supply during the persistent years of drought, making the reservoir work as a filter that prevented the perception by irrigators of the severity of the existing drought. The impacts of the drought felt at Tabuleiro de Russas irrigated perimeter can be expressed as a function of the volumes of water supplied by Castanhão reservoir that are allocated to the perimeter during 2012-2018 (Table 3). In the state of Ceará, due to the concentration of rainfall in only one half   [47]. At the beginning of the drought, the reservoir had more than 75% of its accumulated capacity, since 2011 was quite rainy in the region. These reserves were used to guarantee the supply during the persistent years of drought, making the reservoir work as a filter that prevented the perception by irrigators of the severity of the existing drought. Even though 2013 was the most critical year in terms of drought intensity, according to the SPI (Figure 3), the drought years that most affected the farmers were concentrated after 2015. Of these, 2018 is worthy of attention as it was considered the most severe drought year based on the farmers' perceptions, although this is contradicted by the meteorological data. This fact can be explained by the decrease in the storage levels of the reservoir after 2016, which significantly affected the users' perceptions. Figure 6 displays the water reserves of Castanhã o, over the period 2010-2018, where there is a notable reduction in the percentage of stored water since 2012. The reservoir reached its dead volume in 2017, which in turn negatively influenced irrigated agriculture within the perimeter in 2018 and led to a hydrological drought [47]. At the beginning of the drought, the reservoir had more than 75% of its accumulated capacity, since 2011 was quite rainy in the region. These reserves were used to guarantee the supply during the persistent years of drought, making the reservoir work as a filter that prevented the perception by irrigators of the severity of the existing drought. The impacts of the drought felt at Tabuleiro de Russas irrigated perimeter can be expressed as a function of the volumes of water supplied by Castanhão reservoir that are allocated to the perimeter during 2012-2018 (Table 3). In the state of Ceará, due to the concentration of rainfall in only one half The impacts of the drought felt at Tabuleiro de Russas irrigated perimeter can be expressed as a function of the volumes of water supplied by Castanhão reservoir that are allocated to the perimeter during 2012-2018 (Table 3). In the state of Ceará, due to the concentration of rainfall in only one half of the year, the water allocation decision is made in July, in a participatory process based on the rainy season (February to May). According to values provided by the DNOCS (Table 3), the negotiated allocation of water to the perimeter decreased only from 2015 onwards, even though the region was already facing meteorological drought since 2012, which shows a time lag between the beginning of the drought and the beginning of its impacts on the irrigation perimeter.

Farmers' Perception to Drought
Drought was characterized by 60% of the respondents as lack of rain, indicating that most respondents perceived the drought to be the result of a meteorological phenomenon. This perception may be linked to the trajectory of rainfed agriculture experienced by the farmers before they settled on the perimeter. From the perspective of irrigated agriculture, drought can be better defined in terms of a hydrological phenomenon associated with supply reservoirs than in terms of a local meteorological phenomenon. Therefore, hydrological drought is the most likely explanation for the phenomenon of droughts in the irrigated areas.
Regarding the reason for the droughts, 70% of the respondents defined natural disasters as the main cause for droughts, compared to 23% who blamed lack of planning by the responsible agencies, and 7% who pointed to poor water management as the main reason. Thus, the drought was intrinsically associated with local precipitation, with the issue of poor water management comparatively ignored. As such, these numbers were consistent with the farmers' understanding of drought as a meteorological phenomenon.
For the drought frequency and duration, the classification of the importance of the explanatory variables via the RF method is shown in Figure 7a,b. Farmers' perception was considered at two levels: perception of the temporal evolution of drought frequency and perception of the increase or decrease in drought frequency.
With respect to drought duration (Figure 7a), the RF method indicated that the number of actively cultivated land plots belonging to the farmer (land) and the years of work experience of the farmer in the agriculture sector (time) were the most important explanatory variables for drought perception. Other significant explanatory variables were: the respondent's opinion regarding the reason for drought (reason), age of the respondent (age), and the respondent's level of education (educ).
With respect to drought frequency (Figure 7b), the RF method indicated that the respondent's age (age), education level (educ), and number of actively cultivated land plots (land) were the most important variables in influencing the farmers' perceptions. Other significant explanatory variables, in order, were: the number of droughts experienced (time), years of work experience in the agriculture sector (years), and the number of climate information sources (info).
Predictors pertaining to cultivated land (land), age of the respondent (age), years of work experience in the agriculture sector (time), and level education (educ) were among the top five (although not necessarily in that order), for both the duration and frequency analyses, indicating their importance in understanding farmers' perceptions.
A comparison with the literature showed that some of these variables have previously been identified as influencing perception. Many factors, such as age, gender, religion, access to education, and means of communication, have been previously cited in studies on climate and risk perception [48][49][50].
Batha et al. [18] indicated that education is an important factor in the development of community resilience to the impact of drought. Poor education can be associated with marginalization and poverty, and, therefore, the lower is the education of farmers, the greater is the chance that they are susceptible to the impacts of drought. Another hypothesis developed by Lidoso is that many youths lead lives that are increasingly disconnected from agricultural activities and increasingly connected to their studies and urban activities, which could influence their perception of environmental tendencies and cycles [51]. The variable educ exhibited considerable importance in our model, as shown in Figure 7b.
Udmale et al. [37] confirmed that farmers with the greatest amount of land, or the most plots, demonstrated more resilience and less vulnerability to drought scenarios, adopting measures such as changing their agricultural calendar and prioritizing cultures that consumed lower amounts of water. These practices were less frequently observed among farmers with more restricted and smaller land properties. In our study, the respondents were mainly distinguished by their cultivated crops and the size of their plots, since their irrigation and handling techniques were largely similar. Drought had a different effect for these farmers depending, for example, on their physical environment, type and degree of involvement in agricultural activities, and the level of impact on their financial well-being.  A relationship between the area of cultivated land and farmer productivity can be established under the hypothesis that more cultivated land is associated with larger producers, leading to greater purchasing power and, possibly, access to newer technologies. This discussion is highly pertinent, given that this variable was shown to have considerable importance for both analyses.
Years of experience in the agricultural sector (years) also played an important role and significantly affected the recovery and expectations of farmers [19]. As such, it was necessary to observe how the daily lives and experiences of farmers affected their perception of drought as well as their resilience and adaptation to the impacts and risks. Ashraf et al. [19] analyzed the experiences of rural workers and showed that the most experienced farmers perceived variations in temperature and rainfall as more significant, while those less experienced claimed to be unaware of the role played by these factors or simply exhibited no opinion. Years of experience was also an important variable in this study, as shown in Figure 7b. In this manner, it can be inferred that years of experience of a farmer in the agriculture sector is a major factor with respect to his/her adaptation to drought scenarios, which, consequently, influences his/her perception of drought.
A correlation between age and education level was observed. In this study, the age of the respondents ranged from 30 to 60 years, with 37% of them being 40-50 years old. This age concentration can be explained by various factors, including incentives to educate their children (given the ease of access to education compared to past decades), which in turn points to a lower involvement of the younger age group in agricultural practices. In addition, the difficulties imposed by low levels of water availability for agriculture causes young people to avoid agriculture and practice other livelihoods. Figure 7b highlights age as the most important predictor for the perception of drought frequency.

Synergy between Explanatory Variables
The synergies between explanatory variables were also analyzed using a DT. The method revealed the variables cultivated land (land), number of droughts experienced (years), and age of the respondent (age) for drought duration and the variables age of the respondent (age), level education (educ), and gender of the respondent (gender) for drought frequency as the most important variables for explaining and classifying the perceptions of farmers.
With respect to drought duration (Figure 8), each independent variable was accompanied by its respective decision threshold value. The construction of the DT involved selecting the variable that maximized the quality of the dataset and minimized its entropy. In this case, the first node or root node was represented by the number of cultivated land plots, with a lower threshold of two actively cultivated plots being stipulated for the next subdivision. If this condition was not met, the second node corresponded to years of agricultural experience, for which the threshold limit was set to less than 5.5 experienced drought years. If the second node was negative, the model considered a third node, using the variable age and a threshold greater than or equal to 51 years. The percentage of observations for the respective response appears below each result in Figure 8.
Similarly, Figure 9 analyzes the relationships between the explanatory variables with respect to drought frequency. The analysis was conducted in a similar manner as that shown in Figure 8. The first node considered the variable age, the most important factor in the RF analysis, with a threshold greater than or equal to 57 years. If this condition was met, a further sub-division was tested for a new threshold, greater than or equal to 58 years. This process continued until an age value of greater than 68 years. If the condition for the first node was not met, education was then considered, with a lower threshold of 5 (incomplete higher education), corresponding to a positive incidence of 83%. If this condition was not met, a new node was considered for gender (female = 0, male = 1), in which the incidence of women was the same as that of men (considering the prior results).
The parameters identified in the DT are shown in the captions of the figures, corresponding to the group of variables affecting perception and the percentage of observations in the node seen under each partition. than 68 years. If the condition for the first node was not met, education was then considered, with a lower threshold of 5 (incomplete higher education), corresponding to a positive incidence of 83%. If this condition was not met, a new node was considered for gender (female = 0, male = 1), in which the incidence of women was the same as that of men (considering the prior results).
The parameters identified in the DT are shown in the captions of the figures, corresponding to the group of variables affecting perception and the percentage of observations in the node seen under each partition.  Comparing the DTs helped to identify the similarities between the group results. For drought duration (Figure 8), the sum of the percentages that indicated high perception at the end nodes comprised 90% of the responses, compared to 10% in the group with no perception and no answers for the other groups. Similar results were presented for drought frequency, but with responses in all groups; the total responses were 89%, 3%, 3%, and 3% for the group with high perception, those who were indifferent, those with low perception, and those with no perception, respectively (Figure 9).
Analyzing the trees separately, Figure 8 indicates that farmers with more than two cultivated land plots, who had lived through more than 5.5 years of drought, and were younger than or equal to 51 years in age presented no perception (Group 3). These results do not agree with those of Udmale et al. [37] and Ashraf [19], who reported high perception among farmers with more cultivated land plots and broader experience with drought. However, our hypothesis is that the perception of drought was lower when analyzed in terms of duration for farmers with greater land ownership, given that these individuals also have greater purchasing power and can adapt more effectively to Comparing the DTs helped to identify the similarities between the group results. For drought duration (Figure 8), the sum of the percentages that indicated high perception at the end nodes comprised 90% of the responses, compared to 10% in the group with no perception and no answers for the other groups. Similar results were presented for drought frequency, but with responses in all groups; the total responses were 89%, 3%, 3%, and 3% for the group with high perception, those who were indifferent, those with low perception, and those with no perception, respectively (Figure 9).
Analyzing the trees separately, Figure 8 indicates that farmers with more than two cultivated land plots, who had lived through more than 5.5 years of drought, and were younger than or equal to 51 years in age presented no perception (Group 3). These results do not agree with those of Udmale et al. [37] and Ashraf [19], who reported high perception among farmers with more cultivated land plots and broader experience with drought. However, our hypothesis is that the perception of drought was lower when analyzed in terms of duration for farmers with greater land ownership, given that these individuals also have greater purchasing power and can adapt more effectively to dry scenarios, either through the acquisition of new technologies or through easier access to technical knowledge.
Age was the most important factor that guided farmers' decisions relative to drought frequency. Education level and gender were also selected by the DT as important predictors. According to the DT in Figure 9, farmers younger than or equal to 57 years in age (90% of the data) with incomplete higher education (educ > 5) were influenced by gender, with women showing high perception (perc = 1) and men showing low perception (perc = 3). Those who were indifferent and who had low perception were captured in the DT associated with farmers aged less than 63 or 58 years, respectively. As this parameter is associated with frequency, a lower age plays a role in the farmers' experience of drought and the manner in which they tracked its recurrence.
Comparing the methods of analysis, the number of cultivated land plots (for duration) and age (for frequency) were revealed to be the most important in the RF method, and this result was confirmed by the DTs. Thus, these variables are fundamental for understanding farmers' perceptions of drought. The vast majority expressed a perception that was classified as "high", and approximately 10% of the farmers exhibited indifference or lack of perception with respect to drought risk scenarios.

Conclusions
The objective of this study was to evaluate farmers' perceptions of the drought that occurred from 2012-2018 in the Tabuleiro de Russas irrigated perimeter of Ceará state, northeast Brazil, using SPI values to characterize the drought and machine learning algorithms to evaluate the importance of the explanatory variables for the farmers' perceptions.
An SPI analysis over a 12-month period for both hydrographic regions allowed a hydrological analysis of the 2012-2018 drought, with the aim to determine the farmers' perception to the drought. The farmers who use irrigated agriculture supplied by a large reservoir classified the drought as meteorological, which contrasts with our statistical analysis.
The SPI results indicate that the beginning of the drought (2012-2013) was the most severe; however, irrigation was not affected because the water reservoirs contained a high water supply due to an elevated inflow in the previous year. Due to the persistence of the drought, water levels reduced drastically, resulting in the introduction of stern measures to cut water demand, including the reduction of water allocation for irrigation. Therefore, the farmers' perception to drought indicated that 2016 was the most severe period. These characteristics render the drought in the region, as per local perceptions, in hydrological terms, rather than meteorological. Lack of a singular drought definition makes it difficult for stakeholders to understand the real drought state during an event. Understanding farmers' perceptions to drought may facilitate communication and increase engagement with proposed measures to mitigate the impacts.
The field research was directed to small irrigators (1-3 irrigation lots) due to the social network of the respondent and the difficulty of access to irrigated areas. The farmers' perceptions to drought were characterized using nine explanatory variables: gender, age, experience in the agriculture sector, education, number of actively cultivated land plots, years of droughts experienced, opinion of the main cause of droughts, number of information sources regarding climate, and participation in discussion groups regarding droughts. These variables were evaluated with the RF method to determine their importance in understanding farmers' perceptions of drought.
The number of actively cultivated land plots, age of the farmer, and education level were among the five most important variables for both analytical criteria (the duration and frequency of droughts).
The importance of the first two was reaffirmed by the DT analysis. It is understandable that the increase in the number of cultivated land plots is a major factor influencing access to more effective irrigation methods and technology, which affects farmers' experiences of the drought. The importance of age is also understandable, since the age of the farmer is associated with his/her experience in both drought scenarios.
The analysis of this study could be adopted to formulate strategies for allocating water, thereby leading to participatory management. The perceptions of farmers could also be incorporated when developing preparatory plans for drought and adopting educational measures to raise awareness regarding water use and the optimal tools for adapting to drought.
This study is important because it identified the most influential variables affecting the technical understanding and perception of farmers regarding drought, and the methodology used in this work can be replicated in other regions to help them to adapt to water deficit scenarios. This work could also be continued with a methodological model of perception, taking the variables selected by the machine learning applications and adjusting the criteria for analysis and applicability to different sectors that use water.
Adequately facing drought events is crucial to properly manage water resources. In the present society, it is unacceptable that participatory water resource management is not promoted. Considering farmers' perception to drought as part of the decision-making process is fundamental in planning measures to mitigate drought and would facilitate their engagement with proposed measures.