Modeling Sentiment–Hydrology Interaction Using LLM: Insights for Adaptive Governance in Ceará’s Water Management

Batista, Tatiane Lima; Studart, Ticiana Marinho de Carvalho; Duarte, Marlon Gonçalves; Souza Filho, Francisco de Assis de

doi:10.3390/w17172615

Open AccessArticle

Modeling Sentiment–Hydrology Interaction Using LLM: Insights for Adaptive Governance in Ceará’s Water Management

by

Tatiane Lima Batista

^1,*

,

Ticiana Marinho de Carvalho Studart

¹

,

Marlon Gonçalves Duarte

²

and

Francisco de Assis de Souza Filho

¹

Department of Hydraulic and Environmental Engineering, Federal University of Ceará, Pici Campus, Bloco 713, Fortaleza 60400-900, CE, Brazil

²

Master’s and PhD Programs in Computer Science, Federal University of Ceará, Pici Campus, Bloco 910, Fortaleza 60440-900, CE, Brazil

^*

Author to whom correspondence should be addressed.

Water 2025, 17(17), 2615; https://doi.org/10.3390/w17172615

Submission received: 6 August 2025 / Revised: 29 August 2025 / Accepted: 30 August 2025 / Published: 4 September 2025

(This article belongs to the Special Issue Application of Hydrological Modelling to Water Resources Management)

Download

Browse Figures

Versions Notes

Abstract

This study aims to analyze the relationships between concerns and sentiments of stakeholders and the drought stage in a semi-arid region of Ceará from Language Technologies based on Artificial Intelligence. The dataset comprises 36 meeting minutes of water management bodies (2007–2024), of which 17 correspond to dry periods and 19 to normal periods (reservoir volume > 50%). Natural Language Processing (NLP) techniques were applied to generate word clouds, and sentiment analysis was performed using a Large Language Model (Llama 3.2, 3B). Sentiment scores were compared with reservoir volume data. Results show that both perceptions and themes differed between drought and normal phases, with higher water availability coinciding with more positive sentiments. A moderate positive correlation was found between sentiment and reservoir volume (r = 0.53, p = 0.00095, 95% CI [0.24, 0.73]). Statistical tests confirmed differences between periods (Welch’s t-test, p = 0.0018; Mann-Whitney, p = 0.0039). Box-plot analyses indicated that over 75% of sentiments were positive in normal phases, while about 65% were negative in drought phases. These findings highlight the sensitivity of human perceptions to hydrological conditions and point to the potential of LLMs as innovative instruments for integrating qualitative data into complex socio-environmental analyses.

Keywords:

drought; natural language processing; socio-hydrology; artificial intelligence

Graphical Abstract

1. Introduction

Hydrological phenomena are complex and deeply connected to social dynamics, exerting a reciprocal influence that is fundamental to a comprehensive understanding of water resource management. The demand for water resources to ensure water, food, and energy security for multiple uses imposes the need for management strategies that reconcile cooperation and conflict resolution, considering the specificities of each region [1]. In this context, recent studies have been dedicated to investigating the complex interactions between social and hydrological systems, seeking to advance the understanding of these relationships [2,3,4,5,6].

The literature still reflects a significant gap, often limited by traditional quantitative approaches that may neglect humanistic aspects and local perceptions. The use of qualitative data has, therefore, been a critical addition to these analyses, providing a dimension that illustrates the complex interactions and human influences on natural events [7].

In this context, sentiment analysis emerges as an alternative with the potential to assist in this task [8]. Sentiment analysis was used to monitor public opinion during the water crisis in Chennai. The research analyzed data from social media to capture the emotional reactions of the population to water scarcity, revealing frustrations and concerns that could better inform communication and crisis management strategies. In another study, [1] explores socio-hydrological modeling in the Lancang-Mekong River. This study integrates sentiment analysis to assess cross-border cooperation in response to extreme climate events, such as floods and droughts. The research highlights the importance of considering human perceptions in hydrological models to promote the sustainable management of water resources.

Sentiment analysis has proven useful in climate events, for example, where public perceptions can influence government responses [9]. The ability to analyze large volumes of textual data from social media and other sources allows for a real-time assessment of the emotions and concerns of affected communities. This is particularly important in the context of climate change, where public perception can influence the acceptance and success of implemented policies [10].

Droughts have been reported in Northeast Brazil (NEB) since the colonial period. The region is characterized by a predominantly semi-arid climate, marked by long periods of drought [11]. The semi-arid region is characterized by factors such as strong climatic seasonality, high variability of rainfall and flows, shallow soils predominantly over crystalline rocks, and high evapotranspiration rates, making the impacts of drought even more severe, such as those that occurred in 1877–1879, which led to the death of approximately 500,000 people in Ceará [12].

According to the study by [12], Northeast Brazil (NEB) experienced one of the worst droughts ever recorded, from 2012 to 2018, resulting in devastating and widespread impacts on water storage, agriculture, livestock, and industry. In the State of Ceará alone, 39 of the 153 monitored reservoirs completely collapsed, another 42 reservoirs reached their minimum operational water level, and 52% of the municipalities in the State faced interruptions in water supply until the end of 2016.

Furthermore, climate change has intensified climatic variability and complicated the management of water resources, especially in regions that already exhibit high natural variability, such as Ceará [13]. In this context, adaptive governance emerges as an instrument to strengthen resilience and society’s capacity to adapt to climate change by incorporating diverse social and environmental perspectives into decision-making processes [14].

Societies adapt to the environment shaped by climatic factors [15], and in the case of Ceará, this adaptation was centered on two pillars. The first was the construction of large Superficial reservoirs, used as a public policy to help deal with drought, and the second was participatory water management [16]. In this context, the River Basin Committees and the Management Commissions of Water Systems are inserted, the latter being a particularity of the water resource management model of the State of Ceará [17].

In the exercise of water governance in Ceará, fundamentally based on popular participation, a large volume of text documents is generated over time, such as minutes of meetings of entities, technical and administrative reports, and even opinions issued through social networks on topics associated with water management. This range of information, associated with the availability of adequate computational models, such as Language Technologies based on Artificial Intelligence (AI), enables the investigation of spatiotemporal patterns and relationships between various variables, which can generate important information for water resource management [18].

Large Language Models (LLMs) present remarkable textual analysis capacity, allowing not only the identification of linguistic and structural patterns but also the extraction of complex semantic insights. These models are capable of interpreting the global context of a text, identifying central themes, and even detecting discursive nuances that may go unnoticed by human eyes or in traditional natural language processing approaches [19].

Furthermore, LLMs possess advanced competence in sentiment analysis, going beyond the simple categorization of positive, negative, or neutral polarity. They can identify emotional subtleties, such as irony, ambiguity, and variations in tone, providing a more in-depth understanding of communicative intentionality. This capability makes LLMs valuable tools for applications that demand refined interpretation of language, such as opinion analysis, discourse monitoring, and persuasive communication studies [19].

In this context, socio-hydrology, which studies the interactions of human beings with water, starts from the premise that, to encompass the completeness of the understanding of these water systems, people must be included in the analysis [7]. Integrating qualitative perspectives into environmental modeling through hybrid frameworks has proven necessary to encompass the complexity of water resource management. Examples include works in the literature: [20] who proposes a socio-hydrological framework that incorporates social processes into hydrological models allowing to observe more realistically coevolutionary dynamics between social and ecological systems; [21] reinforces this approach by highlighting that hydrological modeling, in isolation, is insufficient to deal with water management problems, and it is necessary to involve social actors through processes that strengthen the trust and legitimacy of the models.

This research is inserted in this approach and aims to analyze the relationships between concerns and sentiments of stakeholders and the drought stage in a region located in the semi-arid region of Ceará, from Language Technologies based on Artificial Intelligence.

This work explored the minutes of meetings of the Management Commission of the Arneiroz II Reservoir through the use of Language Technologies based on Artificial Intelligence (AI). Initially, a word cloud was generated through Natural Language Processing (NLP) to identify recurring terms in the texts. Then, an LLM was used to perform sentiment analysis, allowing an assessment of the perceptions expressed in the minutes and their relationship with the drought in the region. Considering the critical context of water resource management in semi-arid regions such as Ceará, this research has its relevance by integrating sentiment analysis and Language Technologies based on AI in socio-hydrology. Such integration allows a new perspective on the governance of water resources, enabling more informed and effective mitigation and adaptation strategies in the face of droughts.

This research focuses on the Management Commission of the Arneiroz II Reservoir, located in the Hydrographic Region of Alto Jaguaribe, in the State of Ceará, which served as a microcosm to study the interrelationship between human management and hydrological variability. The choice of this dam, in particular, lies in the fact that it is located in a region extremely impacted by the 2012–2018 drought [12] and the existence of minutes of its Management Commission in the period from 2008 to 2024 that covers the period to be analyzed. The examination of historical minutes, especially covering the critical period of the 2012–2018 drought, offers insights to assess how community decisions, perceptions, and emotions were modulated by water availability. It is hypothesized that the volume of water in the dam is related to the sentiments expressed by the community that manages it. The use of minutes of meetings of the Management Commissions, highlighted by their inclusive and participatory approach, provides qualitative data that can capture discontinuities or social transitions that purely quantitative models could ignore.

The Large Language Model represents a technological innovation by enabling context-aware sentiment analysis in formal and extensive texts, overcoming the limitations of traditional NLP methods. Furthermore, integrating these qualitative sentiment results with hydrological indicators, linking human perspectives to reservoir volume data, constitutes a second level of innovation in this study.

2. Materials and Methods

The study area of this work was the Arneiroz II Reservoir, located in the municipality of Arneiroz in the state of Ceará, Northeast region of Brazil. The Arneiroz II Reservoir, which dams the Jaguaribe River, was built in 2005 and has a storage capacity of 178.13 hm³, being part of the Hydrographic Region of Alto Jaguaribe (RHAJ) (Figure 1). The RHAJ is located in the southwest/south portion of Ceará, between latitudes 5°24′15″ S–7°23′00″ S and longitudes 38°47′26″ W–40°51′14″ W, occupying an area of 25,261.77 km² [22].

The state of Ceará is located in the semi-arid climate region. Superficial reservoirs are the main source of water in the region for multiple uses. The Management Commissions of Superficial Reservoirs arise, therefore, from the need to manage water reservoirs more closely, acting in the mediation of conflicts over water uses. The Management Commissions represent an evolution of the User Commissions of Superficial reservoirs, acquiring greater formality and standing out as organizational structures aimed at meeting the growing management demands in the strategic reservoirs of Ceará. Linked to the Integrated Water Resources System, these commissions aim to defend the principles established by the State Water Resources Policy (Law No. 14,844/2010, Brazil), promoting local organization, regulation of activities carried out in the Superficial reservoirs, and the insertion of the normative apparatus of water resources in the management routine of the reservoirs [17]. Consequently, the meeting minutes become important records of the decision-making process, as they capture conflicts, community concerns, and the negotiation dynamics that shape local water governance. The Arneiroz II Reservoir has officially had a Management Commission since 2010.

The conceptual framework used is the Integrated Socio-Hydrological Analysis (ASHI). This framework integrates AI-based sentiment analysis with quantitative hydrological data to understand socio-hydrological dynamics.

The research consists of five stages (Figure 2).

2.1. Data Collection and Organization

The data used in this work were:

Qualitative data: Minutes of meetings of the Management Commission of the Arneiroz II Reservoir (2007–2024)
Quantitative data: historical series of volumes (2005–2024), in percentage, of the reservoir (Figure 2).

Both are public information and were provided by the Water Resources Management Company of the state of Ceará [22]. The volume data of the historical series are available in a daily format, but with some days without data in some months. To better visualize the data, they were organized into monthly averages for each month of the historical series, from March 2005 to December 2024 (Figure 3).

The last drought that started around 2012 in the region affected the volume of water stored in the reservoirs, and this can be noted in Figure 4. The volume of the Arneiroz II Reservoir decreased from the beginning of the series in 2011 until the year 2016, when it had a recharge that raised the level to almost 30%, following a fall, later, with no or small elevations in the vicinity of the month of March of the following years. The next significant recharge occurred in the year 2020, when its volume exceeded 90%, starting a visibly different period (2020 to 2024), in which the volumes were always above 50%.

The period from March 2005 to March 2008 was also an unfavorable period for the reservoir, in which the volumes were always below 50%. From April 2008 to August 2012, the volumes were always above 50% (Figure 3). It can be defined, then, from the available data, four distinct periods: first dry period (March 2005 to March 2008), first period with favorable water availability (April 2008 to August 2012), second dry period (September 2012 to March 2020), and second period with favorable water availability (April 2020 to December 2024).

In terms of annual averages of volume stored in the years of the historical series, the effect of the 2012 drought can also be seen in Figure 4a. The year 2012 has a percentage volume above 50%, but starting a period of fall in the graph with a small increase in 2016, and a more favorable period from 2020 to 2024. The average volumes accumulated in each month of the year (Figure 4b) well reflect the rainfall regime of the region that supplies the reservoir, which occurs basically in the first half of the year, with little or no rain in the second half. It can be seen that the highest averages are between the months of April and June, with a fall until the month of January and an increase from January to May (rainy season).

The meeting minutes received were organized by date from 2007 to 2024. In this research, inclusion criteria based on the relevance and adequacy of the document for sentiment analysis were adopted. After excluding duplicate documents, 39 min resulted for analysis. Of these, three documents corresponded exclusively to administrative meetings (e.g., regulation of the committee, training, and member appointments) and were excluded, leaving 36 min for further analysis. Of these 36 documents, 17 correspond to the dry period and 19 to the normal period.

Ceará State is characterized by pronounced seasonality in precipitation patterns, concentrated within four months—February through May. During the remaining eight months, precipitation is close to zero. Consequently, at the end of the rainy season, Ceará’s reservoirs reach their maximum storage for any given year.

The volumes of water from the reservoirs that will be available to users throughout the dry season (and the associated risks) are negotiated. The decisions are endorsed by COGERH, which operates the reservoir system and verifies water uses according to the stakes defined in the participative decision-making process.

In isolated reservoirs—those that are not part of a perennialized valley system (which operate conjointly)—the management of accumulated water is governed by the Reservoir Management Commission.

The Reservoir Management Commissions are watershed-based organizations affiliated with the River Basin Committees (CBH). They comprise water users, representatives from organized structures, and public sector representatives, featuring a plenary assembly and secretariat in their organizational structure, and operate exclusively within the scope of their respective reservoir. Their institutional function consists of negotiated water allocation [23], conducted invariably after the rainy season, when the discharge to be released from the reservoir for each use during the eight-month dry season is determined through participatory processes. During these sessions, the reservoir Management Commissions not only establish water allocation but also mediate inter-user conflicts and formalize a collective framework for sustainable water resource utilization.

Due to the seasonal characteristics of rainfall patterns, the Management Commissions convene only a few times per year, resulting in a limited number of meetings and, consequently, a restricted corpus of minutes. This article analyzes the complete repository of relevant deliberative records from the Management Commission of the Arneiroz II Reservoir over a 17-year period, thereby ensuring the comprehensiveness and representativeness of the decision-making process dataset.

Furthermore, the Llama 3.2 (3B) model was implemented in inference mode, which obviates the need for extensive training datasets, as it leverages its pre-trained knowledge base to extract contextual sentiment patterns.

2.2. Textual Analysis

The 36 min were evaluated in search of two pieces of information: (1) a summary of the main themes addressed from the word cloud and (2) a sentiment analysis of the text.

2.2.1. Word Cloud

For the first step of the textual analysis of the minutes of meetings of the Management Commission of the Arneiroz II Reservoir, a natural language processing (NLP) methodology was applied with the aim of identifying the most recurring terms in the documents and generating a word cloud representative of the content discussed.

Initially, the necessary packages were installed and loaded in the R language (version 4.4.0), including pdftools for extracting text from PDF files, tm for text processing, wordcloud and ggwordcloud for visualizing the terms in a word cloud, and ggplot2 for adjustments in the visual representation.

Then, the minutes were imported from a PDF file containing the records of the meetings held in the respective period. The text was extracted and concatenated into a single textual corpus, allowing its manipulation and cleaning.

The pre-processing of the texts involved the following steps: conversion of all text to lowercase; removal of punctuation and numbers; exclusion of stopwords and specific terms considered irrelevant for the analysis, such as proper names and generic terms related to the structure of the minutes; and elimination of extra white spaces.

After pre-processing, a term matrix was created, allowing the counting of the frequency of occurrence of each word. The visual representation was generated using ggwordcloud, where the most frequent terms were highlighted through variations in size and color. The color scheme was adjusted to highlight the most recurring words in shades of red and blue. To avoid terms of low relevance, only words with a frequency equal to or greater than five were included in the analysis.

Two periods were considered (Dry and Normal) with the corresponding meeting minutes. The months in which the reservoir volume was below 50% were considered as Dry periods. The rest of the analyzed period (volume greater than 50%) was considered as Normal. Therefore, two word clouds were created, one for each period, with the intention of trying to summarize the main themes that emerge from the discussions recorded in the minutes, and whether it is possible to perceive differences between the themes addressed in each period.

The definition of “Dry” and “Normal” periods based on the 50% storage threshold, although apparently simplistic, aligns with water governance practices in Ceará. The state has a unique hydrological characteristic: its water availability is almost entirely dependent on the reserves accumulated in surface reservoirs, a direct result of the intermittence of the region’s rivers and the scarcity of groundwater due to the predominance of the crystalline basement. In this context, the volume stored in the reservoirs becomes the most relevant and direct indicator of water availability in the region, integrating rainfall, evaporation, and water use.

The Arneiroz II Proactive Drought Management Plan establishes storage thresholds (níveis meta) as triggers for transitions between drought states, from Normal to Alert, Drought, and Severe Drought, with operational rules for each state [24]. For this study, the 50% threshold was adopted as a practical simplification to distinguish periods of relative water security from those requiring stricter management, reflecting a critical point recognized in local drought strategies and decision-making.

The reservoir volume is a direct reflection of the climatic conditions of the region, naturally integrating variables such as precipitation, evaporation, and water use, thus serving as an indicator of general water conditions. This definition is aligned with the water management practices adopted by Water Resources Management Company of Ceará (COGERH), reflecting the local operational reality. In addition, the analysis considers the variation of this indicator over an extended period (2005–2024), thus capturing complete cycles and long-term trends. Therefore, this approach is not only suitable for the specific context of Ceará but also provides a solid basis for the analysis of the interactions between water conditions and social perceptions captured in the minutes of the meetings of the Management Commission of the Arneiroz II Reservoir.

2.2.2. Sentiment Analysis

The sentiment analysis was performed using the Llama 3.2 (3B) model through the Ollama (version 0.11.8) framework (Figure 5), a tool for running open-source or free LLMs such as Llama, BERT, Gemma, Mistral, among others. Pretrained on extensive multilingual corpora, Llama 3.2 is recognized for robustness in contextual text analysis and its ability to capture nuanced sentiments beyond lexicon-based methods. Recent studies indicate that LLMs, including Llama variants, can compete with or surpass high-performing transfer learning models in sentiment classification, even without task-specific fine-tuning [25]. All runs were performed with the default inference parameters of Ollama, without any manual adjustment. Information about available models and instructions for using Ollama can be found in the Ollama documentation.

The experiments were performed on a personal machine equipped with an Intel Core i5-12450H processor, 240 GB SSD, and 24 GB of RAM. Throughout the execution, the equipment remained connected to the power source to ensure maximum performance. This hardware set was one of the factors that motivated the choice of the Ollama tool, given that it works in an environment of little computational power very easily [26]. This tool allows the management and efficient execution of AI models in a local environment, without the need for connection to external servers, which is advantageous from the point of view of data privacy and scientific reproducibility due to the control that the user has regarding the server.

In this step, the Python language (version 3.12) was used to access the Ollama API and invoke the model with the prompt to be analyzed. The general architecture of the system was designed in a modular way, with the aim of facilitating maintenance and allowing the replication of experiments. The LLM model is accessed through a local REST API, made available by Ollama on the default port 11434. The analysis process consists of sending pre-configured prompts, containing both the entire content of the minute and explicit instructions on the task to be performed by the model.

The source code was structured in four main scripts. The first of them, pdf_to_str.py, is responsible for extracting the text contained in the PDF files of the minutes. For this, the PyMuPDF library (version 1.26.1) is used, also known as fitz, which allows reading the textual content of each page. During this process, lines containing headers, footers, or other standardized institutional marks are automatically removed based on a similarity calculation (Jaccard index) in relation to a list of pre-defined excerpts. The resulting text is a clean and condensed representation of the minute, ideal for analysis without noise.

The second component is the script prompt_generator.py, which receives as input the path to the PDF of the minute and returns the complete prompt to be sent to the model. This prompt is composed of an introduction explaining the context of the Management Commissions, followed by the entire content of the minute, and ends with the specific analysis instructions. This step represents a classic example of prompt engineering, where the clarity and adequate contextualization of the task are fundamental to the performance of the model. The prompt sent to the model followed the following structure:

“I am sharing a minute from a meeting of the Management Commission of the Arneiroz II Reservoir. These commissions aim to promote local organization, regulation of activities carried out in the Superficial reservoirs, and the insertion of the normative apparatus of water resources in the management routine of the reservoirs. Below is the content of the minute:
Your task is to act as an engineer, specialist in the area of Water Resources and Sentiment Analysis, responsible for analyzing the main events related to the management of the water of the Arneiroz II Reservoir, based on the minutes.
The action to be performed is to calculate the overall polarity of the text, assigning a value (note) between −1 and +1, where −1 indicates an extremely negative sentiment, +1 indicates an extremely positive sentiment, and 0 indicates neutrality. The final answer must include a brief justification for the calculation of the note, highlighting the main points that influenced the assignment of the note.”

Next, the script send_prompt_to_API_model.py is responsible for interfacing between the generated prompt and the language model. It loads the textual content of the minute from a .txt file, sends the HTTP request via the POST method to the Ollama API, and processes the received response, which is transmitted in streaming, line by line, in JSON format. The code verifies the integrity of the response and ensures that the done field is present, indicating the end of processing. The final response is a concatenation of the content of the response field, which contains the assigned polarity note and the corresponding justification.

Finally, the script gerar_analise_ata.py works as an orchestrator of the process, using a class called TerminalLogger that redirects the standard output and errors of the terminal to a .txt log file, allowing the complete recording of all the analyses performed. This file serves both as evidence of execution and as material for later analysis.

From a methodological point of view, the project adopted a statistical robustness strategy by repeating the analysis of each minute in 30 distinct rounds. As LLM models have a stochastic component in their text generation, this repetition allows calculating a more representative polarity average, reducing the impact of possible fluctuations in the model’s response. As LLM models have a stochastic component in their text generation, this repetition allows for the calculation of a more representative polarity average, reducing the impact of possible fluctuations in the model’s response. To quantify this variability, the standard deviation and 95% confidence intervals of the polarity scores across the 30 runs were computed, providing a measure of uncertainty for each estimate. It is important to note that, despite the overall consistency of the results, variations between different executions of the LLM model for the same text were observed. These fluctuations are not seen as a limitation, but rather as a reflection of the complexity and subjectivity inherent in socio-hydrological interactions. The inclusion of confidence intervals highlights the importance of a robust statistical approach in the interpretation of the results.

Replication by other researchers is fully feasible. The requirements include the installation of Python, the fitz and requests libraries, as well as the local installation of Ollama and the download of the Llama 3.2 model. The expected input is PDF files containing minutes of meetings, and the output consists of .txt files with the generated analyses. The modularity of the code also allows adaptation to different contexts, such as other LLM models, different types of textual analysis, or even other structures of institutional meetings.

2.2.3. Interpretation Socio-Hydrological

For the generated word clouds, an analysis was performed of the existence or non-existence of thematic changes between dry and normal periods, reflected by the words that appeared more frequently in the set of minutes of each period.

After obtaining the notes referring to each of the 36 min, this series of values was compared with the series of values referring to the volume stored in the dam in the month in which the meeting occurred, with the intention of verifying the existence or not of some sentiment pattern in relation to water conditions.

The relationship between the variables volume and notes was assessed using Pearson’s correlation, with a significance level of 5%. The correlation coefficient r, the associated p-value, and the 95% confidence interval were calculated, and a visual analysis of the scatter plot was generated from the volumes, on the x-axis, and notes, on the y-axis. With the intention of better evaluating the difference in the notes between the Normal and Dry periods, a box-plot graph was generated for each of the periods. Subsequently, Welch’s t-test was applied to verify the statistical significance of the differences between the two groups, and the non-parametric Mann–Whitney test was also performed to confirm the robustness of the results.

3. Results and Discussion

3.1. Word Cloud

The word clouds generated for the Dry (Figure 6) and Normal (Figure 7) periods allow the identification of general topics present in the discussions of the commission in two periods with distinct water characteristics of the reservoir.

The analysis of the word clouds generated from the minutes of the meetings shows that certain terms are recurring in both periods, such as “flow,” “operation,” “supply,” and “release.” This indicates that the management of water resources remains a central axis of the discussions regardless of seasonality. In fact, the Management Commission of the Arneiroz II Reservouir has the function of monitoring the operation of the dam, approving the flows to be released to meet the uses of the basin.

In the Normal period, some terms gained prominence, such as “fishing”, “fishermen”, “discharge”, “fish”, “canoes”, “irrigation”, “spillway”, and “shrimp.” This pattern suggests discussions of themes focused on the uses of water from the dam for fishing and irrigation activities, in addition to the indication of the discharge of the reservoir. In this period, there were discussions within the Management Commission to create a document that deals with an agreement for the execution of sustainable fishing in the Arneiroz II Reservoir. Therefore, this theme appeared frequently throughout the various meetings. In the dry period, these discussions did not appear frequently.

The word “work” also appears prominently. Other associated words, such as “recovery,” “inspection,” and “construction,” appear in a smaller size, also indicating the presence of themes associated with some type of work carried out in the period associated with the dam. In this period, works were carried out to recover nearby dams, to build wet crossings and pipelines belonging to the Water Mesh project (Projeto Malha Dágua).

In the dry period, some words stand out that do not appear in the cloud of the Normal period, such as: “drought”, “closing,” “river,” “evaporation,” “valve,” and “water trucks.” These words indicate some concern with factors that can affect the availability of water, such as evaporation. The closure of the operation of the dam for some sections happened at some moments during the drought, which is associated with the word “closure.” The use of a Water Tanker is commonly a solution adopted in periods of scarcity and was adopted to supply some communities due to the lowering of the dam. During the drought, the water release valve of the dam was also closed at times, and this was reported in the meetings. These words reflect, therefore, this situation.

It is also perceived that the word “pipeline” appears with a larger size in the cloud, in prominence. In the context of the 2012 drought, an emergency measure commonly adopted was the construction of pipelines to take water to more distant places without water loss, as is the case with perennization by the riverbed. With the river dry, the water that is released for perennization is sucked by the dry soil, causing a very high loss in transit, and is therefore not suitable for dry periods. This discussion about the need to build pipelines was frequent in this period.

Another solution that was widely adopted in this context was the drilling of more wells to increase the water supply in the region. It is perceived that this theme also appears in the cloud from the words “wells” and “drilling,” although the word “well” also appears in the cloud of the Normal period, but in a smaller size. The discussion about the recharge in existing wells was also frequent in this period.

Other various words appear pulverized in the cloud that indicate other themes addressed in the meetings.

3.2. Sentiment Analysis

The note of the sentiment analysis performed by the model considered for the analyses was the average of the 30 rounds performed for each minute (Figure 8 and Figure 9).

The correlation between the notes of the sentiment analysis and the volume of the reservoir in the corresponding period was moderate and positive (r = 0.53, p = 0.00095, 95% CI [0.24, 0.73]), indicating a statistically significant association. Both the Welch’s t-test (t = 3.44, df = 29.16, p = 0.0018) and the Mann-Whitney test (W = 251, p = 0.0039) revealed a difference between the mean scores in dry and normal periods. This suggests that participants’ perceptions are sensitive to variations in reservoir volume, with drier conditions correlating with lower scores.

When analyzing chronologically the distribution of the notes (Figure 8), it is observed that, in the normal periods (2 and 4 in Figure 8), the notes are predominantly positive, except in the transitions between the normal period and the dry period (from 2 to 3) and from the dry period to the normal (from 3 to 4). The dry period (2012 to 2019) concentrates most of the negative notes. The first negative note appears in 2012, and the last appears in 2020, during the transition to the normal period.

In the first dry period represented in the graph (1 in Figure 8), prior to 2008, only one minute was analyzed due to the limited availability of data. The note assigned was positive. However, it is worth noting that the date corresponding to this minute (2007) is very close to the transition to the normal period. According to the graph in Figure 3, the year 2007 also showed an increase in the volume of the reservoir, although the accumulated volume did not reach 50%, the limit established for analysis in this research, due to the low volumes recorded in previous years.

After plotting in a graph, the volume data on the x-axis and the notes on the y-axis (Figure 9a), a growth trend is perceived, indicated by the line of the straight line of a linear regression model adjusted to the data, although the points are with a certain dispersion around the line. The negative notes (below the red dashed line) were given predominantly for minutes associated with smaller volumes of the reservoir (<50%). The same happens for more positive notes (above the red dashed line) that were given predominantly for minutes associated with larger volumes of the reservoir (>50%), with a few exceptions.

The same analysis can be performed from a box-plot graph of the sentiment analysis scores for the two periods, Normal and Dry (Figure 9b). It is noticed that the graph corresponding to the Normal period is located higher than the graph corresponding to the dry period, presenting a higher frequency of positive scores (more than 75%). The graph corresponding to the dry period is located predominantly below the axis corresponding to the score 0 (presenting around 65% of negative scores). It should be noted that it is not possible to perceive a different behavior for the minutes corresponding to the Negotiated Allocation meetings (red dots).

Meeting minutes are generally longer and more technical documents when compared to texts of social media publications. They are also strongly influenced by the editor, which can introduce variations in the structure, detail, and language used. In the context of the meetings of the Management Commissions over the years, the alternation of those responsible for drafting the minutes can modify the form of expression and the textual approach, making it difficult to standardize the sentiment analysis. This particularity, inherent to the type of document analyzed, may have impacted certain aspects of the evaluation, especially in cases where the scores approached neutrality, even with very high or very low volumes in some cases.

In addition, other recurring themes in the minutes may influence the analysis, such as the engagement of the community to solve problems in dry periods, pulling the analysis to a more positive bias, or the discussion of inspection measures in periods of high water availability, which may reduce the positivity of the text. Also, expressions of sentiments may not be so sensitive to more subtle variations in the volume of the reservoir, making it possible to observe clearer differences when periods with larger variations in water availability are compared. While other factors—such as government policies, management changes, or social events—may also influence stakeholder emotions, the primary focus of these meetings is the discussion of reservoir-related issues. Therefore, it is reasonable to assume that reservoir volume plays a major role in shaping both the topics discussed and the sentiments expressed.

It should also be noted that the results presented here are specific to the case of the Arneiroz II Reservoir and cannot be directly extrapolated to other regions without considering their social, cultural, and hydrological particularities. However, the methodology employed in this study can be applied to other regions in future research, allowing for comparative analyses of how water availability influences stakeholders’ perceptions. In this context, it is important to emphasize that cultural background and language characteristics may affect the results. For example, the formal administrative language of official meeting minutes may introduce biases different from those found in other types of texts, such as social media posts. Future research, therefore, can assess how cultural and linguistic specificities shape sentiment detection in water governance contexts. Moreover, the methodology could also be extended to other types of textual data beyond meeting minutes, such as news articles or social media posts, to explore consistency and applicability across diverse sources.

Another relevant factor that may influence the attribution of sentiment scores is the way the model deals with subtle or context-dependent expressions, which may impact the result of the analysis [27]. Future studies may evaluate other types of documents, such as news published in the press, publications in portals specialized in water resources, and social media content focused on the theme of drought.

These factors justify the moderate correlation coefficient (0.53). In contrast, the complementary analysis demonstrated in the graph of Figure 9b points to a noticeable difference between the two periods, indicating that the volume of the reservoir exerts influence on the perception expressed in the minutes.

Based on the results obtained, it is observed that the application of Large Language Models (LLM) to the minutes of the Management Commission of the Arneiroz II Reservoir allowed for the identification of a relationship between the sentiments expressed by the stakeholders and the hydrological conditions of the region. The analyses indicate that dry periods are predominantly associated with negative sentiments, while periods of greater water availability tend to present more positive evaluations. This correlation suggests that the perceptions of the meeting participants reflect the variation of the reservoir volumes, evidencing the impact of water availability on the social and decision-making dynamics of water resource management in the region.

The findings are consistent with international research that emphasizes the integration of unconventional and qualitative data into water management analyses. A previous study has demonstrated the potential of participatory approaches to assess socio-environmental impacts of droughts [28]. Ref. [29] utilized data-driven analysis incorporating socio-economic, demographic, geo-climatic, and technological factors to optimize water management for agricultural practices, demonstrating how diverse data types can enhance water resource management effectiveness. In line with these precedents, the present study highlights the sensitivity of human perceptions to hydrological variability and extends the discussion by applying Large Language Models to systematically capture stakeholders’ concerns and sentiments.

Although not addressed in this study, research in other domains shows that sentiment analysis can strengthen early warning systems, such as in credit risk [30] and disaster management [31]. These precedents suggest opportunities for future applications in water governance, particularly in contexts of hydrological risk.

A limitation of this study is the absence of a formal validation procedure against a manually annotated dataset. Nevertheless, the sentiment outputs showed consistency with contextual expectations and with hydrological indicators, supporting their interpretative validity. Future work can include a systematic evaluation of model performance, for example, through expert annotation and inter-rater agreement analysis, to further strengthen the reliability of LLM-based sentiment analysis in environmental governance research.

4. Conclusions

This study proposes an innovative approach by employing Artificial Intelligence (AI)-based Language Technologies to integrate textual and quantitative data in the analysis of social and hydrological dynamics during drought periods. The use of word clouds allowed the identification of recurring themes and their variations between normal and dry periods, while sentiment analysis revealed consistent differences, indicating shifts in the general sentiment of participants in the Reservoir Management Committee meetings. Regarding the LLM employed, results indicate that, although variations occur between different runs for the same text, the overall outcome is consistent with the assessed context, reflecting the inherent subjectivity of socio-hydrological analysis.

These findings corroborate the interrelationship between human behavior and the hydrological characteristics of the system, suggesting that coping with drought, a natural feature of the semi-arid region, requires consideration of both hydrological and social phenomena. Limitations of the study highlight directions for future research, including generalization to other geographical and cultural contexts, expansion of textual data sources (social media, technical reports), and the development of more specialized AI models for analyzing water management documents.

Beyond its academic value, the proposed approach has practical applications: it can support historical reconstruction of hydrological scenarios, inform water governance in scarcity contexts, and provide a foundation for the development of early warning systems integrating social perceptions with hydrological indicators. Consequently, this work contributes to consolidating the integration of socio-hydrology and artificial intelligence as a promising pathway to address the water-related challenges of the twenty-first century.

Author Contributions

Conceptualization, T.L.B., T.M.d.C.S. and M.G.D.; Data curation, T.L.B. and M.G.D.; F.d.A.d.S.F., T.L.B., T.M.d.C.S. and M.G.D.; Funding acquisition, T.M.d.C.S. and F.d.A.d.S.F.; Investigation, T.L.B., T.M.d.C.S. and M.G.D.; Methodology, T.L.B., T.M.d.C.S. and M.G.D.; Project administration, T.L.B., T.M.d.C.S. and M.G.D.; Resources, T.M.d.C.S.; Software, M.G.D.; Supervision, T.M.d.C.S. and F.d.A.d.S.F.; Validation, T.M.d.C.S. and F.d.A.d.S.F.; Visualization, T.L.B., T.M.d.C.S. and F.d.A.d.S.F.; Writing—original draft, T.L.B. and T.M.d.C.S.; Writing—review & editing, T.L.B., T.M.d.C.S., M.G.D. and F.d.A.d.S.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ceará Foundation for Scientific and Technological Development (FUNCAP) under project UNI-0210-00316.01.00/23 from the UNIVERSAL Call—Edital Nº 06/2023.

Data Availability Statement

The original data and source code used in this study are openly available in GitHub at https://github.com/eletromarlon/MAS_LLM (accessed on 1 August 2025).

Acknowledgments

The authors would like to thank the Water Resources Management Company of Ceará (COGERH) for providing the data. The authors also acknowledge the support from the Ceará Foundation for Scientific and Technological Development (FUNCAP) through grant 29012.006446/2024-76 to the first author and the National Council for Scientific and Technological Development (CNPq), through grant 314861/2023-8 awarded to the second author.

Use of Generative AI

During the preparation of this manuscript, the author(s) used Claude for the purposes of improving the readability, language, and translation of the manuscript. The author(s) have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Lu, Y.; Tian, F.; Guo, L.; Borzì, I.; Patil, R.; Wei, J.; Liu, D.; Wei, Y.; Yu, D.J.; Sivapalan, M. Socio-hydrologic modeling of the dynamics of cooperation in the transboundary Lancang–Mekong River. Hydrol. Earth Syst. Sci. 2021, 25, 1883–1903. [Google Scholar] [CrossRef]
Gohari, A.; Savari, P.; Eslamian, S.; Etemadi, N.; Keilmann-Gondhalekar, D. Developing a system dynamic plus framework for water-land-society nexus modeling within urban socio-hydrologic systems. Technol. Forecast. Soc. Change 2022, 185, 122092. [Google Scholar] [CrossRef]
Shi, Y.; Wang, Z.; Zhao, J.; Chen, J.; Chen, J. A socio-hydrology model for water-urban-land-population-production nexus. J. Clean. Prod. 2024, 482, 144202. [Google Scholar] [CrossRef]
Lyu, J.; Mo, S.; Jiang, K.; Yan, S. Seeking a pathway towards a more sustainable human-water relationship by coupled model—From a perspective of socio-hydrology. J. Environ. Manag. 2024, 368, 122231. [Google Scholar] [CrossRef]
Bahra, M.; Fennan, A. Smart city: An advanced framework for analyzing public sentiment orientation toward recycled water. Int. J. Electr. Comput. Eng. 2024, 14, 1015–1026. [Google Scholar] [CrossRef]
Wheeler, S.A.; Zuo, A.; Pickersgill, J.W. The propensity for negative media reporting of the Murray-Darling Basin Plan in Australia. J. Rural Stud. 2024, 109, 103320. [Google Scholar] [CrossRef]
Srinivasan, V.; Sanderson, M.; Garcia, M.; Konar, M.; Blöschl, G.; Sivapalan, M. Prediction in a socio-hydrological world. Hydrol. Sci. J. 2017, 62, 338–345. [Google Scholar] [CrossRef]
Xiong, J.; Hswen, Y.; Naslund, J.A. Digital surveillance for monitoring environmental health threats: A case study capturing public opinion from Twitter about the 2019 Chennai water crisis. Int. J. Environ. Res. Public Health 2020, 17, 5077. [Google Scholar] [CrossRef]
Jaiswal, R.; Gupta, S.; Tiwari, A.K. Decoding mood of the Twitterverse on ESG investing: Opinion mining and key themes using machine learning. Manag. Res. Rev. 2024, 47, 1221–1252. [Google Scholar] [CrossRef]
Ibrohim, M.O.; Bosco, C.; Basile, V. Sentiment analysis for the natural environment: A systematic review. ACM Comput. Surv. 2023, 56, 88. [Google Scholar] [CrossRef]
Marengo, J.A.; Torres, R.R.; Alves, L.M. Drought in Northeast Brazil—Past, present, and future. Theor. Appl. Climatol. 2016, 129, 1189–1200. [Google Scholar] [CrossRef]
Pontes Filho, J.D.; Souza Filho, F.A.; Martins, E.S.P.R.; Studart, T.M.C. Copula-based multivariate frequency analysis of the 2012–2018 drought in Northeast Brazil. Water 2020, 12, 834. [Google Scholar] [CrossRef]
IPCC. Summary for Policymakers. In Climate Change 2023: Synthesis Report. Contribution of Working Groups I, II and III to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change; Core Writing Team, Lee, H., Romero, J., Eds.; IPCC: Geneva, Switzerland, 2023; pp. 1–34. [Google Scholar] [CrossRef]
Zhang, Q. Land use and adaptive governance under climate change: Analysis of four cases in pastoral areas of China. Front. Environ. Sci. 2023, 11, 922417. [Google Scholar] [CrossRef]
Arthur, J.L. Impact of Geography on Adaptation for the Future Sustainability of Human Society on Earth. Open J. Soc. Sci. 2021, 9, 188–217. [Google Scholar] [CrossRef]
Campos, J.N.B. Secas e políticas públicas no semiárido: Ideias, pensadores e períodos. Estud. Avançados 2014, 28, 65–88. [Google Scholar] [CrossRef]
Frota, P.V.; Silva, U.P.A.; Sales, C.A.J.; Sousa Filho, F.A. Comissões gestoras de sistemas hídricos no estado do Ceará. In Proceedings of the XX Simpósio Brasileiro de Recursos Hídricos, Bento Gonçalves, Brazil, 17–22 November 2013; ABRHidro: Porto Alegre, Brazil, 2013. [Google Scholar]
Carvalho, T.M.N.; Souza Filho, F.A.; Brito, M.M. Unveiling water allocation dynamics: A text analysis of 25 years of stakeholder meetings. Environ. Res. Lett. 2024, 19, 044066. [Google Scholar] [CrossRef]
Zhang, W.; Deng, Y.; Liu, B.; Pan, S.J.; Bing, L. Sentiment analysis in the era of large language models: A reality check. Find. Assoc. Comput. Linguist. NAACL 2024, 2024, 246–257. [Google Scholar] [CrossRef]
Lu, Z.; Wei, Y.; Feng, Q.; Western, A.W.; Zhou, S. A framework for incorporating social processes in hydrological models. Curr. Opin. Environ. Sustain. 2018, 33, 42–50. [Google Scholar] [CrossRef]
Sowby, R.B.; South, A.J.; Jones, N.L.; Hopkins, E.G.; Ames, D. More than modelling: Building trust for positive change in water resources management. Environ. Model. Softw. 2025, 189, 106465. [Google Scholar] [CrossRef]
COGERH. Water Resources Plan for the Alto Jaguaribe River Basin; Water Resources Management Company (COGERH): Fortaleza, Brazil, 2022.
Formiga-Johnsson, R.M.; Kemper, K.E. Institutional and Policy Analysis River Basin Management: The Jaguaribe River Basin, Ceará, Brazil; Policy Research Working Paper no. 3649; World Bank: Washington, DC, USA, 2005. [Google Scholar]
CEARÁ. Proactive Drought Management Plan—Arneiroz II Hydrosystem: Fortaleza, Brazil; CEARÁ: Ceará, Brazil, 2025. [Google Scholar]
Krugmann, J.; Hartmann, J. Sentiment analysis in the age of generative AI. Cust. Needs Solut. 2024, 11, 1–19. [Google Scholar] [CrossRef]
Gruber, J.B.; Weber, M. rollama: An R package for using generative large language models through Ollama. arXiv 2024. [Google Scholar] [CrossRef]
Moraes, M.; Lima, J.; Costa, F. Linguistic ambiguity analysis in large language models (LLMs). Texto Livre 2024, 18, 53181. [Google Scholar] [CrossRef]
Friesen, J.; Sinobas, L.R.; Foglia, L.; Ludwig, R. Environmental and socio-economic methodologies and solutions towards integrated water resources management. Sci. Total Environ. 2017, 581, 906–908. [Google Scholar] [CrossRef]
Kalu, C.K.; Sakilu, O.B.; Ebhota, S. Innovative data-driven analysis of water management for effective agricultural practices. J. Food Technol. Nutr. Sci. 2023, 156, 2–21. [Google Scholar] [CrossRef]
Karentia, A.; Suhartono, D. The influence of sentiment analysis in enhancing early warning system model for credit risk mitigation. IAES Int. J. Artif. Intell. (IJ-AI) 2025, 14, 1829–1838. [Google Scholar] [CrossRef]
Wu, D.; Cui, Y. Disaster early warning and damage assessment analysis using social media data and geo-location information. Decis. Support Syst. 2018, 111, 48–59. [Google Scholar] [CrossRef]

Figure 1. Location of the Arneiroz II Reservoir and the Alto Jaguaribe Hydrographic Region.

Figure 2. Research design.

Figure 3. Historical series of volumes of the Arneiroz Reservoir (2005–2024).

Figure 4. Average volumes of the historical series: (a) average annual volumes (%) and (b) average monthly volumes (%).

Figure 5. The sequence of steps followed in sentiment analysis.

Figure 6. Word cloud for the dry period.

Figure 7. Word cloud for the Normal period.

Figure 8. Notes of sentiment analysis in time. Note: The gray rectangle corresponds to the range of values of negative notes; the green dashed lines divide the data set into 4 periods: 1st and 3rd are periods in which the reservoir volume was less than 50%, 2nd and 4th are periods in which the reservoir volume was above 50%. The blue dashed line represents the confidence interval.

Figure 9. Result of the sentiment analysis of the minutes. In (a): Graph of the correlation of the notes with the volume of the reservoir for each minute. In (b): Box-plot graph of the notes found for the minutes in each period, normal and dry. Note: In (a), the green dashed line corresponds to the volume of 50% of the reservoir that defines what is normal and a dry period for this analysis; the red dashed line refers to the null note of the sentiment analysis.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Batista, T.L.; Studart, T.M.d.C.; Duarte, M.G.; Souza Filho, F.d.A.d. Modeling Sentiment–Hydrology Interaction Using LLM: Insights for Adaptive Governance in Ceará’s Water Management. Water 2025, 17, 2615. https://doi.org/10.3390/w17172615

AMA Style

Batista TL, Studart TMdC, Duarte MG, Souza Filho FdAd. Modeling Sentiment–Hydrology Interaction Using LLM: Insights for Adaptive Governance in Ceará’s Water Management. Water. 2025; 17(17):2615. https://doi.org/10.3390/w17172615

Chicago/Turabian Style

Batista, Tatiane Lima, Ticiana Marinho de Carvalho Studart, Marlon Gonçalves Duarte, and Francisco de Assis de Souza Filho. 2025. "Modeling Sentiment–Hydrology Interaction Using LLM: Insights for Adaptive Governance in Ceará’s Water Management" Water 17, no. 17: 2615. https://doi.org/10.3390/w17172615

APA Style

Batista, T. L., Studart, T. M. d. C., Duarte, M. G., & Souza Filho, F. d. A. d. (2025). Modeling Sentiment–Hydrology Interaction Using LLM: Insights for Adaptive Governance in Ceará’s Water Management. Water, 17(17), 2615. https://doi.org/10.3390/w17172615

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modeling Sentiment–Hydrology Interaction Using LLM: Insights for Adaptive Governance in Ceará’s Water Management

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection and Organization

2.2. Textual Analysis

2.2.1. Word Cloud

2.2.2. Sentiment Analysis

2.2.3. Interpretation Socio-Hydrological

3. Results and Discussion

3.1. Word Cloud

3.2. Sentiment Analysis

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Use of Generative AI

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI