River Stage Variability and Extremes in the Itacaiúnas Basin in the Eastern Amazon: Machine Learning-Based Modeling

Costa, Luiz Rodolfo Reis; Ferreira, Douglas Batista da Silva; Senna, Renato Cruz; de Sousa, Adriano Marlisom Leão; Carmo, Alexandre Melo Casseb do; Silva, João de Athaydes; de Souza, Felipe Gouvea; de Souza, Everaldo Barreiros

doi:10.3390/hydrology12050115

Open AccessArticle

River Stage Variability and Extremes in the Itacaiúnas Basin in the Eastern Amazon: Machine Learning-Based Modeling

by

Luiz Rodolfo Reis Costa

¹,

Douglas Batista da Silva Ferreira

²

,

Renato Cruz Senna

³,

Adriano Marlisom Leão de Sousa

⁴,

Alexandre Melo Casseb do Carmo

⁵,

João de Athaydes Silva, Jr.

⁵,

Felipe Gouvea de Souza

⁶ and

Everaldo Barreiros de Souza

^1,5,*

¹

Programa de Pos-Graduação em Ciências Ambientais (PPGCA), Instituto de Geociências (IG), Universidade Federal do Pará (UFPA), Belem 66075-110, PA, Brazil

²

Instituto Tecnológico Vale (ITV), Belem 66055-090, PA, Brazil

³

Instituto Nacional de Pesquisas da Amazônia (INPA), Manaus 69067-375, AM, Brazil

⁴

Instituto Socioambiental e dos Recursos Hídricos (ISARH), Universidade Federal Rural da Amazônia (UFRA), Belem 66077-830, PA, Brazil

⁵

Faculdade de Meteorologia (FAMET), Programa de Pos-Graduação em Gestão de Risco e Desastre na Amazônia (PPGGRD), Instituto de Geociências (IG), Universidade Federal do Pará (UFPA), Belem 66075-110, PA, Brazil

⁶

Laboratório de Genética Humana e Médica (LGHM), Instituto de Ciências Biológicas (ICB), Universidade Federal do Pará (UFPA), Belem 66075-110, PA, Brazil

^*

Author to whom correspondence should be addressed.

Hydrology 2025, 12(5), 115; https://doi.org/10.3390/hydrology12050115

Submission received: 1 April 2025 / Revised: 25 April 2025 / Accepted: 6 May 2025 / Published: 8 May 2025

(This article belongs to the Section Hydrology–Climate Interactions)

Download

Browse Figures

Versions Notes

Abstract

This study fosters tropical hydroclimatology research by implementing a computational modeling framework based on artificial neural networks and machine learning techniques. We evaluated two models, Multilayer Perceptron (MLP) and Support Vector Machine (SVM), in their ability to simulate 20-year monthly time series (2001–2021) of minimum and maximum river stage in the Itacaiúnas River Basin (BHRI), located in the eastern Brazilian Amazon. The models were configured using explanatory variables spanning meteorological, climatological, and environmental dimensions, ensuring representation of key local and regional hydrological drivers. Both models exhibited robust performance in capturing fluviometric variability, with a comprehensive multimetric statistical evaluation indicating MLP’s superior accuracy over SVM. Notably, the MLP model reproduced the maximum river level during a sequence of extreme hydrological events linked to natural disasters (floods) across BHRI municipalities. These findings underscore the computational model’s potential for refining hydrometeorological products, thus supporting water resource management and decision-making processes in the Amazon region.

Keywords:

hydrological modeling; Amazon; artificial neural networks; machine learning; hydrometeorology; floods

1. Introduction

Hydrological modeling is essential for water resource management and decision-making to reduce the impacts of hydrometeorological extreme events [1,2]. In particular, the Amazon, which is the focus region of this work, has been experiencing a succession of extreme hydrological and climate events, which are exacerbated by global climate change and enhanced by regional factors linked to deforestation, environmental degradation, and urbanization throughout the territory [3]. Gloor et al. [4] evidenced an intensification of the Amazon hydrological cycle in the 1990s to 2000s, with a strong trend toward an increase in the annual amplitude of river discharge and severity of events. Similarly, Barichivich et al. [5] analyzed the historical series of water levels in the Amazon basin and reported that, despite the occurrence of several drought episodes, the largest change in recent decades is a marked increase in very severe floods. Such increased flooding conditions along the Amazon are linked to strengthening of the Walker circulation, resulting from strong tropical Atlantic warming which in turn is related to global anthropogenic and natural factors. Corroborating the positive trends of extreme events in the Amazon, de Souza et al. [6] quantified the official data extracted from the Digital Atlas of Natural Disasters in Brazil [7] and found consistent intensification in the annual occurrence of natural disasters (76% belong to the hydrological group, i.e., floods and inundations) in the municipalities of the state of Pará (eastern Brazilian Amazon) over the past 25 years (1999 to 2023). These disasters’ catastrophic impacts on people and the economy were also documented by these authors, showing a significant rise in the number of homeless individuals and those directly affected, alongside considerable material damage and economic losses for both the public and private sectors. Therefore, given this threatening scenario of hydroclimate extremes, there is an urgent need to develop forecasting tools that can effectively assist both in decision-making and the management of water resources, as well as in the generation of early warning products, which are essential for safeguarding human well-being and strengthening the resilience of Amazonian communities vulnerable to climate change.

In a number of meteorological and hydroclimatological forecasting applications, mathematical and computational modeling tools based on artificial intelligence (AI) algorithms have emerged as a cost-effective and reliable alternative to the development of hydrological models [8]. Rozos et al. [9] proposed using such computational methods not as a substitute for hydrological models but as independent tools to evaluate their performance and contribute to studies in the field. In the context of artificial neural networks (ANNs), pioneering studies date back to the late 1980s; the foundational work performed by Hornik et al. [10] established the formulation of multilayer feedforward networks as universal approximators. This technique was first applied in the 1990s for short-term rainfall–runoff modeling [11]. Over the past two decades, significant scientific and technological advancements have been made in AI modeling using ANNs and machine learning (ML). Dawson et al. [12] outlined the fundamental principles of ANN modeling, including common network architectures and training algorithms for hydrological applications. Samper-Pilar et al. [13] summarized key ANNs and ML techniques for analyzing complex and high-dimensional hydrological datasets, enabling novel insights and spatiotemporal predictive capabilities. These techniques include supervised learning (e.g., random forests, Support Vector Machines, and gradient boosted trees, which are useful for classification and regression), deep learning (e.g., recurrent neural networks (RNNs), long short-term memory (LSTM) networks, and convolutional neural networks (CNNs) for time-series forecasting and spatial pattern recognition), unsupervised learning (e.g., clustering, dimensionality reduction, principal component analysis, and T-distributed stochastic neighbor embedding (t-SNE) for anomaly detection in water quality data), and hybrid models integrating ML with traditional and numerical hydrological modeling approaches.

Recent research efforts leveraging computational methods based on ANNs and ML architectures have fostered hydrological modeling for the Amazon basin, with simulation studies of the river level or stage variability being carried out in some tributaries, such as the Xingu, Tocantins, Purus, and Madeira rivers [14,15,16,17,18]. These studies primarily incorporate meteorological variables (e.g., point-based or spatially distributed precipitation within watersheds) and climate data (oceanic indices from the Pacific and Atlantic) as explanatory variables for short- and long-term hydrological regime forecasting. The practical outcomes of these studies have directly supported flood monitoring and prevention initiatives across the region.

In this work, we address computational hydrological modeling for the Itacaiúnas River Basin (Bacia Hidrográfica do Rio Itacaiúnas—BHRI), located in the eastern Brazilian Legal Amazon, at the transition between the Amazon and Cerrado biomes. This hydrographic unit was selected due to its ecological significance in terms of biodiversity conservation (hosting biodiverse ecosystems with endemic flora and fauna, including federally protected primary Amazonian forests) and economic importance (encompassing the Carajás Mineral Province, a major contributor to Brazil’s trade balance) [19,20,21,22]. Souza-Filho et al. [19] examined in situ and satellite data, and Serrão et al. [20] conducted a modeling study on the environmental changes in the BHRI over the past four decades and their consequences on regional hydroclimatology, both indicating a transition to warmer atmospheric conditions and a positive trend in river flow near the basin’s mouth. Therefore, the aim of this work is to develop a hydrological modeling approach to simulate 20 years (2000 to 2021) of monthly data of minimum and maximum river stages in the BHRI. We employ two widely used methods in regional hydrological studies [23], the Multilayer Perceptron Network (MLP) and Support Vector Machine (SVM) based on ANNs and ML architectures, respectively. Our goal is to evaluate which algorithm best simulates monthly fluviometric variability, particularly extreme hydrological years that triggered natural disasters associated with floods in BHRI municipalities. The models are configured using explanatory variables selected in the meteorological, climatological, and environmental dimensions, ensuring representation of key local and regional hydrological drivers. A novel focus is the inclusion of satellite-derived vegetation data in sensitivity tests, an underexplored aspect in prior research, to evaluate environmental influences on water regime dynamics.

2. Materials and Methods

2.1. Study Area and Databases

Figure 1 illustrates a map of the study area encompassing the BHRI, located in the southeast of the state of Pará on the southeastern portion of the Brazilian Legal Amazon (see the reference map on the bottom left). BHRI belongs to the hydrographic unit of the Araguaia-Tocantins River Basin near the eastern border of the Amazon River Basin. According to Serrão et al. [16], this basin is located within geographic coordinates 7.50 to 3.50 S and 51.60 to 48.50 W, covering an area of about 42,000 km². Among the 10 municipalities present in the study area, 8 have their urban centers (municipal seat) located within the BHRI, totaling around 742,240 inhabitants [24]. The main course of the Itacaiúnas River flows from south to north and then to the northeast, meeting and converging with the Tocantins river at its mouth in the municipality of Marabá (top right corner on BHRI map). The main tributaries are the Parauapebas, Vermelho, and Sororó rivers, which flow northwards, contributing to the regime in the middle and upper courses of the Itacaiúnas River. On the BHRI land use and land cover map, it is possible to visualize that watercourse crosses large areas covered by primary forests characteristic of the Amazon biome [25] with an altitude range of 400–900 m, as well as anthropized areas with extensive pasturelands and agriculture that have altitudes ranging from 80 to 300 m [19,26]. The area has a mosaic of conservation units and protected areas, as well as human and industrial interventions dedicated to mining, agriculture, and livestock [21,22].

Table 1 lists the details of the databases in various dimensions and the respective variables used in this study, all covering the period from January 2000 to July 2021 (almost 20-year time series). The meteorological data were obtained from the Instituto Nacional de Meteorologia (INMET: https://bdmep.inmet.gov.br, accessed on 20 December 2024), with the availability of two stations (conventional and automatic) in Marabá containing complete atmospheric monitoring data. From the hydrological network of the Agência Nacional de Águas e Saneamento Básico (ANA: https://www.snirh.gov.br/hidroweb/serieshistoricas, accessed on 16 December 2024), we used nine pluviometric stations (Fazenda Caiçara, Fazenda Santa Elisa, Fazenda Surubim, Eldorado, Parauapebas, Serra Pelada, PA-150, Fazenda Alegria, and Marabá) distributed throughout the basin with precipitation as the only variable. Hydrological data with information on the river level or stage monitored using a limnimetric staff gauge installed on the Itacaiúnas River, in the city of Marabá (mouth of the basin), were acquired from ANA. Instead of stream flow or discharge (in cm³/s), we used river stage (unit in cm), as this information is more relevant in civil defense applications that monitor flood conditions in the region. So, our time series consists of maximum (Max_WL) and minimum (Min_WL) values of the fluviometric level observed in each month in the period from January 2000 to July 2021. The geographic locations of all stations are indicated on the map in Figure 1. The climate data were accessed from the Climate Prediction Center (CPC/NOAA: https://www.cpc.ncep.noaa.gov/data/indices, accessed on 20 December 2024) with the monthly series of sea surface temperature (SST) and mean sea level pressure (SLP) that are representative of traditional oceanic areas for monitoring the climate mechanisms of the Pacific and Atlantic (see areas in Figure 1, top left map). In addition, to represent the environmental component, three vegetation products were collected: the Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), and Leaf Area Index (LAI). These variables are based on MODIS sensor images present on the TERRA satellite, and we used the Google Earth Engine platform v.1.5.0.3 to extract the time series cropped (georeferenced spatial average) within the BHRI geographic domain. The source of MODIS images is the Land Processes Distributed Active Archive Center (LP DAAC: https://lpdaac.usgs.gov, accessed on 8 January 2025) that operates as a partnership between the U.S. Geological Survey (USGS) and the National Aeronautics and Space Administration (NASA). The map of land use and land cover classes in Figure 1 shows extensive areas of forests and pastures through which the BHRI rivers flow, whose regional environmental factor represented by the NDVI, EVI, and LAI indicators will be investigated in the hydrological regime analysis. Official data on natural disasters were obtained from the Digital Atlas of Disasters in Brazil (https://atlasdigital.mdr.gov.br, accessed on 27 January 2025), managed by the National Secretariat for Civil Defense and Protection (SEDEC) of the Ministry of Integration and Regional Development (MDR) of Brazil [7]. Disaster data (annual total), particularly from the hydrological group (inundations, floods and flash floods) in the BHRI municipalities, will be analyzed together with data on maximum river levels during the occurrence of extreme hydrological years that mainly impacted the cities of Marabá, Parauapebas, and Eldorado dos Carajás.

2.2. Statistical and Computational Methods

All data preprocessing calculations and the subsequent application of the computational methods used in this work were developed in Python v. 3.10 with the following main libraries: sklearn.imputer, statsmodels.api, scipy.stats, sklearn.preprocessing, sklearn.model_selection, sklearn.metrics, sklearn.linear_model, sklearn.neural_network, tensorflow.keras.models, and sklearn.svm.

Except for the data from the climatological and environmental dimensions (complete time series), the meteorological and hydrological series presented individual gaps or failures of around 2 to 3% of the total data, which were imputed using the statistical technique SimpleImputer based on neighboring stations or temporal trends, according to Carvalho [27]. The complete data series were then generated and organized in CSV spreadsheets, which were subjected to descriptive statistics calculations in the data exploration and visualization steps, including the application of the Shapiro–Wilk normality test at a 5% significance level, which is an important procedure in hydrological studies [28].

2.2.1. Spearman Correlation

Based on Spearman’s correlation coefficient (ρ), a correlation matrix (correlogram) was created between the hydrological dependent variables (Max_WL and Min_WL) and the independent variables of the meteorological, climatological, and environmental dimensions (listed in Table 1) with the purpose of evaluating the intensity and direction of the relationships and selecting the most significant predictors to be used in computational modeling. The Spearman method measures the monotonic relationship (increasing or decreasing trend) between the variables, being less sensitive to outliers and not requiring the assumption of linearity [14].

2.2.2. Data Normalization

In computational modeling, it is common to use activation functions applied to the hidden and output layers, which require data on a scale of 0 to 1 for the proper functioning of mathematical methods [16]. Therefore, the time series of all variables with a sample size of 258 monthly records were normalized using the following equation:

Xn = (Xj − Xmin)/(Xmax − Xmin)

(1)

where Xn is the normalized value (with output ranging from 0 to 1); Xj is the value of the variable in line j; and Xmin is the lower value, while Xmax is the higher value considering the entire time series.

2.2.3. Artificial Neural Networks (ANNs)

ANNs are computational techniques based on the neural structure of living organisms and perform computational tasks through experience acquired using learning, with their main advantages being the ability to adapt, generalize, and be fault tolerant. In addition, ANNs act as universal approximators, i.e., they can estimate any nonlinear continuous function with exceptional accuracy [10]. In this work, a model based on an ANN of the Multilayer Perceptron Network (MLP) type was used. This type of architecture was chosen due to its ability to adapt perfectly to environmental data with nonlinear behavior. This method belongs to the feedforward class, allowing for application in several complex situations and in the execution of functions that include data analysis, pattern identification, hydrological and climate forecasting, categorization, image analysis, and projections of seismic events, among others [8,29].

The composition of the ANN with MLP is illustrated in Figure 2 and consists of a grouping of neurons organized into layers, each containing an input layer, one or more intermediate layers, and, in the case of a regression model, an output layer [16,30]. Each neuron in the input layer is connected to all neurons in the intermediate layer, just as all neurons in the intermediate layer will be connected to the only output neuron along the network [31,32].

The learning process takes place precisely within the intermediate layers through the processing of each neuron present in the network [14]. A neuron in a specific layer has the function of integrating the activities of the neurons connected to it in the previous layer and participating in processing, helping in the retention of information, and then the output value, Sj, of the neuron to be stored is given by the following:

Sj = f (∑ Wij Xi + θj)

(2)

where Xi represents the input values in the network; Wij is the weight of the connection from input neuron i to hidden neuron j; θj is the bias incorporated throughout the intermediate layers; and f() is the activation function used to generate the results that will be stored in the next layer.

Taking into account the results obtained in the output layer, which result from the sum of the synaptic connections with the last intermediate layer, a function is applied to obtain the output layer predicted by the model, according to the following expression:

y_k = g (∑ β_kj S_j + θ_k)

(3)

where S_j represents the values stored in the last intermediate layer; β_kj is the weight of the connection of the neuron of the last intermediate layer with the output layer; θ_k is the bias incorporated into the output layer; and g() is the activation function used to generate the network output results, with the values of the target parameter subsequently being compared.

We used the activation function known as Rectified Linear Unitary (ReLU) [34,35] in the network output layer with the following expression:

f(x) = max (0, x), ReLU’ (X) = 1, if X ≥ 0

(4)

where X is the network output value itself in the case of a positive value. This function helps to mitigate the problem of vanishing gradients, common in deep networks, ensuring that the gradients transmitted during error backpropagation do not become too small. Another important point in choosing this function is the fact that it keeps positive values unchanged and disregards negative values, which are assigned a value of zero, which is important in the output of the application of regression methods in hydrological modeling, as reported in [12,36].

However, for intermediate layers, the selected function was the tangent function or the commonly called hyperbolic tangent, Tanh, which is widely used in hidden layers due to its ability to handle positive and negative inputs and to have a certain similarity with the sigmoid function [37,38], as indicated in [16,30,39,40]. This function is given by

f(xi) = (1 − e^xi)/(1 + e^xi)

(5)

where xi is each value stored in the output layer of the network.

The backpropagation algorithm is a widely used approach to adjust the synaptic weights of the neural network based on the gradient descent technique [14,41]. According to [42], the execution of the synaptic weight correction process occurs in two stages: (i) the first is forward, in which the input values are submitted to the network until the output values are generated, and in this stage, there is no change in the weights implemented in the network; (ii) the second is backward, with the propagation of the error generated in (i) being backwards through the connection weights of the output neuron sensing input neurons. This learning process is repeated until there is a maximum approximation of the value estimated by the network with the target values of the prediction [16]. However, in order to speed up the process of obtaining the minimum global error generated by the network in relation to the test data, we chose to use the adaptive moment estimation optimizer—Adam [43]—which is a combination of the momentum optimizers (which records the average exponential decay of previous errors) and RMSprop (which records the average exponential decay of previous squared errors). The momentum optimizer reduces the variation in the gradient over the iterations, stabilizing the updates and helping to avoid sudden oscillations in the adjustment of the weights, with the expression given as follows:

M = β1m − (1 − β1) ∇_θ J(θ)

(6)

where m represents the exponential moving average of the gradients, storing a smoothed average of the gradients over iterations, which helps capture the overall direction of the gradient more stably; β1m is the smoothing term for the moving average; and β1 is a constant between 0 and 1 that controls how much of the previous value of m will be kept. The larger β1 is, the slower the update of m will be, making it smoother; the term (1 − β1) ∇_θ J(θ) is the current gradient contribution of the loss function J(θ) with respect to the parameter θ. This weighted term represents the recent update of the gradient, and (1 − β1) is the weight that regulates the influence of the new gradient compared to the accumulated value of m.

RMSprop controls the magnitude of the update step by individually adjusting the learning rate for each parameter based on the variance of the gradient (parameters with large gradients have smaller updates), with the expression given as follows:

S = β2s − (1 − β2) ∇_θ J(θ)

(7)

where s represents a state variable, which accumulates a weighted average of the derivatives of the gradient of the loss function J(θ) with respect to the parameter θ; the term β2s represents the previous value of the variable s, multiplied by a damping factor, β2, which is a constant between 0 and 1, representing how much of the previous value of s will be retained; the term (1 − β2) ∇_θ J(θ) indicates a current contribution of the gradient of the loss function J(θ) with respect to the parameter θ; and the term (1 − β2) is a weighting factor that determines the influence of the new gradient update.

Since at the beginning of training, the M values tend to be biased towards zero, the following correction needs to be applied:

m_r = m − (1 − β1^t)

(8)

where m_r represents a variable that is corrected or adjusted to eliminate the bias introduced in the original calculation of m in early iterations. It is considered an unbiased value; m is the exponential moving average of the gradients, calculated in previous steps; (1 − β1^t) is the bias correction factor; β1 is a constant between 0 and 1 that determines the decay rate; and t is the current iteration number.

As in the previous item, the values obtained by the equation also need corrections, calculated as follows:

ŝ = s/(1 − β2^t)

(9)

where ŝ represents the variable corrected, or debiased, to compensate for bias in the early stages of the second-order moving average, which is often the mean of the squares of the gradients; s is the exponential moving variance of the second powers of the gradients, which accumulates information about the variance of the gradients; the term (1 − β2^t) is the bias correction factor; β2 is a decay constant between 0 and 1; and t is the iteration number. This factor helps reduce the impact of the initial bias, especially in the early iterations, by making s more representative of the variance of the gradients.

In the end, the combination of the two algorithms results in the update of the weights of the computational network considering both the momentum (average direction of the gradient) and the adaptation of the learning rate based on the variance of the gradient, shown through the following expression:

θ_r = θ + η ((m_r/√(ŝ + ε)))

(10)

where θ represents the current value of the model parameters that will be adjusted; η is the learning rate, controlling the size of the steps taken in the direction of the gradient; mr is the corrected moving average of the gradients, representing the direction of the gradient adjusted to eliminate the bias of moving averages in early steps; and √(ŝ + ε) is the square root of the corrected moving average of the variance of the gradients ŝ, added to a small term ε that is used to avoid division by zero.

2.2.4. Support Vector Machine (SVM)

The SVM learning method originally emerged with the proposal to be used in processes involving data classification, demonstrating high precision in its results. However, this method is very versatile, not only being compatible with classification activities but also being able to be used in processes that require linear and/or nonlinear regression of data, making it promising for hydrological and environmental modeling [44,45]. We used the SVM with an extension known as Support Vector Regression (SVR) by using the below functions:

f(x) = wt + ϕ(x) + b

(11)

Rsvm (C) = ½ ||w||² + C ∑ (ξi + ξi*)

(12)

f(x) = ∑ (αi − αi*) K (xi, x) + b

(13)

where w is the weight vector; ϕ(x) represents the transformation of the data into a multidimensional space; b represents biases; ξi and ξi* are the slack variables used to deal with outliers; C is the regularization parameter that controls the trade-off between model complexity and error penalty; ε is the error tolerance margin; αi and αi* are the associated Lagrange multipliers; and K is the kernel used to calculate ϕ(x).

There are several types of kernels in the literature, but the most common are linear, polynomial, and radial basis function (RBF). In this study, the kernel used was the RBF, as it has been widely used in environmental studies, demonstrating efficiency in analyses aimed at implementing regression and classification [46,47]. The application of the BRF kernel is represented by the following equation:

K (x_i, x_j) = exp (−γ || x_i − x_j ||²

(14)

where x_i and x_j are spatial vectors calculated from input data, either training or test data.

2.2.5. Setup of Computational Simulations and Performance Analysis

After comprehensive pre-processing and preparation of the databases, the procedure of dividing the time series was adopted with 70% of the observations used in the network training and 30% for testing and simulations. Since our time series consist of a sample of 258 data points from January 2000 to July 2021, we will have 78 data points (30%) to be simulated. Unlike most traditional regression models that test the equations only in the final periods of the data series, here, we chose to apply the simulation tests by selecting months in all years of the available data series. Thus, the iteration of data extraction for testing was carried out systematically in which the data in all the years present in the time series were removed randomly, obeying the assumption of equal probability in the composition of the test base. Table A1 in the Appendix A contains the dates of the Max_WL and Min_WL simulations selected by the computer network.

The tests carried out to select the best configuration for the computational network and the number of hidden layers and neurons in each layer were established in two stages:

A network with 1 hidden layer was assembled, with the number of neurons varying from 5 to 30 neurons.
An additional hidden layer was incorporated in which each value from the first hidden layer used was combined with all the values from the second hidden layer, with the latter also varying from 5 to 5 neurons.

Using these criteria and procedures, 144 different arrangements were executed to simulate the river water level of the Itacaiúnas (dependent hydrological variable) by using a set of objectively selected variables (through Spearman’s correlogram analysis) that express the independent variables of the meteorological, climatological, and environmental dimensions. The MLP and SVM models were executed in two different runs, one for Max_WL and another for the Min_WL. To contemplate the objectives proposed in the hydrological modeling approach, the models were executed with the selected independent variables in all dimensions (MLP and SVM) and another model run by removing only the environmental variable (MLP_noEnv and SVM_noEnv). Thus, we have 4 model configurations for each hydrological variable (Max_WL and Min_WL). The summary of the configuration used to create the neural network is shown in Table 2.

The configuration for the SVM model is shown in Table 3. Parameters C, γ, and ϵ determine the prediction accuracy of the FBR type kernel function.

Finally, we reach the important stage of using metrics to evaluate the performance of the simulations generated by the computational networks, but first, the output values are rescaled to real values using the following expression:

Yi^real = Yi^min + (Yi^max − Yi^min) Yi^network

(15)

where Yi^real is the value estimated by the network, rescaled to real values; Yi^min is the lower value, while Yi^max is the higher value of the dependent variable in the complete database; and Yi^network is the value estimated by the network on a scale between 0 and 1.

In the quantitative analysis of the performance of the computational models in predicting Max_WL and Min_WL up to 1 month in advance, we used six statistical metrics listed in Table 4, with detailed descriptions of the concept and the purpose of use, including the interpretation of the ranges of results for each metric to be quantitatively analyzed.

We use a comprehensive multimetric analysis to evaluate the performance of the models, with the RMSE and MAE showing the absolute error between the observed and simulated values, NSE and KGE quantifying the efficiency of the models, PBIAS monitoring the systematic bias of the simulations in underestimating or overestimating the variability observed, and R² capturing the fraction of the variability explained by the models in relation to the observations. The NSE is a classical metric originally proposed to evaluate hydrological forecasts [48], whose formulation was decomposed in [49] to include the combination of the correlation coefficient, bias, and variability, thus receiving the name KGE. Along with the traditional methods described in Table 4, the NSE and KGE have been widely used in model skill analyses in the areas of climatology and hydrology in the Amazon [14,16,17,20,50]. Complementing the methodology, we adopt an objective criterion to compare and rank the performance of the models. We have a total of four models, so the best result receives a score of 1, and the worst receives a score of 4 in each metric according to the range sequence explained in Table 4. The average rank considering the multimetric analysis will objectively classify the models ranging from best (value close to 1) to worst (value close to 4), and a two-dimensional graph is used in the overall visualization of the models. Recent studies have addressed similar model ranking strategies in verifying the performance of general circulation model ensembles in representing the current climate in the Brazilian domain [51,52]. Thus, we applied multimetric statistical evaluation analysis in BHRI hydrological simulations for the whole period. Nevertheless, another fundamental aspect of the hydroclimatology of the Amazon is its pronounced seasonality [53,54], which we will analyze separately to indicate, from a climatological point of view, the months in which the seasonal regimes occur, as well as to evaluate the performance of the models in capturing the seasonal behavior of the Itacaiúnas River level. After separating the time series of the observed and simulated data in each seasonal hydrological regime, a multimetric analysis is applied to investigate which model and which seasonal regime present the best results.

3. Results

3.1. Correlation Analysis and Objective Selection of Independent Variables

Spearman’s correlation calculations were performed considering lags of 1 to 3 months behind the hydrological data (river level). The results show that the 1-month lag presented the best quantitative results for the meteorological variables, while the climate and environmental variables showed slightly higher values in the 2-month lag. However, when considering the entire set of variables, lag-1 is the one that presents the largest number of statistically significant correlations. So, Figure 3 depicts the 1-month lag Spearman correlogram calculated for the entire database from 2000 to 2021, whose results allow us to identify and select which explanatory variables are most important in modulating the hydrological regime of the BHRI. To optimize interpretation, values close to zero were omitted, only statistically significant correlations (p-value < 0.05) were colored in the matrix, and values above 0.5 in the modulus are indicated by circles.

In the meteorological dimension, Pr (station) and Pr_area (spatial average) presented respective values of 0.82 and 0.84 for Min_WL and 0.84 and 0.85 for Max_WL, demonstrating the direct contribution of local rainfall and mainly the spatial integration of precipitation within the basin as the key variable in the variability of the river stage at the mouth of the Itacaiúnas. In addition, a local relationship between RH (correlations of 0.77 and 0.78) and river water levels is noticed, reinforcing the role of rainfall in surface atmospheric humidity conditions. The other meteorological variables presented moderate values for Ta (around −0.6) and very low values for surface pressure (below −0.2).

In terms of the modulation of the climate dimension variables on the river’s fluviometric levels, the statistically significant correlations of SST in the northern (NATL: correlations of −0.85 for Min_WL and −0.80 for Max_WL) and southern (SATL: correlations of 0.83 for Min_WL and 0.81 for Max_WL) basins of the Atlantic Ocean are noticeably more intense. The relationship with the Pacific Ocean is weaker, with negative signals around −0.3 in the western portion (NINO4), while in the central and eastern sectors, the correlation signal is positive and higher, with values of 0.43 and 0.40 in NINO3 and 0.80 and 0.77 in NINO1+2 for Min_WL and Max_WL, respectively. Correlations with SLP at Darwin and Tahiti are between −0.64 and −0.75, reflecting atmosphere–ocean coupling in the Pacific.

On the other hand, the results for the vegetation indicators, representing the environmental dimension, exhibit positive correlations with values of 0.56 and 0.52 for LAI, 0.69 and 0.72 for EVI, and 0.70 and 0.70 for NDVI with Min_WL and Max_WL, respectively.

Therefore, the correlogram helped, in the first approximation, to identify the explanatory variables that directly interfered on the response variable of the river level, as well as, in the second approximation, to eliminate variables that were interrelated between one dimension and other. The latter aspect is important to avoid redundancy between the two or more independent variables to be used in the modeling process. Finally, after some initial sensitivity tests of the computational models, at the final configuration, we objectively chose the explanatory variables Pr_area, NATL, SLPTahiti, and NDVI, which represent the meteorological, climatological, and environmental dimensions (see highlighted text to the left of Figure 3).

3.2. Observations and Simulations over the Whole Period, Including Extreme Hydrological Years with Natural Disasters

Figure 4 shows the time series of the observed and simulated Max_WL in the four configurations of the computational models in the period from 2000 to 2021. The simulations were tested on the monthly data chosen by the computer network scanning all years, and the dates in each year are mentioned in Table A1 (Appendix A). The high temporal variability of the maximum and minimum river levels is evident over the last two decades. Visually, in the periods with seasonally lower levels, the SVM and SVM_noEnv models present simulated Max_WL values around 100 cm that are systematically below the observed ones, while the MLP and MLP_noEnv models are close to the observations. This aspect is not observed in the Min_WL results, which reveal simulated values compatible with the observed minimum levels. On the other hand, when the river is seasonally elevated, all models exhibit a propensity to approximate the observed data in terms of the intensity of Max_WL and Min_WL. A very important aspect, according to the objectives proposed in this work, is to check whether the models were able to reproduce the extreme years in which hydrological disasters were officially recorded in cities downstream within the BHRI. Figure 5 shows the compilation of data on natural disasters specific to the hydrological group, i.e., prolonged inundations and floods, reported on the official SEDEC portal. Almost every year during the rainy season, the maximum fluviometric levels measured at the mouth of the basin (green bars in Figure 4) exceed the alert threshold (1000 cm) defined by the State Civil Defense. As a result, such disasters recur annually in the region, as illustrated in Figure 5c. Analyzing the model simulations, particularly in the sequence of years with observations above 1000 cm, verified that MLP and MLP_noEnv are able to capture these extreme values in the years 2000, 2001, 2002, 2003, 2006, 2008, 2009, 2012, 2014, 2018, and 2019. The SVM and SVM_noEnv models simulate values systematically below the extreme stage observed in the river in these years. All models failed in the extreme hydrological years in 2004, 2005, 2007, 2011, and 2020. The years 2018, 2020, and 2021 in Figure 5c deserve special attention as those years represent when consecutive disasters were confirmed in several cities of the BHRI, with emergency conditions and a state of public calamity being declared. The annual total reached eight disasters in 2018 (three records in Eldorado dos Carajás, two in Parauapebas, and one each in Marabá, Ourilândia do Norte, and Piçarra), seven disasters in 2020 (three in Parauapebas, two in Marabá, and two in Eldorado dos Carajás), and four disasters in 2021 (two each in Parauapebas and Marabá). Analyzing the Max_WL variable in Figure 4 shows that all models were wrong and underestimated the extreme fluviometric levels in 2020, but the simulations generated using MLP were able to predict the maximum levels above the alert limit in 2018 and 2021. In the base period from 2000 to 2021, the highest frequency of natural disasters in the hydrological group occurs in municipalities located in the northern portion, close to the mouth of the BHRI (map in Figure 5b), being 40% in Marabá, 31% in Parauapebas, and 22% in Eldorado dos Carajás (Figure 5a), which are cities located in the lowest topographic region (map in Figure 5c). This result is in accordance with the mapping of natural disasters in the state of Pará carried out recently in [6], in which it was demonstrated that Parauapebas and Marabá occupy the third and tenth positions in the general ranking of the Amazonian state.

Table 5 shows information about the adopted computing architectures and the quantitative values of the performance analysis of the Max_WL and Min_WL simulations during the study period. When first analyzing the architecture of computational networks (Table 5, left part), for the Max_WL variable, the MLP presented the best accuracy using two hidden layers (HLs), with 25 neurons in HL1 and 10 in HL2, while in the MLP_noEnv model, the best architecture was also the two hidden layers with 30 neurons in HL1 and 20 in HL2. The Min_WL followed this pattern, with the most effective architecture consisting of two hidden layers, with 30 neurons in HL1 and 5 in HL2 for MLP and 25 neurons in HL1 and 15 in HL2 for MLP_noEnv.

In comparing the results between models (Table 5, right part), statistical analysis revealed consistently superior performance by the MLP models across all evaluation metrics. Both RMSE and MAE values were substantially lower for MLP compared to the SVM and SVM_noEnv configurations for both hydrological variables. The model efficiency metrics (NSE and KGE) ranged between 0.87 and 0.88 for MLP, indicating excellent predictive capability for maximum and minimum river levels. The PBIAS analysis showed low underestimation errors of 2–3% for Max_WL simulations in the MLP-based models, while the Min_WL predictions exhibited overestimation errors of −3% to −2%. The high R² values (0.882 for Max_WL and 0.881 for Min_WL) confirm that MLP accounts for the majority of the observed data variability in the RHBI. A comprehensive evaluation through integrated metric ranking (Table 4, Figure 6) clearly identified MLP as the superior modeling approach for hydrological simulations in the study area. The model’s performance advantage is particularly evident when comparing the extreme positioning of MLP and SVM in the ranking visualization (see the separation of these models, one at each extreme in Figure 6). Furthermore, the enhanced performance of MLP over MLP_noEnv underscores the importance of including environmental variables for an accurate fluviometric regime simulation at the basin mouth, particularly for applications requiring precise water level predictions during extreme hydrological events.

3.3. Analysis of Seasonal Hydrological Regimes

The seasonality is demonstrated in Figure 7 with the monthly climatological cycle of Min_WL and Max_WL considering the hydrological year. The Itacaiúnas River begins its rising regime in November and increases progressively until reaching its maximum annual peak in the months of February (Min_WL of 887 cm and Max_WL of 1089 cm) and March (Min_WL of 895 cm and Max_WL of 1099 cm), then presents a descending regime from April onwards. The seasonal regime characterized as a flood pulse occurs between January and April, when the maximum values throughout the year are observed. Conversely, the fluviometric level reaches the annual minimum stage between July and October, when the seasonal recession regime is noted, with Min_WL values between 204 and 201 cm and Max_WL values between 263 and 283 cm during August and September.

The results of the multimetric performance analysis, particularly in the seasonal hydrological regimes of the flood pulse (only the Max_WL variable) and recession (only the Min_WL variable), with sample sizes of 31 and 30, respectively, are shown in Table 6 and Figure 8.

In the flood pulse regime (Table 6), the error metrics range from the lowest to the highest values of RMSE and MAE in the sequence of MLP, MLP_noEnv, SVM, and SVM_noEnv. The negative PBIAS indicates that all computational models underestimate the seasonally higher hydrological regime of the Itacaiúnas River, but the percentages are relatively low, between −1.4% and −3.2%. Despite the small variation in the NSE and KGE values, the models’ efficiency is better in the MLP-based networks and worse in the SVM-based networks. These results are also reflected in R², with a value of 0.535 for MLP, which was the model that best explained the variability observed in Max_WL. Integrating all metrics, the overall average rank points to MLP as the best hydrological model to simulate BHRI floods compared to the others.

On the other hand, in the recession regime (Table 6), the RMSE and MAE results show that the absolute errors between the observed and simulated values are lower in the MLP-based architectures and higher in the SVM-based ones. In contrast to the flood pulse regime, the PBIAS is positive in the recession regime, meaning that all models overestimate the observed values, with higher percentage errors: MLP and MLP_noEnv show errors of 6% to 8%, while SVM and SVM_noEnv exhibit significant errors of nearly 20%. Notably, the R² values of 0.479 for MLP and 0.444 for SVM are relatively higher than those for MLP_noEnv (0.361) and SVM_noEnv (0.308), indicating that the complete models capture a larger fraction of the observed data’s variability, i.e., a stronger linear relationship is observed in simulations incorporating environmental indicators (NDVI). In the general average ranking, MLP, with a value of 1, stands out significantly and emerges as the best model to simulate the lowest seasonal regime of the river in the BHRI.

Figure 8 graphically illustrates the average rank obtained through the comprehensive multimetric statistical analysis of the models’ performance. The position of the MLP-based networks in the best-ranking quadrant (blue) is clearly distinguishable, while the SVM-based networks fall in the worst-ranking quadrant (red). Individually, MLP is the best computational model for accurately simulating the complex hydrological variability in the BHRI, with relatively better predictive capacity in the seasonal recession regime than in the flood regime. Despite this higher skill in the recession regime, MLP has an additional advantage: its PBIAS is only −3% (underestimation) for maximum fluviometry during flood pulse compared to the 6% (overestimation) error for minimum fluviometry during recession. This result is particularly relevant given the extreme hydrological events recurring in the region, where flooding disasters have impacted most municipalities of this eastern Amazon hydrographic unit.

4. Discussion

Taking advantage of the emerging and promising potential of the mathematical and computational tools [8,9,13], we implemented a hydrological modeling approach to simulate 20 years of monthly data (2000 to 2021) of minimum and maximum river levels over the BHRI, a hydrographic sub-unit of great socioeconomic and environmental importance for the Amazon [19,20,21,22].

In the initial step of this work, we made a comprehensive and objective selection of the explanatory variables to be used in the modeling. The regional factor of the spatial rainfall distribution in the contributing area (P_area) appeared as a key meteorological parameter, followed by the environmental factor associated with the conditions of the vegetation cover estimated by satellite (NDVI), as well as the large-scale external aspects related to the oceanic (NATL SST) and atmospheric (SLPTahiti) variables representative of the tropical climate mechanisms over the Atlantic and Pacific that regulate the Amazon’s hydroclimatology.

Previous studies have demonstrated a strong relationship between spatial rainfall variability in regulating the seasonal flood and recession regime of the eastern Amazon rivers, such as the Xingu, Tapajos, and Tocantins [15,55,56,57]. In monthly hydrological forecasting studies using classical multiple regression techniques, it is common to use SST predictor variables in adjacent tropical oceans, with robust signals of influences from the central Pacific and the northern Atlantic basin [58,59,60], whose indications are consistent with the climate variables selected in the present study. The influence of the Pacific occurs through the anomalous ocean–atmosphere patterns associated with the warm phase of El Niño and the cold phase of La Niña, which modify the structure of the general circulation of the tropical atmosphere, modulating the seasonal precipitation dynamics within the Amazon basin [61,62]. Equally important are the SST conditions in the northern and southern Atlantic basins, whose anomalous patterns interfere with the positioning of the Intertropical Convergence Zone (ITCZ), which is the main meteorological system that induces rainfall in the Amazon [63].

In the second step of this work, the development of computational modeling was carried out while considering two different artificial intelligence architectures, neural networks (MPL) and machine learning (SVM). Some similar previous studies on the Amazon, like that conducted by Silva et al. [14], developed a monthly prediction system of the Xingu River based on the neural network perceptron of multiple layers by using precipitation in the internal basins and Pacific/Atlantic SST values from 1979 to 2016. The test simulation, particularly for the year 2016, obtained an R² value of 0.99 between the observed and simulated data, which is higher than the R² value of 0.89 found in a similar study carried out by Franco et al. [55] using multiple linear regression in the same region. Using artificial neural networks with feedforward MLP, Salame et al. [15] adjusted the model in the period from 1970 to 2015 and tested the streamflow forecasts of the Tocantins river (station Travessão) for the years 2016 and 2017, whose monthly values were within the 95% confidence interval with the observed data. Santos Neto et al. [16] proposed a hydroclimatological model based on MLP for predicting floods (maximum fluviometry) in the Acre River basin (southwestern Amazon), using monthly SST and SLP data from the Pacific and Atlantic. The model was trained in the period from 1970 to 2016 and tested in the last 6 years of the series (2011 to 2016). When evaluating 30 network architectures, the configuration with 17 and 25 neurons in the hidden layer for a 1-month lag was the one that generated the best results (R² 0.836). These results were promising, and these authors implemented computational modeling in the operational centers of the western Amazon to assist in decision-making and flood risk management in the state of Acre. Mendonca et al. [40] developed rainfall–runoff hydrological models for the Guamá River basin (northeastern Pará state and eastern Amazon) based on two neural network designs, MLP and Nonlinear Autoregressive with Exogenous Inputs (NARX). The years between 2009 and 2015 were used for model calibration, 2016 and 2017 were used for cross-validation, and 2018 and 2019 were tested in the models. In the comparison of the results, with an R² value of 0.99 and a percentage error of around 4.4%, the NARX method was more reliable in hydrological simulations, and the authors recommended the use of this computational network in small and medium-sized basins, especially those with absence or limited hydrometeorological data. Duarte et al. [17] evaluated random forest, artificial neural networks, support vector regression, and M5 model tree architectures, fitted with the selected meteorological variables within basin, and they found better performance scores for the M5 model in the prediction of daily streamflow in three stations in the Tocantins River Basin. Sousa et al. [64] applied machine learning to predict streamflow on a daily scale in the upper Teles Pires River in the southern Amazon by using the rainfall estimated by remote sensing (PERSIANN products). Among four different models, random forest exhibited the best performance for estimating streamflow with a horizon of up to 3 days.

Although there are still few studies using computational modeling approaches applied to hydrological modeling in the Amazon, recent efforts made in the aforementioned studies have effectively contributed to the advancement of this topic. In our model development using MLP and SVM for the BHRI in the eastern Amazon, we achieved a high R² value of around 0.88 and low systematic errors of 2 to 3% considering the complexity of the Itacaiúnas River’s behavior, whose performance allows for the generation of reliable forecasts in both seasonal hydrological regimes. Thus, our study also corroborates the scientific scope in Amazonian hydrology, and an important point to emphasize is the inclusion of modeling based not only on the effects of local meteorological and external climate explanatory variables but also the integration of regional environmental variables related to vegetation cover indicators. Particularly for the RHBI, Souza-Filho et al. [19] determined the consequences of multi-decadal LCLU changes on the regional hydroclimatology, revealing that, from the 1970s to the 2000s, this watershed changed to warmer and drier environmental conditions and had an increase in river discharge in the mouth in Marabá. So, the inclusion of the environmental component in the modeling is essential to express the hydrological processes more completely. In this context, Liu and Ayres [65] used a classic statistical regression model with rainfall and NDVI data to simulate the monthly river water level in the Upper Paraguay River Basin (southwestern Brazil), and the results demonstrated that inundation events could be predicted 1 month in advance with reasonable success. Here, the results obtained using the two computational models (PMC and SVM) showed that the simulations of the Itacaiúnas river stage present relatively better performance when using the combination of meteorological, climatological, and environmental variables (complete model) compared to models without the environmental component. It is interesting that both models better simulate the recession regime when the river reaches seasonal minimum levels (July to October) compared to the flood pulse regime (January to April). As reported by de Souza et al. [53], in the rainy regime of eastern Amazonia (first half of the year), there is a predominance of large-scale meteorological systems and tropical climate mechanisms, while in the dry regime (second half of the year), with the absence of these larger-scale factors, there is a notable tendency for local/regional factors related to the geography and the environment, e.g., the heterogeneity of vegetation cover and topography, to have more evident effects on hydrology.

5. Conclusions

This study addressed a contribution of AI modeling to understanding and predicting extreme hydrological events in the eastern Brazilian Amazon, particularly in the Itacaiúnas River Hydrographic Basin (BHRI). Through a comprehensive evaluation of the MLP and SVM models, we identified key quantitative differences in their ability to simulate maximum (Max_WL) and minimum (Min_WL) river stages in the BHRI over a 22-year period (2000–2021).

The inclusion of selected explanatory variables, such as local meteorological data (spatial rainfall distribution), external climate indices (tropical Pacific and Atlantic dynamics), and regional environmental indicators (vegetation cover based on satellite-estimated NDVI), significantly improved the hydrological regime simulations. Both models exhibited reduced performance when environmental variables were excluded, highlighting the necessity of integrated hydroclimate and environmental modeling. This approach is fundamental for advancing scientific understanding and addressing sustainability challenges in the Amazon.

The MLP exhibits superior performance, with the Nash–Sutcliffe Efficiency and Kling–Gupta Efficiency values ranging between 0.87 and 0.88, along with high R² values (0.882 for Max_WL and 0.881 for Min_WL), thus confirming their high predictive accuracy. Notably, MLP successfully captured 11 out of 16 extreme flood years (river level > 1000 cm), including the critical disaster periods in 2018 and 2021, during which multiple municipalities declared states of emergency. Conversely, SVM-based models systematically underestimated extreme events, failing to reproduce peak flood levels.

A seasonal analysis further highlighted MLP’s robustness. During the flood pulse regime, MLP maintained a low underestimation error (PBIAS −3%) and explained 53.5% of observed variability. In the recession regime, despite a slight overestimation (PBIAS +6%), MLP outperformed the SVM model, which exhibited errors nearing 20%. Model performance was improved by adding environmental variables (NDVI), reinforcing their significance in hydrological simulations.

The distribution of natural disasters revealed that 93% of events occurred in three municipalities near the basin’s mouth (Marabá: 40%, Parauapebas: 31%, and Eldorado dos Carajás: 22%), aligning with peak fluviometric levels. MLP’s ability to predict these extremes provides crucial support for disaster risk management in flood-prone Amazonian regions.

This study concludes by highlighting the importance of MLP-based hydrological modeling as a reliable tool for simulating Amazonian river dynamics, particularly under extreme conditions. The model’s high accuracy in reproducing hydrological regime offers a scientific basis for improving early warning systems and mitigating hydrological disasters in vulnerable Amazonian communities.

Future research should further refine AI model architectures (taking into account more sophisticated machine learning techniques) and expand spatial–temporal scales (including the spatial distribution of daily precipitation data estimated using satellites) to improve applicability across various and heterogeneous Amazonian basins. These scientific developments will be instrumental in fostering the resilience of Amazonian communities vulnerable to escalating hydrological extremes exacerbated by climate change.

Author Contributions

Conceptualization, L.R.R.C. and E.B.d.S.; methodology, software, validation, and formal analysis, L.R.R.C., E.B.d.S., F.G.d.S. and A.M.C.d.C.; investigation, L.R.R.C. and E.B.d.S.; writing—original draft preparation, L.R.R.C., D.B.d.S.F., R.C.S., A.M.L.d.S., A.M.C.d.C., J.d.A.S.J., F.G.d.S. and E.B.d.S.; writing—review and editing, L.R.R.C., E.B.d.S. and A.M.C.d.C.; visualization, L.R.R.C., E.B.d.S. and F.G.d.S.; supervision, project administration, and funding acquisition, E.B.d.S. and D.B.d.S.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Instituto Tecnológico Vale (ITV) and CNPq through the project 420142/2023-1.

Data Availability Statement

Databases with respective sources and references are described in the Materials and Methods Section.

Acknowledgments

We thank CAPES and CNPq for the student’s scholarships and the Research Productivity Grants provided to E.B.S. (314809/2023-6).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Dates (month-year) of the Max_WL and MinWL simulations selected by the computer network.

Max_WL

Min_WL

Mar-00, Jul-00, Oct-00, Nov-00
Mar-01, Apr-01, Jul-01, Aug-01,
Jan-02, Feb-02, Jul-02, Sep-02, Oct-02
Jan-03, Mar-03, Jun-03, Oct-03, Nov-03
Jan-04, Aug-04, Oct-04,
Jan-05, Jul-05, Aug-05, Sep-05
Apr-06, Jun-06, Aug-06, Nov-06
Jan-07, Mar-07, Jul-07, Sep-07
Feb-08, Mar-08, Jun-08
Mar-09, Apr-09, Aug-09, Sep-09
Apr-10, May-10, Jun-10,
Apr-11, May-11, Aug-11, Sep-11
Jan-12, Apr-12, Jun-12,
Apr-13, Sep-13, Dec-13
Feb-14, Mar-14, Oct-14, Dec-14
Jan-15, Feb-15, Apr-15
Feb-16, Aug-16
Jan-17, Feb-17, Jun-17, Aug-17
Jan-18, Feb-18, Nov-18, Dec-18
Mar-19, Apr-19
Jan-20, Feb-20, Aug-20, Nov-20
Feb-21, Apr-21

Mar-00, Apr-00, Jul-00, Oct-00, Nov-00
Mar-01, Apr-01, Jul-01, Aug-01, Sep-01
Jan-02, Feb-02, Aug-02, Oct-02
Mar-03, Oct-03, Nov-03
Aug-04, Nov-04, Dec-04
Jan-05, May-05, Aug-05, Sep-05, Oct-05
Jan-06, Fev-06, Jun-06, Nov-06
Jan-07, Mar-07, Apr-07, Jul-07, Sep-07
Fev-08, Mar-08, Jun-08, Sep-08
Jan-09, fev-09, Apr-09, Aug-09, Oct-09
Jun-10, Jul-10, Sep-10
Jan-11, May-11, Jun-11
Apr-12, Sep-12, Oct-12
Apr-13, Nov-13, Dec-13
Mar-14, Oct-14, Dec-14
Jan-15, Apr-15, May-15
Jul-16, Aug-16, Sep-16
Jan-17, Jun-17, Aug-17
Jun-18, Jul-18, Aug-18
Jul-19, Aug-19
Mar-20, Jun-20, Aug-20, Nov-20
Mar-21, Jun-21

References

Paiva, R.C.D.; Buarque, D.C.; Collischonn, V.; Bonnet, M.-P.; Frappart, F.; Calmant, S.; Mendes, C.A.B. Large-scale hydrologic and hydrodynamic modeling of the Amazon River basin. Water Resour. Res. 2013, 49, 1226–1243. [Google Scholar] [CrossRef]
Towner, J.; Cloke, H.L.; Zsoter, E.; Flamig, Z.; Hoch, J.M.; Bazo, J.; Coughlan de Perez, E.; Stephens, E.M. Assessing the performance of global hydrological models for capturing peak river flows in the Amazon basin. Hydrol. Earth Syst. Sci. 2019, 23, 3057–3080. [Google Scholar] [CrossRef]
Marengo, J.A.; Souza, C.M., Jr.; Thonicke, K.; Burton, C.; Halladay, K.; Betts, R.A.; Alves, L.M.; Soares, W.R. Changes in Climate and Land Use Over the Amazon Region: Current and Future Variability and Trends. Front. Earth Sci. 2018, 6, 228. [Google Scholar] [CrossRef]
Gloor, M.; Brienen, R.J.W.; Galbraith, D.; Feldpausch, T.R.; Schöngart, J.; Guyot, J.-L.; Espinoza, J.C.; Lloyd, J.; Phillips, O.L. Intensification of the Amazon hydrological cycle over the last two decades. Geophys. Res. Lett. 2013, 40, 1729–1733. [Google Scholar] [CrossRef]
Barichivich, J.; Gloor, E.; Peylin, P.; Brienen, R.J.W.; Schöngart, J.; Espinoza, J.C.; Pattnayak, K.C. Recent intensification of Amazon flooding extremes driven by strengthened Walker circulation. Sci. Adv. 2018, 4, eaat8785. [Google Scholar] [CrossRef] [PubMed]
de Souza, E.B.; Ferreira, D.B.S.; Anjos, L.J.S.; Cunha, A.C.; Silva, J.A., Jr.; Coutinho, E.C.; Sousa, A.M.L.; Souza, P.J.O.P.; Correa, W.P.M.; Dias, T.S.S.; et al. Intensification of Natural Disasters in the State of Pará and the Triggering Mechanisms Across the Eastern Amazon. Atmosphere 2025, 16, 7. [Google Scholar] [CrossRef]
Brasil Ministério da Integração e do Desenvolvimento Regional; Secretaria de Proteção e Defesa Civil; Universidade Federal de Santa Catarina; Centro de Estudos e Pesquisas em Engenharia e Defesa Civil. Atlas Digital de Desastres no Brasil; MIDR: Brasília, Brazil, 2023. Available online: https://atlasdigital.mdr.gov.br/arquivos/Atlas_Digital_Desastres_Manual_Aplicacao.pdf (accessed on 2 August 2024).
Oliveira, E.C.L.; Nogueira Neto, A.V.; Santos, A.P.P.; da Costa, C.P.W.; Freitas, J.C.G.; Souza-Filho, P.W.M.; Rocha, R.L.; Alves, R.C.; Franco, V.S.; Carvalho, E.C.; et al. Precipitation forecasting: From geophysical aspects to machine learning applications. Front. Clim. 2023, 5, 1250201. [Google Scholar] [CrossRef]
Rozos, E.; Dimitriadis, P.; Bellos, V. Machine Learning in Assessing the Performance of Hydrological Models. Hydrology 2022, 9, 5. [Google Scholar] [CrossRef]
Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
Minns, A.W.; Hall, M.J. Artificial neural networks as rainfall-runoff models. Hydrol. Sci. J. 1996, 41, 399–417. [Google Scholar] [CrossRef]
Dawson, C.W.; Wilby, R.L. Hydrological modelling using artificial neural networks. Prog. Phys. Geogr. 2001, 25, 80–108. [Google Scholar] [CrossRef]
Samper-Pilar, J.; Samper-Calvete, J.; Mon, A.; Pisani, B.; Paz-González, A. Machine Learning Analysis of Hydrological and Hydrochemical Data from the Abelar Pilot Basin in Abegondo (Coruña, Spain). Hydrology 2025, 12, 49. [Google Scholar] [CrossRef]
Silva, A.G.; Castro, A.R.G.; Vieira, A.C. Modelo de previsão hidrológica utilizando redes neurais artificiais: Um estudo de caso na bacia do Rio Xingu—Altamira-PA. Rev. Bras. Comput. Apl. 2018, 10, 55–62. [Google Scholar] [CrossRef]
Salame, C.W.; Queiroz, J.C.B.; Souza, E.B.; Farias, V.J.C.; Rocha, E.J.P.; Moura, H.P. A comparative study of Box Jenkins models and artificial neural networks in forecasting pluviometric flows and precipitations of Araguaia-Tocantins basin/Brazil. Rev. Bras. Ciências Ambient. 2019, 52, 28–43. [Google Scholar] [CrossRef]
Santos Neto, L.A.; Maniesi, V.; Querino, C.A.S.; Silva, M.J.G.; Brown, V.R. Modelagem hidroclimatologica utilizando redes neurais multilayer perceptron em bacia hidrográfica no sudoeste da Amazônia. Rev. Bras. Climatol. 2021, 16, 26. [Google Scholar] [CrossRef]
Duarte, V.B.R.; Viola, M.R.; Giongo, M.; Uliana, E.M.; Mello, C.R. Streamflow forecasting in Tocantins river basins using machine learning. Water Supply 2022, 22, 6230–6244. [Google Scholar] [CrossRef]
Brandão, J.F.S.; Correa, F.W.S.; Guedes, E.B. A Comparative Analysis of Artificial Neural Networks on River Level Forecasting for the Rio Madeira Basin. In Proceedings of the Encontro Nacional de Inteligência Artificial e Computacional (ENIAC), Belo Horizonte, Brazil, 25–29 September 2023; Belo Horizonte/MG. Anais. Sociedade Brasileira de Computação: Porto Alegre, Brazil, 2023; pp. 141–155. [Google Scholar] [CrossRef]
Souza-Filho, P.W.M.; de Souza, E.B.; Silva Júnior, R.O.; Nascimento, W.R.; Versiani de Mendonça, B.R.; Guimarães, J.T.F.; Dall’Agnol, R.; Siqueira, J.O. Four decades of land-cover, land-use and hydroclimatology changes in the Itacaiúnas River watershed, southeastern Amazon. J. Environ. Manag. 2016, 167, 175–184. [Google Scholar] [CrossRef]
Serrão, E.A.O.; Silva, M.T.; Sousa, F.A.S.; Lima, A.M.M.; Santos, C.A.; Ataide, L.C.P.; Silva, V.P.R. Four decades of hydrological process simulation of the Itacaiúnas river watershed, southeast Amazon. Bol. Ciências Geodésicas 2019, 25, e2019018. [Google Scholar] [CrossRef]
Pontes, P.R.M.; Cavalcante, R.B.L.; Sahoo, P.K.; Silva Júnior, R.O.; da Silva, M.S.; Dall’Agnol, R.; Siqueira, J.O. The role of protected and deforested areas in the hydrological processes of Itacaiúnas River Basin, eastern Amazonia. J. Environ. Manag. 2019, 235, 489–499. [Google Scholar] [CrossRef]
Nunes, S.; Cavalcante, R.B.L.; Nascimento, W.R., Jr.; Souza-Filho, P.W.M.; Santos, D. Potential for Forest Restoration and Deficit Compensation in Itacaiúnas Watershed, Southeastern Brazilian Amazon. Forests 2019, 10, 439. [Google Scholar] [CrossRef]
Ibrahim, K.S.M.H.; Huang, Y.F.; Ahmed, A.N.; Koo, C.H.; El-Shafie, A. A review of the hybrid artificial intelligence and optimization modelling of hydrological streamflow forecasting. Alex. Eng. J. 2022, 61, 279–303. [Google Scholar] [CrossRef]
Silva, R.C.F.; Pimentel, M.A.D.S.; Araújo, A.N. Caracterização Morfométrica e Geomorfológica da Bacia Hidrográfica do Rio Itacaiunas (BHRI), Amazônia Oriental, Brasil. Rev. Bras. Geogr. Física 2022, 15, 1556–1563. [Google Scholar] [CrossRef]
Viana, P.L.; Gil, A.S.B. Flora das cangas da Serra dos Carajás, Pará, Brasil: Cannabaceae. Rev. Rodriguésia 2018, 69, 49–51. [Google Scholar] [CrossRef]
Silva, R.O.; de Souza, E.B.; Tavares, A.L.; Mota, J.A.; Ferreira, D.B.S.; Souza-Filho, P.W.M.; Rocha, E.J.D. Three decades of reference evapotranspiration estimates for a tropical watershed in the eastern Amazon. Acad. Bras. Ciênc. 2017, 89, 1985–2002. [Google Scholar] [CrossRef]
Carvalho, G.L.B.; Ferreira, D.H.L.; Sugahara, C.R. Aplicação da análise envoltória de dados na gestão de recursos hídricos. Rev. Bras. De Iniciação Científica 2022, 9, e022021. Available online: https://periodicoscientificos.itp.ifsp.edu.br/index.php/rbic/article/view/612 (accessed on 10 January 2025).
Razali, N.; Wah, Y. Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests. J. Stat. Model. Anal. 2011, 2, 21–33. Available online: https://www.nrc.gov/docs/ml1714/ml17143a100.pdf (accessed on 2 December 2024).
Shah, H.; Ghazali, R. Prediction of Earthquake Magnitude by an Improved ABC-MLP. In Proceedings of the 2011 Developments in E-systems Engineering, Dubai, United Arab Emirates, 6–8 December 2011. [Google Scholar] [CrossRef]
Coutinho, E.R.; Silva, R.M.; Delgado, A.R.S. Utilização de Técnicas de Inteligência Computacional na Predição de Dados Meteorológicos. Rev. Bras. Meteorol. 2016, 31, 24–36. [Google Scholar] [CrossRef]
Hall, T.; Brooks, H.R.; Doswell, C.A. Precipitation Forecasting Using a Neural Network. Wea. Forecast. 1999, 14, 338–345. [Google Scholar] [CrossRef]
Asaduzzaman, M.; Ahmed, S.U.; Khan, F.E.; Shahjahan, M.; Murase, K. Making use of damped noisy gradient in training neural network. In Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, 18–23 July 2010; pp. 1–5. [Google Scholar] [CrossRef]
Leonel, J.S. Deep Learning. Available online: https://deeplearningbrasil.wordpress.com/author/jorgesleonel (accessed on 15 December 2024).
Haykin, S. Neural Networks: A Comprehensive Foundation; Prentice Hall PTR: Hoboken, NJ, USA, 1998. [Google Scholar]
Nair, V.; Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the International Conference on Machine Learning 2020, Haifa, Israel, 12–18 July 2020. [Google Scholar]
Godinho, J.; Gomes, J.S.G.; Malheiro, R.; Santana, L.E. Previsão hidrológica na bacia do rio Macaé com redes neurais artificiais. Rev. Bras. Comput. Apl. 2022, 14, 70–80. Available online: http://seer.upf.br/index.php/rbca/article/view/12964 (accessed on 3 December 2024).
Chollet, F. Deep Learning with Python; Manning Publications: Shelter Island, NY, USA, 2018. [Google Scholar]
Géron, A. Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, 1st ed.; O’Reilly Media: Sebastopol, CA, USA, 2017. [Google Scholar]
Debastiani, A.B.; Da Silva, R.D.; Neto, S.L.R. Eficácia da arquitetura MLP em modo closed-loop para simulação de um Sistema Hidrológico. Revista Bras. Recur. Hídricos 2016, 21, 821–831. [Google Scholar] [CrossRef][Green Version]
Mendonça, L.M.; Gomide, I.S.; Sousa, J.V.; Blanco, C.J.C. Modelagem chuva-vazão via redes neurais artificiais para simulação de vazões de uma bacia hidrográfica da Amazônia. Rev. Gestão Água América Lat. 2021, 18, e2. [Google Scholar] [CrossRef]
Simpson, P.K. Artificial Neural Systems: Foundations, Paradigms, Applications, and Implementations; Pergamon Press: Oxford, UK, 1999. [Google Scholar]
Albarakati, N.; Kecman, V. Fast neural network algorithm for solving classification tasks: Batch error back-propagation algorithm. In Proceedings of the IEEE Southeastcon 2013, Jacksonville, FL, USA, 4–7 April 2013; pp. 1–8. [Google Scholar] [CrossRef]
Diederik, P.K.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Vapnik, V.; Golowich, S.E.; Smola, A. Support vector method for function approximation, regression estimation and signal processing. Adv. Neural Inf. Process. Syst. 1997, 9, 281–287. Available online: https://dl.acm.org/doi/abs/10.5555/2998981.2999021 (accessed on 2 December 2024).
Li, X.-L.; Lü, H.; Horton, R.; An, T.; Yu, Z. Real-time flood forecast using the coupling support vector machine and data assimilation method. J. Hydroinform. 2014, 16, 973–988. [Google Scholar] [CrossRef]
Roy, C.; Motamedi, S.; Hashim, R.; Shamshirband, S.; Petković, D. A comparative study for estimation of wave height using traditional and hybrid soft-computing methods. Environ. Earth Sci 2016, 75, 590. [Google Scholar] [CrossRef]
Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol. 1970, 10, 3. [Google Scholar] [CrossRef]
Gupta, H.V.; Kling, H.; Yilmaz, K.K.; Martinez, G.F. Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. J. Hydrol. 2009, 377, 80–91. [Google Scholar] [CrossRef]
Oliveira, R.F.; Zolin, C.A.; Victoria, D.C.; Lopes, T.R.; Vendrusculo, L.G.; Paulino, J. Hydrological calibration and validation of the MGB-IPH model for water resource management in the upper Teles Pires River basin in the Amazon-Cerrado ecotone in Brazil. Acta Amaz. 2019, 49, 54–63. [Google Scholar] [CrossRef]
Oliveira, D.M.; Ribeiro, J.G.M.; De Faria, L.F.; Reboita, M.S. Performance dos modelos climáticos do CMIP6 em simular a precipitação em subdomínios da América do Sul no período histórico. Rev. Bras. Geogr. Física 2023, 16, 116–133. [Google Scholar] [CrossRef]
Medeiros, F.J.; Oliveira, C.P.; Avila-Diaz, A. Evaluation of extreme precipitation climate indices and their projected changes for Brazil: From CMIP3 to CMIP6. Weather Clim. Extrem. 2022, 38, 100511. [Google Scholar] [CrossRef]
De Souza, E.B.; Ferreira, D.B.S.; Guimarães, J.T.F.; Franco, V.S.; Azevedo, F.T.M.; Moraes, B.C.; Souza, P.J.O.P. Padrões clima- tológicos e tendências da precipitação nos regimes chuvoso e seco da Amazônia oriental. Rev. Bras. Clim. 2017, 21, 81–93. [Google Scholar] [CrossRef]
Espinoza Villar, J.C.; Ronchail, J.; Guyot, J.L.; Cochonneau, G.; Naziano, F.; Lavado, W.; De Oliveira, E.; Pombosa, R.; Vauchel, P. Spatio-temporal rainfall variability in the Amazon basin countries (Brazil, Peru, Bolivia, Colombia, and Ecuador). Int. J. Climatol. 2009, 29, 1574–1594. [Google Scholar] [CrossRef]
Franco, V.S.; de Souza, E.B.; Lima, A.M.M.; Sousa, A.M.L.; Pinheiro, A.N.; Dias, T.S.S.; Azevedo, F.T.M. Previsão hidrológica de cheia sazonal do rio Xingu, Altamira-PA. Rev. Bras. Climatol. 2018, 14, 22. [Google Scholar] [CrossRef][Green Version]
Batista, R.G.; Costa, C.E.A.S. Influence of climatic phenomena on rainfall in the Tapajós River Basin. Rev. Agrogeoambiental 2024, 16, e20241847. [Google Scholar] [CrossRef]
Collischonn, B.; Collischonn, W.; Tucci, C.E.M. Daily hydrological modeling in the Amazon basin using TRMM rainfall estimates. J. Hydrol. 2008, 360, 207–216. [Google Scholar] [CrossRef]
Ronchail, J.; Cochonneau, G.; Molinier, M.; Guyot, J.L.; De Miranda Chaves, A.G.; Guimarães, V.; De Oliveira, E. Interannual rainfall variability in the Amazon basin and sea-surface temperatures in the Equatorial Pacific and the tropical Atlantic Oceans. Int. J. Climatol. 2002, 22, 1663–1686. [Google Scholar] [CrossRef]
De Linage, C.; Famiglietti, J.S.; Randerson, J.T. Statistical prediction of terrestrial water storage changes in the Amazon Basin using tropical Pacific and North Atlantic Sea surface temperature anomalies. Hydrol. Earth Syst. Sci. 2014, 18, 2089–2102. [Google Scholar] [CrossRef]
Yoon, J.H.; Zeng, N. An Atlantic influence on Amazon rainfall. Clim. Dyn. 2010, 34, 249–264. [Google Scholar] [CrossRef]
Marengo, J.A.; Espinoza, J.C. Extreme seasonal droughts and floods in Amazonia: Causes, trends and impacts. Int. J. Climatol. 2016, 36, 1033–1050. [Google Scholar] [CrossRef]
Ambrizzi, T.; de Souza, E.B.; Pulwarty, R.S. The Hadley and Walker Regional Circulations and Associated ENSO Impacts on South American Seasonal Rainfall. In The Hadley Circulation: Present, Past and Future. Advances in Global Change Research; Diaz, H.F., Bradley, R.S., Eds.; Springer: Dordrecht, The Netherlands, 2004; Volume 21. [Google Scholar] [CrossRef]
de Souza, E.B.; Kayano, M.; Ambrizzi, T. Intraseasonal and submonthly variability over the Eastern Amazon and Northeast Brazil during the autumn rainy season. Theor. Appl. Climatol. 2005, 81, 177–191. [Google Scholar] [CrossRef]
Sousa, M.F.; Uliana, E.M.; Aires, R.V.U.; Rápalo, L.M.C.; da Silva, D.D.; Moreira, M.C.; Lisboa, L.; da Silva Rondon, D. Streamflow prediction based on machine learning models and rainfall estimated by remote sensing in the Brazilian Savanna and Amazon biomes transition. Model. Earth Syst. Environ. 2024, 10, 1191–1202. [Google Scholar] [CrossRef]
Liu, W.T.; Ayres, F.M. Upper Paraguay River inundation prediction using rainfall and NDVI. Int. J. Remote Sens. 2005, 26, 4455–4470. [Google Scholar] [CrossRef]

Figure 1. The study area in the hydrographic basin of the Itacaiúnas River (BHRI, map on the right) with hydrology (main rivers) and land use and cover classes, including the locations of stations (symbols: circle, cross, and triangles). The reference map (bottom-left) shows the BHRI over the state of Pará within the main hydrographic basins in tropical Brazil. The top left map indicates the climate monitoring areas of the Pacific and Atlantic Oceans.

Figure 2. The architecture of the artificial neural network adopted for this study (adapted from [33]).

Figure 3. One-month lag Spearman correlogram calculated for entire database from 2000 to 2021. Values close to zero are omitted, only statistically significant correlations (p-value < 0.05) are colored in matrix, and values above 0.5 in modulus are indicated by circles.

Figure 4. Time series of observed (green bars) and simulated (colored lines) Max_WL (top) and Min_WL (bottom) from 2000 to 2021, considering dates shown in Table A1 (Appendix A). Dashed bolded line is Civil Defense alert threshold.

Figure 5. Natural disasters of the hydrological group (inundations and floods) recorded in the BHRI municipalities from 2000 to 2021: (a) percentages by municipalities, (b) a map with the topography (m) and the impacted municipalities, and (c) the annual total of disasters (vertical bars).

Figure 6. Average ranking positions for each computational model for Max_WL and Min_WL.

Figure 7. The annual climatological cycle of the hydrological regime given by variables Max_WL and Min_WL at the mouth of the BHRI, emphasizing the extremes in the seasonal regimes.

Figure 8. Average ranking positions for each computational model for Max_WL and Min_WL in the seasonal regimes.

Table 1. Databases and variables used in this study.

Dimension	Variables	Acronym	Location	Unit	Source
Meteorological	Air temperature	Ta	Marabá station	°C	INMET
	Relative humidity	RH		%
	Surface atmospheric pressure	Pa		hPa
	Wind speed	Vv		m/s
	Precipitation	Pr		mm
	Precipitation	Pr_area	Average of the 9 stations within the basin	mm	ANA
Climatological	Mean sea level pressure	SLP	Darwin and Tahiti	hPa	CPC/NOAA
	Sea surface temperature	SST	NINO1+2, NINO3, NINO3.4, NINO4, NATL, and SATL	°C	CPC/NOAA
Environmental	Normalized Difference Vegetation Index	NDVI	Spatial average within the basin		USGS, NASA
	Enhanced Vegetation Index	EVI
	Leaf Area Index	Lai
Hydrological	Maximum river water level	Max_WL	Marabá station	cm	ANA
	Minimum river water level	Min_WL	Marabá station	cm	ANA
Natural disasters	Annual number of hydrological disasters	–	Municipalities within the basin	–	SEDEC/MDR

Table 2. The configuration adopted in the computational model PMC. The 1st HL is the first hidden layer; the 2nd HL is the second hidden layer; the F1st HL is the activation function of the first hidden layer; the F2nd HL is the activation function of the second hidden layer; FCS is the function of the output layer; and NI is the number of interactions.

Model	1st HL	2nd HL	F1st HL	F2nd HL	FCS	NI
MLP	5, 10, 15, 20, 25 and 30	5, 10, 15, 20, 25 and 30	Tanh	Tanh	Relu	up to 4000

Table 3. The configuration adopted in the computational model SVM. C = 1 limits the maximum values during optimization, ensuring that the model is not too flexible; ϵ = 0.1 considers that only errors higher than 0.1 will be penalized in the objective function; γ = 1 ensures that when creating the width of the Gaussian curve, the amplitude of the data is met in an attempt to reach the maximum number of points within the pre-established criteria.

Model	K	C	ϵ	γ
SVM	FBR	1	0,1	1

Table 4. Statistical metrics for analyzing model performance.

Metric	Description	Range and Interpretation of Quantitative Results
RMSE	Root Mean Square Error: This measures the square root of the mean of the squared errors. It penalizes larger errors more heavily than smaller errors.	[0, +∞) lower is better
NSE	Nash–Sutcliffe Efficiency: This measures the predictive quality of the model relative to the mean of the observed values.	(−∞, 1] values close to 1 indicate better fit
KGE	Kling–Gupta Efficiency: This improves the NSE by decomposing performance into three components, namely linear correlation, bias, and variability.	(−∞, 1] values close to 1 indicate better fit
MAE	Mean Absolute Error: This is the average of the absolute error values. It is less sensitive to outliers than the RMSE.	[0, +∞) lower is better
PBIAS	Percent Bias: This measures the average tendency of the simulated values to be higher (underestimation) or lower (overestimation) than the observed values.	(−∞, +∞) values close to 0 indicate less bias
R²	Coefficient of Determination: This is the proportion of the variance in the simulated variable from the observed variable. It measures the strength of the linear relationship between observed and simulated values.	[0, 1] values close to 1 indicate better correlation

Table 5. Quantitative values of the statistical performance metrics and the average rank for Max_WL and Min_WL simulations. HL means hidden layer.

Variable	Model	Network Architecture			Statistical Metrics						Average Rank
Variable	Model	HL1	HL2	Epoch	RMSE	NSE	KGE	MAE	PBIAS	R²	Average Rank
Max_WL	MLP	25	10	4000	123.48	0.878	0.881	90.89	2.07	0.882	1.00
	SVM	–	–	–	174.51	0.840	0.860	135.58	14.14	0.840	2.00
	MLP_noEnv	30	20	2000	141.44	0.757	0.834	101.13	3.46	0.846	3.50
	SVM_noEnv	–	–	–	175.33	0.755	0.835	133.07	13.55	0.831	3.50
Min_WL	MLP	30	5	1000	92.99	0.880	0.884	66.76	0.29	0.881	1.17
	SVM	–	–	–	96.98	0.878	0.887	72.47	–3.76	0.876	1.83
	MLP_noEnv	25	15	1000	93.53	0.869	0.865	69.19	1.00	0.880	3.17
	SVM_noEnv	–	–	–	97.20	0.869	0.863	73.63	–2.23	0.873	3.83

Table 6. Quantitative values of the statistical performance metrics and the average rank for Max_WL in the flood pulse and Min_WL in the recession regime.

Variable and Seasonal Regime	Model	RMSE	NSE	KGE	MAE	PBIAS	R²	Average Rank
Max_WL Flood pulse (January to April)	MLP	130.01	0.518	0.653	108.80	–3.04	0.535	1.33
	MLP_noEnv	132.99	0.495	0.632	110.80	–3.18	0.514	2.50
	SVM	133.23	0.494	0.626	109.87	–1.38	0.500	2.67
	SVM_noEnv	134.85	0.481	0.632	111.13	–2.18	0.495	3.50
Min_WL Recession (July to October)	MLP	37.56	0.253	0.682	30.31	6.19	0.479	1.00
	MLP_noEnv	43.53	−0.004	0.591	34.92	8.27	0.361	2.16
	SVM	58.46	−0.810	0.556	46.00	19.34	0.444	2.83
	SVM_noEnv	58.65	−0.822	0.510	48.12	19.70	0.308	4.00

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Costa, L.R.R.; Ferreira, D.B.d.S.; Senna, R.C.; de Sousa, A.M.L.; Carmo, A.M.C.d.; Silva, J.d.A., Jr.; de Souza, F.G.; de Souza, E.B. River Stage Variability and Extremes in the Itacaiúnas Basin in the Eastern Amazon: Machine Learning-Based Modeling. Hydrology 2025, 12, 115. https://doi.org/10.3390/hydrology12050115

AMA Style

Costa LRR, Ferreira DBdS, Senna RC, de Sousa AML, Carmo AMCd, Silva JdA Jr., de Souza FG, de Souza EB. River Stage Variability and Extremes in the Itacaiúnas Basin in the Eastern Amazon: Machine Learning-Based Modeling. Hydrology. 2025; 12(5):115. https://doi.org/10.3390/hydrology12050115

Chicago/Turabian Style

Costa, Luiz Rodolfo Reis, Douglas Batista da Silva Ferreira, Renato Cruz Senna, Adriano Marlisom Leão de Sousa, Alexandre Melo Casseb do Carmo, João de Athaydes Silva, Jr., Felipe Gouvea de Souza, and Everaldo Barreiros de Souza. 2025. "River Stage Variability and Extremes in the Itacaiúnas Basin in the Eastern Amazon: Machine Learning-Based Modeling" Hydrology 12, no. 5: 115. https://doi.org/10.3390/hydrology12050115

APA Style

Costa, L. R. R., Ferreira, D. B. d. S., Senna, R. C., de Sousa, A. M. L., Carmo, A. M. C. d., Silva, J. d. A., Jr., de Souza, F. G., & de Souza, E. B. (2025). River Stage Variability and Extremes in the Itacaiúnas Basin in the Eastern Amazon: Machine Learning-Based Modeling. Hydrology, 12(5), 115. https://doi.org/10.3390/hydrology12050115

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

River Stage Variability and Extremes in the Itacaiúnas Basin in the Eastern Amazon: Machine Learning-Based Modeling

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area and Databases

2.2. Statistical and Computational Methods

2.2.1. Spearman Correlation

2.2.2. Data Normalization

2.2.3. Artificial Neural Networks (ANNs)

2.2.4. Support Vector Machine (SVM)

2.2.5. Setup of Computational Simulations and Performance Analysis

3. Results

3.1. Correlation Analysis and Objective Selection of Independent Variables

3.2. Observations and Simulations over the Whole Period, Including Extreme Hydrological Years with Natural Disasters

3.3. Analysis of Seasonal Hydrological Regimes

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI