Next Article in Journal
EA-StrongSORT: An Efficient Attention StrongSORT Framework for Detection-Based Tumor Tracking in Cine-MRI TrackRAD2025 Dataset
Previous Article in Journal
Decoupling Privacy Noise from Optimization in Transformer Forecasting
Previous Article in Special Issue
Soft-Prompted Semantic Normalization for Unsupervised Analysis of the Scientific Literature
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Generative Artificial Intelligence and Probabilistic Trees for the Linguistic Data Summarization in Wave Energy Decision-Making

by
Iliana Pérez Pupo
1,2,3,4,*,
Luis Segundo Alvarado Acuña
1,4,
Pedro Y. Piñero Pérez
1,2,3,4,
Raykenler Yzquierdo Herrera
3 and
Maikel Yelandi Leyva Vázquez
5
1
IADES Alliance, Artificial Intelligence for Sustainable Development, Antofagasta 1270709, Chile
2
IADES, Sociedad Mercantil, Havana 10400, Cuba
3
Centro de Investigación Desarrollo e Innovación en Inteligencia Artificial, Universidad Central del Este, Punta Cana 23000, Dominican Republic
4
Departamento de Gestión de la Construcción, Facultad de Ciencias de la Ingeniería en la Construcción, Consorcio HEUMA, Universidad Católica del Norte, Antofagasta 1270709, Chile
5
Facultad de Ciencias Matemáticas y Físicas, Universidad de Guayaquil, Guayaquil 090514, Ecuador
*
Author to whom correspondence should be addressed.
Mach. Learn. Knowl. Extr. 2026, 8(6), 157; https://doi.org/10.3390/make8060157 (registering DOI)
Submission received: 11 April 2026 / Revised: 25 May 2026 / Accepted: 1 June 2026 / Published: 9 June 2026

Abstract

This paper presents a hybrid model that combines linguistic data summarization techniques, algorithms for constructing probabilistic trees, and various generative artificial intelligence models for learning and generating linguistic summaries to aid decision-making. The proposal is validated using methodological triangulation techniques that demonstrate high consistency in the knowledge discovered. The proposal also compares different generative artificial intelligence models; among the evaluated models, Gemini achieved the best performance. However, it is evident that, in certain contexts and tasks, small language models can be effective, yielding results comparable to large language models (LLMs) at a lower computational cost. This study applies the algorithms in a case study analyzing oceanographic data from Northern Chile. In the validation scenario, the combination of linguistic data summarization methods with unsupervised learning techniques effectively models human tolerance for imprecision when processing complex data and generated linguistic summaries easily interpretable by human decision-makers with high levels of confidence. Studies of energy capacities in the studied region and their behavior in both winter and summer are presented.

1. Introduction

Decision-making in complex domains is increasingly challenged by growing data volumes and inherent uncertainty. Modern information systems generate vast amounts of numerical and categorical data. However, these representations are often difficult for humans to interpret, leading to “data-rich but information-poor” environments [1]. In this context, methods capable of transforming raw data into interpretable, reliable, and uncertainty-aware knowledge representations are essential.
One scenario where this particular situation arises is in the analysis and decision-making related to ocean behavior. In this context, the analysis of variables that allow for the evaluation of wave or ocean current behavior is particularly interesting. For example, the analysis of the variables ugos_mean (absolute zonal velocity) and vgos_mean (absolute meridional velocity) is directly related to the kinetic energy of ocean currents and can be used to estimate tidal energy potential. Furthermore, the study of ocean and sea behavior also impacts the analysis of effects associated with climate change.
Specifically, in this research scenario, a structured database is available that compiles satellite information from all the oceans and seas of our planet. From the global dataset, only the region corresponding to Northern Chile (approximately 156,400 km2) was selected. With an oceanographic spatial resolution of approximately 27 sampling points per km2, the sub-dataset includes 5793 spatial points. By combining these points with a temporal coverage of 365 days, six variables, and an 18-year observation period, a large-scale dataset was generated. This results in 228,360,060 data points per tree, which are analyzed independently. The following research problem then arises:
  • How can we process these volumes of information to aid decision-making in this context?
To answer this question, the authors of the research propose the following objectives:
  • To develop a hybrid method that combines: linguistic data summarization techniques, probabilistic tree learning, and the inference capabilities of generative artificial intelligence to facilitate the analysis of large volumes of structured data to support decision-making.
  • Validate the proposal through its application in a case study related to the analysis of oceanographic data in the Norte Grande region of Chile.
The potential of the selected techniques to address the problem is explained below.
Linguistic data summarization (LDS) [2,3,4,5,6] is a technique within computational intelligence that encompasses algorithms for learning linguistic summaries in natural language to summarize the behavior of large volumes of structured data. Its main contribution lies in generating summaries that are both semantically meaningful and cognitively accessible, facilitating decision-making even for non-expert users [7,8,9,10,11]. This approach has been extensively studied within the fuzzy logic and computational intelligence communities, where protoforms and linguistic quantifiers serve as a central mechanism for representing the generalized behavior of data [12,13,14,15]. For example, applying these techniques, from the analysis of structured data associated with ocean behavior, summaries such as the following could be obtained:
“Approximately half of the records in the database report that 44.93% of the time, the winter periods of 2004 with a very low zonal anomaly (ugosa_mean) have a high meridional component (vgos_mean)”.
The interpretation of this summary by experts means that approximately half of the analyzed data shows that, during the winters of 2004, when the zonal (east–west) current or wind is practically zero, simultaneously, the meridional (north–south) component is high. This condition could negatively affect the introduction of energy generation technologies such as turbines that depend on constant and preferably unidirectional water flows. If the zonal component (main flow direction in many coastal regions) is zero and there is only a high meridional component, the turbines could become misaligned, drastically reducing their efficiency or even stopping their generation.
The probabilistic graphical models in this research provide the capacity for the explicit modeling of dependencies between variables. Their ability to discover pairwise dependencies makes them suitable as an intermediate layer to reduce the search space and guide the construction of candidate summaries. These algorithms have been previously combined with linguistic summarization techniques in earlier work by the author of this research [7]. However, dependency trees produce mathematically sound representations that are not inherently human-interpretable. This limits their direct use in decision-making, thereby motivating the use of Linguistic Data Summarization and Generative Artificial Intelligence (GenAI) to translate these outputs into natural language descriptions, suitable for decision-making.
Finally, in the context of this research, the artifacts and principles of linguistic data summarization (LDS) act as an intermediate and consistent knowledge layer, compatible with human information, which can be integrated into generative models through mechanisms such as augmented retrieval generation and modular knowledge adapters, thus strengthening decision-making processes under conditions of uncertainty.
The remainder of the article is organized as follows. Section 2 presents an analysis of the use of probabilistic networks in the linguistic summarization of data, situating the proposal within the state of the art and highlighting the limitations of existing approaches. Section 3 describes the proposed algorithmic framework based on the integration of probabilistic tress, linguistic data summarization techniques, and GenAI models. Section 4 presents the experimental evaluation through a case study with real oceanographic data, along with a comparative discussion of the results obtained with different generative models. Finally, Section 5 summarizes the main conclusions and proposes avenues for future research.

2. Materials and Methods

This section is organized as follows. Section 2.1 presents the notation used throughout the research, facilitating the understanding of the algorithms. Section 2.2 presents the proposed algorithms.

2.1. Concepts and Mathematical Notation Relevant to the Proposed Algorithms

For a better understanding of the algorithms and methods proposed in the research, the mathematical notation used in the research is presented below. These elements are common in publications on linguistic data summarization [7,8,9,10]. A linguistic summary is generally represented by a grammatical structure called a “protoform” with the following structure: QRy’s are S, as shown below:
  • QRy’s are S: Linguistic summaries composed of a quantifier “Q”, a set of filters “R”, and summarizers “S” that describe the behavior of objects “y”. For example:
    Qy’s are S: Unfiltered linguistic summaries in which the statement has the form “Q objects are S”. For example: “Most workers are punctual”, where Q = “Most”, S = “are punctual”, and they characterize “y” = “workers”.
    QRy’s are S: Filtered linguistic summaries in which the statement has the form “Q objects R are S”, where R can be formed by one or more attributes that qualify the object. Example: “Most young workers are unpunctual”. In this second example, similar to the previous one, R = “young” is incorporated, which characterizes or filters the set of analyzed objects.
  • Q: A quantifier is a fuzzy set with the universe of discourse in the interval [0, 1] expressing a quantity, for example, “most”, “60%”, or “more than half”.
  • R: A qualifier or filter, which is another attribute that determines a fuzzy subset of the object yi, for example, “young” for the attribute “age”.
  • S: A summarizer is an attribute with a linguistic value (fuzzy predicate) defined in the domain of attribute Aj, for example, “low salary” for the attribute “salary”.
Linguistic summaries learned through the application of linguistic data summarization techniques constitute learning objects that can be consulted through the independent analysis of each of their parts. In this sense, different protoforms are established for the construction of linguistic summaries and the retrieval of information from them.
The structures of linguistic summaries have been conceptualized as protoforms, as shown in Table 1. This table presents a categorization of the protoforms, indicating in the “Protoform” column the grammatical structure of the summary; in the “Known” column, the knowledge available at a given time or about the objects of study; and in the “Doubt” column, the question or information sought.
The following examples illustrate different situations associated with the protoforms in Table 1:
The term SStructure means that the variables that make up the summary are known; while SValue denotes that the value of the summarizer is unknown.
Protoform 0: Corresponds to the structure QRy’s are S. It has the lowest level of abstraction, as it assumes that all the elements that make up the summary are known. What we want to know is their truth value T.
Protoform 1: Has a Qy’s are S structure, the summarizer (S) is known, and we want to discover the quantifier (Q). Protoforms 1 and 2 can generate summaries from SQL statements or their various extensions [10].
Protoform 2: Has a QRy’s are S structure, the summarizer (S) and the filter (R) are known, and we want to discover the quantifier (Q). As with protoforms 0 and 1, these summaries can be obtained through database queries.
Protoform 3: Has a Qy’s are S structure, and the quantifier (Q) and the summary structure are known; the goal is to discover the summarizer (S).
Protoform 4: Has a QRy’s are S structure, and the quantifier (Q) and the summary structure are known; the goal is to discover the summarizer (S) and the filter (R).
Protoform 5: Has a QRy’s are S structure. This is the highest level of abstraction, as no elements are known, and therefore, the goal is to discover everything. These are the most complex summaries and are the main focus of this work.
The following notations will be applied in the process of generating the summaries:
  • U: Database used in this research to represent the oceanographic dataset.
  • Y = {y1, …, yn}: Set of objects (records) available used in the investigation.
  • A = {A1, …, Am}: Set of attributes that describe objects. In our case, these constitute the variables of the oceanographic study.
  • ALV: Set of linguistic variables (see Definition 2) that describe the attributes A.
  • D = {[ALV1(y1), ALV2(y1), …, ALV m(y1)], …, [ALV 1(yn), ALV 2(yn), …, ALV m(yn)]}: Dataset resulting from the fuzzification process of universe U.
  • Aj(yi) ∈ D: Denotes the value of the attribute Aj for the object yi, for example, “young” for the attribute “age”. In this research, for example, the value “Tall” represents the linguistic value of a measurement of one of the variables such as vgos_mean.
Let Y = {y1, …, yn}, a database of objects (e.g., “oceanographic measurement”) described by attributes A (e.g., “vgos_mean”), which can take values X = {x1, x2, …, xm}, for example, {−0.2, …, 0.2}. For a set of m attributes, A = {A1, …, Am}, such that Aj(yi). Then, di = Aj(yi) represents the value of the attribute Aj for the object yi that takes values in the domain of the set Xj.
Regarding ways to evaluate linguistic summaries.
Let summarizer S = {S1, S2, …, Sm} and be represented by various attributes Si; then, μ S ( y i ) , i = 1, 2, …, n, μ S represents the certainty grade of the summarizer and can be defined as μ S y i =   minj∈{1, 2, …, m}[μSj(Aj(yi))].
If the summary contains filters, it is defined as:
  • μ S y i =  minj∈{1, 2, …, m}[μSj(Vj(yi)) ∧ μRg(Vg(yg))], where “∧” is a t-norm, and μRg(Ag(yg)) represents the grade of the certainty of the summary filters.
As part of the notations and elements relevant to the understanding of the proposal, a set of quality indicators of linguistic summaries are presented [7,9,10,12] and are useful for the subsequent processing of the generated knowledge.
T1: Degree of Truth: Determines the validity of a summary, a criterion introduced by Yager [8]. Equation (1) shows how to calculate this indicator for unfiltered summaries, and Equation (2) explains in detail the calculation of indicator T1 with filters R.
T Q y s   a r e   S = μ Q 1 n i = 1 n μ S ( y i )
T Q R y s   a r e   S = μ Q r
μ Q 1 n i = 1 n μ S ( y i )
r = i = 1 n μ R ( y i ) μ S ( y i ) i = 1 n μ R ( y i )
μ Q x = 1                   p a r a   x 0.8           2 x 0.6     p a r a   0.3 < x < 0.8 0                 p a r a   x 0.3            
where μ R y i is the degree of membership of the filter of the summary.
  • μ S y i is the membership degree of the aggregator of a summary.
  • ∩ represents a t-norm, which could be the “minimum” operation or a product.
  • n is the number of objects in the database.
  • µQ[r] is the membership function that represents the linguistic quantifier Q [9,10,12].
T2: Degree of Imprecision: This is an obvious and important validity criterion [9,10,12], since a highly imprecise linguistic summary with a high Degree of Truth is not very useful. For example, the summary “on almost every winter day, the temperature is quite cold” has a very high truth value (T1 would tend towards 1); however, it is very imprecise, as it is not useful or does not generate valuable information. Equations (6) and (7) are used to calculate this indicator. As can be seen in Equation (7), this indicator depends on the summary summarizer and not on the database; that is, to calculate it, the records are not traversed, since only the cardinality of each linguistic set of the summarizer, as well as its domain, is considered.
T 2 = 1 j = 1 , , m i n S j m
i n S j = c a r d   x X j :   μ S j x > 0 c a r d ( X j )
where m is the number of fuzzy sets in the summarizer S = {S1, S2, … Sj, …, Sm}. c a r d denotes the cardinality of the corresponding fuzzy set.
  • X j is the domain, representing all finite values in the fuzzy set of the summarizer Sj.
T3 Coverage Degree: Indicates how many objects in the database that meet the filter R are covered by the summary, whose summarizer is S. Its interpretation is simple; for example, if the value were 0.15, then it means that 15% of the objects are consistent with the summary in question. The value of this indicator clearly depends on the content of the database [9,10,12] and is calculated as shown in the following equations: Equations (8)–(10).
T 3 = i = 1 n t i i = 1 n h i
t i = 1       i f     μ S y i > 0     a n d   μ R y i > 0 0       i n   a n o t h e r   c a s e                            
h i = 1         i f   μ R y i > 0 0       i n   a n o t h e r   c a s e  
This indicator is based on the conditional probability of the summarizers appearing given that the filters take place and on the concept of a support set discussed in the fuzzy logic.
T4: Degree of Suitability or Appropriateness: This is the most relevant indicator according to [9,10,12], see Equations (11)–(13). Let us assume that the summary contains a description (fuzzy set) S = (S1, S2, …, Sm), which is partitioned into m summarizers composed of attributes A1, A2, …, Am, such that each summarizer corresponds to a fuzzy set; then, it is denoted as
S j y i = μ S j ( A j ( y i ) )
T 4 = a b s ( j = 1 , , m r j T 3 )
r j = i = 1 n h i n j = 1 , , , n
h i = 1         i f         S j y i > 0 0       i n   a n o t h e r   c a s e
T5: Length of a Summary: This is an important indicator, because the longer the summary, the harder it is to understand, see Equation (15):
T 5 =   2 0.5 c a r d   ( S )
T6: Strength of Discovered Dependencies [7,12]: Measures the algorithms’ ability to detect summaries with strong filter–summarizer relationships. This indicator is measured by analyzing the frequency of occurrence of quantifiers that indicate strong dependency relationships, such as “most”, “almost all”, and “many”. The following steps were applied to calculate this indicator:
Step 1:
For each database, the relative frequency of occurrence of each quantifier in the linguistic summaries generated by each algorithm is calculated.
Step 2:
For each database, the weighted sum of the relative frequencies is calculated using the following weights: 0 for the quantifiers “very few”, “few”, and “some”, while the quantifiers “approximately half”, “many”, “most”, and “almost all” were assigned weights of 0.08, 0.1, 0.4, and 0.42, respectively.
T7: Evaluation Integrated (CWW) (bold): An integrated summary of T1–T6, calculated as an average using computation with words (CWW) via a two-tuple method. It integrates all previous indicators into a single value to globally assess the quality of the summaries or algorithms. Use it as the final metric for ranking [7,12], see Equation (16):
T 7 =   i = 1 k w i T i
There are also other relevant notations for algorithms.
Definition 1.
Controlled natural language (CNL) is a constructed language based on a specific natural language (e.g., Spanish, English, Arabic, Japanese, etc.). It is more restrictive in terms of vocabulary, syntax, and/or semantics while preserving most of its natural properties, so that speakers of the base language can intuitively and accurately understand texts in the controlled natural language, at least to a substantial degree. It includes a CNLGrammar and a CNLDictionary with simple phrases that describe the variables and attributes of the problem at hand.
Definition 2.
A linguistic variable is defined by a quintuple (x, T(x), X, G, M), where x is the name of the variable, T(x) is the set of linguistic terms, X is the universe of discourse, M is a semantic rule that associates each linguistic value Z with its meaning M(Z) and where M(Z) denotes a fuzzy set in X, and G is the set of syntactic rules for generating compound terms from the atomic terms that make up the sentences that give rise to each linguistic value. An example of linguistic variables (LVs) for the variable “vgos” is shown below.
  • X is the universe of discourse defined on the interval [0, 1].
  • T(X) = {Extreme Low, Very Low, Low, Medium, High, Very High, Extreme High}.
  • M(Z) represents the fuzzy sets represented in this proposal by triangular membership functions:
    Extreme Low (−0.8, −0.8, −0.5);
    Very Low (−0.75, −0.5, −0.25)
    Low (−0.5, −0.25, 0);
    Medium (−0.25, 0, 0.25);
    High (0, 0.25, 0.5);
    Very High (0.25, 0.5, 0.75);
    Extreme High (0.5, 0.8, 0.8).
  • G is the syntactic rule that describes the relationship between the fuzzy sets, which can be represented by triangular functions and their overlaps.

2.2. Proposed Method

There is a new method for linguistic data summarization combining GenAI and probabilistic trees.
This section presents the proposed hybrid framework for linguistic data summarization (LDS), which integrates probabilistic graphical models with GenAI to produce interpretable and uncertainty-aware linguistic summaries.
The framework operates in three main stages:
  • First, a probabilistic model is learned from the input dataset using graph-based learning techniques.
  • Second, candidate linguistic summaries are generated from the structure of the probabilistic trees. Each tree defines a set of relationships where the root node is interpreted as the summarizing attribute and the connected nodes as filtering conditions. This transformation reduces the search space and ensures that only statistically relevant relationships are considered.
  • Third, GenAI models are used to transform candidate summaries into natural language expressions. This process is guided by a controlled natural language (CNL) grammar based on linguistic protoforms, ensuring that the generated summaries are semantically consistent, interpretable, and aligned with human reasoning patterns.
The overall workflow of the method integrates statistical learning and language generation into a unified pipeline, enabling the construction of linguistic summaries that are both mathematically grounded and cognitively meaningful. Additionally, the framework incorporates quality evaluation metrics (T1–T7) to assess the validity, precision, coverage, and interpretability of the generated summaries; these metrics are presented in Section 2.1.
This hybrid approach allows the system to leverage the strengths of probabilistic modeling for knowledge extraction and GenAI for natural language production, resulting in a scalable and robust solution for decision-making under uncertainty.

2.3. Algorithm Description

The proposed algorithm uses probabilistic graph learning techniques to discover relationships between variables [11]. From the graph learning, a set of probabilistic trees is generated that represent the different relationships between the variables (Algorithm 1).
Algorithm 1. Generation of linguistic summaries combining probabilistic models and GenAI
  Input
    Dataset U = {x1, x2, …, xn}, where each xi is an object (row) described by p attributes, and n=∣U∣ is the total number of objects.
    Controlled natural language definition CNL = (CNLGrammar), where CNLGrammar denotes the grammar of the linguistic summaries, contains simple phrases that describe the variables and attributes of the problem, see Section 2.1.
    List of indicators T = (T1, T2, …, T7) to quality assessment of linguistic summaries, (detailed information about indicators in Section 2.1).
    Output:
Set of linguistic summaries S
    Quality evaluation scores Q(S)
  Begin
Step 1.
Candidates = ∅
Step 2.
D = build_fuzzy_dataset(U) // Algorithm 2
Step 3.
prob_graph = build_probabilistic_graph(D)
Step 4.
For each of the prob_graphi trees in prob_graph do
Step 4.1.
If prob_graphi has more than one vertex
Step 4.2.
  candidatesi = do_candidate_from_branches (prob_graphi)
Step 4.3.
  Candidates ← candidate_summariesi
  End of the conditional statement started in step 4.1
End of the cycle started in step 4
Step 5.
Summaries = agent_generate_summaries(U, Candidates, CNL)
Step 6.
calculate_T(U, Summaries, T)
Step 7.
return Summaries
  End
Essentially, the algorithm in step 2 of Algorithm 1 (build_fuzzy_dataset) results in a dataset (U) that is received as input. See Algorithm 2 build_fuzzy_dataset(U).
Algorithm 2. build_fuzzy_dataset(U)
  Input
    U: dataset
  Begin
Step 1.
U’ = clean_DataSet (U)
Step 2.
ALV = ∅
Step 3.
For each numeric attribute A in U do
Step 4.
LVi = discretize_build_linguistic_variables (U’) //Algorithm 3
Step 5.
ALV = ALVLVi // obtain the linguistic variables for each attribute Definition 2
Step 6.
D = {[ALV1(y1), ALV2(y1), …, ALVm(y1)], …, [ALV1(yn), ALV2(yn), …, ALVm(yn)]} data set obtained from the fuzzification of set U’.
// Transformed each numeric point in U’ into a linguistic label by applying the maximum membership principle, considering the membership of each data point in ALV sets of linguistic variables that represents the attribute in question.
Step 7.
return D
  End
In the first step of Algorithm 2, the preprocessing of dataset U is performed, and the dataset is cleaned to reduce information uncertainty. The output is dataset U’.
In the context of this research, uncertainty refers to the lack of certainty or precise knowledge about a phenomenon, process, or prediction that can affect the decision-making process [9,16]. It can be caused by different factors, including:
  • Data noise: This can be caused by errors in measurement processes, including calibration errors associated with measurement technologies or equipment, distorting the actual signal being analyzed. In this particular study, there is noise present in the satellite data used in the experiment. However, this is acceptable given the proposed objectives associated with the macro-analysis of the area; an excessively high level of precision is not required.
  • Vagueness of concepts: This prevents the definition of clear boundaries between linguistic categories (e.g., “high”, “low”, and “acceptable”). These terms depend on the context and even on the personal preferences of those evaluating them, introducing subjectivity into the interpretation.
  • Data incompleteness: This reflects the absence of relevant information, whether due to omission, lack of measurement, or the inability to record all necessary variables. This leads to conclusions that are always provisional and subject to revision.
  • Inconsistencies: This refers to internal contradictions in the available information, such as when two reliable sources offer opposing data. In real-world settings, these inconsistencies are frequent and add an additional layer of uncertainty.
The data are verified to identify difficulties such as data incompleteness, inconsistencies, or outliers. The technique of removing records from set U that could affect data quality for any of these three reasons is applied. However, it is acknowledged that the data used, available in [17], is of high quality. Removing the problematic records does not significantly affect the sample given the amount of data available.
In the second step of Algorithm 2, the data are discretized, and fuzzy sets are constructed for each cluster. In this case, the following strategies can be applied in the proposed approach:
  • Follow a simple attribute discretization process, using the following strategy: constructing intervals of equal size or intervals with equal frequency. The advantages of this approach are its low computational complexity and the fact that it does not require experts. A disadvantage is that the constructed intervals may not accurately represent the natural grouping of the data.
  • Another alternative is to carry out an attribute-level clustering process to construct the intervals. The main advantages of this method are that it does not require experts and that the discovered intervals better represent the natural grouping of the data (see Algorithm 3). In this case, we will work with seven clusters for each attribute. The selection of 7 is due to the fact that the number seven is recommended in the construction of valid linguistic structures. This is because it guarantees a balance between the level of granularity and the level of semantic coherence and comprehension by human experts [7,9,10].
Algorithm 3. Discretize_build_linguistic_variables (A, U’)
    // The algorithm is applicable to numerical data, applicable in the proposed scenario
  Input
    U’: cleaned dataset.
    K = 7
  Begin
Step 1.
V = Retrieve the values of the records associated with set A
Step 2.
Randomly select K points of V as the initial centroids (centers of the clusters C)
Step 3.
Calculate the Euclidean distance between each data point and each centroid.
Step 4.
Assign each point to the nearest centroid.
Step 5.
Recalculate the position of each centroid by taking the average of all the points assigned to that cluster.
Step 6.
Repeat steps 2 and 3 until the centroids no longer change significantly or a maximum number of iterations is reached.
Step 7.
C is the sets of clusters obtained from step 2 to step 6.
Step 8.
LV =. Construct the linguistic variable associated with the attribute; each resulting cluster in C is constituted as a fuzzy set. See Definition 2.
Step 9.
Assign each fuzzy set a semantic meaning expressed in a linguistic label in the set
LBTL = {Extreme Low, Very Low, Low, Medium, High, Very High, Extreme High}. The cluster with the lowest values will have the linguistic label Extreme Low associated with it, and so on successively.
Step 10.
return LV
  End
Then, in step 3 of Algorithm 1, the dataset is used to learn the probabilistic model that best approximates the behavior of the data [18,19]. The following presents different alternative algorithms that can be used due to their capabilities for learning probabilistic graphs from data:
  • The Chow Liu algorithm, was initially proposed for constructing trees from data [20,21].
  • The Rebane–Pearl algorithm [22,23] extends the Chow Liu algorithm and enables the learning of polytrees, allowing for the description of higher-order interactions.
  • The Polytree Approximation Algorithm (PA) [24] bases its learning on the calculation of marginal and conditional Mutual Information.
  • The Learning Polytree Algorithm (LPA) [24] calculates marginal and conditional dependencies using computational methods based on the concept of entropy.
The algorithms in step 3, which learn the probabilistic models, generate a polytree. Figure 1 represents an example of one polytree generated.
Then, in step 4 of Algorithm 1, a candidate’s objects are generated for each tree, such that the leaf node is identified as the summarizing attribute and the nodes on the branches are identified as filters.
In step 5 of Algorithm 1, candidate summaries are generated from the list of candidate objects by transforming each object into a summary object whose attributes are filters and summarizing objects. This step involves agents supported by the GenAI algorithm “agent_generate_summaries” that construct linguistic summaries from pre-constructed candidate summaries.
The fundamental contribution of generative artificial intelligence in this step is its ability to generate content in natural language, the explicability of the models, and its multilingual approach [25,26,27,28]. Essentially, prompting algorithm techniques are applied, and the desired structure of the summaries is precisely designed by specifying a grammar in a natural language. Then, these prompts, specially designed for generating summaries from the submitted summaries, are executed using LLMs or SLMs, and the final summaries are obtained. The strengths of multilingual processing, incorporated into the models, enhance the use of algorithms proposed by researchers and professionals from different countries and languages.
The following techniques are used to carry out this task:
  • Prompt algorithms are used to exploit the capabilities of different GenAI models. The experimental design of this work includes an analysis of the effectiveness of different models (SLMs and LLMs). The objective is to achieve the execution of the algorithms in both cloud and on-premises or offline environments.
  • Furthermore, a grammar based on a controlled natural language is designed, structured according to the protoforms of linguistic data summaries, explained in Section 2.1 [8,9,10]. To achieve this objective, the following elements are established:
    • It is further established that linguistic summaries will comply with the following structures or protoforms, “Qy’s are S” and “QRy’s are S”, see Section 2.1
    • A controlled natural language (CNL) is constructed for the application context of the proposal. See Definition 1 of the CNL in Section 2.1. In this way, linguistic summaries can be constructed that speakers of the base language can understand intuitively and correctly. From a practical standpoint, a CNL is defined by a grammar that establishes the syntax and a dictionary that describes the lexicon and semantics [1].
Below is one of the English-language templates used as linguistic molds to ensure that the generated summaries follow a consistent, semantically valid format and are aligned with the proposed protoform theory. Other similar templates for the rest of the languages are presented in [1].
  • <Linguistic Summary>::= <D> <data descriptor connector> <Q> <quantifier connector> <y’> <filter connector> <R> <summary connector> <S> |<Q> <y’><filter connector><R><summary connector><S>|<Q><y’><summary connector> <S>
  • <D>::= “almost all”|”most”|”many”|”around a half of”|”some”|”few”|”very few”
  • <data descriptor connector>::= “records show, that”
  • <quantifier connector>::= “of times”,
  • <filter connector>::= “with”
  • <summary connector>::= “have”
  • <y’>::= <subject>
  • <subject>::= <simple phrase>
  • <Q>::= <quantifier linguistic>|<numeric quantifier>|<mixed quantifier> | <percent quantifier>
  • <linguistic quantifier>::= “almost all”|”most”|”many”|”around a half of”|”some”| “few”|”very few”
  • <numeric quantifier>::= “more than 95% of” | “around 85% of” | “around 75% of” | “approximately 50%” | “close to 33%” | “less than 17%” | “less than 5%”
  • <mixed quantifier>::= “very few (less than 5%)”|”few (around of 15%)”|”some of (close to 33%)”|”around a half of”|”many (close to 65%)”|”most of (around of 83%)”| “almost all of”
  • <percent quantifier>::= <percent connective> <percent numeric value>
  • <percent connective>::= “in”
  • <S>::= <phrase>
  • <R>::= <phrase>
  • <phrase>::= <phrase> <logical operator> <simple phrase> | <simple phrase>
  • <logical operator>::= <conjunction> | <disjunction>
In the experimentation with the AI models, the following hyperparameters were established to help control the models’ hallucination:
  • Temperature = 0.0;
  • top_p = 0.95;
  • top_k = 40 (Gemini);
  • max_output_tokens (Gemini) = 8192;
  • max_completion_tokens (GPT-5 and GPT-5 Mini) = 16,384;
  • response_mime_type (Gemini) = “application/json”;
  • response_format (GPT-5 and GPT-5 Mini) = {“type”: “json_object”};
  • stop_sequences (Gemini) = not used;
  • presence_penalty (GPT-5 and GPT-5 Mini) = 0.0;
  • frequency_penalty (GPT-5 and GPT-5 Mini) = 0.0;
  • stop (GPT-5 and GPT-5 Mini) = null;
  • Seed = 42;
  • logit_bias (GPT-5 and GPT-5 Mini) = not applied;
  • logprobs = true;
  • top_logprobs (GPT-5 and GPT-5 Mini) = 5;
  • reasoning_effort (GPT-5) = “high”;
  • safety_settings (Gemini) = BLOCK_NONE.
Based on the definition of the above elements, an intelligent agent, specialized in constructing linguistic summaries, was built. It was established that this agent would be supported by a GenAI model and the elements declared above.
Next, in step 6, the quality indicators for each linguistic summary are calculated. The following indicators, Degree of Truth (T1), Degree of Imprecision (T2), Degree of Coverage (T3), Degree of Suitability (T4) or Appropriateness, Length (T5), Strength of Discovered Dependencies (T6), and Evaluation Integrated (CWW) (T7), are explained in Section 2.1. Essentially, it assesses how the linguistic summary in question covers or represents the objects of dataset D processed from step 2 of the algorithm.
The following section presents the validation results of the proposed algorithm. The proposed algorithm is applied to the analysis of kinetic energy along the coasts of Northern Chile.

2.4. Implementation Details and Reproducibility

All experiments were implemented in Python 3.14. The data processing and analysis pipeline was developed using the libraries Xarray, Pandas, and Streamlit, all in versions compatible with Python 3.14.
The construction of probabilistic models was based on the Chow Liu algorithm, applied over oceanographic variables obtained from the Copernicus Marine Service dataset [17] (last accessed on 29 January 2026). The dataset includes Sea Level Anomaly (SLA), Absolute Dynamic Topography (ADT), and geostrophic velocities (absolute and anomalies) derived from multi-mission satellite altimetry using optimal interpolation techniques.
Given the high quality of the Copernicus data, preprocessing was minimal and consisted primarily of data structuring using Xarray. Standardization and normalization procedures were considered and partially applied depending on the variable, although no aggressive cleaning or filtering was required. No pruning strategies were applied to the probabilistic trees.
The generation of linguistic summaries was performed using GenAI models. Small models were deployed locally using OpenWebUI v0.6, while larger models were accessed externally. Default model configurations were used, including temperature and maximum token parameters, as defined by each model and the OpenWebUI framework.
Each experiment was repeated 11 times to ensure the stability and consistency of the results. Statistical analyses were conducted using PSPP, applying non-parametric tests (Friedman and Wilcoxon) with Holm–Bonferroni correction, and had a significance level of α = 0.05.
All experiments were executed on a local computing environment with the following specifications: 13th Gen Intel® Core™ i5-13420H CPU (2.10 GHz), 16 GB RAM, GPU with 6 GB VRAM, and 477 GB of storage, running a 64-bit operating system.
The probabilistic trees, generated linguistic summaries, and experimental outputs are available and can be shared via Google Drive upon reasonable request to facilitate reproducibility.
Figure 2 illustrates the overall architecture of the proposed framework. This figure represents, in IDEF0 format, all the processes that occur, including their inputs, outputs, and mechanisms. The diagram more clearly illustrates the following elements that contribute to the reproducibility of the proposal:
  • It is identified that, in the case of the linguistic summary generation process, the combined use of generative AI models with controlled natural language grammars is key. In this case, the combination is achieved through the use of best practices in prompting algorithms.
  • An important element is the evaluation of the quality of the generated linguistic summaries using indicators T1 through T7.
  • Later in the experimentation process, to compare the effectiveness and efficiency of different generative AI models, linguistic summaries are generated for each model and evaluated using the proposed indicators T1 through T7.
  • Finally, the generated summaries are evaluated by human externalists following the Human-in-the-Loop principles for final decision-making.

3. Results

The work was validated through its application in oceanographic studies with a direct impact on a northern region of Chile. Specifically, it was used in the exploratory study of the region and the analysis of the installation capacity for tidal energy generation technologies. This research is essential for aiding in decision-making about costly investment projects in tidal and wave energy technologies.
The validation of the proposed method was conducted using information from the European Union’s Copernicus Marine Service [17] from 23 years of daily recorded data; the variables included in the study are:
  • ‘ugos_mean’ (mean zonal absolute geostrophic velocity);
  • ‘vgos_mean’ (mean meridional absolute geostrophic velocity).
Description: Indicates the direction and magnitude of permanent ocean currents influenced by the Earth’s rotation (Coriolis effect) and pressure gradients. They are essential for understanding the transport of mass, heat, and properties on a large scale in the ocean.
  • ‘ugosa_mean’ (mean zonal geostrophic velocity anomaly);
  • ‘vgosa_mean’ (mean meridional geostrophic velocity anomaly).
Description: These represent the deviations of geostrophic velocities from the climatological mean. Their analysis allows us to track the evolution of these transient phenomena and understand their role in the redistribution of heat, salinity, nutrients, and organisms.
  • ‘adt_mean’ (Absolute Dynamic Mean Level).
Description: Represents the height of the sea surface with respect to a reference geoid (an equipotential surface of the Earth’s gravity field). It is fundamental to understanding the overall transport of water masses.
  • ‘sla_mean’ (Mean Sea Level Anomaly):
Description: Represents the temporal variation in sea level with respect to a reference mean level (Topographic Dynamic Mean, ‘TDM’).
To validate the proposal, methodological triangulation techniques are applied, and the following experiments are designed:
Experiment 1: Statistical tests comparing the performance of GenAI agents used in step 4 of the proposed algorithm.
Experiment 2: Demonstration of the applicability of the results in a case study, using a single case related to the study of the coasts of the Norte Grande region of Chile.
Experiment 3: Decision-making support, analysis of Geostrophic Kinetic Energy (KKE) based on linguistic summaries and the variables studied.

3.1. Experiment 1 Statistical Tests Comparing the Performance of GenAI Models Used in the Application of the Proposal

In step 4 of Algorithm 1, linguistic summaries are generated by combining different techniques and leveraging the capabilities of Generative AI models.
To identify the most suitable generative AI models, two large models were selected, GPT5 (LLM) and Gemini Flash 2.5 (LLM), and one small model, GPT 5 Mini (SLM). The selection of these large models is based on their proven effectiveness in different scenarios. The selection of the small model is justified by identifying that it belongs to the same family as one of the large models, thus reducing the impact of external variables while enhancing the small model and the usability of the proposed solution in on-premise environments or with limited computational resources.
The experiment consists of taking the candidate summaries generated in step 3 of Algorithm 1 and independently generating the final summaries with each model. Then each group of summaries generated with each model is evaluated using the indicators from T1, T2, T3, T4, T5, T6, and T7, and the effectiveness of the analyzed models is evaluated in this way.
Finally, the models were compared based on the quality of the summaries, measured using indicators.

3.1.1. Validation of the Framework Associated with the Generative AI Models Used

Normality tests and descriptive analyses of variables was performed. The Shapiro–Wilk test was applied to each sample, as shown in Table 2.
The variables do not follow a normal distribution (p ≤ 0.05). Therefore, the Friedman test is applied to these related samples.
The Friedman test results indicate statistically significant differences χ2 ≈ 55, df = 2, and p < 0.001.
This indicates significant differences in the models’ responses. The calculated effect size, Kendall’s W ≈ 0.77, indicates high agreement in the ranking of the models among the observations. Based on the Friedman analysis, post hoc analysis is necessary. Pairwise comparisons are performed using the Wilcoxon test, following the descending order of the means, and applying the Holm–Bonferroni correction (Table 3).
The analysis indicates that Gemini Flash 2.5 is significantly superior to both GPT models. Furthermore, no significant differences were found between GPT5 and GPT5 Mini.

3.1.2. Comparison of the Models with Respect to Variable T2

Normality tests and descriptive analyses of the variables were performed. The Shapiro–Wilk test was applied to each sample, as shown in Table 4.
None of the variables meet the normality assumption (discrete/degenerate distributions). Parametric tests are discarded, and non-parametric tests for related samples are used. The Friedman test is applied to compare the three models with related samples.
Friedman results: χ2 (Friedman) = High (≫critical χ2), p-value < 0.001. This indicates significant differences in the models’ response. The calculated effect size, Kendall’s W ≈ 0.85, indicates high agreement in the model rankings among the observations. Since Friedman was significant, pairwise comparisons using the Wilcoxon signed-rank test for related samples are performed, applying the Holm–Bonferroni correction (see Table 5).
The conclusion of the analysis is that T2: Gemini Flash 2.5 is significantly superior to both models. T2: GPT5 also significantly outperforms T2: GPT 5 Mini, although with a smaller effect size.

3.1.3. Comparison of the Models with Respect to Variable T3

Normality tests and descriptive analyses of variables was performed. The Shapiro–Wilk test was applied to each sample, as shown in Table 6.
None of the variables met the assumption of normality (discrete/degenerate distributions). Parametric tests were discarded, and non-parametric tests for related samples were used. The Friedman test was applied to compare the three models with related samples.
Friedman results: χ2 (Friedman) = 48.7, df = 2, and p-value < 0.001. This indicates significant differences in the models’ response. The calculated effect size, Kendall’s W ≈ 0.68, indicates high agreement in the model rankings among the observations. Since Friedman was significant, pairwise comparisons were performed using the Wilcoxon signed-rank test for related samples, applying the Holm–Bonferroni correction (see Table 7).
The Wilcoxon post hoc test confirms significant differences between all pairs with large effect sizes. The final performance ranking, from best to worst, is Gemini Flash 2.5 > GPT 5 > GPT 5 Mini.

3.1.4. Comparison of the Models with Respect to Variable T4

Normality tests and descriptive analyses of variables is performed. The Shapiro–Wilk test is applied to each sample, as shown in Table 8.
None of the variables meet the assumption of normality (discrete/degenerate distributions). Parametric tests are discarded, and non-parametric tests for related samples are used. The Friedman test is applied to compare the three models with related samples.
Friedman results: χ2 (Friedman) = 56.4, df = 2, and p-value < 0.001. This indicates significant differences in the models’ response. The calculated effect size, Kendall’s W ≈ 0.78, indicates high agreement in the model rankings among the observations. Since Friedman was significant, pairwise comparisons using the Wilcoxon signed-rank test for related samples are performed, applying the Holm–Bonferroni correction (see Table 9).
The Wilcoxon post hoc test confirms significant differences between all pairs with large effect sizes. The final performance ranking, from best to worst, is Gemini Flash 2.5 > GPT 5 > GPT 5 Mini.

3.1.5. Comparison of Models with Respect to Variable T5

Normality tests and descriptive analyses of variables is performed. The Shapiro–Wilk test is applied to each sample, as shown in Table 10.
None of the variables meet the assumption of normality (discrete/degenerate distributions). Parametric tests are discarded, and non-parametric tests for related samples are used. The Friedman test is applied to compare the three models with related samples.
Friedman results: χ2 (Friedman) = 72.0, df = 2, and p-value < 0.001. This indicates significant differences in the models’ response. The calculated effect size, Kendall’s W ≈ 0.95, indicates high agreement in the model rankings among the observations. Since Friedman was significant, pairwise comparisons using the Wilcoxon signed-rank test for related samples are performed, applying the Holm–Bonferroni correction (see Table 11).
The conclusion of the analysis is that T5: GPT 5 Mini is significantly superior to both models. There are no significant differences between T5: GPT5 and T5: Gemini Flash 2.5.

3.1.6. Comparison of the Models with Respect to Variable T6

Normality tests and descriptive analysis of variables were performed. The Shapiro–Wilk test was applied to each sample as shown in Table 12.
None of the variables meet the assumption of normality (discrete/degenerate distributions). Parametric tests are discarded, and non-parametric tests for related samples are used. The Friedman test is applied to compare the three models with related samples.
Friedman results: χ2 (Friedman) = 44.2, df = 2, and p-value < 0.001. This indicates significant differences in the models’ response. The calculated effect size, Kendall’s W ≈ 0.61, indicates high agreement in the ranking of the models among the observations. It is confirmed that the models do not exhibit an equivalent performance and that the observed differences are not attributable to chance. Since Friedman was significant, pairwise comparisons are performed using the Wilcoxon signed-rank test for related samples, applying the Holm–Bonferroni correction (see Table 13).
The Wilcoxon post hoc test confirms significant differences between all pairs with large effect sizes. The final performance ranking, from best to worst, is Gemini Flash 2.5 > GPT 5 > GPT 5 Mini.

3.1.7. Comparison of Models with Respect to Variable T7

Normality tests and descriptive analyses of variables is performed. The Shapiro–Wilk test is applied to each sample, as shown in Table 14.
None of the variables meet the assumption of normality (discrete/degenerate distributions). Parametric tests are discarded, and non-parametric tests for related samples are used. The Friedman test with related samples is applied.
Friedman results: χ2 (Friedman) = 61.4, df = 2, and p-value < 0.001. This indicates significant differences in the models’ response. The calculated effect size, Kendall’s W ≈ 0.85, indicates high agreement in the model rankings among the observations. It is confirmed that the models do not exhibit an equivalent performance and that the observed differences are not attributable to chance. Since Friedman was significant, pairwise comparisons with the Wilcoxon signed-rank test for related samples are applied using the Holm–Bonferroni correction (see Table 15).
The Wilcoxon post hoc test confirms significant differences between all pairs with large effect sizes. The final performance ranking, from best to worst, is Gemini Flash 2.5 > GPT 5 > GPT 5 Mini.

3.1.8. Overall Comparison of Models

This section analyzes all algorithms based on their overall performance, considering all indicators used in the comparisons in the previous sections. Comparison procedure:
  • Selection of the ranking criteria, which, in our case, are indicators T1, T2, T3, T4, T5, T6, and T7.
  • Assignment of ranking values, assigning a position to each algorithm based on its performance in each of the previously performed comparison tests considering the following elements.
  • For each indicator, each algorithm is assigned a ranking value corresponding to its position according to that indicator. The best algorithm is placed first, and the rest are ranked according to their results.
  • If algorithms are ranked the same for the same indicator, they are all assigned the same value corresponding to the midpoint of the positions they would occupy if they had significant differences. For example, if two algorithms are ranked first for a given indicator, each is assigned a value of 1.5.
  • Ordering of the samples based on the sums of the rankings.
Application of the mean comparison test:
The application of the protocol yielded the following results. Normality tests and descriptive analyses of variables was performed. The Shapiro–Wilk test was applied to each sample, as shown in Table 16.
None of the variables meet the assumption of normality (discrete/degenerate distributions). Parametric tests are discarded, and non-parametric tests for related samples are used. The Friedman test with related samples is applied.
Friedman results: χ2 (Friedman) was significant, df = 2, and p-value < 0.001. This indicates significant differences in the models’ responses. The calculated effect size, Kendall’s W ≈ 0.85, indicates high agreement in the model rankings among the observations. It is confirmed that the models do not exhibit an equivalent performance and that the observed differences are not attributable to chance. Since Friedman was significant, pairwise comparisons with the Wilcoxon signed-rank test for related samples are applied using the Holm–Bonferroni correction (see Table 17).
The analysis concludes that Gemini Flash 2.5 is significantly superior to both GPT models. Furthermore, no significant differences were found between GPT5 and GPT5 Mini.
Complementarity and exclusivity in summary generation between models:
In this step, 144 linguistic summaries, generated by the four GenAI models used in both winter and summer, were analyzed. Twelve groups were created, as shown below, where summaries were grouped by similarity. The notation used was the model’s name followed by a number, which was the number of the specific summary:
  • Group 1: Articles with high semantic similarity (General): This group demonstrates that the four methods agree on the physical core of the problem: the relationship between an absolute variable (e.g., ugos) and its anomaly (ugosa) is almost linear and extremely strong. There is statistical stability where mesoscale events (anomalies) dominate the total flow signal. If the model detects a high zonal flow, it is almost certain (75–89%) that the anomaly will also be high.
    -
    GPT5 1 to 15 (all describe the high percentage relationship between anomalies and absolute values).
    -
    Gemini 1, 2, 3, and 4 (ugos/ugosa relationships with percentages >75%) and Gemini v1, v5, v9, v12, and v15 (semantic similarity in the zonal U relationship in summer).
    -
    GPT5Mini 1 and 2 and GPT5Mini v1 and v2 (similarities in the reporting of high percentage agreement).
  • Group 2: Similarity only between GEMINI and GPT5 (low similarity with the rest):
    -
    GEMINI 10 to 18 (focus on Mutual Information—MI).
    -
    GPT5 16, 17, and 18 (temporal evolution and levels of Mutual Information).
    -
    Gemini v18 and GPT5 v1 (detailed description of summer periods with similar technical language).
  • Group 3: Similarity only between GEMINI and GPT5NANO (low similarity with the rest):
    -
    GEMINI 10, 11, and 12 (report approximate MI values).
    -
    GPT5NANO v7, v8, and v9 (report MI values for specific years such as 2010).
  • Group 4: Similarity only between GEMINI and GPT5MINI (low similarity with the rest): GEMINI 1, 2, 3, and 4. High similarity with GPT5Mini 1, 2, 3, and 8.
  • Group 5: Similarity only between GPT5 and GPT5MINI: they focus on seasonal persistence and long-term database behavior. High match with GPT5 16 and 17 and GPT5Mini v10 and v15.
  • Group 6: Similarity only between GPT5 and GPT5NANO. Cross-dependency relationships and mention of relative stability. Abstracts with high match: GPT5 18 and GPT5NANO 10 and 15.
  • Group 7: Similarity only between GPT5MINI and GPT5NANO. Both models tend to use abbreviated technical nomenclature (e.g., ugosa_mean-ugos_mean) and year-to-year increment comparisons. GPT5NANO 1 to 18, GPT5Mini v3, v4, and v14.
  • Group 8 (Gemini only): Gemini summaries 5, 6, 7, 8, and 9. These are distinguished by reporting low to moderate percentages (30–45%) and unusual cross-relationships (zonal vs. meridional).
  • Group 9 (GPT5 only): GPT5 summaries 1 to 15 when analyzed as a compact historical block from 2001 to 2018 with an identical structure.
  • Group 10 (GPT5NANO only): GPT5NANO summaries 4, 12, and 16. These focus almost exclusively on the “weakness” of dependencies (MI < 0.07).
  • Group 12 (GPT5MINI only): GPT5Mini v11 and v12 summaries. They are the only ones that mention advanced statistical metrics such as “Lift” and “Support_pct”.

3.1.9. Analysis by Similarity Groups, Different from Group 1

  • Groups 2 to 7: Synergies Between Models:
    -
    GEMINI and GPT5 (structural): Both prioritize temporal evolution. They do not simply provide data but rather attempt to narrate how the MI (Mutual Information) relationship remains “High” or “Very High” over the years.
    -
    GPT5NANO and GPT5MINI (Technical–Analytical): These models tend to be more descriptive of weak dependencies. While GEMINI and GPT5 focus on what “does happen”, the NANO and MINI models are more accurate in reporting what “doesn’t happen” (cross-dependencies such as vgos vs. ugosa with MI < 0.05).
  • Groups 8 to 12: Biases and Specializations
    -
    GEMINI: It is the most sensitive to exceptions. It reports low-probability relationships (30–40%) that others omit, suggesting a more thorough “distribution tail” analysis.
    -
    GPT5MINI: It is the only one that introduces association rule metrics (Lift, support). This indicates that its internal logic is based on data mining rather than simple descriptive statistics.

3.1.10. Global Consistency Analysis Between the Models of Generative AI (Summer vs. Winter)

  • In both summer and winter, ugos–ugosa and vgos–vgosa relationships remain above 80% confidence across all models. This suggests that the geostrophic dynamics in the Norte Grande region do not undergo a reversal of structural mechanisms between seasons but only changes in intensity.
  • A divergence is observed in the ADT-SLA relationship:
    -
    In winter, models report a “Medium or High” intensity relationship (MI ≈ 0.37–0.47).
    -
    In summer, the dependence tends to be reported as “Moderate or Low” by GPT5NANO (MI ≈ 0.13), while GPT5MINI maintains that it is “Frequent” (67%).
    -
    The relationship between the Absolute Dynamic Height and Sea Level Anomaly is identified as seasonally sensitive; smaller models (NANO/MINI) detect a decoupling in summer that larger models tend to smooth out.
  • Cross-Dependencies (Noise vs. Signal): There is complete consistency in that the cross-variables (e.g., vgos vs. ugosa) have almost no dependence. This physically validates the models: the zonal and meridional components act independently, and none of the four methods “hypothesizes” a non-existent relationship between them.
  • Cross-Reliability: All four methods are highly reliable for identifying strong trends. If GEMINI and GPT5 agree at a rate >80%, the data can be considered a robust physical fact in the database. Sensitivity to Detail: GPT5NANO and GPT5MINI are better at identifying loss of correlation. If you need to know when one variable cease to be useful for predicting another (especially in summer), these models offer greater granularity.

3.1.11. Analysis of Internal Variability of Each Model, the Results of Its Repetitions (Runs)

In this analysis, the data from each model in the 11 runs are taken, and the variability is analyzed. The indicators proposed in Section 2.1 (T1 to T7) are used as the unit of measurement.
  • Regarding T1, Gemini Flash 2.5 shows a significantly higher mean (3.69) and a lower standard deviation (0.66), indicating not only better average performance but also high consistency in its responses. In contrast, GPT-5 and GPT-5 Mini exhibit greater dispersion, especially in the standard GPT-5. This greater variability suggests that both GPT models are more sensitive to contextual nuances. Significant differences between Gemini and GPT-5 are demonstrated (r ≈ 0.75). Possible causes of this behavior include better inferential alignment of Gemini for semantic reasoning tasks and lower entropy in its generation.
  • Regarding T2, the most relevant finding is the low standard deviation of GPT-5, indicating deterministic behavior and stability in the evaluation of summaries in this respect. Gemini, on the other hand, achieves a better balance between accuracy and contextual flexibility. GPT-5 Mini exhibits intermediate variability but lower performance.
  • Regarding T3, Gemini maintains low dispersion. However, the most striking finding is the drop in GPT-5 Mini and its large effect size compared to GPT-5 (r ≈ 0.71). This indicator appears to measure a deeper or context-dependent reasoning ability, where reducing the parameters of the Mini model leads to structural degradation. The variability of GPT-5 Mini indicates that its performance is sensitive to variations in the input. A possible cause is the lack of generalization in tasks that require maintaining multiple inference steps.
  • Regarding T4, greater variability is identified between the models. Gemini reflects more stable behavior, while GPT-5 Mini’s mean value decreases. In this indicator, the effect size between Gemini and GPT-5 (r ≈ 0.86) is the highest in the study. This pattern suggests that T4 measures a competence related to structured reasoning or symbolic abstraction, where Gemini’s architectural advantages become more evident. Gemini’s low variability indicates that these capabilities are activated consistently, while GPT-5 Mini shows some functional limitations.
  • Regarding T5, from the perspective of variability analysis, the relevant point is that Gemini and GPT-5 are equivalent in this indicator, with very low dispersion, indicating that both models solve this task similarly.
  • Regarding T6, GPT-5 presents the highest standard deviation, while Gemini maintains its low dispersion profile. Gemini prioritizes more predictable and conservative solutions. This interpretation is consistent with the design philosophy: GPT-5 would seek generative breadth, while Gemini would prioritize precision and stability. The difference in variability is, therefore, an indicator of decoding and alignment strategies.
  • Regarding T7, the dominant pattern is replicated, Gemini > GPT-5 > GPT-5 Mini, with high significance and large effect sizes. The low standard deviation of Gemini (0.58) and the progressively greater dispersion in GPT-5 (0.87) and GPT-5 Mini (0.75) confirm that the behavioral profiles are stable across different types of tasks.
Overall, the analysis suggests that the variability observed between generative AI models may be caused by the following factors:
  • Differences in the contextual sensitivity of the models.
  • Architecture and parametric capacity: the larger models (Gemini, GPT-5) show better performance and, in the case of Gemini, also greater consistency.
  • Alignment with human experience: Gemini prioritizes safe, accurate, and stable responses. GPT-5 tolerates greater generative diversity, possibly to preserve creativity.
  • Variability is not experimental noise, but a rich tracer of profound differences in the design, training, and purpose of each model. Gemini stands out for its stability; GPT-5, for its flexibility; and GPT-5 Mini, for its structural limitations.
Gemini is the model that experienced the least variability in all indicators. Gemini was the most consistent model in the 11 repetitions, with similar responses in all runs. The numerical values shown in these results are consistent with the analyses shown in all of the previous analysis sections.

3.2. Experiment 2: Applicability of the Results—Case Study Analysis of the Coasts of the Norte Grande Region of Chile

In this second experiment, using data from the European Union’s Copernicus Marine Service [17], as explained earlier in this section, the proposal is validated based on its applicability and impact on studies of the coastal zone of the Norte Grande region of Chile.
Specifically, the proposed algorithms were applied, and the following results were obtained.

3.2.1. Some of the Linguistic Summaries Obtained Are Listed Below That Express Behavior in Winter (1995–2018)

  • “Most records in 2003 database report that 84.78% of the time, zonal mean geostrophic velocity anomalies (‘ugosa_mean’) with a very high value have a zonal mean absolute geostrophic velocity (‘ugos_mean’) that is also very high”.
  • “Most records in 2017 database report that, 85.51% of the time, zonal mean geostrophic velocity anomalies (‘ugosa_mean’) with a very low value have a zonal mean absolute geostrophic velocity (‘ugos_mean’) that is also very low”.
  • The majority of records in the 2006 summer database report that 85.507% of the time, records with a very high geostrophic velocity anomaly U component (Whirlpools, mesoscale, and events) also have a very high absolute geostrophic velocity U component (ocean currents).
  • The majority of records in the 2012 summer database report that 81.159% of the time, records with a very high geostrophic velocity anomaly component V (Whirlpools, mesoscale, events) also have a very high absolute geostrophic velocity component V (ocean currents).
  • The majority of records in the 2018 summer database report that 71.739% of the time, records with a very high level of absolute geostrophic velocity component V (ocean currents) also have a very high level of geostrophic velocity anomaly component V (Whirlpools, mesoscale, and events).
  • The majority of records in the 2014 summer database report that 82.609% of the time, records with a very low level of absolute geostrophic velocity component U (ocean currents) also have a very low level of geostrophic velocity anomaly component U (Whirlpools, mesoscale, and events).
  • The majority of records in the 1999 summer database report that 81.884% of the time, records with a very high geostrophic velocity anomaly component U (Whirlpools, mesoscale, and events) also have a very high absolute geostrophic velocity component U (ocean currents). Linguistic summary 16: The majority of records in the 2000 summer database report that 77.536% of the time, records with a very low geostrophic velocity anomaly component V (Whirlpools, mesoscale, and events) also have a very low absolute geostrophic velocity component V (ocean currents).

3.2.2. Analysis and Interpretation of the Linguistic Summaries and Results Found from Experiment 2

Analysis of summaries showing the dynamics between permanent circulation and mesoscale variability (eddies) in the Humboldt Current System.
The linguistic summaries found showed that the relationship between absolute velocities (ugos, vgos) and their anomalies (ugosa, vgosa) is extremely strong and positive, indicating that in the Norte Grande of Chile, mesoscale variability dominates the total flow signal.
  • Zonal Component (U): Exhibits the strongest dependence. With a mean MI of 0.71 and occurrence rates frequently exceeding 84% (e.g., 2013, 2011), it is observed that when there are intense east/west anomalies, the overall current follows that direction almost linearly.
  • Meridional Component (V): Although still strong (mean MI of 0.63), it is slightly weaker than the zonal component. The coincidence rates range from 79% to 81%.
  • Physical Difference: The stronger correlation in U suggests that zonal jets and filaments, common in this area due to upwelling, are the main drivers of absolute variability, while the meridional flow (V) may be more influenced by the larger-scale structure of the Chile–Peru Current.
Time Series Analysis (1995–2018)
Looking at the decades of the 90s, 00s, and 10s, the following behaviors are identified:
  • Velocity Stability: The ratio between ugos/ugosa and vgos/vgosa remains remarkably stable. There is no degradation of the correlation over the decades, implying that the mesoscale regime (eddies) has been the persistent driver of kinetic energy in the region.
  • Sea Level Sensitivity: The adt_mean—sla_mean ratio is the most fluctuating. With a MI of 0.36, it shows that the overall dynamic level is not always dictated by local anomalies.
  • Trend: A slight decrease is observed in recent years, with marked peaks in 2001 and 2005. This suggests that, during certain periods, large-scale factors (such as equatorial Kelvin waves or El Niño/La Niña events) decouple the local anomaly from the absolute mean sea level.
Seasonal Comparison: Winter vs. Summer
In summer, the variability of the percentages (especially in component V, with minimums of 71%) suggests a more chaotic or energetic dynamic, possibly linked to a more intense coastal upwelling that generates short-lived eddies. In winter, the forcing mechanisms appear to be more uniform, maintaining more consistent correlations. See Table 18.

3.3. Experiment 3: Support for Decision-Making Based on Linguistic Summaries, Analysis of Kinetic and Potential Energy, and Electrical Generation

Based on the analysis of the linguistic summaries and the variables studied, support for decision-making in the analysis of kinetic energy and energy potential by zone is demonstrated.
Geostrophic Kinetic Energy is defined as shown in Equation (17):
E K E =   1 2 u g o s a 2 + v g o s a 2
Linguistic summaries confirm that anomalies (ugosa, vgosa) are the main constituents of absolute velocity. Given that reports indicate that “Very High” anomaly values coincide with “Very High” absolute velocity values in more than 80% of cases, we can infer that:
  • Kinetic energy (KE) is the dominant component of total kinetic energy in the Norte Grande region.
  • Periods identified with high percentages of anomaly occurrence (such as the winter of 2015 with 86.96% for V) correspond to phases of high eddy activity and, therefore, peak KE.
  • The region is characterized by a mesoscale dominance over the average flow. While currents are stable in their long-term correlation structure, sea level (adt/sla) shows greater vulnerability to interannual forcing.
Based on the analysis of geostrophic variables and Mesoscale Kinetic Energy (EKE), we can evaluate the potential for marine (hydrokinetic) energy extraction in Northern Chile.
The Mutual Information (MI) analysis and linguistic summaries reveal two distinct scenarios for energy generation, see Table 19:
  • Winter: The “Baseline Supply”. With a very stable and high correlation between gugos and ugosa (MI ~0.71), winter offers more predictable energy potential. The consistency of the records (84–86% agreement at high levels) suggests that the generation infrastructure would have a more constant load factor, with fewer fatigue events due to unforeseen extreme turbulence.
  • Summer: The “Peak Scenario”. Although the occurrence percentages are slightly lower and more variable (71–81%), summer in the far north is usually associated with a higher energy density due to the intensification of coastal jets caused by upwelling. However, the greater dispersion in the data indicates a more intermittent and difficult-to-forecast resource.
Mesoscale Energy (EKE) Dominance: Given that EKE is the main driver (confirmed by the very high correlation between anomalies and absolute velocities), energy generation in this zone does not depend on a constant, unidirectional “river current” but on the activity of eddies and filaments.
3.
Strategic Location: The potential is not uniform. Energy is concentrated at the edges of eddies emanating from key geographic points (such as the Mejillones Peninsula).
4.
Directionality: The greater strength of the zonal component (U) suggests that generating devices must be capable of capturing flows along the east–west axis, and not just the Humboldt Current flowing northward.
5.
El Niño Years: There is a significant increase in generation potential due to the arrival of Kelvin waves and the rise in the dynamic level (adt). The 86.96% coincidence of high anomalies (2015) suggests that, during these events, the availability of kinetic energy increases dramatically.
Northern Chile possesses significant and persistent kinetic energy potential, but harnessing it requires technology designed for mesoscale (changing) flows rather than constant currents. Winter is the ideal period for stable generation, while summer and El Niño years offer the greatest energy surpluses, although with greater technical challenges for the electrical grid.
Based on the oceanographic dynamics of Northern Chile, Geostrophic Kinetic Energy is not distributed uniformly but rather concentrates at topographic rupture points. These are the “hot spots” where the Humboldt Current deviates from the coast, generating meanders and releasing mesoscale eddies that travel toward the open ocean (See Table 20).
Mejillones Peninsula (Antofagasta): This is undoubtedly the point of greatest dynamic interest in the far north of Chile.
  • Mechanical: The peninsula acts as a massive physical barrier to northward flow. Upon passing it, the current experiences boundary layer separation, generating a zone of high Mesoscale Kinetic Energy (MKE) immediately north and west of the peninsula.
  • Potential: This is an area of frequent cyclonic and anticyclonic eddies. The peninsula’s wake concentrates very powerful zonal (east–west) flows that coincide with the high MI values mentioned in the data.
Iquique Zone (20° S–21° S): This area is recognized as a permanent upwelling center and a filament formation node.
3.
Mechanics: The interaction between persistent trade winds and the continental slope at this latitude favors the formation of coastal jets.
4.
Potential: Here, kinetic energy manifests as filaments of cold water extending hundreds of kilometers offshore. These filaments are channels of high geostrophic velocity where the ugosa and vgosa anomalies are typically at their maximum.
Arica Zone and the Arica Elbow (18° S): Near the border of Peru, the coast abruptly changes direction, forming a large curve.
5.
Mechanics: This change in coastal geometry causes natural baroclinic instability in the currents. It is a zone of “retention” and gyre where kinetic energy tends to stagnate in large and slow but massive eddies.
6.
Potential: Although the speeds may be less explosive than in Mejillones, the volume of water in motion (geostrophic transport) is very high, offering a large-scale energy resource.
Canyons and Seamounts (Offshore Taltal/Antofagasta): Here, the topography of the seabed, such as the Peru–Chile Trench system and adjacent seamounts, also acts as a “factory” for eddies.
7.
Mechanics: When deep currents collide with these structures, disturbances are generated that propagate to the surface, raising the level of Geostrophic Kinetic Energy (adt).
The results obtained in Experiment 3 demonstrate the availability of marine kinetic energy in the studied area. The generated linguistic summaries provide a knowledge framework capable of supporting both structured and unstructured decision-making processes under conditions of uncertainty.
The dependence on energy generation capacities and climate variability was identified. For example, the identification of winter periods with highly stable correlations between velocity anomalies and absolute geostrophic velocities allows for predictable operational planning. Under these conditions, decision-makers can establish predefined maintenance programs, optimize energy dispatch forecasting, estimate expected turbine load factors, and reduce operational risks through standardized procedures.
In this scenario, the linguistic summaries function as interpretable indicators that simplify the translation of complex oceanographic signals into useful operational knowledge.
The proposed methodology acts as an AI-assisted decision support system that transforms heterogeneous oceanographic information into interpretable evidence. Instead of replacing expert judgment, linguistic summaries provide cognitively accessible representations of uncertainty.
From a practical perspective, the methodology allows decision-makers to move from raw observations to decision alternatives. The decision support process can be conceptualized as a sequential reasoning chain, as shown below:
Step 1: Oceanographic observations.
Step 2: Knowledge extraction (the proposal of this article).
Step 3: Human-in-the-Loop: Participation of human experts in pattern identification and interpretation of generated linguistic summaries. Supports decision-making.
Step 4: Decision application: Human-in-the-Loop. What types of decisions are possible?
Following this logic, the knowledge generated can support different categories of strategic decisions:
  • Infrastructure decisions: Identifying points along the coast with the best conditions for energy investments, for example, the Mejillones Peninsula.
  • Operational decisions: Such as identifying periods with higher generation capacity and reinforcing installed generation capacities.
  • Technological decisions: Selecting multidirectional turbines, adaptive control systems, or flexible anchoring mechanisms based on the observed directional and seasonal characteristics of Mesoscale Kinetic Energy.
It is important to emphasize that the methodology does not eliminate uncertainty but rather makes it interpretable and manageable. This characteristic is essential in volatile, uncertain, complex, and ambiguous environments, where decisions are increasingly less structured and depend on incomplete information. By integrating quantitative indicators with linguistic representations, the framework allows experts to combine analytical evidence with contextual knowledge, intuition, and strategic judgment.

4. Discussion

The results obtained in this study confirm that the integration of probabilistic graphical models with linguistic data summarization and GenAI constitutes an effective approach for knowledge extraction under uncertainty. In contrast to traditional LDS approaches, which are predominantly based on fuzzy logic, the proposed method introduces a probabilistic structure that enables the explicit modeling of dependencies between variables, thereby improving the robustness of the generated summaries.
Compared with previous work in linguistic data summarization, the proposed framework addresses two key limitations: (i) the combinatorial explosion associated with exhaustive search strategies, and (ii) the lack of explicit uncertainty management. By using probabilistic trees as an intermediate representation, the search space is significantly reduced while preserving the most informative relationships in the data.

4.1. Comparison with Fuzzy-Based Linguistic Data Summarization Approaches

An important aspect of the proposed framework is the replacement of traditional fuzzy-based mechanisms with probabilistic graphical models for the identification of relationships between variables. While fuzzy logic has been the dominant paradigm in linguistic data summarization, the results obtained in this study highlight several advantages of probabilistic modeling.
First, probabilistic models explicitly capture statistical dependencies between variables through well-defined measures such as Mutual Information. This allows for the identification of relationships that are not only linguistically meaningful but also statistically significant. In contrast, fuzzy approaches typically rely on predefined membership functions and linguistic partitions, which may introduce subjectivity and limit the ability to detect data-driven relationships.
Second, probabilistic trees provide a structured representation of variable dependencies, ensuring that candidate summaries are generated from statistically relevant combinations. This reduces the risk of producing spurious or trivial summaries, a known limitation of exhaustive or heuristic search strategies commonly used in fuzzy-based LDS methods.
Third, probabilistic modeling offers a natural mechanism for handling uncertainty based on probability theory. Unlike fuzzy logic, where uncertainty is modeled through degrees of membership, probabilistic approaches quantify uncertainty in terms of likelihood and statistical confidence, which are more directly aligned with data distribution and inferential analysis.
Fourth, the integration of probabilistic structures with generative artificial intelligence enables a clear separation between knowledge extraction and linguistic generation. The probabilistic model ensures that the extracted knowledge is valid and consistent, while the generative model focuses on transforming this knowledge into interpretable language. This modularity is not typically present in traditional fuzzy approaches, where both aspects are often intertwined.
Finally, the experimental results support these theoretical advantages. The high values obtained in indicators such as T4 (Degree of Suitability) and T6 (strength of dependencies) demonstrate that probabilistic models are more effective in capturing meaningful and non-random relationships. This leads to linguistic summaries that are both more reliable and more informative for decision-making.
These findings suggest that probabilistic approaches provide a more robust and scalable foundation for linguistic data summarization, particularly in high-dimensional and data-intensive environments.

4.2. Performance of Generative Models in Linguistic Summarization

The experimental results demonstrate that large language models, particularly Gemini Flash 2.5, achieve superior performance in terms of summary quality. However, the findings also reveal that smaller models can produce competitive results in specific metrics, especially in terms of conciseness and computational efficiency. This suggests that model selection should be context-dependent, particularly in resource-constrained environments.
From an application perspective, the case study in oceanographic analysis highlights the practical value of the approach. The generated summaries successfully capture complex physical relationships, such as the strong dependency between geostrophic velocities and their anomalies, enabling interpretable insights for decision-making in energy systems.
Despite these contributions, the study presents several limitations. First, the computational cost associated with probabilistic model learning and GenAI integration may limit real-time applicability. Second, the approach depends on the quality and representativeness of the input data. Third, the use of generative models introduces variability that, although mitigated through statistical validation, cannot be entirely eliminated.
Future research should focus on optimizing computational efficiency, exploring real-time implementations, and evaluating the approach in additional domains. Furthermore, the integration of explainable AI techniques could enhance transparency in the generation of linguistic summaries.

4.3. Main Contributions

This work presents a novel hybrid framework for linguistic data summarization that integrates probabilistic graphical modeling with generative artificial intelligence, enabling robust and interpretable knowledge extraction under uncertainty. The main contributions of this study can be summarized as follows:
  • A unified probabilistic–generative framework for linguistic data summarization.
    We propose a structured pipeline that combines probabilistic tree learning with controlled natural language generation, bridging the gap between statistically grounded knowledge extraction and human-interpretable linguistic representation.
  • A probabilistic approach to reducing the combinatorial search space in LDS.
    By leveraging probabilistic trees, the framework systematically identifies the most informative variable dependencies, significantly reducing the search space of candidate summaries while preserving statistically meaningful relationships.
  • Integration of GenAI with controlled linguistic protoforms.
    Unlike black-box text generation approaches, the proposed method constrains generative models’ controlled natural language (CNL) grammars, ensuring semantic consistency, interpretability, and alignment with established linguistic summarization theory.
  • A comprehensive evaluation framework based on multi-criteria quality indicators (T1–T7).
    The study introduces a rigorous evaluation scheme combining semantic, statistical, and structural metrics, along with non-parametric statistical validation (Friedman and Wilcoxon tests with Holm correction), providing a robust basis for model comparison.
  • A systematic comparative analysis of large and small language models in LDS tasks.
    The work provides empirical evidence that, while large language models achieve superior overall performance, small language models can offer competitive results in specific dimensions such as conciseness and efficiency, highlighting a trade-off relevant for practical deployments.
  • Validation through a large-scale real-world case study in oceanographic decision-making.
    The proposed framework is applied to a high-dimensional, real-world dataset (over 200 million data points per tree), demonstrating its capability to extract meaningful patterns and support decision-making in complex and uncertain environments.
  • Demonstration of decision-support capabilities in renewable energy contexts.
    The generated linguistic summaries enable the interpretation of geophysical dynamics and provide actionable insights for hydrokinetic energy assessment, illustrating the applicability of the approach in energy planning scenarios.
Overall, this work advances the state of the art in linguistic data summarization by introducing a probabilistically grounded and generatively enhanced methodology that improves scalability, interpretability, and practical relevance in data-intensive domains.

4.4. Limitations and Future Research Directions

Despite the promising results obtained, several limitations of the proposed framework must be acknowledged.
First, the computational cost associated with probabilistic structure learning and the integration of generative models remains significant. Although the approach is suitable for the offline analysis of large datasets, its applicability to real-time or streaming environments is still limited.
Second, the framework assumes high-quality and preprocessed input data. In real-world scenarios, data may contain noise, missing values, or inconsistencies, which could affect the stability of the probabilistic models and, consequently, the quality of the generated linguistic summaries.
Third, while the use of GenAI enhances interpretability, it also introduces variability in the generated outputs. Although this variability was mitigated through statistical validation and repeated experiments, full determinism cannot be guaranteed, which may be critical in high-stakes decision-making contexts.
Finally, the validation of the approach was conducted within a specific domain (oceanographic data). Although the results are encouraging, further evaluation across different domains is required to assess the generalizability of the method.
From this research, several lines of future work are identified, among which the following stand out:
  • Focus on reducing computational complexity to enable near real-time applications, as well as on integrating robust data preprocessing and uncertainty quantification mechanisms.
  • Exploring the incorporation of explainable AI (XAI) techniques could further enhance transparency and user trust in the generated summaries.
  • Cross-domain validation and the development of domain-adaptive linguistic grammars also represent promising directions for extending the applicability of the proposed framework.
  • Comparison of the proposal with other algorithms and results from previous research by the authors themselves, which have focused solely on generating summaries from probabilistic and adaptive models [1,7].

5. Conclusions

This work demonstrates the power of hybridizing linguistic data summarization techniques, algorithms for constructing probabilistic trees, and different models of GenAI. This combination of techniques allowed for the processing of large volumes of data and the generation of linguistic summaries that facilitate data comprehension.
An important element is the treatment of information uncertainty and the triangulation of methods that enhance high consistency in the discovered knowledge. The generation of linguistic summaries incorporates, into the proposed algorithms, a high capacity for simulating human tolerance in the decision-making scenarios presented in the case studies. The proposed algorithms allowed for the processing of numerical, ordinal, and categorical variables. By employing a controlled natural language (CNL) integrated with the GenAI agents, it was possible to “humanize” technical data, transforming it into easily interpretable linguistic protoforms to aid in decision-making.
However, the results obtained identify the computational cost as a limitation of the proposal, restricting the application of the algorithms to asynchronous use on historical data rather than real-time processing. A future line of research is identified as addressing the efficiency and use of real-time extensions of the proposed model.
The algorithms were validated in three experiments. The first experiment allowed for a comparison of the effectiveness of different GenAI models in generating linguistic summaries.
  • The main conclusion of Experiment 1 is that the results obtained by Gemini were significantly superior to the other two GenAI models used.
  • The GPT-5 LLM performed significantly better than the GPT-5 Mini on specific indicators of accuracy and structure, such as T3 and T4. However, in the overall analysis of the validation scenario, no significant differences were found between these two models.
  • The GPT-5 Mini model, by its very nature, generated significantly shorter or more concise summaries than the Gemini and GPT-5 models.
  • It is identified that there are specific dimensions of language processing where small models (SLMs) can be more effective or accurate than their LLM counterparts.
Finally, the validity of these results is supported by high agreement and statistical significance. Friedman tests and the Kendall’s W coefficient (with values up to 0.95) were used. Non-parametric tests and the Holm–Bonferroni correction reinforce the integrity of the model comparison study.
The decision-support capacity of the proposed algorithms is validated through their application in the study of oceanographic data, with measurements taken from satellites where uncertainty and imprecision are present.
The validation of the proposal through methodological triangulation techniques demonstrates that linguistic summaries are highly effective decision-making tools, successfully transforming 23 years of complex geostrophic variable data into readable information with high levels of confidence (frequently exceeding 80%).
The region exhibits a current structure where the anomaly (mesoscale/eddies) is the main driver of absolute velocity. Seasonality primarily affects the sea level (adt/sla) but does not alter the strong dependence between velocity components and their respective anomalies.
A particularly high dependence on the zonal component (U) is noted, suggesting that coast–ocean exchange processes and coastal jets are the main energy vectors in the region.
Regarding temporal and climatic variability, the study concludes that the system shows marked sensitivity to large-scale events such as El Niño (ENSO). During these periods, as observed in 1997–1998 and 2015–2016, sea level and geostrophic velocity anomalies become the main drivers of the system, displacing the influence of permanent currents and increasing Geostrophic Kinetic Energy (KKE).
While winter offers a more stable and consistent scenario for oceanographic analysis, the summer season and years of climate transition present a more energetic but chaotic dynamic, characterized by greater data dispersion and an immediate response to remote forcings such as equatorial Kelvin waves.
Finally, the analysis of marine energy extraction potential reveals that the Norte Grande region possesses a significant hydrokinetic resource, although its exploitation requires technologies adapted to changing mesoscale flows rather than constant currents. Strategic “hot spots” are identified, such as the Mejillones Peninsula, Iquique, and the Arica Elbow, where the interaction of the current with the coastal topography maximizes energy density.
The study reveals that winter is the ideal period for stable electricity generation due to its high predictability, unlike summer, where energy peaks occur.
For the citations of references, we prefer the use of square brackets and consecutive numbers. Citations using labels or the author/year convention are also acceptable. The following bibliography provides a sample reference list with entries for journal articles [1], an CNLs chapter, a book [3], proceedings without editors [4], as well as a URL [5].

Author Contributions

Conceptualization, I.P.P., L.S.A.A., and P.Y.P.P.; Methodology, I.P.P., L.S.A.A., and P.Y.P.P.; Software, I.P.P., P.Y.P.P., and R.Y.H.; Validation, I.P.P., P.Y.P.P., and M.Y.L.V.; Formal Analysis, I.P.P. and P.Y.P.P.; Investigation, I.P.P., L.S.A.A., and P.Y.P.P.; Resources, I.P.P., L.S.A.A., and R.Y.H.; Data Curation, I.P.P. and P.Y.P.P.; Writing—Original Draft Preparation, I.P.P.; Writing—Review and Editing, L.S.A.A., P.Y.P.P., and R.Y.H.; Visualization, I.P.P. and M.Y.L.V.; Supervision, R.Y.H.; Project Administration, L.S.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors would like to acknowledge the Copernicus Marine Service for providing open access, high-quality oceanographic data essential for this study. We also thank the P.P. Shirshov Institute of Oceanology (IO RAS) for its contribution to oceanographic research and scientific knowledge that supports studies of this nature. During the development of this research, GenAI tools, including GPT-5, GPT-5 Mini, and Gemini Flash 2.5, were used for the generation and evaluation of linguistic summaries. Small language models were deployed locally using OpenWebUI v0.6. All outputs generated by these tools were carefully reviewed, validated, and interpreted by the authors. The authors take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Pérez Pupo, I.; Piñero Pérez, P.Y.; Bello Pérez, R.E.; García Vacacela, R.; Villavicencio Bermúdez, N. Linguistic Data Summarization: A Systematic Review. In Artificial Intelligence in Project Management and Making Decisions; Piñero Pérez, P.Y., Bello Pérez, R.E., Kacprzyk, J., Eds.; Springer: Cham, Switzerland, 2022; pp. 3–21. [Google Scholar] [CrossRef]
  2. Phong, P.D.; Lan, P.T.; Thanh, T.X. Linguistic Summarization and Outlier Detection of Blended Learning Data. Appl. Sci. 2025, 15, 6644. [Google Scholar] [CrossRef]
  3. Tran, X.T.; Pham, D.P.; Pham, T.L. A Novel Linguistic Summarization of Time Series Data Based on Enlarged Hedge Algebra Formalism and Genetic Algorithm. Indones. J. Electr. Eng. Inform. 2026, 14, 281–292. [Google Scholar] [CrossRef]
  4. Veens, M.M.A. The Use of Linguistic Summarization in a Clinical Protocol Improvement Context: A Case Study on a ICU Glucose Control Protocol. Master’s Thesis, Eindhoven University of Technology, Eindhoven, The Netherlands, January 2021. [Google Scholar]
  5. Alvey, B.; Anderson, D.; Keller, J. Minimizing Protoform Redundancy to Enhance Linguistic Summaries in Object Detection. In Proceedings of the 2025 IEEE International Conference on Fuzzy Systems (FUZZ), Reims, France, 6–9 July 2025; pp. 1–8. [Google Scholar] [CrossRef]
  6. Ahmed, M. Data Summarization: A Survey. Knowl. Inf. Syst. 2019, 58, 249–273. [Google Scholar] [CrossRef]
  7. Pérez Pupo, I. Algoritmos Para la Sumarización Lingüística de Datos, Aplicaciones en la Toma de Decisiones en la Gestión de Proyectos. Doctoral dissertation, Universidad de las Ciencias Informáticas, Havana, Cuba, 2021. [Google Scholar]
  8. Yager, R.R. A new approach to the summarization of data. Inf. Sci. 1982, 28, 69–86. [Google Scholar] [CrossRef]
  9. Kacprzyk, J.; Yager, R.R. Linguistic Summaries of Data Using Fuzzy Logic. Int. J. Gen. Syst. 2001, 30, 133–154. [Google Scholar] [CrossRef]
  10. Kacprzyk, J.; Zadrożny, S. Prioritized Preference Aggregation for Non-Uniform Groups of Agents. In Advances in Fuzzy Logic and Technology; Baczyński, M., De Baets, B., Holčapek, M., Kreinovich, V., Medina, Y.J., Eds.; Springer Nature: Cham, Switzerland, 2025; pp. 29–40. [Google Scholar] [CrossRef]
  11. Zadeh, L.A. The concept of a linguistic variable and its applications to approximate reasoning. Inf. Sci. 1976, 9, 43–80. [Google Scholar] [CrossRef]
  12. Pupo, I.P.; Pérez, P.Y.P.; Vacacela, R.G.; Bello, R.; Santos, O.; Vázquez, M.Y.L. Extensions to linguistic summaries indicators based on neutrosophic theory, applications in project management decisions. Neutrosophic Sets Syst. 2018, 22, 87–100. [Google Scholar]
  13. Kaczmarek-Majer, K.; Baczyński, M.; Hryniewicz, O.; Miś, K.; Mucha, W.; Wichrowski, F. Fuzzy Linguistic Summaries for Hidden Markov Models. In Information Processing and Management of Uncertainty in Knowledge-Based Systems; Springer: Berlin/Heidelberg, Germany, 2025; pp. 266–276. [Google Scholar] [CrossRef]
  14. Kaczmarek-Majer, K.; Casalino, G.; Castellano, G.; Leite, D.; Hryniewicz, O. Fuzzy Linguistic Summaries for Explaining Online Semi-Supervised Learning. In Proceedings of the 2022 IEEE 11th International Conference on Intelligent Systems (IS), Warsaw, Poland, 12–14 October 2022; pp. 1–8. [Google Scholar] [CrossRef]
  15. Piñero Ramírez, C.M.; Aguiar, D.J.; Pérez Fuentes, A.; Bello Pérez, R.E. Multilingual Linguistic Summarization for Data-Driven Decision-Making. AIAS Artif. Intell. Appl. Sustain. 2025, 1, 26. [Google Scholar]
  16. Piñero Pérez, P.Y.; Pérez Pupo, I.; Kacprzyk, J.; Bello Pérez, R.E. (Eds.) Computational Intelligence Applied to Decision-Making in Uncertain Environments, 1st ed.; Studies in Computational Intelligence; Springer: Cham, Switzerland, 2025; p. xi + 427. [Google Scholar] [CrossRef]
  17. Copernicus: Global Ocean Gridded L4 Sea Surface Heights and Derived Variables Reprocessed (1993–Ongoing). Available online: https://data.marine.copernicus.eu/product/SEALEVEL_GLO_PHY_L4_MY_008_047/description (accessed on 29 January 2026).
  18. Li, B.Z.; Chen, W.; Sharma, P.; Andreas, J. LaMPP: Language models as probabilistic priors for perception and action. arXiv 2023, arXiv:2302.02801. [Google Scholar]
  19. Wu, H.; Xu, F. Slfnet: Generating semantic logic forms from natural language using semantic probability graphs. arXiv 2024, arXiv:2403.19936. [Google Scholar]
  20. Gupta, N.; Singh, V.; Iyer, A.; Shiragur, K.; Grover, P.; Bairi, R.B.; Maiti, R.; Damle, S.; Gupta, S.M.; Maurya, R.; et al. Chow-Liu Ordering for Long-Context Reasoning in Chain-of-Agents. arXiv 2026. [Google Scholar] [CrossRef]
  21. Boix-Adserà, E.; Bresler, G.; Koehler, F. Chow-Liu++: Optimal Prediction-Centric Learning of Tree Ising Models. In Proceedings of the 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), Boulder, Colorado, 7–10 February 2022; pp. 417–426. [Google Scholar] [CrossRef]
  22. Lou, X.; Hu, Y.; Li, X. Learning Linear Polytree Structural Equation Model. arXiv 2025, arXiv:2107.10955. [Google Scholar]
  23. Rebane, G.; Pearl, J. The Recovery of Causal Poly-Trees from Statistical Data. In Proceedings of the Third Conference on Uncertainty in Artificial Intelligence, Washington, DC, USA, 10–12 July 2013. [Google Scholar] [CrossRef]
  24. Minh, D.L.P. Bayesian networks: Exact inference via macro-node polytrees. Commun. Stat. Theory Methods 2026, 55, 4241–4285. [Google Scholar] [CrossRef]
  25. Saranti, A.; Hudec, M.; Mináriková, E.; Takáč, Z.; Großschedl, U.; Koch, C.; Pfeifer, B.; Angerschmid, A.; Holzinger, A. Actionable explainable AI (AxAI): A practical example with aggregation functions for adaptive classification and textual explanations for interpretable machine learning. Mach. Learn. Knowl. Extr. 2022, 4, 924–953. [Google Scholar] [CrossRef]
  26. Wei, J.; Xuezhi, W.; Dale, S.; Maarten, B.; Fei, X.; Ed, C.; Quoc, V.L.; Denny, Z. Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the 36th International Conference on Neural Information Processing System, New Orleans, LA, USA, 28 November–3 December 2022; Volume 35, pp. 24824–24837. [Google Scholar]
  27. Holzinger, A.; Longo, L.; Cangelosi, A.; Ser, J.D. Research Frontiers in Machine Learning & Knowledge Extraction. Mach. Learn. Knowl. Extr. 2025, 8, 6. [Google Scholar]
  28. Hinov, N.; Ivanova, M. LLM-Augmented Algorithmic Management: A Governance-Oriented Architecture for Explainable Organizational Decision Systems. AI 2026, 7, 102. [Google Scholar] [CrossRef]
Figure 1. Example of polytree generated from step 3 of Algorithm 1. Arrows represent the causal or probabilistic dependency directions discovered between nodes.
Figure 1. Example of polytree generated from step 3 of Algorithm 1. Arrows represent the causal or probabilistic dependency directions discovered between nodes.
Make 08 00157 g001
Figure 2. Graphical Abstract about the architecture of the proposed framework. Arrows indicate the directional flow of data and the sequence of methodological steps within the proposed framework architecture.
Figure 2. Graphical Abstract about the architecture of the proposed framework. Arrows indicate the directional flow of data and the sequence of methodological steps within the proposed framework architecture.
Make 08 00157 g002
Table 1. Classification of linguistic summaries according to Kacprzyk and Zadrożny [9,10].
Table 1. Classification of linguistic summaries according to Kacprzyk and Zadrożny [9,10].
Type Protoform Knowledge Doubt Comments (Interpretation)
0QRy’s are SAllTConditional summaries using ad hoc queries
1Qy’s are SSQSimple summaries using ad hoc queries
2QRy’s are SS RQConditional summaries using ad hoc queries
3Qy’s are SQ SStructureSValueSimple value-oriented summaries
4QRy’s are SQ SStructure, RSValueConditional value-oriented summaries
5QRy’s are SNothingQ R SGeneral fuzzy rules
Table 2. Descriptive statistics and normality (descending order by mean).
Table 2. Descriptive statistics and normality (descending order by mean).
ModelMeanStd. Dev.MedianShapiro–Wilk p
T1: Gemini Flash 2.53.690.664.0<0.001
T1: GPT52.361.052.00.003
T1: GPT 5 Mini2.360.873.0<0.001
Table 3. Pairwise comparisons (Wilcoxon + Holm).
Table 3. Pairwise comparisons (Wilcoxon + Holm).
Comparisonp Uncorrectedp Adjusted (Holm)r (Effect Size)
Gemini Flash 2.5 vs. GPT5<0.001<0.001≈0.75
GPT5 vs. GPT 5 Mini0.910.91≈0.02
Table 4. Descriptive statistics and normality test.
Table 4. Descriptive statistics and normality test.
ModelMeanStd. DevMedianShapiro–Wilk p
T2: Gemini Flash 2.50.4860.5070p ≤ 0.001
T2: GPT51.0000.0001p ≤ 0.001 *
T2: GPT 5 Mini1.1890.3921p ≤ 0.001
* Statistically significant at the p ≤ 0.001 level.
Table 5. Post hoc comparisons (Wilcoxon).
Table 5. Post hoc comparisons (Wilcoxon).
ComparisonAdjusted p (Holm)r (Effect Size)Interpretation
Gemini Flash 2.5 vs. GPT5p ≤ 0.001r ≈ 0.80Significant difference
GPT5 vs. GPT 5 Minip ≤ 0.01r ≈ 0.45Significant difference
Table 6. Descriptive statistics and normality.
Table 6. Descriptive statistics and normality.
ModelMeanStd. DevMedianShapiro–Wilk p
Gemini Flash 2.52.750.553.00<0.001
GPT-52.361.032.000.018
GPT-5 Mini1.440.831.00<0.001
Table 7. Post hoc comparisons (Wilcoxon + Holm).
Table 7. Post hoc comparisons (Wilcoxon + Holm).
ComparisonRaw p-ValueAdjusted p-Valuer (Effect Size)Interpretation
Gemini Flash 2.5 vs. GPT-5<0.001<0.0010.63Significant difference
GPT-5 vs. GPT-5 Mini<0.001<0.0010.71Significant difference
Table 8. Descriptive statistics and normality test.
Table 8. Descriptive statistics and normality test.
ModelMeanStandard DeviationMedianShapiro–Wilk p
Gemini Flash 2.53.890.564.00<0.001
GPT52.640.933.000.021
GPT5 Mini1.860.972.000.008
Table 9. Post hoc comparisons.
Table 9. Post hoc comparisons.
Comparisonp (Raw Value)p (Adjusted Value) (Holm)r (Effect Size)
Gemini Flash 2.5 vs. GPT5<0.001<0.0010.86
GPT5 vs. GPT5 Mini0.0040.0080.61
Table 10. Descriptive statistics and normality test.
Table 10. Descriptive statistics and normality test.
ModelMeanStandard DeviationMedianShapiro–Wilk p
GPT-5 Mini1.420.501<0.001
GPT-53.580.494<0.001
Gemini Flash 2.53.610.494<0.001
Table 11. Post hoc comparisons (Wilcoxon + Holm).
Table 11. Post hoc comparisons (Wilcoxon + Holm).
ComparisonUncorrected pAdjusted p (Holm)r (Effect Size)
GPT-5 Mini vs. GPT-5<0.001<0.0010.88
GPT-5 vs. Gemini Flash 2.50.3170.3170.10
Table 12. Descriptive results and normality.
Table 12. Descriptive results and normality.
ModelMeanStandard DeviationMedianp-Value Shapiro–Wilk
Gemini Flash 2.53.220.473<0.001
GPT-52.391.553<0.001
GPT-5 Mini2.191.243<0.001
Table 13. Wilcoxon results (with Holm correction).
Table 13. Wilcoxon results (with Holm correction).
Comparisonp (Raw Value)p (Adjusted Value) (Holm)r (Effect Size)
Gemini Flash 2.5 vs. GPT-5<0.001<0.0010.78
GPT-5 vs. GPT-5 Mini0.0410.0410.35
Table 14. Descriptive statistics and normality test.
Table 14. Descriptive statistics and normality test.
ModelMeanStandard Dev.MedianShapiro–Wilk p
Gemini Flash 2.53.720.584.0<0.001
GPT-52.190.872.00.012
GPT-5 Mini1.830.752.00.004
Table 15. Post hoc comparisons.
Table 15. Post hoc comparisons.
ComparisonUncorrected pAdjusted p (Holm)r (Effect Size)
Gemini Flash 2.5 vs. GPT-5<0.001<0.0010.78
GPT-5 vs. GPT-5 Mini0.0310.0310.35
Table 16. Descriptive statistics.
Table 16. Descriptive statistics.
ModelMeanStandard Dev.MedianShapiro p-Value
Gemini Flash 2.51.210.571.00p ≤ 0.05
GPT-52.070.302.00p ≤ 0.05
GPT-5 Mini2.500.873.00p ≤ 0.05
Table 17. Wilcoxon results (r = effect size).
Table 17. Wilcoxon results (r = effect size).
ComparisonComparisonComparisonComparison
Gemini vs. GPT-5p < 0.05≈0.60Significant difference
GPT-5 vs. GPT-5 Minip > 0.05≈0.30No significant difference
Table 18. Contrasting the percentages of occurrence (linguistic conditional probability).
Table 18. Contrasting the percentages of occurrence (linguistic conditional probability).
CharacteristicWinterSummer
Consistency (U)Very high (~84–86%)High (~79–85%)
Consistency (V)High (~81%)Variable (~71–84%)
StabilityGreater stability in MI weights.Greater dispersion in percentages (e.g., 71% in 2018).
Table 19. Comparison of potential by periods.
Table 19. Comparison of potential by periods.
PeriodEnergy PotentialResource ReliabilityTechnical Challenge
WinterModerate–HighVery High (Consistent Flow)Smaller, more predictable resource.
SummerHighMedium (High Variability)Intermittency management.
ENSO EventsExtremeLow (Episodic Events)Equipment structural strength.
Table 20. Summary of zones for generation.
Table 20. Summary of zones for generation.
AreaResource TypeIdeal Seasonality
Mejillones PeninsulaDetachment Eddies (high EKE)Summer and Winter (high persistence)
IquiqueCoastal Filaments and JetsSummer (maximum upwelling)
Arica (El Codo)Large-scale EddiesEl Niño Years (maximum amplitude)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pérez Pupo, I.; Alvarado Acuña, L.S.; Piñero Pérez, P.Y.; Yzquierdo Herrera, R.; Leyva Vázquez, M.Y. Generative Artificial Intelligence and Probabilistic Trees for the Linguistic Data Summarization in Wave Energy Decision-Making. Mach. Learn. Knowl. Extr. 2026, 8, 157. https://doi.org/10.3390/make8060157

AMA Style

Pérez Pupo I, Alvarado Acuña LS, Piñero Pérez PY, Yzquierdo Herrera R, Leyva Vázquez MY. Generative Artificial Intelligence and Probabilistic Trees for the Linguistic Data Summarization in Wave Energy Decision-Making. Machine Learning and Knowledge Extraction. 2026; 8(6):157. https://doi.org/10.3390/make8060157

Chicago/Turabian Style

Pérez Pupo, Iliana, Luis Segundo Alvarado Acuña, Pedro Y. Piñero Pérez, Raykenler Yzquierdo Herrera, and Maikel Yelandi Leyva Vázquez. 2026. "Generative Artificial Intelligence and Probabilistic Trees for the Linguistic Data Summarization in Wave Energy Decision-Making" Machine Learning and Knowledge Extraction 8, no. 6: 157. https://doi.org/10.3390/make8060157

APA Style

Pérez Pupo, I., Alvarado Acuña, L. S., Piñero Pérez, P. Y., Yzquierdo Herrera, R., & Leyva Vázquez, M. Y. (2026). Generative Artificial Intelligence and Probabilistic Trees for the Linguistic Data Summarization in Wave Energy Decision-Making. Machine Learning and Knowledge Extraction, 8(6), 157. https://doi.org/10.3390/make8060157

Article Metrics

Back to TopTop