1. Introduction
River health assessment is essential for maintaining ecological sustainability and supporting effective watershed management under increasing anthropogenic pressures and climate-driven disturbances. Traditional evaluation tools—such as the River Pollution Index (RPI) and the Carlson Trophic State Index (CTSI)—primarily capture isolated dimensions of environmental quality. While informative for baseline monitoring, these approaches do not adequately represent the complex interactions among hydrological, physicochemical, and biological processes that shape stream ecosystem functioning. As a result, they offer limited diagnostic capacity for understanding degradation pathways or guiding restoration planning.
The
Index of Stream Condition (ISC) provides a more holistic framework by integrating multiple ecological dimensions, including hydrology, physical habitat, water quality, and riparian vegetation. Because of this comprehensiveness, ISC has been widely used in ecological assessments, restoration design, and long-term watershed planning [
1,
2]. However, many river systems exhibit strong non-linear responses to environmental stressors, where indicator relationships cannot be sufficiently captured using linear methods such as principal component analysis (PCA) or traditional clustering. These limitations underscore the need for updated analytical approaches that better reflect ecological complexity.
Conventional ISC-based assessments generally rely on linear dimensionality reduction or expert-defined weighting schemes to aggregate ecological indicators. These approaches assume linearity, orthogonality, or additivity among variables, potentially oversimplifying interdependent ecological processes. As river ecosystems often display non-linear dynamics driven by interactions among flow regimes, water quality gradients, and biotic communities, linear aggregation can obscure meaningful ecological patterns and reduce assessment reliability.
Furthermore, few studies incorporate modern machine learning tools—such as non-linear embedding, density-based clustering, or cluster consistency metrics—to refine or validate the structure of ISC sub-indices. The limited integration of such methods has constrained the evolution of ISC frameworks, particularly in contexts requiring high-resolution evaluation of restoration outcomes.
To address these gaps, this study enhances the ISC through non-linear analytical techniques. The objectives and contributions of the study are threefold:
To develop a t-SNE–enhanced ISC3 framework that reorganizes diverse ecological indicators into data-driven, ecologically coherent clusters, improving representation of hydrological, water-quality, and biological interactions.
To introduce a non-linear analytical approach that captures complex interdependencies among river health metrics, overcoming the constraints of traditional linear methods commonly used in ISC-based assessments.
To demonstrate the practical applicability of the recalibrated framework through a river restoration case study in Taiwan, illustrating how the refined structure provides clearer ecological interpretation and supports more informed watershed management decisions.
Although various nonlinear dimensionality reduction techniques have been applied in environmental analysis, including UMAP and autoencoder-based approaches, t-SNE was selected in this study due to its ability to preserve local neighborhood structure.
This methodological choice is motivated by persistent limitations in conventional composite environmental index frameworks. Recent studies have demonstrated the effectiveness of composite environmental indices in assessing riverine and watershed health by integrating hydrological, physical, chemical, and ecological dimensions. However, conventional index construction approaches typically rely on fixed expert-driven weighting schemes, linear aggregation rules, or statistical normalization methods, which may insufficiently capture nonlinear relationships and latent structural heterogeneity among sub-indicators. Moreover, many existing frameworks assume static importance across sub-indices, limiting their adaptability to spatially heterogeneous or dynamically evolving environmental systems.
Therefore, a methodological gap remains between traditional rule-based index construction and emerging data-driven representations. This study addresses this gap by proposing a density-informed recalibration framework that integrates unsupervised manifold learning with sub-index weight adjustment, enabling adaptive sensitivity to dominant environmental drivers while preserving interpretability and regulatory relevance.
By integrating non-linear dimensionality reduction with a widely used stream health index, this study provides a transferable and ecologically grounded methodology for modern watershed management. The proposed framework advances river health assessment by strengthening diagnostic accuracy, improving interpretability, and supporting evidence-based restoration planning—key priorities aligned with sustainability-oriented river governance.
The main contributions of this study can be summarized as follows:
(1) We propose a density-informed recalibration framework that integrates unsupervised manifold learning with composite index construction, enabling adaptive adjustment of sub-index weights based on intrinsic data structure rather than predefined assumptions.
(2) Unlike conventional index aggregation approaches, the proposed method explicitly incorporates cluster density information to reflect dominant environmental drivers, enhancing both sensitivity and physical interpretability.
(3) The framework is demonstrated through a real-world case study, illustrating its capability to improve sub-index consistency and overall index robustness without compromising transparency or regulatory applicability.
The primary purpose of this study is to develop and validate an integrated river health assessment framework that enhances diagnostic sensitivity and interpretability by recalibrating traditional ISC-based indices using data-driven structural information. Specifically, this study addresses the following research question:
Can the incorporation of nonlinear embedding and density-informed clustering improve the internal consistency, diagnostic resolution, and interpretability of composite river health indices compared to conventional fixed-weight approaches?
To answer this question, the proposed methodology integrates unsupervised manifold learning with density-based recalibration under clearly defined operational assumptions, and its applicability is demonstrated through a real-world river restoration case study in Taiwan.
2. Literature Review
The ISC has long served as a foundational tool for evaluating river health, originating from assessments conducted in Victoria, Australia [
3]. Recent research has expanded its application and highlighted its strengths in capturing multidimensional ecological conditions. Al-Mhdawi et al. (2024) emphasized the limitations of traditional single-metric indices and recognized the ISC as a framework capable of integrating hydrological, habitat, and water-quality attributes [
4]. Atazadeh [
2] demonstrated the ISC’s capacity to synthesize diverse ecological metrics, while Tran et al. [
5] adapted the framework for tropical river environments, illustrating its flexibility across climatic and geomorphic contexts [
2,
5].
To address growing analytical demands, the recent literature has incorporated advanced dimensionality reduction and decision-analysis techniques.
t-SNE has emerged as a powerful tool for preserving local neighborhood structure within high-dimensional ecological datasets [
6]. Guan et al. (2025) [
7] showed that
t-SNE improves clustering resolution and data representation, whereas Lin et al. (2025) highlighted its adaptability for analyzing dynamic, high-density river systems [
7,
8]. Complementary to these advances, multi-criteria decision-analysis methods such as
the Analytical Network Process (ANP) [
9,
10] have been applied to capture interdependencies among ecological indicators and enhance model interpretability [
11].
Technological innovations have further supported the refinement of river health assessment tools. Drone-based remote sensing has enabled rapid, high-resolution evaluations of habitat and geomorphic conditions [
12]. Integration of ISC with ecosystem service frameworks—such as those explored by Xu et al. (2022) [
13] and Yang et al. (2025) [
14]—provides insight into how ecological conditions influence social and economic benefits, thereby improving restoration planning [
13,
14]. Additionally, Calo et al. [
15] stress the importance of incorporating socio-environmental dimensions to ensure that restoration strategies align with community priorities and sustainability goals [
15].
Despite these advances, limitations remain within ISC-based assessments. Linear aggregation and PCA-based dimensionality reduction do not fully capture non-linear ecological interactions, and inconsistencies may arise when interdependencies among indicators are overlooked [
4]. These shortcomings underscore the need for updated analytical approaches that integrate non-linear embedding methods and incorporate modern validation tools to improve the accuracy and reliability of ISC-based river health evaluations.
Existing river health assessment studies have provided valuable insights into index-based evaluation and multivariate environmental analysis. However, most approaches rely on static weighting schemes and predefined aggregation rules, limiting their ability to reflect heterogeneous ecological responses and dominant stressors across spatial contexts. Recent applications of machine learning in environmental assessment have focused primarily on classification or prediction tasks, with limited emphasis on systematic index recalibration.
Building on these observations, this study aims to bridge the methodological gap between conventional ISC-based assessment frameworks and emerging data-driven representations. The specific objective is to recalibrate sub-index contributions using intrinsic data structure while maintaining transparency and regulatory relevance.
3. Materials and Methods
3.1. Study Area and Data Selection Principles
This case study applies the proposed t-SNE-based framework to the Zhuoshui River Basin, the longest river system in Taiwan, and a critical resource supporting agricultural irrigation, biodiversity conservation, and regional development. Owing to its pronounced hydrological variability, ecological heterogeneity, and long-standing pressures from industrial discharge, agricultural runoff, and urban expansion, the basin represents a typical yet challenging context for river health assessment and restoration-oriented analysis. These characteristics make it a suitable testbed for evaluating the sensitivity and internal consistency of composite river health indices. This case-study design allows direct evaluation of whether data-driven recalibration can enhance index consistency and interpretability under real-world monitoring constraints.
The datasets used in this study were selected based on three criteria:
(1) ecological relevance to key river health dimensions,
(2) data availability from official and continuous monitoring programs, and
(3) temporal consistency across sub-indicators.
Accordingly, indicators representing hydrological conditions, physical habitat, water quality, riparian environment, and aquatic life were compiled over a unified assessment period. These data selection principles enhance methodological transparency, ensure reproducibility, and facilitate the future application of the proposed framework to other river basins with comparable monitoring infrastructures.
To ensure data consistency and analytical robustness, ecological and environmental indicators were compiled from official long-term monitoring programs covering multiple sampling stations across the Zhuoshui River basin. The datasets represent temporally consistent observations aggregated at comparable assessment intervals, ensuring cross-indicator comparability. Preprocessing steps included standardization of indicator scales, screening for missing values, and normalization to reduce the influence of extreme observations. These procedures were adopted to enhance data robustness and minimize artifacts prior to nonlinear embedding and clustering.
3.2. Workflow for ISC3 Framework Recalibration
The recalibration of the ISC3 framework followed a six-stage analytical workflow (
Figure 1) designed to enhance the ecological representativeness and statistical coherence of the index for the Zhuoshui River.
The stages comprise:- (1)
Data collection and preprocessing;
- (2)
Framework optimization;
- (3)
Neighborhood selection and density-based clustering;
- (4)
Cluster validation and dimensional framework establishment;
- (5)
Factor weight calculation;
- (6)
Non-linear weight adjustment and cluster consistency analysis.
These procedures collectively ensure that ISC3 metrics capture local ecological responses and system-level interactions under subtropical hydrological conditions.
3.3. Data Collection and Preprocessing
The ISC3 sub-indices—hydrology, physical form, streamside zone, water quality, and aquatic life—were recalibrated to reflect the ecological conditions of the Zhuoshui River basin. While the ISC3 framework was retained as the structural baseline, it was adapted to Taiwan’s subtropical hydrological and biological context, incorporating local species composition, riparian vegetation characteristics, typhoon-induced flow variability, and anthropogenic pressures such as industrial effluents.
To complement the quantitative recalibration, a qualitative expert evaluation was conducted to validate and cross-check the relative importance of sub-indices within the river health assessment framework. Five domain experts representing river ecology, watershed management, hydrological engineering, biodiversity conservation, and environmental science participated in this structured assessment. Expert judgments were elicited using pairwise comparisons expressed on a five-point Likert scale and subsequently processed through the ANP to derive relative priority intensities among sub-indices. These ANP-derived priority scores were used as an independent qualitative reference and were normalized prior to integration with the quantitative workflow.
To ensure contextual understanding without methodological interference, the expert panel was provided with a conceptual overview of the t-SNE-based quantitative framework. Nevertheless, a clear distinction between qualitative judgment and data-driven analysis was maintained, with expert input applied exclusively at the post-embedding recalibration stage.
To align their qualitative assessments with the study’s context, the experts were provided with a conceptual overview of the t-SNE-based quantitative framework. Nevertheless, a clear distinction between qualitative judgment and quantitative analysis was maintained, with the expert panel tasked to provide an independent qualitative perspective without influencing the data-driven embedding and clustering processes.
3.4. Framework Optimization Using t-SNE and ANP
(1) Conditional Probability Transformations in t-SNE
t-SNE is a non-linear dimensionality reduction algorithm that preserves local neighborhood relationships and global structure in high-dimensional data [
6]. In this study,
t-SNE was used to cluster and visualize ISC3 ecological metrics, providing insights into their complex interdependencies.
The algorithm models the similarity between points x
i and x
j in the original space using conditional probabilities:
where σ
i controls the local neighborhood size and is related to the perplexity parameter, which influences the balance between local and global structure. The joint probability P
ij is then symmetrized as:
where N is the total number of data points.
In the reduced space,
t-SNE mitigates overcrowding by applying a Student t-distribution for similarity:
where Q
ij avoids overcrowding of points in the reduced dimension while preserving significant pairwise relationships.
To align high-dimensional and low-dimensional spaces,
t-SNE minimizes the
Kullback–Leibler divergence (KLD) between the two similarity distributions [
16,
17]:
where smaller C values indicate better fidelity in preserving neighborhood structure.
Applied to ISC3 metrics such as hydrology, water quality, and aquatic life, t-SNE maps high-dimensional data into interpretable two-dimensional clusters. By optimizing parameters like perplexity, t-SNE balances local ecological site conditions and global river health gradients, uncovering statistically robust and ecologically meaningful clusters essential for river health assessment and conservation planning.
(2) Interdependency Analysis Through ANP
The ANP by Saaty [
9] quantified metric interdependencies through pairwise comparisons, producing a weighted supermatrix that captured causal relationships and feedback loops critical for recalibration [
9,
18]. Key findings revealed hydrology’s indirect influence on water quality and vegetation, along with reciprocal feedback loops between aquatic life and riparian conditions.
The pairwise comparison matrix (A) was derived from expert judgments (Equation (5)), and standard ANP computational procedures generated the weighted supermatrix, resulting in weight vectors (
) for each ISC3 metric (Equation (6)).
The calibrated weights were incorporated into the t-SNE embeddings to maintain metric interdependencies during dimensionality reduction and clustering. This integration preserved ecological relationships, supporting metric prioritization and validating the clustering outcomes.
It is important to note that the ANP does not influence the t-SNE embedding or the density-based clustering process. The nonlinear embedding and clustering are conducted solely based on observed ecological indicators, ensuring that the data-driven structural representation remains unbiased by expert judgment.
Following the identification of intrinsic data structure, expert-derived ANP weights are applied during the post-embedding recalibration stage to adjust sub-index contributions. Rather than altering the embedding itself, the calibrated weights are mapped onto the t-SNE-derived clusters to preserve metric interdependencies identified during dimensionality reduction. This integration facilitates metric prioritization and supports the interpretation and validation of clustering outcomes within an ecologically meaningful decision-support context.
3.5. Neighborhood Selection and Density-Based Clustering
This subsection presents the application of t-SNE to evaluate ISC3 sub-indices (X1–X5), focusing on neighborhood probabilities, density-based clustering, and sub-index priority identification. These steps ensured the recalibration process reflected both statistical fidelity and ecological significance.
1. Neighborhood Probability Analysis
t-SNE projected ISC3 metrics into a two-dimensional space for non-linear recalibration and dimensionality reduction, capturing both local and global interdependencies. Local neighborhood probabilities were computed as the basis for cluster density analysis, serving to inform the recalibrated weight assignments and quantify the relative ecological influence of each sub-index.
2. Density-Based Clustering and Sub-Index Priority Identification
Clusters were identified by systematically varying perplexity values (5–50, optimized at 30 via sensitivity analysis) and minimizing KLD for embedding stability [
19,
20]. High-density clusters emerged as dominant contributors to the ISC3 framework, reflecting sub-index prioritization. The density-based clustering approach considered:
a. Intra-cluster density: Evaluating compactness within clusters.
b. Inter-cluster separation: Assessing distinctiveness across clusters.
c. Conditional probabilities: Mapping interdependencies and divergences among sub-indices.
Metrics were standardized using z-scores, with missing values imputed. Variables with low ecological relevance or high multicollinearity (VIF > 10) were excluded prior to analysis [
21]. This clustering workflow validated the robustness of identified clusters and informed the prioritization of sub-indices, providing the basis for weight recalibration and enhancing the ISC3 framework’s ecological validity.
3.6. Validation of k Values and Dimensional Framework
The step determined the optimal number of clusters (k) and established the dimensional framework for ISC3 metrics. Candidate k values were evaluated using two criteria:
(1) Cumulative cluster probability coverage exceeded 90%, ensuring adequate representation of metric interdependencies [
22].
(2) Intra-cluster density deviations remained below 0.25 standard deviations, ensuring internal consistency [
23].
A sequential non-linear evaluation assessed cumulative probabilities and density balances across neighborhoods, overcoming the limitations of linear methods like Principal Component Analysis (PCA) scree plots, which often overlook complex ecological dependencies. t-SNE’s capability to maintain local relationships while separating overlapping global structures ensured more precise cluster validation.
3.7. Factor Weight Calculation via Cluster Densities
Local density values were computed for each data point using the density estimator applied in the t-SNE–derived cluster analysis. Cluster-level densities were then obtained by aggregating the local densities of points belonging to each cluster.
Cluster weights were calculated using Equation (7):
where
denotes the
m-th cluster, n is the total number of clusters,
and
represent the local density values of data points i and j, respectively [
24].
To obtain the weight of each ecological sub-index, the density values associated with that sub-index were normalized over the total density of all sub-indices.
where
and
denote the aggregated density contribution of sub-index
i and
j, respectively.
m is the total number of ecological sub-indices considered.
This procedure yields a set of normalized weights for the ecological sub-indices.
3.8. Non-Linear Weight Adjustment and Cluster Consistency Validation
This subsection integrates weight recalibration and cluster consistency validation to ensure the robustness of ISC3 metrics and their ecological interpretability within the t-SNE framework.
(1) Non-Linear Weight Adjustment
The non-linear relationships among ISC3 sub-indices were captured using t-SNE proximity probabilities (Pij, Qij), supporting weight recalibration and normalization:
a. Weight Calibration and Normalization: Weights were recalibrated based on
t-SNE proximity probabilities and normalized as z-scores for cross-index comparability. Sensitivity analyses across varying perplexity values (5–50) and bootstrap sampling confirmed projection robustness [
23].
b. Implications for Subsequent Analyses: Recalibrated weights provide a stable foundation for sub-index reorganizations and downstream ecological assessments within the ISC3 framework.
(2) Cluster Consistency Validation
Cluster consistency was assessed to ensure the statistical stability and structural coherence of recalibrated sub-indices. Intra-cluster density deviations (Dcluster < 0.1) confirmed compact clustering and robust ecological relevance. Metrics such as High Flows, Artificial Barrier, and Variability demonstrated strong internal consistency, reinforcing their roles in promoting stability and ecological coherence. The validated intra-cluster densities further supported localized metric interactions and consistent sub-index organization, ensuring both statistical reliability and ecological interpretability.
(3) Structural Coherence Validation
The robustness of recalibrated clusters was further tested by comparing intra-cluster density deviations with a threshold of 0.1 (D
cluster < 0.1), ensuring stable and interpretable groupings [
25]. This analysis confirmed that ISC3 sub-indices in the
t-SNE space maintain ecological and statistical coherence, supporting the integrity of sub-index weighting and the overall ISC3 framework.
All analyses were conducted using Python 3.13.3 with standard scientific computing libraries, including scikit-learn for t-SNE implementation and custom scripts for density-based clustering and consistency evaluation.
4. Results
The results integrate the analytical procedures described in
Section 3.2, including neighborhood probability estimation,
t-SNE–based clustering, ANP-supported interdependency evaluation, and non-linear weight recalibration of ISC3 sub-indices. Collectively, these outcomes clarify the ecological relevance of the recalibrated metrics, reveal inter-cluster relationships, and confirm the robustness of the updated ISC3 framework for application to river-health diagnostics and management planning.
4.1. Likert-Scale Scoring and Framework Demonstration
To ensure coherence between expert ecological knowledge and the quantitative recalibration process, expert-based scoring from
Section 3.3 was applied to cross-validate the priority structure derived from density-based weighting. This step serves as an external check on the methodological workflow and reinforces the ecological interpretability of the recalibrated ISC3 metrics.
(1) Expert Evaluation and ANP-Derived Priority Scoring
The intermediate ANP-derived priority scores prior to normalization were as follows: Hydrology (X1): 2.5; Physical Form (X2): 3.2; Streamside Zone (X3): 3.8; Water Quality (X4): 4.5; and Aquatic Life (X5): 5.5. These values represent ANP-derived priority scores prior to normalization and do not correspond to raw Likert-scale ratings. Accordingly, no raw Likert-scale rating exceeds the original five-point scale, and the reported ANP-derived priority scores fully comply with the constraints of the expert elicitation process.
The results indicate a convergent expert consensus regarding ecological significance. Aquatic Life (X5) and Water Quality (X4) were consistently identified as primary determinants of ecosystem condition, reflecting the dominant role of biological integrity and water quality stressors in the Zhuoshui River basin. Streamside Zone (X3) exhibited intermediate sensitivity to anthropogenic disturbance, while Hydrology (X1) and Physical Form (X2) were identified as secondary yet supportive contributors, reinforcing their functional but less dominant ecological roles.
(2) Framework Demonstration Using Simulated Data
A simulated dataset was employed to demonstrate the operational behavior of the integrated t-SNE–ANP framework. Although hypothetical, this dataset allows clear visualization of metric differentiation and ecological structure.
t-SNE clustering produced well-defined ecological groupings based on representative values assigned to the five ISC3 sub-indices:
Cluster 1: Strong dominance of Hydrology (X1 = 0.32)
Cluster 2: Clear prominence of Streamside Zone (X3 = 0.35)
Cluster 4: Distinct representation of Aquatic Life (X5 = 0.33)
These cluster patterns illustrate t-SNE’s capacity to preserve non-linear relationships and highlight the diagnostic value of the recalibrated sub-indices. The demonstration validates the methodological coherence of the workflow by showing how multidimensional ecological interactions are effectively captured and expressed within the ISC3 framework.
4.2. Neighborhood Probability Mapping and Sub-Index Recalibration
The neighborhood probability outputs generated by
t-SNE provide a quantitative representation of how each ISC3 sub-index is distributed across ecological clusters. These probability structures translate the conditional-probability formulations defined in
Section 3.4—namely, the high-dimensional similarity measures P
ij and their low-dimensional counterparts Q
ij—into interpretable ecological patterns. By capturing non-linear relationships among hydrological, geomorphological, riparian, water-quality, and biological conditions, the
t-SNE probabilities reveal how each sub-index contributes to ecological variability within the Zhuoshui River system. This probabilistic mapping, therefore, serves as a direct operational link between the methodological framework established in
Section 3 and the empirical recalibration of ISC3 sub-indices presented in the Results.
4.2.1. Ecological Interpretation Based on Neighborhood Probabilities
(1) Hydrology (X1): Hydrology exhibits its strongest presence in Cluster 4 (0.33) and a secondary concentration in Cluster 5 (0.23).
Cluster 4 reflects hydrology-driven variability dominated by flow regime alterations and discharge fluctuations.
Cluster 5 indicates interactions with riparian stability and channel adjustments.
Together, these distributions confirm that hydrology exerts a distributed, system-regulating influence rather than localized dominance.
(2) Physical Form (X2): Physical Form is primarily associated with Cluster 1 (0.32) and Cluster 3 (0.22).
Cluster 1 captures channel morphology under stable hydrological conditions.
Cluster 3 reflects geomorphic adjustments linked to sediment dynamics.
These patterns underscore geomorphology’s foundational role in controlling habitat structure.
(3) Streamside Zone (X3): The Streamside Zone shows the highest probability in Cluster 5 (0.36), with meaningful representation in Cluster 4 (0.20).
Cluster 5 represents riparian vegetation’s functions in bank reinforcement, shading, and organic matter delivery.
Cluster 4 highlights interactions between riparian vegetation and biological communities.
This confirms riparian zones as a stabilizing ecological buffer.
(4) Water Quality (X4): Water Quality dominates Cluster 2 (0.35) and contributes to Cluster 1 (0.26).
Cluster 2 reflects sensitivity to nutrient loading and pollutant gradients.
Cluster 1 demonstrates hydrology-mediated water-quality regulation.
The distribution indicates strong dependence on external pressures and hydrological control.
(5) Aquatic Life (X5): Aquatic Life is most prominent in Cluster 3 (0.34) and Cluster 2 (0.28).
Cluster 3 highlights biological responses to habitat heterogeneity and geomorphic complexity.
Cluster 2 captures vulnerability to water-quality stressors.
These results identify Aquatic Life as an integrative indicator of ecosystem integrity.
4.2.2. Synthesis and Recalibration Implications
The t-SNE neighborhood probabilities reveal that:
1. Each sub-index exhibits a distinct ecological signature across clusters.
2. Hydrology and Physical Form act as regulatory and structural drivers.
3. Water Quality and Aquatic Life reflect sensitivity to environmental stressors.
4. The Streamside Zone provides riparian stabilization across hydrological gradients.
This multi-dimensional interpretation—grounded in the probability patterns summarized in
Table 1 and fully consistent with the conditional-probability formulations introduced in
Section 3.4—provides a coherent basis for recalibrating the ISC3 sub-indices. It ensures that the recalibration process remains logically aligned with the analytical sequence articulated in
Section 3 (Steps 3–5) [
11] and that the ecological meaning embedded in the neighborhood structures is accurately preserved.
Complementing this probabilistic interpretation, the ANP-derived interdependency weights (
Section 3.4) reinforce the causal and feedback relationships among the sub-indices. By integrating
t-SNE’s non-linear neighborhood structure with ANP’s systematic representation of metric interdependencies, the recalibrated ISC3 framework maintains both ecological relevance and methodological coherence. Together, these components provide a rigorous, ecologically grounded foundation for the multi-dimensional reinterpretation and refinement of the ISC3 assessment system.
4.3. Cluster Density Analysis and Weight Assignment
The results presented in this section correspond directly to the neighborhood probability and density-based clustering procedures described in
Section 3.5. Following the
t-SNE embedding, local neighborhood probabilities were translated into density values that characterize the ecological influence of each ISC3 sub-index (X
1–X
5). These density measures represent how strongly each metric contributes to the internal cohesion of its cluster and therefore form the empirical basis for recalibrating sub-index importance.
Expert Likert-scale evaluations (
Section 3.3) were incorporated as secondary adjustments to ensure ecological interpretability of the recalibrated weights. Their influence was intentionally constrained to a small range (±1–3%) relative to the density-derived values and applied only during final normalization to maintain ΣW
m = 1.
The resulting cluster densities and weight contributions are summarized in
Table 2. These density values are derived directly from the neighborhood-based clustering procedure described in
Section 3.5 and represent each sub-index’s localized ecological influence within the
t-SNE embedding. Importantly,
Table 1 reports occurrence probabilities (P(X
m|Cluster)), whereas
Table 2 expresses density-derived ecological influence. The two tables, therefore, serve distinct analytical functions:
Table 1 characterizes probabilistic distribution patterns across clusters, while
Table 2 quantifies the sub-indices’ relative ecological importance for recalibration purposes.
Key Findings
Aquatic Life (X5) (37.4%) exhibited the highest local density, confirming its central contribution to biological integrity and overall ecological stability in the Zhuoshui River Basin.
Streamside Zone (X3) (25.8%) demonstrated strong cluster cohesion and plays a major role in riparian protection, shading, and erosion control.
Physical Form (X2) (20.9%) represented geomorphological stability essential for channel resilience and habitat support.
Hydrology (X1) (12.7%) showed moderate influence, reflecting its indirect but system-wide regulatory function.
Water Quality (X4) (3.3%) exhibited the lowest density contribution, consistent with its high dependency on hydrological and riparian processes rather than independent structural influence.
Overall, the density-derived weights confirm that the recalibration workflow introduced in
Section 3 is statistically robust and ecologically coherent. The integration of density-based clustering, neighborhood probability analysis, and expert validation produces an internally consistent and interpretable weighting structure suitable for advancing the ISC3 assessment framework.
4.4. Sub-Index Prioritization and Probabilistic Mapping via t-SNE Projections
The
t-SNE configuration employed in this study met the cluster validation criteria specified in
Section 3.6, including achieving cumulative neighborhood-probability coverage above 90% and maintaining intra-cluster density deviations below 0.25 standard deviations. These conditions confirm that the clusters displayed in
Figure 2 are statistically stable and ecologically interpretable. Building on the density-based recalibration (Step 1), the two-dimensional
t-SNE projections were then used to visualize and prioritize the ISC3 sub-indices while preserving their non-linear interdependencies. The five sub-indices—Hydrology (X
1), Physical Form (X
2), Streamside Zone (X
3), Water Quality (X
4), and Aquatic Life (X
5)—formed distinct, well-separated clusters, indicating minimal redundancy and strong ecological differentiation. Among them, X
1, X
3, and X
5 emerged as dominant clusters, reflecting their comparatively greater ecological influence on the composite ISC3 index.
Stable embeddings were achieved through iterative optimization of perplexity (5–50) and minimization of the KLD. Unlike linear dimensionality-reduction techniques that rely on eigenvalues and cumulative variance ratios, t-SNE employs high-dimensional conditional probabilities (Pij) and low-dimensional probabilities (Qij) to preserve neighborhood similarities. Gaussian kernels in high-dimensional space and Student’s t-distributions in low-dimensional space enhance contrast between densely and sparsely clustered regions, enabling clear visual differentiation of ecological domains.
Optimized perplexity values ensured balanced local and global structure retention, minimizing deviations between P
ij and Q
ij. Dynamic neighborhood weighting, adjusted to local density distributions, further refined cluster fidelity and reduced mapping errors. These probabilistic refinements validate the robustness of the recalibrated ISC3 framework and establish a quantitative foundation for subsequent density-based clustering,
k value validation, and non-linear weight adjustment analyses presented in
Section 4.
4.5. t-SNE–Based Factor Clustering and Optimal k Determination
The analyses in this section correspond to the cluster-validation and sub-index reorganization procedures detailed in
Section 3.6. As noted earlier, the factor-weight computation introduced in
Section 3.7 is presented in
Section 4.3; in contrast,
Section 4.5 focuses solely on validating cluster structure and refining metric groupings rather than deriving weights.
The
t-SNE–based clustering procedure reorganized the ISC3 metrics into ecologically cohesive sub-indices by identifying non-linear density relationships across hydrological, riparian, water-quality, and biological dimensions. To strengthen cluster cohesion, metrics exhibiting low explanatory power or inconsistent density profiles—particularly those associated with X
2—were systematically removed according to the density-validation criteria defined in
Section 3.6. This refinement enhanced interpretability and ensured that the resulting clusters captured genuine ecological dependencies rather than artifacts of multicollinearity or noise within the original dataset.
1. Cluster Visualization and Structure Interpretation
Figure 3 presents the two-dimensional
t-SNE projection of ISC3 sub-indices, revealing three ecologically meaningful and statistically stable clusters: X
1, X
3, and X
5. Iterative optimization of perplexity values (5–50) and minimization of KLD yielded reproducible embeddings, preserving both local and global neighborhood structures. Within this configuration, X
3 (Streamside Zone) exhibited the highest local density, emphasizing its role in riparian stability, while X
1 (Hydrology) displayed consistent cross-cluster connectivity essential for geomorphic resilience. X
5 (Aquatic Life) reinforced biological integrity through strong interdependence with hydrological and habitat metrics.
2. Optimal k Value Determination
The optimal cluster number (
k) was determined through KLD minimization and
cumulative probability coverage (CPC) analysis. As summarized in
Table 3 and visualized in
Figure 4,
k = 3 achieved a CPC of
92.66%, exceeding the 90% validity threshold and satisfying the predefined density deviation criterion (<0.1; see
Table 4) [
22]. This outcome confirms that three clusters sufficiently represent the dominant ecological gradients in the ISC3 dataset while maintaining low intra-cluster variance and minimal redundancy.
Figure 4 further validates
k = 3 by showing that
cumulative neighborhood probabilities exceed the 90% threshold, confirming that the three identified clusters capture the majority of ecological variability within the ISC3 dataset. Among these clusters, X
3 (Streamside Zone) contributed most strongly to localized density relationships, underscoring its ecological importance in maintaining riparian stability and catchment resilience. X
1 (Hydrology) exhibited cross-cluster consistency, reflecting its foundational role in sustaining hydrological and geomorphic processes, while X
5 (Aquatic Life) reinforced biological connectivity and served as an indicator of overall ecosystem integrity.
3. Sub-Index Reorganization and Validation
The optimized configuration retained 12 core ISC3 metrics distributed across three refined sub-indices:
Hydrology: High Flows, Artificial Barrier, Cover of Trees and Shrubs, Structure
Water Quality: pH, Salinity, Total Phosphorus
Aquatic Life: Ausrivers, Number of Families, Instream Woody Habitat, Seasonality, Variability
This reorganization, validated through intra-cluster probability densities (
Section 3.6), preserved ecological specificity while eliminating multicollinearity inherent in linear clustering approaches such as PCA [
26].
4. Regression-Based Validation of Reorganized Metrics
To evaluate the predictive utility of the t-SNE–reorganized framework, a comparative regression analysis was conducted using the recalibrated sub-indices as explanatory variables of the composite ISC3 score.
Using all 23 original metrics, the regression model yielded an
R2 of 67.78%, reflecting moderate reliability with considerable redundancy among predictors. In contrast, the regression using the 12 reorganized metrics derived from the
t-SNE clustering achieved an
R2 of 88.15%, indicating substantially improved explanatory power and reduced multicollinearity (
Figure 5 and
Figure 6).
These results confirm that the t-SNE–driven reorganization enhances the predictive coherence of the metric structure, even though t-SNE itself remains an unsupervised dimensionality reduction technique. The post-regression validation thus provides a quantitative measure of the reorganized framework’s practical reliability in capturing ecological gradients represented by the ISC3 index.
4.6. Cluster Consistency and Reliability Analysis Using t-SNE
Cluster consistency analysis provides a critical evaluation of the robustness and ecological reliability of the recalibrated ISC3 framework. Following the procedures outlined in
Section 3.8, consistency was examined at both the sub-index and metric levels to ensure that the
t-SNE–derived clusters remain statistically stable and ecologically interpretable within the non-linear embedding space.
4.6.1. Sub-Index Consistency Verification
Sub-index consistency was first evaluated using the intra-cluster density deviation metric D
cluster, which quantifies localized density variation within the nonlinear
t-SNE manifold. All five ISC3 sub-indices exhibited D
cluster < 0.1, confirming compact and internally coherent clustering (
Table 5). D
cluster replaces classical CI/CR indices by providing a direct measure of structural cohesion within nonlinear ecological spaces [
25].
Among the sub-indices, the Streamside Zone (X
3) exhibited the largest deviation (0.087), reflecting natural ecological variability associated with riparian function; however, this value remains well within the acceptable threshold. These results agree with earlier ecological studies emphasizing the stabilizing influence of hydrological and riparian processes in nonlinear clustering systems [
27,
28].
Figure 7 provides a complementary graphical illustration of sub-index consistency, showing normalized D
cluster values for the five sub-indices. Lower values for Hydrology (0.0078) and Physical Form (0.0036) indicate highly coherent clusters, while moderately higher values for X3 and X4 reflect ecological gradients without compromising overall structural reliability.
4.6.2. Global and Sub-Index Structural Reliability
To extend the sub-index consistency assessment, hierarchical reliability was evaluated across both global and sub-index levels.
Table 5 reports the consistency values derived from the same
t-SNE framework.
All consistency values—including the global ISC3 index (0.0218)—remain well below 0.1, confirming strong alignment between local (cluster-level) and global (index-level) structures.
These results confirm that the recalibrated ISC3 structure preserves cross-scale stability, reflecting ecologically interpretable nonlinear relationships across hydrological, riparian, and biotic dimensions.
4.6.3. Metric-Level Consistency Verification
Metric-level consistency was assessed using normalized density inconsistency values, providing a fine-scale evaluation of ecological coherence across the 23 metrics.
Table 6 shows that
all metrics remained below the threshold of 0.1 (Dcluster < 0.1), validating their suitability for inclusion within the reorganized
t-SNE framework.
High Flows, Artificial Barrier, Variability, and Instream Woody Habitat demonstrated particularly low inconsistency, highlighting their importance in stabilizing cluster cohesion.
High cluster consistency (all Dcluster < 0.1)
Strong cross-level reliability (global and sub-index consistency < 0.03)
Stable metric-level structure across nonlinear ecological gradients
Compared with linear clustering approaches such as PCA, ANOVA, or entropy-based ANP models—which emphasize global variance but overlook spatial heterogeneity—the
t-SNE–based recalibration provides a more adaptive and ecologically realistic representation of the Zhuoshui River ecosystem [
11,
27].
Together, the consistency results confirm the robustness of the updated ISC3 framework for adaptive river-health monitoring and establish a methodological benchmark for applying nonlinear clustering techniques in environmental risk assessment.
4.7. Density-Informed Weight Recalibration Using t-SNE Cluster Densities
Table 7 summarizes the normalized density-informed weights derived from the
t-SNE cluster structure for the five ISC3 sub-indices. The resulting values reveal distinct patterns of ecological dominance reflected in the cluster density distributions. Among the reported weighting outcomes, the normalized density-informed weights presented in
Table 7 constitute the authoritative weighting scheme adopted for all subsequent analyses and management interpretations.
The recalibrated weights show that Water Quality (X4) and Aquatic Life (X5) jointly contribute the largest proportions to the overall index, together accounting for approximately 57% of the total weighting. Their relatively high densities indicate strong internal cohesion and a prominent role in shaping localized ecological conditions within the t-SNE manifold.
Streamside Zone (X3) also exhibits a substantial weight, reflecting the importance of riparian vegetation, shading, bank stability, and lateral habitat connectivity in mediating ecological states across the catchment.
In comparison, Hydrology (X1) and Physical Form (X2) demonstrate smaller but stabilizing influences. Their lower density values correspond to more homogeneous cluster structures, indicating that these indicators contribute more uniformly and exert less differentiation across ecological regimes.
Overall, the density-informed weighting results reveal a more ecologically realistic and spatially responsive importance structure than traditional linear weighting schemes, with physicochemical and biological sub-indices playing central roles in defining ecological variability across the study area.
4.8. Implications and Future Applications for River Health Assessment
The Zhuoshui River case study demonstrates the practical potential of the t-SNE–recalibrated ISC3 framework in adapting international river health indices to Taiwan’s subtropical ecological and climatic conditions. The customized sub-indices—particularly Aquatic Life and Streamside Zone—illustrate the framework’s flexibility to accommodate biodiversity and riparian processes unique to subtropical river systems, while preserving methodological rigor.
At the diagnostic level, the t-SNE approach enhances the detection of non-linear and spatially heterogeneous patterns that conventional linear methods (e.g., PCA or entropy-weighted ANP) tend to obscure. By preserving neighborhood topology and density-driven relationships, the recalibrated framework allows more sensitive differentiation among hydrological, physicochemical, and biological dimensions, supporting early identification of degradation signals such as riparian instability or nutrient enrichment.
At the management level, the validated cluster configuration provides a scientifically grounded tool for adaptive and data-driven decision-making. Quantified density deviations and inter-cluster dependencies pinpoint leverage points—such as flow regulation or riparian restoration—that can yield significant ecological improvement. These results enable evidence-based prioritization in policy design, ecological compensation, and basin-scale performance benchmarking.
Finally, the methodology contributes to the broader advancement of machine-learning–supported ecological diagnostics. The t-SNE–based recalibration offers a replicable and scalable protocol for other Taiwanese river systems (e.g., Gaoping, Danshui) and for international applications facing complex, high-dimensional environmental datasets. Coupled with real-time monitoring and predictive modeling, the density-informed ISC3 framework can evolve into a dynamic diagnostic platform for scenario testing and resilience planning under global sustainability targets (SDGs 6.6 and 15.1).
Overall, this framework bridges quantitative precision with ecological realism, establishing a modern, adaptive foundation for river health evaluation and sustainable watershed governance.
5. Discussion
This study proposes a density-informed recalibration framework for river health assessment that integrates nonlinear embedding and unsupervised clustering into a conventional ISC-based index structure. The results demonstrate that the proposed approach improves internal consistency among sub-indices and enhances diagnostic resolution without compromising interpretability. These findings support the applicability of data-driven structural information for refining composite environmental indices in river restoration contexts.
From a methodological perspective, the proposed framework differs from traditional ISC-based assessment approaches, which typically rely on fixed expert-defined weights and linear aggregation rules. While such conventional methods offer transparency and regulatory familiarity, they often lack adaptability to heterogeneous ecological responses and dominant stressors across spatially diverse river systems. By contrast, the density-informed recalibration adopted in this study enables adaptive sensitivity to intrinsic data structure, allowing dominant environmental drivers to exert proportionate influence on the composite index.
Recent advances in river health assessment have increasingly incorporated machine learning techniques, including clustering, dimensionality reduction, and predictive modeling, to improve pattern recognition and classification accuracy. However, many of these studies primarily focus on prediction or categorization outcomes and often treat learned representations as black-box outputs, limiting their interpretability and practical relevance for management and policy applications. In contrast, the present framework explicitly embeds unsupervised learning outcomes into the index recalibration process, preserving the conceptual structure of ISC while enhancing its responsiveness to empirical data patterns.
Compared with existing achievements in river health assessment, the key contribution of this study lies in bridging the gap between rule-based index construction and data-driven environmental analysis. Rather than replacing conventional indices, the proposed method augments them through a transparent recalibration mechanism grounded in cluster density and structural coherence. This balance between methodological innovation and interpretability distinguishes the present approach from prior machine learning–based assessments and aligns it with the practical needs of river restoration planning and decision support. Importantly, lower recalibrated weights should be interpreted as reflecting reduced discriminatory power within the observed data structure, rather than diminished ecological importance of the corresponding sub-index.
The application to the Zhuoshui River basin further illustrates the framework’s potential to support future assessments under real-world monitoring constraints. By maintaining reproducibility and regulatory relevance while improving diagnostic clarity, the proposed approach offers a scalable pathway for enhancing river health evaluation in other basins with comparable ecological complexity and data availability.
Despite these advantages, several limitations of the proposed framework should be acknowledged. Density-based recalibration may be sensitive to data distribution and sampling variability, particularly under sparse or uneven monitoring conditions. In addition, although parameter tuning (e.g., perplexity selection) was systematically evaluated, the potential risk of overfitting cannot be entirely eliminated when optimizing nonlinear embeddings.
For example, under the traditional ISC weighting scheme, restoration efforts would prioritize hydrological modification; however, the recalibrated ISC3 framework indicates that riparian rehabilitation and nutrient management would yield greater marginal ecological gains in the Zhuoshui River, leading to a different sequencing of management interventions. By enabling transparent recalibration while preserving regulatory relevance, the proposed framework provides a practical and scalable pathway for supporting river restoration planning and evidence-based environmental decision-making.
6. Conclusions
This study developed a t-SNE–recalibrated ISC3 framework that improves river health assessment by integrating non-linear dimensionality reduction with density-informed indicator weighting. Applied to the Zhuoshui River, the framework successfully captured key hydro-ecological gradients through three dominant clusters, reduced indicator redundancy to twelve metrics, and improved predictive accuracy from 67.78% to 88.15%. Water quality, aquatic life, and riparian conditions were identified as the principal drivers of ecological resilience in subtropical river systems.
The study makes three main contributions:
- (1)
Methodological: It introduces a reproducible, non-linear recalibration mechanism that overcomes the limitations of PCA- and entropy-based weighting.
- (2)
Ecological: It preserves essential linkages among hydrology, habitat, and biota, enhancing the ecological realism of composite river-health indices.
- (3)
Governance-oriented: It provides a data-driven basis for identifying high-impact management leverage points and supports alignment with sustainability objectives such as SDGs 6.6 and 15.1.
The framework is compatible with IoT monitoring, remote sensing, and AI-based analytics, offering potential for real-time diagnostics and adaptive watershed management. With continued calibration and expanded long-term datasets, the ISC3–t-SNE model can serve as a scalable, transferable foundation for intelligent and sustainable river-governance strategies across diverse hydro-ecological contexts.