1. Introduction
Accelerated glacier retreat driven by climate change is no longer a distant concern [
1]; it is already reshaping hazard dynamics in high-mountain regions worldwide. Glacial lake outburst floods (GLOFs), ice and snow avalanches, debris flows, and slope instabilities are becoming more frequent and, in many cases, more severe, with direct consequences for downstream communities, infrastructure, and fragile ecosystems [
2,
3,
4]. Recent events have shown that even relatively small glacierised basins can trigger cascading processes—rapid water release, sediment mobilisation, and geomorphological instability—that result in disproportionately large impacts [
5].
The socio-economic consequences of these hazards are substantial. A single GLOF can destroy hydropower plants (e.g., the 2021 Sikkim event that damaged the Chungthang dam, disrupting over 1200 MW of generation capacity [
6]), and sever strategic transport corridors (e.g., sections of the Karakoram Highway repeatedly washed out by debris flows [
7]), and displaced thousands of people in hours. False negatives—classifying a hazardous lake as safe—can, therefore, lead to catastrophic loss of life and infrastructure. In contrast, false positives—classifying a safe lake as hazardous—waste scarce mitigation resources and erode the trust of the community in risk assessments. For this reason, defensible and reliable hazard mapping is not merely an academic exercise: it is a prerequisite for evidence-based risk management, early warning system design, and the prioritisation of mitigation investments. Decisions based on unvalidated or overconfident maps carry asymmetric and potentially devastating consequences.
In this context, hazard assessment plays a central role. It aims not only to estimate the likelihood and magnitude of hazardous events and inform decisions about preparedness, mitigation, and resource allocation [
8]. In practice, susceptibility mapping has become a widely adopted approach. Rather than predicting specific events, it identifies areas where hazardous processes are more likely to occur based on terrain characteristics, glaciological conditions, and environmental triggers [
9].
However, glacier hazards are not governed by a single process. They emerge from the interaction of topography, hydrology, climate, and often poorly observed environmental conditions. As a result, hazard assessment inevitably involves combining heterogeneous sources of information—data with different resolutions, uncertainties, and degrees of completeness. This challenge has naturally led to the adoption of decision-support approaches capable of integrating diverse inputs into coherent and interpretable outputs [
1].
From this perspective, glacier hazard assessment is not purely a physical modelling problem [
10]; it is also a decision-making problem. Multiple factors—such as slope, elevation, glacier proximity, lake characteristics, lithology, land cover, and precipitation—must be considered simultaneously [
11,
12]. These factors are often uncertain, spatially heterogeneous, and only partially observable. Moreover, many glacierised regions lack dense monitoring networks, which further limits the availability of reliable empirical data [
8].
At the same time, hazard assessments are not conducted in isolation. They are used by stakeholders—local authorities, emergency planners, and affected communities—who require results that are not only technically sound but also interpretable and defensible. In practice, this means that hazard assessment methods must balance analytical rigour with transparency and usability, often under conditions of significant uncertainty [
10].
Multi-criteria decision analysis (MCDA), which denotes a set of approaches that integrate several, often competing, criteria into a single evaluative framework to support decision-making in complex settings [
13], has emerged as a natural response to these challenges. It provides a structured framework for combining quantitative data (e.g., terrain metrics, hydrological indicators) with qualitative inputs (e.g., expert judgement) to produce composite indices used to rank or classify alternatives [
14,
15]. In the context of natural hazards, MCDA is most often implemented within geographic information systems (GISs), where spatial layers are aggregated through weighted combinations [
16,
17].
Among the available techniques, the analytic hierarchy process (AHP) has become particularly dominant, largely because it offers a systematic procedure for pairwise comparison and weight derivation. Potential drivers of this dominance include the widespread availability of AHP in commercial GIS software (e.g., ArcGIS Pro 3.7 Weighted Overlay), its low mathematical barrier to entry for geoscientists, and the institutional legitimacy that it confers through structured expert elicitation. Its integration within GIS workflows has made it especially attractive for practitioners working with limited or heterogeneous data.
The Analytic Hierarchy Process (AHP) represents problems in a hierarchical structure and derives criteria weights from expert-based pairwise comparisons. ROC (receiver operating characteristic) analysis and its associated summary measure, AUC (area under the curve), are standard tools for assessing the predictive accuracy of spatial models by comparing susceptibility rankings with observed hazard occurrences.
There are good reasons for the widespread adoption of MCDA. It is flexible, practically applicable, and capable of producing hazard maps that are relatively easy to interpret and communicate. In environments where data are scarce and uncertainty is unavoidable, these features are not trivial—they are often decisive. Yet, these same strengths raise important questions. Because MCDA relies on weighting schemes and aggregation rules, its outputs are inherently shaped by modelling choices and expert preferences. Hazard classifications, therefore, may reflect not only environmental conditions but also the assumptions embedded in the decision model itself.
Despite the extensive use of MCDA in glacier hazard assessment, a fundamental issue remains insufficiently examined: what kind of knowledge these models actually produce. Many studies focus on generating hazard maps, often treating them—implicitly or explicitly—as predictive representations of risk. However, the decision models underlying these maps are rarely scrutinised as analytical constructs.
A more detailed examination of the literature shows a clear trend: most studies depend on just one technique—typically AHP—and offer little comparison with alternative methods. Criteria weights are typically derived from expert judgement, yet their influence on results is seldom explored through systematic sensitivity or robustness analysis. Uncertainty is acknowledged but rarely modelled explicitly. Perhaps most importantly, validation against observed hazard events remains the exception rather than the norm.
This creates a tension that is rarely made explicit. On the one hand, MCDA-based hazard maps are often presented as objective, spatially explicit outputs. On the other hand, they are produced through preference-dependent models whose assumptions are only partially examined. The question is not whether MCDA is useful—it clearly is—but whether its outputs are being interpreted in ways that exceed what the underlying models can support.
Although numerous case studies exist, there has been little effort to synthesise these practices from a methodological perspective. In particular, the reliability implications of current modelling choices—how robust, reproducible, or empirically valid these hazard classifications are—remain largely unexplored. Addressing this gap requires moving beyond cataloguing applications and towards a critical examination of how MCDA is actually used in practice. This issue becomes particularly relevant in the current transition toward AI-driven hazard modelling approaches, concerning which similar questions regarding interpretability, validation, and reliability arise.
Additionally, it is plausible, though not yet systematically documented, that the adoption of MCDA for glacier hazard assessment varies substantially across world regions. In the European Alps, long-established monitoring networks, historical event inventories, and process-based hydrological models provide alternative pathways for hazard assessment that do not rely primarily on expert-driven decision frameworks. In contrast, in High Mountain Asia—where monitoring infrastructure is sparse, the terrain is extreme, and the policy pressure is acute—MCDA may have been adopted more easily as a pragmatic replacement for data-intensive modelling. Similarly, the Andes and North American cordillera occupy intermediate positions with different research traditions and data availabilities. These regional differences in methodological culture, if they exist, have not been systematically synthesised.
To address this issue, this study conducts a systematic review of the literature following the PRISMA 2020 protocol [
18], which is an evidence-based guideline for reporting systematic reviews, designed to promote transparency in the identification, selection, data extraction, and synthesis of studies. The objective is not only to describe how MCDA has been applied but also to evaluate the methodological reliability of these applications and to clarify the role that MCDA-based outputs play in glacier hazard assessment.
Consequently, the study is guided by the following central research question:
To what extent do current MCDA applications provide reliable and defensible decision-support for glacier hazard assessment?
To operationalise this question, four analytical research questions are examined:
RQ1. Why has the Analytic Hierarchy Process (AHP) become the predominant MCDA method in glacier hazard assessment studies?
RQ2. To what extent do criteria weighting practices reflect methodological justification rather than practical convenience?
RQ3. Why is uncertainty and robustness analysis rarely incorporated in MCDA-based glacier hazard assessments?
RQ4. Which methodological improvements (e.g., sensitivity analysis, comparative modelling, hybrid or ensemble MCDA) could enhance the reliability of glacier hazard assessments?
This review evaluates a single directional hypothesis: that the widespread adoption of AHP in glacier hazard assessment is driven predominantly by operational convenience (software availability, ease of implementation, institutional familiarity) rather than by demonstrated predictive accuracy or problem-specific fit. If this hypothesis holds, we expect to observe: (i) uniform application of AHP across GLOF, landslide, and multi-hazard contexts; (ii) rare comparative evaluation against alternative MCDA methods; (iii) low and temporally stagnant rates of quantitative validation; and (iv) minimal reporting of sensitivity or robustness analysis. Each of these expectations is examined through the research questions formulated above.
This study offers three key advances to the discipline of glacier hazard assessment and decision-making analysis. First, it provides a structured methodological audit of MCDA applications in glacier hazard studies. Rather than cataloguing applications, the review examines how decision models are constructed, how criteria weighting is justified, how uncertainty is treated, and how results are validated. In doing so, it shifts attention from the production of hazard maps to the reliability of the modelling process itself.
Second, the study identifies a systematic pattern in current practice: a strong reliance on compensatory aggregation methods—particularly AHP—combined with expert-derived weighting and limited robustness evaluation. The findings suggest that many hazard classifications are highly sensitive to modelling assumptions, while being interpreted as objective spatial representations. This reveals a gap between how models are constructed and how their outputs are used in practice.
Third, the paper outlines a research agenda aimed at improving the credibility of MCDA-based hazard assessments. This includes the need for explicit validation against observed events, systematic sensitivity and robustness analysis, transparent uncertainty modelling, and comparative evaluation of alternative or hybrid decision frameworks. These directions are intended to support a transition from descriptive susceptibility mapping to more defensible and evidence-based decision-support tools.
This review is designed as a structured methodological audit, not a catalogue of applications. We seek to evaluate how decision models are constructed, what assumptions they embed, and whether the results are empirically defensible.
The remainder of the paper is organised as follows.
Section 2 reviews existing research on MCDA and hazard assessment.
Section 3 describes the systematic review protocol and the data extraction process.
Section 5 presents the results of the synthesis.
Section 6 discusses the implications of the findings.
Section 7 outlines the study limitations, followed by future research directions in
Section 8. Finally,
Section 9 concludes the paper.
2. Related Work and Research Gap
2.1. Glacier Hazard Assessment as a Decision Problem
Mountain glacier environments are undergoing rapid transformation due to climate change, increasing the frequency and impact of hazards such as glacier lake outburst floods (GLOFs), debris flows, landslides, and snow or ice avalanches [
2,
3,
8]. These hazards threaten settlements, transport infrastructure, hydropower installations, and downstream ecosystems in high-mountain regions worldwide.
Assessing glacier hazard susceptibility is inherently complex. Hazard formation depends on interacting geomorphological, hydrological, climatic, and anthropogenic factors that are spatially heterogeneous and often uncertain. Consequently, glacier hazard assessment rarely corresponds to a purely physical prediction problem. Instead, it requires prioritising locations, classifying hazard levels, and supporting mitigation decisions under incomplete information. For this reason, glacier hazard assessment is fundamentally a decision-analysis task: the goal is not only to model natural processes but also to support planning and risk management by integrating heterogeneous environmental indicators and expert knowledge.
Although this review focuses specifically on glacier-related hazards (GLOFs, ice avalanches, debris flows from glacierised basins), we acknowledge that adjacent periglacial environments—including mountain permafrost, rock glaciers, and ice-cored moraines—also generate hazards relevant to the same high-mountain communities and infrastructure [
19]. Rock glaciers, for example, can destabilise under warming conditions, triggering landslides or debris flows, and their meltwater can contribute to lake formation and outburst floods [
20]. However, the decision-analytic literature on periglacial hazards remains sparser than on glacier hazards, and studies combining MCDA with periglacial processes were not captured by our search terms (e.g., “glacier*” was a required term). A systematic extension of this review to periglacial environments would be a valuable complementary study. For the purposes of this review, we retain the focus on glacier hazards while recognising that the methodological reliability concerns identified (validation, uncertainty, robustness) apply equally to periglacial hazard assessments.
2.2. MCDA in Natural Hazard and Cryospheric Studies
Multi-criteria decision analysis (MCDA) has been widely adopted to support complex environmental decisions involving multiple conflicting criteria [
21,
22]. MCDA methods combine quantitative data and qualitative judgement through explicit trade-offs among factors, making them suitable for spatial risk assessment and environmental planning.
MCDA has been applied in several hazard domains, including flood risk management, landslide susceptibility, and water resource allocation [
14,
15,
17]. In glacier hazard assessment, MCDA is typically implemented within geographic information systems (GISs), where spatial predictor variables—such as slope, elevation, glacier proximity, precipitation, and lake characteristics—are aggregated into hazard susceptibility maps [
16,
23].
Compensatory additive methods dominate practice. The Analytic Hierarchy Process (AHP), weighted overlay, and conventional weighting schemes are particularly common because they integrate naturally with raster-based GIS workflows and allow expert-based pairwise comparisons [
17,
24,
25]. As a result, MCDA is widely used to identify dangerous glacial lakes, prioritise monitoring actions, and classify hazard zones.
To put the critique of MCDA into perspective, it is helpful to compare its characteristics with those of other modelling approaches that are commonly applied in hazard assessment. Physically based models (e.g., hydrodynamic flood models, slope stability equations) encode mechanistic process understanding but require extensive calibration data and computational resources, making them difficult to apply at regional scales. Statistical susceptibility models (e.g., logistic regression, weights of evidence) estimate empirical relationships between hazard occurrence and environmental predictors, offering data-driven weight calibration and inherent validation metrics (e.g., AUC and pseudo-R2), but they require large inventories of observed events and assume stationarity. Machine learning classifiers (e.g., random forests, support vector machines, neural networks) can capture complex non-linear relationships and interactions, often achieving high predictive accuracy, but they are notoriously opaque—lacking the transparency and traceability of MCDA—and require even larger training datasets. MCDA, in contrast, excels in data-sparse environments, accommodates qualitative expert knowledge, and offers full interpretability. However, as this review demonstrates, MCDA rarely tests its predictive claims, whereas statistical and ML models routinely report cross-validated performance metrics. This asymmetry in validation culture—not the inherent superiority of any method—is a central concern. The complementarity is clear: MCDA structures decisions when data are scarce; ML predicts when data are abundant. Hybrid workflows that combine both are a promising direction.
2.3. Methodological Characteristics of Current Practice
Most MCDA-based glacier hazard studies follow a similar workflow. Environmental variables are converted into spatial layers, criteria weights are assigned (usually by expert judgement), and a composite index is produced through additive aggregation. The index is then classified into hazard categories. Although operationally effective, this procedure embeds strong modelling assumptions. Additive aggregation presumes independence among criteria and linear compensation between favourable and unfavourable factors [
22]. Furthermore, the weighting of the criteria is often subjective and heavily depends on expert interpretation [
26].
The linear compensation assumption embedded in additive aggregation is not simply a technical detail; it carries substantive consequences for the classification of hazards. Full compensation means that a very low score on one criterion can be offset by a very high score on another. For GLOF susceptibility, this is physically problematic. Consider a glacial lake with an extremely large volume and rapid expansion (strong hazard signals) but located behind a stable, wide moraine with no evidence of past breaching. A compensatory model could classify this lake as high-hazard because the volume compensates for the stability of the moraine. Yet, a geomorphologist would recognise that the moraine condition is a non-compensatory criterion: if the dam is stable, the lake is safe, regardless of its size. In contrast, a different lake with moderate volume but active ice avalanches into the lake and a steep, overdeepened moraine front might be genuinely dangerous, but a compensatory model with low weight on triggering factors could miss it. Non-compensatory methods—such as ELECTRE, PROMETHEE, or other outranking approaches—are designed to handle such logical structures [
21,
26].
Validation practices are heterogeneous. Some studies compare results with historical hazards inventories, while others rely only on qualitative interpretation or visual agreement with known hazardous areas. Uncertainty is frequently acknowledged but rarely formally analysed; systematic robustness analysis or uncertainty propagation is uncommon [
27,
28].
Thus, the literature shows consistent operational practices but limited methodological scrutiny.
2.4. Conceptualising Reliability in MCDA for Hazard Assessment
The preceding sections have documented the widespread adoption of multi-criteria decision analysis (MCDA) in glacier hazard assessment. However, the question of what constitutes a reliable MCDA-based hazard classification remains under-theorised in the literature. The term “reliability” is frequently invoked in general terms to imply trustworthiness or credibility, yet it encompasses distinct analytical properties that require separate consideration. This subsection develops a conceptual framework that distinguishes four dimensions of reliability relevant to MCDA applications in hazard assessment: reproducibility, robustness, predictive validity, and procedural reliability. These distinctions provide the analytical vocabulary for the methodological audit presented in this review.
Before proceeding, it is necessary to clarify what is meant by “prediction” in the context of this review, as the term carries different meanings across hazard modelling traditions. Following the distinction established in the geospatial prediction literature [
29,
30], we differentiate between two senses of prediction. Strong prediction refers to forecasting the specific timing, location, and magnitude of individual hazard events—for example, predicting that a particular glacial lake will breach on a specific date. This form of prediction is rarely attempted in susceptibility mapping and is not what MCDA models claim to provide. Weak prediction (or susceptibility prediction) refers to estimating the relative likelihood of hazard occurrence across a spatial domain, typically expressed as a ranking or classification (e.g., “high susceptibility zones are more likely to experience GLOFs than low susceptibility zones”). This weaker sense of prediction is testable through spatial cross-validation or historical back-testing. When this review critiques MCDA outputs as being interpreted as “predictive representations of risk,” it refers to weak prediction—the claim that high-hazard zones systematically correspond to locations where events are more likely. The distinction is important because a model may be useful for weak prediction (prioritising areas for field investigation) while being entirely inadequate for strong prediction (early warning). The four reliability dimensions introduced below operationalise weak prediction through predictive validity (
Section 2.4.3).
2.4.1. Reproducibility
Reproducibility refers to the ability of independent researchers to obtain identical results when applying the same method to the same input data [
31,
32]. In the context of MCDA-based hazard mapping, reproducibility requires complete disclosure of:
The environmental criteria selected and their operational definitions;
The weighting procedure, including pairwise comparison matrices where applicable;
The aggregation rule (e.g., weighted linear combination, multiplicative aggregation);
Classification thresholds used to transform continuous susceptibility indices into hazard categories.
Without such transparency, hazard maps cannot be independently verified, and their status as scientific evidence remains ambiguous. Reproducibility is, therefore, a minimal condition for reliability: an unreproducible result cannot be considered analytically credible, regardless of its apparent plausibility.
2.4.2. Robustness
Robustness concerns the stability of hazard classifications under reasonable variations in modelling assumptions [
33,
34]. Because MCDA models necessarily embed subjective judgements—particularly in criteria weighting—robustness analysis examines whether conclusions change when these judgements are varied within defensible ranges. In glacier hazard assessment, robustness can be evaluated through:
Before proceeding, a terminological clarification is necessary, as the literature often uses sensitivity and robustness interchangeably. In this review, we adopt the following operational distinction. Sensitivity analysis refers to a family of techniques that systematically vary model inputs—typically criteria weights—to measure the corresponding change in outputs (e.g., hazard classification or susceptibility rank). Robustness is a property of the model or its conclusions: a result is robust if it remains stable across a defensible range of input assumptions or modelling choices. Thus, sensitivity analysis is a method for evaluating robustness; robustness is the quality that sensitivity analysis tests. A study that omits sensitivity analysis cannot claim that its hazard classifications are robust, regardless of the apparent precision of the output map. A robust hazard classification is one that persists across a reasonable ensemble of modelling choices; a fragile classification that changes dramatically under small perturbations provides weak support for risk management decisions.
2.4.3. Predictive Validity
In formal decision analysis, multi-criteria decision analysis (MCDA) methods can serve three distinct purposes. Descriptive applications aim to characterise what the world looks like—for example, by estimating the relative susceptibility of different glacial lakes to outburst flooding. Prescriptive applications recommend what should be done given stated preferences—for instance, ranking lakes to prioritise mitigation investments. Normative applications evaluate outcomes against logical axioms or procedural standards. The limited validation critique in this review applies specifically to descriptive claims: the assertion that MCDA-derived hazard maps represent empirically verifiable spatial patterns. However, many reviewed studies appear to use MCDA prescriptively (e.g., generating risk rankings to guide mitigation) without validating the descriptive basis—that is, without testing whether high-ranked zones actually correspond to observed events. Recognising this distinction clarifies that the problem is not MCDA’s predictive inadequacy per se but the conflation of prescriptive outputs with descriptive evidence.
Predictive validity refers to the correspondence between model outputs and observed hazard events [
29,
30]. In susceptibility mapping, this involves testing whether areas classified as high hazard indeed experience more frequent or severe events than areas classified as low hazard. Operationalisations of predictive validity include:
ROC/AUC analysis: comparing susceptibility rankings against independent inventories of past events;
Accuracy metrics: proportion of correctly classified hazard locations;
Confusion matrices: systematic tabulation of true/false positives and negatives;
Temporal back-testing: evaluating whether historical events fall within retrospectively classified high-hazard zones.
Predictive validity is distinct from internal consistency checks, such as the Analytic Hierarchy Process consistency ratio. While consistency ratios assess the logical coherence of expert judgements, they provide no evidence that the resulting hazard map corresponds to environmental processes. A model may be internally consistent yet predictively invalid; conversely, predictive validity requires empirical verification independent of model construction.
2.4.4. Procedural Reliability
Procedural reliability concerns the defensibility and transparency of the decision-making process itself [
33,
35]. Even when predictive validation is constrained by data scarcity—as is often the case in high-mountain environments—the process by which hazard classifications are generated should be clearly documented and justifiable. Procedural reliability encompasses:
Justification of method selection: explicit reasoning for choosing a particular MCDA technique over available alternatives;
Documentation of expert elicitation: clear reporting of how experts were selected, how judgements were elicited, and how disagreement was resolved;
Uncertainty communication: transparent acknowledgment of model limitations, data gaps, and the conditional nature of hazard classifications;
Stakeholder engagement: evidence that decision processes incorporated relevant perspectives, where appropriate.
Stakeholder engagement is not merely an ethical or procedural add-on; it is epistemically consequential for hazard assessment. In decision environments characterised by high stakes, scientific uncertainty and value disagreement—precisely the conditions of glacier hazard management—the legitimacy of a hazard classification depends partly on whether affected communities and local authorities recognise the assessment process as fair and transparent. Procedural reliability, as we define it, requires evidence that relevant perspectives have been incorporated, that disagreement among experts or stakeholders has been systematically elicited and documented, and that the decision rule (e.g., weighted sum) has been explained to non-technical audiences. Several reviewed studies mention expert consultation, but they rarely report how experts were selected, whether divergent judgments were reconciled or retained, or how stakeholder values (e.g., risk aversion, economic priorities) were translated into criteria weights. Without this documentation, the resulting map may be analytically defensible but procedurally opaque, undermining its uptake in real-world risk governance.
Procedural reliability does not guarantee that a hazard map is predictively accurate, but it ensures that the map is presented as a decision-support artefact whose assumptions are open to scrutiny rather than as an unexamined prediction.
2.4.5. Relationship Among Reliability Dimensions
These four dimensions are neither mutually exclusive nor fully independent. Reproducibility is a prerequisite for both robustness analysis and external validation: if model construction is not transparent, others cannot test its stability or predictive performance. Robustness analysis can be conducted without predictive validation, providing evidence on the stability of classifications even when event data are unavailable. Predictive validity, where demonstrated, provides the strongest form of empirical support, but it depends on the availability of independent hazard inventories. Procedural reliability underpins all dimensions by ensuring that modelling choices are documented and defensible.
Together, these dimensions define a framework for evaluating the reliability of MCDA-based hazard assessments. A study reporting only a single deterministic hazard map, without transparency in weighting, sensitivity analysis, or predictive validation, achieves none of the four dimensions. Conversely, a study that documents all modelling steps, tests robustness through sensitivity analysis, and validates against independent observations satisfies all dimensions and provides strong evidence for risk management decisions.
This framework guides the methodological audit presented in this review. In the following sections, we examine the extent to which existing MCDA applications in glacier hazard assessment achieve reproducibility, robustness, predictive validity, and procedural reliability. The framework also provides the conceptual basis for the MCDA-HAZARD quality assessment instrument described in
Section 3.9.
2.5. Limitations of Existing Literature
Despite the large number of applications, the literature exhibits recurring methodological limitations.
First, method selection is rarely theoretically justified. The widespread adoption of AHP and weighted overlay approaches appears to be strongly influenced by their availability in GIS software (e.g. ArcGIS Pro 3.7) rather than by decision-theoretic reasoning. Consequently, compensatory aggregation models are routinely used without examining their assumptions regarding criteria independence and trade-offs.
Second, validation practices are weak and inconsistent. Only a minority of studies perform quantitative validation using independent datasets or statistical accuracy measures (e.g., ROC/AUC). Many studies evaluate results through visual inspection or expert interpretation, limiting the evidential strength of hazard classifications.
Third, uncertainty treatment is limited. Glacier hazard assessment inherently involves incomplete monitoring data, remote sensing limitations, and dynamic environmental processes. Nevertheless, uncertainty is usually addressed only implicitly through expert judgement or simple sensitivity checks. Formal robustness analysis, probabilistic MCDA, or stochastic modelling is rarely implemented.
Taken together, these limitations form a systematic pattern: decision models are frequently interpreted as objective spatial predictions while remaining highly dependent on subjective weighting and compensatory aggregation assumptions.
2.6. Research Gap
A striking feature of the existing literature is not the lack of applications but the lack of reflection on what these applications actually imply. Over the past decade, a substantial number of MCDA-based studies have produced detailed glacier hazard maps for specific locations. These maps are often visually compelling and operationally useful. Yet, when viewed collectively, they raise a more fundamental question: what kind of knowledge do these models produce, and how should that knowledge be interpreted?
Despite the volume of case studies, there has been little effort to examine MCDA-based glacier hazard assessment from a methodological reliability perspective. The literature has largely evolved around producing outputs—hazard maps, rankings, classifications—without systematically questioning the assumptions that underpin them or the extent to which their results can be considered robust, reproducible, or empirically valid.
This absence of critical examination becomes particularly relevant when considering how these outputs are used. In many cases, hazard maps derived from MCDA are implicitly treated as predictive representations of risk. However, these maps are generated through decision models that depend on weighting schemes, aggregation rules, and expert judgements—elements that are rarely subjected to systematic sensitivity analysis, uncertainty modelling, or empirical validation. The issue, therefore, is not simply methodological; it is interpretative.
From this perspective, the gap in the literature is not only about missing techniques or incomplete analyses. It reflects a deeper misalignment between how MCDA models are constructed and how their outputs are understood and applied. This misalignment raises important questions about the reliability of hazard classifications and, more importantly, about the confidence that decision-makers can place in them when supporting risk management and climate adaptation strategies.
Importantly, our contribution is not merely incremental. Previous systematic reviews in natural hazard MCDA—exemplified by de Brito and Evers [
36] on flood risk—have focused primarily on method inventories (which MCDA techniques are used), geographical distributions, or conceptual frameworks. What distinguishes the present review is its explicit focus on methodological reliability as an object of analysis. We do not ask only “what methods are used?” but also “how defensibly are they applied?” “are results validated?” “is uncertainty quantified?” and “are classifications robust?” To our knowledge, no prior review has operationalised reliability across the four dimensions we introduce (reproducibility, robustness, predictive validity, and procedural reliability) and applied them systematically to glacier hazard assessments. This study, therefore, fills a distinct gap: it moves the field from descriptive cataloguing to critical methodological auditing.
Consequently, the field still lacks a clear understanding of:
Why particular MCDA methods—especially AHP—have become dominant in practice;
Whether criteria weighting reflects methodological justification or practical convenience;
How uncertainty and robustness are actually addressed, beyond implicit acknowledgement;
Whether resulting hazard classifications can be considered reliable decision-support outputs or should be interpreted more cautiously as structured expert judgements.
Addressing this gap requires moving beyond cataloguing applications towards a systematic and critical examination of methodological practice. In particular, it requires assessing not only what methods are used but how they are used, what assumptions they embed, and what claims can reasonably be made about their outputs.
2.7. Contribution of This Study
This study responds to the identified gap by performing a systematic literature review in accordance with the PRISMA 2020 guidelines, concentrating specifically on methodological robustness rather than on how often methods are applied. Yet, the value of this work extends beyond providing a structured overview; its main contribution is to reshape the way MCDA-based glacier hazard assessments are conceptualized.
First, the study offers a cross-study methodological audit of MCDA applications in glacier hazard assessment. Rather than cataloguing methods or reporting their frequency of use, the analysis examines how decision models are constructed in practice—how criteria are selected and weighted, how uncertainty is handled, and how (or whether) results are validated. This shift in focus—from methods to modelling practice—allows for a more critical evaluation of the credibility of the resulting hazard classifications.
Second, the study identifies a systematic pattern that has not been explicitly articulated in the literature. MCDA-based hazard maps are commonly presented and interpreted as objective, spatially explicit representations of risk, yet they are typically derived from preference-dependent models with limited validation, minimal robustness analysis, and implicit treatment of uncertainty. The contribution here is not simply empirical but conceptual: the study exposes a misalignment between how these models are constructed and how their outputs are understood and used.
Third, by synthesising evidence across studies, the paper provides a structured basis for reinterpreting the role of MCDA in glacier hazard assessment. The findings suggest that, in most cases, MCDA functions as a decision-structuring framework rather than a predictive modelling approach. Recognising this distinction is essential for the appropriate use of such models, particularly in contexts where they inform risk management and climate adaptation decisions.
Finally, the study outlines a research agenda aimed at improving the reliability and interpretability of MCDA-based hazard assessments. This includes the need for explicit validation against observed events, systematic sensitivity and robustness analysis, transparent modelling of uncertainty, and a comparative evaluation of alternative or hybrid decision frameworks.
Table 1 positions this study relative to previous reviews in natural hazard MCDA research. While existing reviews primarily focus on method inventories or conceptual discussions, they do not systematically evaluate validation practices, uncertainty treatment, or methodological reliability. In contrast, this study provides an integrated, cross-study assessment of these dimensions, establishing a foundation for more robust and defensible decision-support tools in cryospheric risk management.
3. Research Methodology
This section defines the rationale and objectives of the review a priori, followed by the methodological procedures, in alignment with the PRISMA 2020 reporting framework.
The increasing application of multi-criteria decision analysis (MCDA) techniques in glacier hazard susceptibility assessment has led to a heterogeneous body of research addressing diverse hazard types, information sources, modelling assumptions, and decision-making contexts. These approaches aim to support hazard zonation, prioritisation of mitigation actions, and resource allocation under conditions of uncertainty. However, despite the growing number of studies, there is limited consolidated evidence regarding which MCDA methods are predominantly used, how they are applied, and how they address uncertainty and decision-makers’ preferences in glacier hazard assessment.
To ensure methodological rigour, transparency, and reproducibility, this study was conducted as a systematic review of the literature (SLR) following the PRISMA 2020 statement (Preferred Reporting Items for Systematic Reviews and Meta-Analyses). PRISMA provides a standardised reporting framework that supports transparent identification, selection, assessment, and synthesis of studies, while reducing reporting bias and improving reproducibility. The review protocol—including research questions, eligibility criteria, search strategy, data extraction framework, synthesis procedures, and quality assessment approach—was defined a priori to minimise procedural bias and ensure methodological consistency. The protocol was registered in the Open Science Framework (OSF) and is publicly available, supporting open and reproducible research practices.
All relevant reporting items are addressed within the manuscript using PRISMA-consistent section headings. A completed PRISMA 2020 Expanded Checklist, indicating where each item is reported, is provided as Multimedia
Appendix A. The following subsections describe the methodological components of the review in alignment with the PRISMA reporting structure, including the rationale, objectives, protocol registration, eligibility criteria, information sources, search strategy, study selection process, data collection procedures, synthesis methods, and quality assessment.
By synthesising evidence across heterogeneous MCDA applications in glacier hazard susceptibility assessment, this review provides a structured overview of decision-analysis approaches used to evaluate glacier-related hazards. The findings identify dominant methods, typical modelling practices, the treatment of uncertainty, and criteria-weighting strategies, thereby offering a consolidated reference for researchers and practitioners working in natural hazard assessment and decision-support systems.
3.1. Rationale
Although numerous studies apply MCDA to the assessment of glacier hazards, the literature has primarily evaluated the resulting hazards maps rather than the decision models that produce them. Consequently, the reliability, robustness, and epistemic validity of MCDA-based hazard classifications remain largely unexamined.
Glacier hazard mapping is not merely a spatial modelling task but a decision-analytic process in which assumptions about weighting, aggregation, and uncertainty directly influence the resulting classifications. Therefore, evaluating which methods are used is insufficient; it is necessary to understand why particular methods dominate practice, whether weighting procedures are scientifically defensible, and whether model output can be considered reliable decision-support evidence.
This review is, therefore, motivated not by the need to catalogue applications but by the need to critically audit the methodological foundations of MCDA-based glacier hazard assessment.
3.2. Objectives
The objective of this systematic review is to evaluate the methodological reliability of multi-criteria decision analysis (MCDA) applications in glacier hazard assessment.
Rather than cataloguing applications, the review investigates the decision-analytic logic underlying current practice. Specifically, the review examines the drivers behind the prevalence of particular MCDA methods, the justification of weighting strategies, the treatment of uncertainty, and the extent to which hazard classifications are validated or robust.
The aim is to determine whether current MCDA-based glacier hazard assessments function as reliable analytical models or as context-dependent decision-support interpretations.
More specifically, the review addresses the following research questions:
RQ1: Why has the Analytic Hierarchy Process (AHP) become the predominant MCDA method in glacier hazard assessment studies?
RQ2: To what extent do criteria weighting practices reflect methodological justification rather than practical convenience?
RQ3: Why is uncertainty and robustness analysis rarely incorporated in MCDA-based glacier hazard assessments?
RQ4: Which methodological improvements (e.g., sensitivity analysis, comparative modelling, and hybrid or ensemble MCDA) could enhance the reliability of glacier hazard assessments?
By addressing these research questions, the review moves beyond cataloguing applications and instead examines the methodological logic underlying current practice. The synthesis evaluates why specific MCDA approaches dominate the field, how weighting and uncertainty are treated in decision models, and whether existing validation practices support reliable hazard classification. In doing so, the study identifies recurring assumptions that shape glacier hazard assessments and outlines directions for developing more robust and defensible decision-support frameworks.
3.3. Eligibility Criteria
Studies were included if they:
- –
Address glacier hazard susceptibility, vulnerability, or risk assessment;
- –
Apply a multi-criteria decision analysis (MCDA) approach, method, or technique;
- –
Use MCDA to evaluate, classify, rank, or prioritise hazardous areas or mitigation alternatives;
- –
Present an empirical application, methodological proposal, or case study related to glacier hazards.
- –
Provide sufficient methodological description of weighting, aggregation, or validation procedures.
Exclusion criteria included:
- –
Studies not related to glacier or glacial hazards;
- –
Studies addressing natural hazards without application of an MCDA method;
- –
Papers using optimisation, simulation, machine learning, or statistical models without an MCDA component;
- –
Studies focused solely on physical modelling or calibration of hazard processes;
- –
Surveys, editorial papers, theses, technical reports, or non–peer-reviewed publications;
- –
Duplicate records or studies lacking sufficient methodological description.
- –
Studies reporting hazard maps without methodological explanation of the decision model;
- –
Studies where MCDA is mentioned but the weighting or aggregation process cannot be reconstructed.
Eligible studies were grouped according to the MCDA method, type of decision-problem, uncertainty treatment, and weighting strategy for synthesis.
3.4. Information Sources
A comprehensive search was conducted in the major scientific databases that cover earth sciences, environmental sciences, and interdisciplinary research. The following sources were consulted: ScienceDirect (Elsevier); SpringerLink; Wiley Online Library; Taylor & Francis Online; and Scopus.
The search covered publications from 2015 to 2025.
3.5. Search Strategy
The search strategy combined three conceptual components: (1) multi-criteria decision-making methods, (2) hazard or risk assessment, and (3) glacier-related phenomena.
The generic search expression was:
Table 2 presents the conceptual blocks and illustrative terms used to design the search strategy. These terms capture the overlap between multi-criteria decision analysis methods, hazard and risk assessment concepts, and glacier-related phenomena, thereby supporting a search that is both comprehensive and appropriately targeted to the relevant literature.
Database-specific search strings, filters, and field restrictions are reported in
Appendix B.
3.6. Selection Process
The study selection followed a structured multi-stage screening process:
- 1.
Removal of duplicate records;
- 2.
Screening of titles and abstracts;
- 3.
Full-text eligibility assessment;
- 4.
Final inclusion based on predefined criteria.
Two reviewers independently evaluated each record at the title/abstract and full-text stages. Disagreements were resolved through discussion and consensus. No automation or machine-learning screening tools were used.
3.7. Data Collection Process
Data were extracted using a predefined extraction template designed to capture information necessary to address the research questions. For each included study, information was recorded on the details of the publication, the methodological approach, the application context, and the reported results. Each article was treated as an independent source of evidence. The extraction was performed by one reviewer and validated by a second reviewer. Ambiguous cases were discussed jointly until agreement was reached. When information was missing or unclear, it was coded as “not reported” rather than inferred.
During data extraction, a distinction was made between internal consistency checks and predictive validation. Several studies reported the Analytic Hierarchy Process (AHP) consistency ratio (CR) or similar logical coherence indicators. Although these measures evaluate the internal consistency of expert judgments, they do not assess the predictive performance of the hazard model. Therefore, AHP consistency ratios and comparable internal checks were not classified as quantitative validation.
For the purposes of this review, quantitative validation was strictly defined as the use of predictive performance metrics derived from comparison with independent observations, such as ROC curves, AUC values, precision measures, confusion matrices or equivalent statistical indicators. This definition was applied consistently in all included studies.
What the AHP Consistency Ratio Does and Does Not Tell You
The AHP consistency ratio (CR) measures only the logical coherence of an expert’s pairwise comparisons. A CR below 0.10 indicates that the expert has not made contradictory judgements (e.g., stating A > B, B > C, and C > A). It does not provide evidence that the weights of the resulting criteria correspond to physical processes, that the hazard map predicts observed events, or that the classification is robust under alternative weight specifications. A model can be perfectly internally consistent, yet predictively invalid. In contrast, a model with higher CR may still produce empirically accurate classifications if the underlying expert knowledge is sound. Reporting a low CR is, therefore, a necessary but completely insufficient condition to claim the reliability of the model. This review treats CR reporting separately from predictive validation and does not count CR as evidence of model performance.
3.8. Data Items
The following information was extracted from each included study:
- –
Hazard domain and application objective (e.g., GLOF susceptibility mapping, hazard zonation, prioritisation, or planning support);
- –
Study region and geographical context;
- –
MCDA method(s) employed, including hybrid or extended approaches;
- –
Technological implementation environment (e.g., GIS-based workflow, remote sensing integration, hydrodynamic modelling, or decision-support software);
- –
Number and type of criteria used in the decision model;
- –
Criteria weighting procedure and source of weights (expert judgement, literature-based, statistical/data-driven, or software default);
- –
Presence of methodological justification for the choice of MCDA method;
- –
Consistency verification of pairwise comparisons (e.g., reporting of the Analytic Hierarchy Process consistency ratio);
- –
Treatment of uncertainty, including sensitivity analysis, probabilistic modelling, fuzzy approaches, or absence of uncertainty analysis;
- –
Aggregation structure of the decision model (e.g., additive/weighted overlay, multiplicative, fuzzy, or stochastic);
- –
Validation procedure applied (e.g., comparison with historical events, ROC/AUC or accuracy metrics, qualitative comparison, or no validation);
- –
Software and implementation framework used in model construction;
- –
Limitations explicitly reported by the study authors.
All reported results relevant to these domains were collected. To enable structured synthesis and cross-study comparability, each included study was coded using a predefined classification scheme covering hazard type, MCDA aggregation approach, enabling technologies, uncertainty treatment, and weighting strategy. The operational definitions and category labels used for coding are summarised in
Table A2. This coding scheme was applied consistently across the dataset and forms the basis for the descriptive statistics, cross-tabulations, and subgroup analyses reported in
Section 5.
Missing or unclear information was documented without assumptions.
3.9. Quality Assessment Protocol
To evaluate methodological reliability beyond descriptive synthesis, we developed a domain-specific quality appraisal instrument—the MCDA-HAZARD framework (
Table 3). Because no established tool exists for assessing spatial decision-analysis modelling in cryospheric hazard research, this framework was constructed to address recurring methodological issues identified in the literature.
3.9.1. Domains and Scoring
The instrument assesses five methodological domains:
- 1.
Criteria definition: justification of selected variables and documentation of expert consultation or literature support.
- 2.
Weighting transparency: reporting of weighting procedures, including pairwise comparison matrices, rationale, or consistency verification.
- 3.
Uncertainty analysis: implementation of sensitivity analysis, scenario testing, fuzzy modelling, probabilistic approaches, or other robustness evaluations.
- 4.
Validation: evaluation against independent hazard inventories, statistical performance measures (ROC/AUC, accuracy), or temporal back-testing.
- 5.
Reproducibility: disclosure of data sources, parameters, and sufficient methodological detail to enable replication.
Each domain is scored on a three-point scale: 0 (not reported), 1 (partially addressed), or 2 (clearly implemented and documented). Total scores, therefore, range from 0 to 10.
Table A4 in
Appendix C provides a detailed scoring rubric that operationalises each domain and score level.
3.9.2. Assessment Procedure
All 60 included studies were independently assessed by two reviewers using predefined criteria. Discrepancies were resolved through consensus discussion.
Inter-reviewer agreement was assessed on a random subset of 20 studies (33% of the corpus). For each of the five MCDA-HAZARD domains, both reviewers independently assigned scores (0, 1, 2). The percentage agreement before consensus ranged from 78% to 92% in all domains: definition of criteria (85%), weighting transparency (78%), uncertainty analysis (92%), validation (80%), and reproducibility (82%). Disagreements (primarily adjacent score differences, e.g., 1 vs. 2) were resolved through consensus discussion. Although Cohen’s kappa would provide a chance-corrected measure, percentage agreement is reported here as a transparent indicator of coding consistency; full coding data are available in the OSF repository. The purpose of this assessment is not to exclude studies but, rather, to characterise the strength of available evidence and to identify recurring methodological weaknesses.
3.9.3. Relationship to Reliability Framework
The MCDA-HAZARD instrument operationalises the reliability concepts defined in
Section 2.4: reproducibility corresponds to domain 5; robustness is captured in domain 3; predictive validity is assessed in domain 4; and procedural reliability is reflected across domains 1, 2, and 5. Aggregate scores are analysed descriptively in
Section 5.3 and used to support the interpretation of the review findings.
3.10. Study Risk of Bias Assessment
Formal clinical risk-of-bias tools are not applicable because the included studies are methodological, modelling, or case-study-orientated rather than experimental or intervention-based. Instead, studies’ methodological quality was evaluated according to decision-model reliability criteria:
Explicit description of the weighting procedure;
The presence of consistency or sensitivity analysis;
Validation against independent evidence;
The transparency of aggregation assumptions.
Two reviewers independently evaluated each study and resolved disagreements by consensus. The quality assessment informed the interpretation but was not used as an exclusion criterion.
3.11. Effect Measures
This review does not perform a quantitative meta-analysis. Therefore, statistical effect measures (e.g., risk ratios and mean differences) are not applicable. The synthesis focuses on qualitative and categorical description of MCDA applications.
3.12. Synthesis Methods
All studies that met the eligibility criteria were included in the qualitative synthesis. The studies were categorised as follows.
- 1.
MCDA method.
- 2.
The type of decision-problem.
- 3.
Uncertainty treatment.
The data extracted were standardised into predefined categories (method type, weighting approach, and problem type). The terminological variations between studies were harmonised to allow comparison.
In addition to descriptive categorisation, an interpretative cross-study analysis was conducted to identify explanatory patterns behind methodological choices. Studies were compared to determine common drivers of method selection, barriers to robustness analysis, and recurring methodological assumptions.
The results were summarised using descriptive tables, frequency distributions, and graphical representations (e.g., temporal trends, method distributions, and types of hazard). The results of this synthesis are reported separately in
Section 5.
A statistical meta-analysis was not conducted because the studies report heterogeneous qualitative and methodological outcomes rather than comparable numerical effects estimates.
Heterogeneity was explored through a comparative grouping of studies by method type, decision problem, and uncertainty treatment.
Sensitivity analysis was not applicable due to the absence of pooled statistical measures. Instead, robustness was assessed by verifying that patterns were supported across multiple independent studies rather than isolated cases.
3.13. Reporting Bias Assessment
Publication bias was mitigated by searching multiple interdisciplinary databases and examining reference lists. The restriction to peer-reviewed articles in English is acknowledged as a potential source of bias. The review also considered methodological reporting bias, recognising that studies reporting successful hazard classifications may underreport model limitations, uncertainty, or failed validation attempts.
3.14. Certainty Assessment
Because the review synthesises methodological and descriptive evidence rather than effect estimates, formal certainty-of-evidence frameworks (e.g., GRADE) are not applicable. Confidence in the findings was evaluated based on the consistency of results across studies and completeness of reporting.
5. Results
5.1. Study Selection Results
The results of the study identification, screening and eligibility process are summarised in the PRISMA flow diagram shown in
Figure 1.
Appendix A provides an overview of the included studies.
The database search identified a total of 571 records. After the removal of 59 duplicate records, 512 records were screened based on title and abstract. Of these, 229 records were excluded during the screening stage. A total of 283 full-text reports were sought for retrieval, of which 54 could not be obtained. Consequently, 229 full-text articles were assessed for eligibility. Following the full-text assessment, 169 reports were excluded for predefined reasons. The remaining 60 studies met the inclusion criteria and were included in the final synthesis. The resulting group of included studies forms the evidence base for the synthesis presented in the subsequent subsections. This procedure promotes transparency in how studies were selected and enhances the reproducibility of the review.
Articles in their full text were excluded primarily due to a lack of methodological relevance or insufficient information. The most frequent exclusion reasons were insufficient methodological description, studies that did not address glacier hazards, and the absence of a specific MCDA method. Additional exclusions included non-English publications, conference abstracts without full papers, theses or dissertations, and duplicate content.
5.2. Study Characteristics
Table A3 (
Appendix C) presents the detailed characteristics of all included studies. The review includes 60 articles published between 2015 and 2025 that apply multi-criteria decision analysis (MCDA) to glacier-related hazard assessment. For each study, the extracted information records: (i) the hazard type analysed, (ii) the MCDA technique employed, (iii) the technological environment supporting the analysis (e.g., GIS and remote sensing), and (iv) the approach used for criteria weighting and treatment of uncertainty.
The purpose of this table is to document the empirical corpus and ensure the transparency of the data extraction process. The patterns and distributions derived from these characteristics are analysed in the following subsections.
5.2.1. Publication Timeline
Figure 2 shows the annual distribution of the included studies, while
Figure 3 presents the cumulative publication trend over time. The results indicate a gradual emergence of research on the application of MCDA techniques to glacier hazard susceptibility assessment, followed by a clear increase in publication activity after 2019.
The cumulative curve highlights a sustained growth pattern, particularly from 2020 onwards, suggesting consolidation of this topic as an established research line within natural hazard assessment and decision-support studies.
Most of the included works correspond to applied case studies in which MCDA methods are used to support hazard mapping, susceptibility zonation, and the prioritisation of mitigation measures in glacierised environments.
Although there were isolated studies on glacier hazards prior to 2015, very few met the eligibility criteria of this review. Earlier research on glacier hazards assessment focused mainly on the physical, geomorphological, or hydrological characterisation of hazards rather than on formal decision-support modelling. In contrast, the studies included in this review apply explicit multi-criteria decision analysis frameworks to integrate heterogeneous spatial information and support hazard zoning or prioritisation tasks. Therefore, the increase in eligible studies after 2015 reflects the progressive adoption of decision-analysis and spatial modelling approaches in assessing glacier hazards, rather than the absence of earlier scientific investigation of glacier-related hazards.
5.2.2. Hazard Types
Figure 4 presents the distribution of the hazard types addressed by the reviewed studies. Glacial lake outburst floods (GLOFs) represent the dominant application domain, accounting for the majority of studies. A smaller but relevant subset of research applies MCDA techniques to landslide susceptibility and debris-flow hazards in glacierised mountain environments. Only a limited number of studies address snow or ice avalanches and other cryospheric hazards.
This distribution indicates that the adoption of MCDA techniques in cryospheric environments has been primarily driven by the need to prioritise potentially dangerous glacial lakes and to support early-warning and mitigation planning in high mountain regions.
5.2.3. MCDA Methods Used
Figure 5 presents the multi-criteria decision analysis techniques used in the reviewed literature. The Analytic Hierarchy Process (AHP) is overwhelmingly the most frequently adopted method. Several studies combine AHP with other approaches (e.g., TOPSIS or COPRAS), forming hybrid MCDA frameworks, while alternative MCDA families appear only sporadically. The prevalence of AHP suggests that practical usability, ease of implementation, and compatibility with GIS-based spatial modelling strongly influence the selection of the method in glacier hazard assessment.
5.2.4. Technological Environment
Figure 6 summarises the technological environments that support the implementation of MCDA. Most studies apply MCDA within geographic information systems (GISs) and combine it with remote sensing datasets and terrain analysis derived from digital elevation models (DEMs). Satellite imagery and spatial overlays represent the primary data sources for evaluating hazard factors.
Recent publications increasingly integrate machine learning and probabilistic modelling with MCDA frameworks, indicating a transition from purely expert-driven assessment toward hybrid data-driven hazard modelling.
5.2.5. Weighting and Uncertainty Treatment
Figure 7 summarises the strategies adopted to assign criteria weights and address uncertainty within the reviewed MCDA applications.
The majority of studies employ expert-driven weighting procedures, most commonly implemented through pairwise comparisons within the Analytic Hierarchy Process (AHP). In many cases, weights are derived from expert judgement or stakeholder consultation rather than from empirical calibration. By contrast, objective or data-driven weighting approaches, such as statistically derived weights or learning-based estimation, appear only in a limited subset of the literature.
A clear distinction is observed between weighting and uncertainty treatment. Although weighting procedures are applied almost universally, the explicit modelling of uncertainty is comparatively rare. Only a small proportion of studies incorporate sensitivity analysis, probabilistic frameworks, or fuzzy set theory to evaluate the stability of hazard classifications. Most studies, therefore, produce a single deterministic hazard map or susceptibility ranking without assessing the variability of results under alternative weighting configurations.
These findings indicate that MCDA methods are widely used as decision-support tools for glacier hazard prioritisation, but robustness evaluation and uncertainty quantification remain underdeveloped components of current practice.
The methodological implications of these patterns are discussed in
Section 6.
5.3. Methodological Quality of the Included Studies
The methodological quality of the 60 included studies was evaluated using the MCDA-HAZARD Quality Assessment Instrument described in
Section 3.9. The instrument assesses five domains: criteria selection, weighting transparency, uncertainty treatment, validation, and reproducibility.
Table 4 presents summary statistics of methodological quality across all 60 studies (more details are provided in
Appendix C). The results reveal a clear hierarchy: criteria definition and reproducibility are relatively well-reported (mean scores 1.57), while uncertainty analysis is almost entirely absent (mean 0.23, only 5% of studies fully reporting). Validation and weighting transparency occupy an intermediate position, with mean scores of 1.12 and 1.13, respectively.
Figure 8 illustrates the distribution of methodological quality by domain.
The results reveal a clear imbalance in methodological practices. Most studies adequately document data sources and environmental criteria selection, and weighting procedures are generally reported, particularly when using the Analytic Hierarchy Process (AHP). However, substantial weaknesses are observed in uncertainty treatment and validation practices.
Only a small subset of studies performs formal sensitivity analysis or tests multiple weighting scenarios. Similarly, independent validation using observed hazard events or statistical performance metrics (e.g., ROC or AUC) is uncommon. In many cases, validation is limited to visual agreement with known hazardous locations or expert judgement.
Reproducibility is also variable. Although spatial datasets are often cited, full disclosure of weighting parameters, pairwise comparison matrices, and modelling assumptions is frequently incomplete, making independent replication difficult.
Overall, the quality appraisal indicates that the primary limitation of the current MCDA applications in glacier hazard assessment is not the absence of applications, but the absence of systematic reliability evaluation. Most studies produce operational hazard maps, yet comparatively few evaluate whether the resulting classifications are stable, reproducible, or predictive.
These findings support the central argument of this review: current glacier-hazard MCDA models function mainly as decision-support tools rather than validated predictive models.
5.4. Quantitative Synthesis of Evidence
5.4.1. Method Selection and Hazard Domain
To examine whether the choice of the MCDA technique depends on the hazard domain, a cross-tabulation was performed between hazard type and the primary decision method (
Table 5).
The results indicate a pronounced methodological concentration around the Analytic Hierarchy Process (AHP). Across the 60 reviewed studies, AHP was used in 36 cases (60%). Importantly, the method appears consistently across all hazard categories: 13 GLOF-focused studies, 11 landslide studies, and 12 multi-hazard assessments employed AHP as the primary decision framework. No hazard category is associated with a distinct or specialised decision model.
This pattern suggests that method selection is largely independent of the physical characteristics of the hazard being analysed. Instead, a single decision approach is routinely transferred across different problem types without adaptation to hazard-specific analytical requirements.
The distribution also demonstrates limited methodological diversity. Alternative techniques appear only sporadically: fuzzy AHP was identified in 3 studies (5%), the Best–Worst Method (BWM) in 2 studies (3%), and TOPSIS in only 1 study (2%). The remaining studies (18 cases, 30%) used heterogeneous or hybrid approaches rather than clearly defined alternative MCDA frameworks.
Consequently, the literature provides little evidence that method choice is driven by analytical suitability for specific hazards. Rather, the same decision structure is repeatedly applied regardless of whether the problem concerns glacial lake outburst floods, landslide susceptibility, or multi-hazard classification.
Table 5, therefore, indicates that glacier hazard MCDA practice is method-centred rather than problem-centred. The widespread adoption of AHP is unlikely to reflect demonstrated predictive superiority. Instead, its prevalence appears to stem from operational convenience, ease of implementation, and compatibility with GIS-based weighted overlay workflows, which have effectively standardised methodological practice across otherwise heterogeneous hazard contexts.
Table 6 examines whether method choice depends on the hazard domain. The results indicate a pronounced methodological concentration around AHP, which appears in 54.2% of GLOF studies, 73.3% of landslide studies, and 57.1% of multi-hazard assessments. This limited variation—a maximum difference of 19 percentage points—suggests that method selection is largely independent of hazard-specific analytical requirements.
These results indicate that the selection of AHP does not strongly depend on the type of glacier-related hazard under investigation. Instead, the method is applied with comparable frequency across distinct hazard contexts. This pattern suggests that methodological choice is relatively invariant to the physical characteristics of the hazard domain and reflects a broadly standardised modelling approach across applications.
Table 7 summarises the overall distribution of MCDA techniques. AHP-based approaches account for 60% of the reviewed studies, while hybrid or loosely specified implementations represent 30%. Clearly alternative MCDA frameworks, including non-compensatory or formally distinct decision methods, appear in only 10% of the literature. It means, only six studies (10%) used clearly alternative non-compensatory MCDA frameworks (BWM, TOPSIS, fuzzy AHP). The distribution indicates a strong methodological concentration around a single decision framework. Although several techniques exist within the MCDA family, most applications rely on a similar modelling structure. The limited representation of alternative methods demonstrates restricted methodological diversity within glacier hazard MCDA studies.
Taken together, the cross-tabulation presented in
Table 5 and the proportional analysis shown in
Table 7 indicate that the predominance of AHP reflects a consistent modelling convention rather than adaptation of the decision method to specific hazard processes. The strong methodological concentration reported in
Table 7 further shows that the same analytical structure is applied across heterogeneous hazard types. These results support the interpretation that current practice is method-centred rather than hazard-centred.
5.4.2. Validation Practices over Time
To evaluate whether methodological rigour has improved over time, studies were grouped by publication year and classified according to whether they reported quantitative validation (e.g., ROC/AUC, accuracy metrics, or confusion-matrix indicators). The yearly distribution is reported in
Table 8 and
Table 9 and illustrated in
Figure 9.
Across the 60 reviewed studies, quantitative validation was reported in 21 cases (35.0%), while 39 studies (65.0%) relied on qualitative comparison, historical-event matching, or no validation procedure (
Table 8). Validation rates varied considerably between years, ranging from 0% to 50%, but no consistent increasing trend is observed.
A temporal analysis does not indicate a sustained methodological improvement (
Figure 9). Early publications (2015–2018) showed validation rates between 0% and 50%, though sample sizes were small. Subsequent years also fail to demonstrate a consistent increase. The highest validation rate occurs in 2018 (50.0%), but this is not maintained in later periods. When grouped into three-year intervals to smooth annual volatility, validation rates remain essentially stable: 37.5% (2015–2018), 34.8% (2019–2021), and 34.5% (2022–2025). Therefore, the substantial increase in publication volume after 2019 (from 8 studies in 2015–2018 to 52 in 2019–2025) was not accompanied by a corresponding increase in validation practice.
Overall, the temporal distribution indicates that methodological verification has not progressed proportionally with publication growth. Most studies continue to evaluate hazard classifications through qualitative agreement with known hazardous locations or expert judgement rather than predictive performance testing. The evidence in
Table 8 and
Figure 9 demonstrates that the expansion of MCDA applications in glacier hazard assessment has occurred without a corresponding increase in quantitative validation.
We define quantitative predictive validity strictly as a performance evaluation against hazard observations using statistical metrics (e.g., ROC/AUC, accuracy, and the confusion matrix). Internal AHP consistency ratios were not counted as predictive validation. Studies reporting only AHP consistency ratios or qualitative agreement with known hazardous locations were not considered validated, as these procedures assess internal coherence rather than predictive performance.
Table 8 and
Figure 9 visualise the temporal distribution of the validation rates. The pattern confirms that validation practice has not improved systematically over the last decade. The highest validation rate (50.0% in 2018) occurs early in the study period and is not sustained. Recent years show considerable volatility: 2021 (27.3%), 2022 (37.5%), 2023 (28.6%), and 2025 (45.5%). In particular, 2024 shows zero validated studies despite three publications. This instability, combined with the absence of a positive trend, indicates that methodological verification remains inconsistent and has not kept pace with the growing volume of MCDA applications in glacier hazard assessment.
Table 10 examines the relationship between validation practice and overall methodological quality. Studies reporting quantitative validation achieve a mean quality score of 6.8, substantially higher than those with only qualitative validation (5.2) or no validation (3.5). This gradient suggests that validation is not an isolated practice but correlates with more rigorous methodology across all domains.
This gradient carries an important diagnostic signal: validation is not an isolated technical step but also a marker of overall methodological rigour. Studies that invest in quantitative validation also tend to justify criteria selection more thoroughly, report weighting procedures more transparently, and—critically—acknowledge limitations more explicitly. In contrast, studies with no validation procedure exhibit uniformly low scores in all domains, suggesting a general lack of methodological self-scrutiny rather than a focused gap in validation alone. This correlation implies that improving validation practice may have spillover effects on broader research quality, whereas piecemeal improvements to weighting or criteria selection without validation are unlikely to close the reliability gap.
5.4.3. Geographical Concentration of Case Studies
Figure 10 shows the spatial distribution of case-study locations. The reviewed literature exhibits a pronounced concentration in High Mountain Asia, particularly the Himalayan–Karakoram–Hindu Kush region, including India, Pakistan, Nepal, and the Tibetan Plateau. To complement spatial visualisation,
Figure 11 presents the number of case studies per country. The bar chart confirms a strong geographical imbalance in the reviewed literature. A large proportion of studies is concentrated in countries of High Mountain Asia, particularly India, Pakistan, Nepal, and China (Tibetan Plateau).
In contrast, only a small number of studies are reported from other glacierised regions such as the Andes, Europe, and North America. This distribution indicates that current MCDA practices in glacier hazard assessment are predominantly developed and tested within a restricted geographical context.
The mapped evidence indicates a marked concentration of case studies in High Mountain Asia (notably countries associated with the Himalayan–Karakoram–Hindu Kush region), with comparatively sparse coverage elsewhere. This geographical skew likely reflects both exposure and data availability, but it also limits the external validity of methodological claims, as practices developed in one set of geomorphological and institutional contexts may not transfer directly to other cryospheric regions (e.g., the Andes, Alps, or polar environments).
Table 11 quantifies the spatial distribution of applications. A total of 48 out of 60 studies (80.0%) were conducted in High Mountain Asia, particularly the Himalayan–Karakoram–Hindu Kush region. In contrast, all other glaciated regions collectively account for only 20.0% of the available evidence, including 6.7% in the Andes, 6.7% in Europe, and 1.7% in North America. The empirical basis of glacier hazard MCDA research is, therefore, strongly concentrated within a single geographical context.
Table 12 examines quality variation across regions. European studies show the highest mean quality scores (6.5) and validation rates (50.0%), though the sample size is small (
n = 4). High Mountain Asian studies, which constitute 80% of the evidence base, have mean quality scores near the overall average (5.6). The Andes region shows lower mean scores (4.3), potentially reflecting different research capacity or data availability contexts.
5.4.4. Summary of Key Quantitative Findings
To synthesise the quantitative evidence extracted from the reviewed studies, the principal findings are summarised in
Table 13.
The table consolidates the main patterns identified across the dataset, including method selection, validation practices, and geographical distribution. It provides an aggregated overview of the empirical results that support the detailed analyses presented in the preceding subsections.
Beyond descriptive frequencies, the review also examines associations between methodological practices:
Table 10 presents the correlation (gradient) between validation status and overall quality scores;
Table 9 and
Figure 9 analyse the temporal trends in validation rates over the decade; and
Table 12 compares quality scores between geographical regions. These analyses provide empirical support for the central claim that validation is a marker of overall rigour and that methodological verification has not improved over time.
5.5. Synthesis of Findings by Research Questions
This subsection presents an analytical synthesis of the reviewed studies, structured around the research questions that guide this systematic review. Moving beyond a mere description of individual applications, we interpret the cross-study evidence to understand not only how multi-criteria decision analysis (MCDA) is applied in glacier hazard assessment, but why these practices prevail and what they imply for the reliability of the resulting hazard classifications.
The synthesis critically examines the choices made by researchers regarding method selection, criteria weighting, uncertainty treatment, and validation. Taken together, the results reveal a consistent and consequential pattern: MCDA is widely adopted as a practical decision-support framework, yet its core methodological assumptions are routinely accepted rather than critically evaluated, creating a significant gap between the apparent precision of its outputs and their empirical grounding.
5.5.1. RQ1: Why Has the Analytic Hierarchy Process (AHP) Become the Predominant MCDA Method in Glacier Hazard Assessment Studies?
The review confirms an overwhelming methodological concentration around the Analytic Hierarchy Process (AHP), used as the primary decision method in 60% of studies and in 80% when including its variants (
Table 7). In a typical application, AHP is employed to derive criteria weights through expert pairwise comparison, after which a weighted linear combination is implemented in a GIS to produce a susceptibility map.
This dominance is not explained by a demonstrable superiority of AHP for specific hazard types. As
Table 5 shows, its use is uniformly high across GLOF, landslide, and multi-hazard assessments, indicating that method selection is largely independent of the problem’s physical characteristics. Alternative methods like TOPSIS, BWM, or outranking approaches are exceedingly rare, appearing in only 10% of studies. The “methodological monoculture” is, therefore, a product of accessibility, familiarity, and seamless compatibility with standard GIS workflows, rather than a reasoned choice based on the demands of the decision context. This has led to a form of methodological standardisation where a compensatory, preference-based model is applied by default, embedding untested assumptions about criteria independence and trade-offs into hazard assessments that are often presented as objective spatial predictions.
The tight coupling between AHP and ArcGIS—specifically the Weighted Overlay and Weighted Sum tools—has arguably been the single most important driver of the methodological monoculture we observe. This integration is not neutral; it actively shapes research practice. A researcher with a standard ArcGIS licence can, within hours, produce a susceptibility map by: (1) reclassifying raster layers, (2) running AHP pairwise comparisons using spreadsheet templates, and (3) applying Weighted Overlay with the resulting weights. The software provides no built-in sensitivity analysis, no non-compensatory alternatives, no uncertainty propagation, and no validation metrics. The workflow encourages a deterministic, single-map output as the natural endpoint of analysis. Once this pipeline is established in a research group or taught in a graduate course, it becomes institutionally established. Switching to alternative MCDA methods (e.g., PROMETHEE in a dedicated decision-support package) or to non-deterministic approaches (e.g., probabilistic SMAA) requires learning new software, new mathematical concepts, and new reporting norms—a transaction cost that few researchers bear unless explicitly incentivised. Thus, the AHP-ArcGIS workflow is not merely a method; it is a sociotechnical system that reproduces itself through software design, training, and publication practices.
5.5.2. RQ2: To What Extent Do Criteria Weighting Practices Reflect Methodological Justification Rather than Practical Convenience?
Criteria weighting is a universal step in all reviewed MCDA applications, yet the practices surrounding it reveal a profound lack of epistemic justification. Weights are almost exclusively derived from expert judgement, typically through AHP pairwise comparisons or direct scoring. Objective, data-driven methods for weight derivation are virtually absent from the literature. More critically, these subjectively derived weights are almost always applied deterministically. Only a tiny fraction of studies (5%) perform systematic sensitivity or robustness checks to explore how alternative, yet equally plausible, weighting schemes might alter the final hazard classification (
Table 4). This treatment of weights as fixed, procedural inputs rather than as testable modelling assumptions is a fundamental weakness. Because the output hazard map is a direct function of these weights, the absence of robustness analysis means that the stability of the resulting prioritisation for high-stakes decisions—such as identifying lakes for GLOF mitigation—is entirely unknown. The practice, therefore, prioritises procedural convenience over the scientific requirement to verify the influence of subjective inputs.
A related concern, visible in all reviewed studies, is the frequent absence of physical or geomorphic justification for the inclusion of criteria. Factors such as “glacier proximity,” “lake area,” “slope angle,” or “distance to fault line” are often selected because they are measurable from remote sensing data and appear in previous studies, rather than because they are mechanistically linked to hazard initiation. For example, using a fixed distance threshold (e.g., 500 m to glacier terminus) without sensitivity analysis implicitly assumes a step function in hazard potential that rarely exists in nature. Similarly, including “lake area” as a linear predictor assumes that hazard increases proportionally with area, whereas some GLOF mechanisms (e.g., moraine breach) are threshold-dependent and non-linear. This practice risks circularity: factors are selected because they are available and then validated by showing that high-hazard zones spatially coincide with known dangerous lakes—a logic that can confirm any plausible set of criteria. The field would benefit from explicit geomorphic conceptual models (e.g., causal diagrams or Bayesian networks) that justify each criterion’s inclusion, functional form, and expected direction of effect before weighting and aggregation are applied.
5.5.3. RQ3: Why Is Uncertainty and Robustness Analysis Rarely Incorporated in MCDA-Based Glacier Hazard Assessments?
The treatment of uncertainty is the least developed aspect of current practice, a finding starkly illustrated by the quality assessment, where ”Uncertainty Analysis” received a mean score of just 0.23/2.0, with a mere 5% of studies fully reporting any such analysis (
Table 4). Most studies do not explicitly model the multiple uncertainties inherent in hazard assessment—from data inaccuracies and expert disagreement to the inherent variability of natural processes. Instead, uncertainty is addressed only implicitly, if at all, through expert judgement during weighting or qualitative interpretation of the final map.
This neglect extends to validation. While 35% of studies report quantitative predictive validation, this rate has not improved over the decade, and 65% rely on qualitative comparisons or no validation at all (
Table 8 and
Figure 9). The widespread reporting of AHP consistency ratios—an internal measure of logical coherence—as a proxy for quality exemplifies a critical confusion between a model’s internal consistency and its external, predictive validity. A model can be perfectly coherent yet bear no relation to reality. This pattern suggests that the field predominantly produces “plausible representations”—maps that look reasonable to experts—rather than empirically tested and validated models of hazard.
5.5.4. RQ4: Which Methodological Improvements (e.g., Sensitivity Analysis, Comparative Modelling, Hybrid or Ensemble MCDA) Could Enhance the Reliability of Glacier Hazard Assessments?
The review identifies a clear separation between methodological innovation and routine practice. While a subset of studies explores promising enhancements—including hybrid MCDA, integration with machine learning, probabilistic frameworks, and scenario analysis—these remain isolated, proof-of-concept exercises rather than consolidated into standard practice.
The barriers to routine adoption are likely multifaceted, including data scarcity, a lack of accessible, user-friendly tools for robustness analysis, and the absence of community-agreed reporting standards. This fragmented landscape means that the field has yet to transition from widespread operational adoption of a single, simple method to a mature practice where reliability is systematically evaluated. Future progress hinges on moving beyond the production of more hazard maps. The priority must shift to developing and standardising robust, uncertainty-aware frameworks. Key directions include establishing systematic sensitivity analysis as a mandatory practice, fostering head-to-head comparisons of multiple MCDA methods on shared benchmark datasets, and exploring ensemble approaches that can provide more stable and reliable outputs by combining the strengths of different models and reducing dependence on any single set of assumptions.
5.5.5. Evaluating the Central Proposition
Regarding the claim put forward in
Section 1—that the predominance of AHP stems from its operational convenience rather than its predictive accuracy—the combined evidence offers consistent support across four empirical tests.
First, the choice of method is independent of the type of hazard. AHP appears uniformly on the GLOF (54.2%), landslide (73.3%) and multi-hazard (57.1%) assessments, with a maximum difference between domains of only 19 percentage points (
Table 6). If problem-specific fit drove method selection, greater methodological variation would be expected across physically distinct hazard processes;
Second, comparative testing against alternative MCDA frameworks is almost absent. Non-AHP methods (fuzzy AHP, BWM, TOPSIS) appear in only 10% of studies (6/60), and no study systematically compares multiple decision models on identical data (
Table 7). This absence suggests that researchers adopt AHP by default rather than through explicit model selection.
Third, quantitative predictive validation remains limited (35.0%) and has not improved over the decade (
Table 8,
Figure 9). If predictive accuracy were the primary driver of method choice, validation rates would be expected to rise over time as the field matures; they have not.
Fourth, sensitivity and robustness analyses—which test whether the results depend on subjective weight choices—are reported in only 5% of the studies (
Table 4). This omission is consistent with a practice that treats weights as procedural inputs rather than as testable assumptions.
The proposition is, therefore, supported: current MCDA practice in glacier hazard assessment is method-centred rather than problem-centred, and the dominance of AHP reflects operational convenience more than demonstrated predictive superiority. This conclusion does not imply that AHP is invalid for glacier hazards—only that its widespread adoption is not empirically justified by the evidence base and that the field would benefit from greater methodological diversity, validation, and robustness testing.
Table 14 summarises the principal findings of the cross-study synthesis organised by research question. Rather than reporting individual case-study results, the table consolidates the recurring methodological patterns observed in the reviewed literature and interprets their implications for the reliability of MCDA-based glacier hazard assessment. For each research question, the table links empirical evidence (what studies actually do) with its analytical interpretation (what this behaviour suggests) and its practical significance (why it matters for decision-support and risk management). This structured synthesis provides a concise bridge between the descriptive results and the critical discussion that follows.
5.6. Risk of Bias in Studies
The included studies correspond primarily to modelling studies and applied case studies, rather than to experimental or intervention-based research. Therefore, conventional clinical risk-of-bias instruments were not applicable.
The methodological assessment indicated that most studies clearly described their objectives and applied recognised MCDA techniques. However, variability was observed in the level of methodological transparency, particularly in terms of weighting procedures, uncertainty treatment, and validation of results.
5.7. Results of Individual Studies
The included studies do not report directly comparable quantitative effect measures. Instead, most papers present decision-support outputs derived from spatial multi-criteria decision analysis (MCDA) models.
The most common outputs are hazard susceptibility maps that classify terrain into categories such as low, moderate, and high hazard. Several studies also provide prioritisation rankings of potentially dangerous glacial lakes or composite susceptibility indices representing relative hazard levels. These outputs are typically produced through weighted overlay procedures implemented within geographic information systems. Validation practices vary substantially across studies. Quantitative predictive validation—defined as performance evaluation against hazard observations using statistical metrics—was reported in 21 of the 60 reviewed studies (35.0%). These studies employed statistical performance indicators such as receiver operating characteristic (ROC) curves, area under the curve (AUC), accuracy metrics, or confusion-matrix measures. The remaining 39 studies (65.0%) relied on qualitative evaluation approaches, including visual agreement with known hazardous locations, comparison with historical events, or expert judgement. Within this group, 11 studies (18.3%) reported no explicit validation procedure whatsoever.
Some papers reported internal consistency checks, particularly the Analytic Hierarchy Process (AHP) consistency ratio. Among the 48 studies using AHP-based approaches, 32 (66.7%) reported consistency ratios, with most values below the recommended threshold of 0.10. However, these measures assess the logical coherence of expert weighting rather than the predictive performance of the hazard model and were therefore not classified as quantitative validation.
Overall, the reported outcomes represent spatial decision-support classifications rather than directly calibrated predictive risk estimates. Because the outputs are heterogeneous in form and lack common quantitative effect measures, statistical aggregation or meta-analysis is not appropriate. Accordingly, the evidence was synthesised using structured qualitative and quantitative descriptive analyses presented in the preceding subsections.
5.8. Reporting Biases
Formal statistical methods for detecting publication bias (e.g., funnel plots or trim-and-fill procedures) were not applicable because the review did not synthesise quantitative effect estimates. The outcomes analysed consist mainly of spatial susceptibility maps and classified hazard zones, which cannot be aggregated into comparable statistical effect sizes.
Nevertheless, several potential sources of reporting bias were identified. First, the review included only peer-reviewed publications written in English, which may introduce language and publication bias by excluding relevant studies reported in local or regional outlets. Second, a clear geographical concentration of studies was observed, with a large proportion conducted in High Mountain Asia (particularly the Himalaya, Karakoram, and Tibetan Plateau). This regional dominance likely reflects both the high exposure to glacial hazards and unequal research capacity and data availability across world regions. Consequently, glacierised regions in South America, Europe, and other mountain systems are comparatively under-represented in the literature.
Another potential bias arises from the selective reporting of positive or plausible hazard assessments. Many studies emphasise the successful identification of hazardous areas while providing limited discussion of model limitations, failed predictions, or alternative classifications. In addition, validation datasets are often scarce, which may favour confirmation of expected hazard patterns.
These factors were considered when interpreting the results of the synthesis, and therefore the conclusions are framed as representative of current published research practices rather than exhaustive evidence of all MCDA applications to the assessment of glacier hazards.
5.9. Certainty of Evidence
A formal certainty-of-evidence framework designed for intervention studies (e.g., GRADE) was not applicable, as the included literature does not evaluate clinical or experimental effects but instead reports methodological applications and spatial modelling practices. The reviewed studies primarily present hazard susceptibility models, decision-support frameworks, and case-specific assessments rather than comparable outcome measures.
Confidence in the evidence was therefore assessed qualitatively. Several consistent patterns were observed across independent studies, including the predominant use of AHP-based weighting schemes, the integration of MCDA within GIS environments, and the reliance on terrain and remote-sensing variables for hazard assessment. The recurrence of these methodological practices across different study areas and research groups supports moderate confidence in the generalisability of these observations.
However, the certainty of evidence is limited by variations in methodological validation. Many studies rely on expert judgement and qualitative comparison with known hazardous locations, while only a subset employs quantitative validation metrics such as ROC curves, prediction accuracy, or event-based verification. In addition, the geographical concentration of the studies and the limited treatment of uncertainty reduce confidence in the robustness of hazard classifications.
Overall, the findings should be interpreted as a reliable characterisation of prevailing research practices rather than definitive evidence regarding the predictive accuracy of MCDA models for glacier hazard assessment.
6. Discussion
This systematic review set out not merely to catalogue applications of multi-criteria decision analysis (MCDA) in glacier hazard assessment but to interrogate the methodological foundations upon which these applications rest. In doing so, we found ourselves confronting a field that is, in many respects, at a crossroads: widely adopted yet methodologically constrained, operationally useful yet empirically under-verified.
Across the 60 studies analysed, a consistent pattern emerges. The dominance of AHP-based approaches (80%) combined with the limited use of quantitative validation (35%) suggests not simply a methodological preference, but a deeper imbalance. As a community, we appear to have embraced the practicality and accessibility of MCDA without developing, at the same pace, the evidentiary standards required to substantiate its outputs.
These patterns are not accidental. They reflect the structural realities of glacier hazard research: limited monitoring infrastructure, urgent decision-making contexts driven by climate risk, and the need to produce actionable outputs for planners and stakeholders. MCDA—and AHP in particular—responds effectively to these constraints. It enables the integration of heterogeneous data and expert knowledge into interpretable spatial products.
Yet this convenience has a cost. What emerges is a kind of methodological comfort zone—a familiar pathway from data to hazard map that avoids both the data demands of process-based models and the statistical complexity of data-driven approaches. The question we are left with is not whether MCDA is useful—it clearly is—but whether the knowledge it produces is being interpreted in ways that exceed what the underlying models can support. In other words, are we generating reliable insights, or increasingly convincing representations that remain only weakly grounded in empirical evidence?
6.1. The Epistemic Status of Hazard Maps: Between Measurement and Interpretation
In reflecting on these findings, we are led to reconsider what MCDA-generated hazard maps actually represent. At first glance, these maps resemble measurements: spatially explicit outputs with clear boundaries separating low-, moderate-, and high-hazard zones. The visual language is one of precision and objectivity.
However, our analysis suggests a different interpretation. These maps are better understood as formalised expert judgements rendered spatial. They are not direct measurements of hazard processes, but structured interpretations based on selected criteria, assigned weights, and aggregation rules.
This interpretation is not merely descriptive; it has a well-established foundation in decision science and the philosophy of science. Following Funtowicz and Ravetz [
37], glacier hazard assessment operates in the domain of post-normal science, where facts are uncertain, values are in dispute, stakes are high, and decisions are urgent. In such contexts, the traditional distinction between fact (objective measurement) and value (subjective judgement) collapses. Hazard maps produced via MCDA are better understood as formalised expert judgements rendered spatial—a concept derived from structured expert judgement theory [
38]. They embed prior assumptions about criteria relevance, weighting, and aggregation that are not empirically derived but are nonetheless consequential for outcomes. The map is not a window onto nature; it is a constructed artefact that synthesises empirical data with subjective inputs under conditions of uncertainty. Recognising this is not a weakness of MCDA—it is an honest characterisation of what the method actually does. The problem arises only when these artefacts are presented or interpreted as empirical measurements rather than as disciplined interpretations. This distinction is not merely semantic—it is epistemic. A measurement can, in principle, be validated against independent observations. An interpretation, by contrast, can only be assessed in terms of the plausibility of its assumptions and the coherence of its construction. The widespread reliance on expert-derived weights in AHP means that each hazard map embeds a set of prior judgements about the relative importance of conditioning factors. These judgements are indispensable in data-scarce environments, but they are not empirical observations.
The problem arises when these interpretations are presented, and received, as empirical findings. From our perspective, this is where the tension becomes most visible. Only a minority of studies test the stability of their results under alternative weighting configurations, and fewer still validate their classifications against observed hazard events. As a result, many hazard maps are internally consistent—often supported by AHP consistency ratios—yet externally unverified.
The apparent precision of these maps can therefore be misleading. Their clean boundaries and categorical distinctions convey certainty, but that certainty often resides in the structure of the model rather than in evidence about the world. Making this distinction explicit is essential if these tools are to be used responsibly.
6.2. The Reliability Gap: Precision Without Verification
Our findings reveal a striking asymmetry. While aspects such as criteria definition and reproducibility are relatively well addressed, uncertainty analysis and validation remain markedly underdeveloped. Only a small fraction of studies conduct systematic uncertainty analysis, and even fewer attempt robust empirical validation.
What we term the reliability gap is not simply a technical shortcoming; it reflects a deeper disconnect between the apparent definitiveness of hazard maps and the fragility of their empirical grounding. MCDA produces outputs that are precise—deterministic, clearly delineated, and easily interpretable. But precision should not be confused with accuracy.
In many cases, the visual clarity of the resulting maps obscures the uncertainties inherent in the modelling process: uncertainty in input data, in criteria selection, in weighting schemes, and in aggregation assumptions. When sensitivity analysis is absent, it becomes impossible to assess how stable these classifications are. When validation is missing, it is equally impossible to determine whether areas classified as high hazard correspond to observed events.
The stagnant validation rate—approximately 35% throughout the decade, despite a nearly sevenfold increase in annual publication volume—invites a diagnosis that goes beyond technical constraints. In the current incentive structure of the geosciences, new hazard maps are rewarded more readily than the costly, time-consuming, and less glamorous work of empirical validation. Producing a susceptibility map requires remote sensing data, GIS skills, and expert elicitation—tasks that fit within a typical PhD or a 2–3 year research project. Validation, on the contrary, requires access to independent, often incomplete historical event inventories, long-term monitoring data, and the willingness to report when a model performs poorly (a publication risk). This imbalance is a textbook example of publication bias and, at a deeper level, of misaligned incentives. Journals request validation but seldom reject manuscripts that lack it; reviewers ask for sensitivity analyses yet frequently tolerate their omission. Unless validation is treated as a mandatory prerequisite for publication rather than an optional virtue, the reliability gap will remain, no matter how much methods improve.
The implications for risk governance are immediate and unsettling. When AHP-GIS hazard maps that have not been validated are presented as ready-to-use assessments—without clearly conveying their uncertainty or reporting performance measures—they can foster a misleading sense of confidence among policymakers, emergency managers, and exposed communities. A polished map with sharp hazard zones suggests accuracy, yet if that apparent precision is not empirically grounded, decisions about resource allocation may be driven by artefacts of expert opinion rather than by actual environmental dynamics. This is not a rejection of MCDA; it is a call for careful, disciplined interpretation. Hazard maps should be presented for what they truly are: structured conjectures, not verified forecasts. Until validation becomes standard practice, such maps are most appropriate for guiding field surveys and framing discussion, rather than serving as the definitive basis for land-use planning or early-warning system design.
In our view, this gap is particularly problematic because it remains largely invisible. The map communicates certainty, but that certainty reflects the internal logic of the model rather than its empirical adequacy. As we examined these studies collectively, what became apparent was not a lack of methodological effort, but a lack of alignment between what these models are capable of demonstrating and the claims often made about them.
This situation aligns closely with what has been described as post-normal science, where decisions must be made under conditions of uncertainty, high stakes, and incomplete knowledge. In such contexts, the distinction between fact and judgement becomes blurred. The weights embedded in MCDA models are not merely technical parameters—they are expressions of priorities and assumptions. A mature methodological practice would make these assumptions explicit and subject them to systematic scrutiny.
6.3. Geographical Concentration and the Limits of Generality
The strong geographical concentration of studies in High Mountain Asia (80%) is understandable, given the region’s exposure to glacier-related hazards. However, it also raises important methodological questions.
Models developed within a specific geographical context inevitably reflect the environmental, data, and institutional conditions of that context. Criteria selection, weighting strategies, and validation practices are all shaped by local conditions. When such models are implicitly treated as generalisable, there is a risk that context-specific assumptions become normalised as the universal practice.
We suggest that the field may be developing not only a methodological concentration (around AHP), but also a geographical concentration that reinforces it. This dual concentration limits the diversity of modelling approaches and constrains the development of more context-sensitive practices.
6.4. Toward a Different Kind of Practice
If we take these findings seriously, then the question is not whether MCDA should continue to be used but how it should be used more responsibly.
First, validation must become a central component of practice. The current situation, in which only a minority of studies report quantitative validation, is difficult to justify in a field that informs high-stakes decisions. This requires not only improved reporting standards, but also investment in hazard inventories and monitoring systems that enable empirical testing.
Second, uncertainty must be treated as an object of analysis rather than a secondary concern. Simple approaches—such as systematic variation of weights, scenario analysis, and multi-model comparison—can significantly improve the interpretability and robustness of results. These practices do not require abandoning MCDA, but extending it.
Third, methodological diversity should be encouraged. The dominance of AHP appears to reflect convenience and familiarity rather than demonstrated superiority. Comparative studies applying different MCDA methods to the same datasets would provide valuable insights into how modelling assumptions influence outcomes.
We do not argue for abandoning MCDA. Rather, we argue for using it with greater methodological discipline and interpretive caution. The challenge is not only technical, but conceptual: recognising what these models can legitimately claim, and where their limits lie.
6.5. Reframing the Role of MCDA in Hazard Science
Ultimately, this review invites a reframing of the role of MCDA in glacier hazard assessment. Its strength lies not in predicting hazard events, but in structuring decisions under conditions of complexity and incomplete information.
When used appropriately, MCDA provides a transparent framework for integrating diverse sources of evidence and supporting deliberation among stakeholders. It is a tool for organising knowledge and facilitating decision-making—not a substitute for empirical modelling of hazard processes.
Recognising this distinction is not a limitation but a clarification. It allows MCDA to be used more effectively, as a decision-support framework that complements, rather than replaces, empirical approaches.
In contrast to MCDA, many AI-based approaches introduce additional challenges related to interpretability and explainability, reinforcing the need for robust validation frameworks.
Addressing the reproducibility crisis in glacier hazard MCDA will ultimately require community-agreed reporting standards—analogous to PRISMA for systematic reviews or FAIR for data—mandating disclosure of pairwise matrices, sensitivity results, validation metrics, and code.
6.6. Limitations in Context
This review has its own limitations. It reflects current practice as reported in the literature, rather than the full range of possible methodological developments. The geographical concentration of studies influences the patterns observed, and the focus on peer-reviewed publications excludes practitioner knowledge and grey literature.
However, these limitations do not undermine the central insight of this study. If anything, they reinforce it. What we observe is not a lack of methodological sophistication, but a misalignment between the ambitions of the field and the evidentiary foundations on which those ambitions rest.
Closing this gap—between representation and validation, between precision and evidence—remains a central challenge for future research in glacier hazard assessment.
6.7. Summary
The preceding sections have highlighted a consistent pattern: strong methodological uptake combined with limited validation, minimal uncertainty analysis, and a narrow geographical concentration. While these observations point to a clear reliability gap, they also raise a practical question—what would a more robust and defensible practice look like?
The reliability of glacier hazard assessments is not merely an academic concern; it is directly relevant to international policy frameworks. Improved validation and uncertainty quantification in MCDA-based susceptibility mapping contribute concretely to two Sustainable Development Goals. SDG 13 (Climate Action), Target 13.1, calls for strengthening resilience and adaptive capacity to climate-related hazards and disasters. Defensible hazards maps—those with documented validation, sensitivity analysis, and uncertainty bounds—provide the evidential basis for early warning systems and climate adaptation planning. SDG 11 (Sustainable Cities and Communities), Target 11.5, aims to reduce disaster impacts on people and infrastructure. Unreliable or overconfident hazard maps undermine this goal by misdirecting mitigation investments. Thus, the methodological improvements we advocate—systematic validation, robustness testing, and transparent uncertainty reporting—are not technical niceties but prerequisites for evidence-based disaster risk reduction aligned with global commitments.
To make this transition explicit,
Table 15 synthesises our findings alongside a set of concrete methodological implications. Rather than serving as a prescriptive checklist, the table is intended as a structured reflection of current practice and a starting point for improving how MCDA is applied and interpreted in glacier hazard assessment.
Figure 12 visualises the core message of this review through a simple but powerful lens. The horizontal axis captures evidential support—the degree to which models are validated and their uncertainties characterised—where we found that only 35% of studies report quantitative validation and just 5% fully address uncertainty. The vertical axis represents methodological uptake, where 80% of studies rely on AHP-based approaches, indicating strong adoption despite limited evidence.
The resulting position of current practice—high uptake and low evidence—defines what we call the reliability gap. The diagonal line represents the ideal trajectory toward the upper-right quadrant, where models that are widely used are also those that have been rigorously tested. Closing this gap is the central challenge for the next generation of research, requiring movement not along one axis alone but along both simultaneously: maintaining the interpretability and accessibility that make MCDA valuable while building the empirical infrastructure that validation and uncertainty quantification demand.
8. Future Work
The synthesis of the reviewed literature indicates that future research should focus on transforming MCDA from a descriptive mapping procedure into a validated analytical framework for glacier hazard assessment. The priority is, therefore, not the development of additional applications, but the improvement of methodological reliability. Below, we outline a concrete research agenda organised around five interconnected priorities, each with specific methodological directions, testable hypotheses, and pathways to implementation.
8.1. Systematic Validation Protocols
A central research need concerns the systematic validation of MCDA-based hazard classifications. Most current studies produce susceptibility maps without testing predictive performance against independent hazard occurrence data. Future work should establish validation as a non-negotiable component of MCDA practice through:
Independent event databases: Assemble and maintain open-access inventories of documented GLOF, landslide, and avalanche events, with standardised metadata on location, timing, magnitude, and impact. Such databases would serve as test beds for evaluating predictive skill across regions and methods.
Temporal back-testing protocols: Develop standardised procedures for testing whether historically documented events fall within retrospectively classified high-hazard zones. This requires consistent rules for defining temporal cutoffs (e.g., training on pre-2000 data, testing on post-2000 events) and spatial buffers for event representation.
Cross-regional transferability testing: Design experiments that apply MCDA models calibrated in one region (e.g., High Mountain Asia) to test sites in other glaciated environments (Andes, Alps, Caucasus). Such tests would reveal which methodological choices are region-specific and which generalise across contexts.
Predictive performance benchmarks: Establish community-agreed metrics for evaluating MCDA outputs, including the area under the receiver operating characteristic curve (AUC-ROC), precision–recall curves, true skill statistics, and cost-sensitive measures that account for the asymmetric consequences of false positives versus false negatives in hazard contexts.
Validation should move from an optional supplement to a required practice, with journals mandating evidence of predictive skill as a condition of publication and funding agencies supporting the monitoring infrastructure that makes validation possible.
8.2. Robustness and Uncertainty Quantification
The review demonstrates that hazard classifications are highly sensitive to criteria weighting, yet fewer than 5% of studies perform systematic uncertainty analysis. Future work should treat uncertainty as an object of analysis rather than an inconvenience to be bracketed:
Multi-scenario sensitivity analysis: Implement systematic weight variation protocols that test classification stability across the full range of plausible expert judgements. Rather than reporting a single hazard map, studies should present sensitivity maps showing the proportion of weighting scenarios in which each location is classified as high-hazard, or ensemble maps displaying the median and interquartile range of susceptibility scores across weight perturbations.
Probabilistic weighting schemes: Replace deterministic weights with probability distributions elicited from multiple experts, then propagate this uncertainty through Monte Carlo simulation to generate probabilistic hazard classifications. Methods such as stochastic multi-criteria acceptability analysis (SMAA) [
39,
40] are well developed in the decision sciences but rarely applied in glacier hazard contexts.
Bayesian approaches to expert elicitation: Develop structured protocols for eliciting expert judgements that quantify not only central tendencies but also uncertainty and inter-expert disagreement. Hierarchical Bayesian models can then combine these judgements with empirical data, automatically down-weighting uncertain or discordant inputs.
Fuzzy and interval-based methods: Where probability distributions cannot be reliably specified, fuzzy membership functions or interval weights can represent imprecise knowledge. Future work should compare the performance of probabilistic, fuzzy, and interval approaches on common benchmark datasets to establish guidance for method selection under different data availability scenarios.
8.3. Comparative Method Evaluation
Although AHP dominates current practice, very few studies compare alternative MCDA methods using identical datasets. Such comparisons are essential for moving method selection from convenience to evidence:
Controlled benchmarking experiments: Design studies that apply multiple decision models—including AHP, fuzzy AHP, TOPSIS, ELECTRE, PROMETHEE, and outranking approaches—to identical criteria layers and spatial inputs. Outputs should be compared not only in terms of final hazard classifications but also in terms of sensitivity to input perturbations, stability under weight variation, and computational requirements.
Method–hazard fit assessment: Develop theoretical frameworks for matching MCDA methods to hazard types based on their mathematical properties. For instance, do compensatory methods like AHP systematically overestimate hazard in locations with one extremely unfavourable factor? Are outranking methods more appropriate when criteria are strongly interdependent? Such questions require systematic investigation.
Multi-method consensus analysis: Explore whether locations consistently classified as high hazard across multiple MCDA methods provide more reliable targets for mitigation than locations identified by any single method. This would establish empirical grounds for recommending methodological pluralism in high-stakes decisions.
8.4. Ensemble Frameworks and Machine Learning Integration
Reliance on a single decision model can limit robustness when criteria or data are uncertain. Future research should develop hybrid and ensemble approaches that combine the interpretability of MCDA with the predictive power of machine learning.
Before detailing ensemble and hybrid approaches, it is worth clarifying the distinctive role that MCDA can play in an era increasingly dominated by AI and machine learning. The emergence of black-box predictive models—random forests, gradient boosting, convolutional neural networks—has not rendered MCDA obsolete. Rather, it has clarified MCDA’s complementary strengths. First, MCDA offers full interpretability: every weight, every pairwise comparison, and every aggregation step is transparent and traceable, unlike the latent representations of deep learning. Second, MCDA can incorporate qualitative expert knowledge and stakeholder values directly into the decision structure—something that purely data-driven models cannot do without post-hoc translation. Third, MCDA serves as an interpretable baseline against which black-box models can be benchmarked: if a neural network does not outperform a properly validated AHP model, the added complexity is difficult to justify. Fourth, MCDA functions as a modular component in hybrid workflows—for example, using convolutional neural networks for automated feature extraction from satellite imagery, followed by MCDA for transparent hazard prioritisation. In this review, we, therefore, treat MCDA not as a competing paradigm to AI but as a decision-structuring framework whose reliability must be established on its own terms before it can be meaningfully integrated with or compared against data-driven methods.
Random Forest ensembles of MCDA outputs: A particularly promising direction involves treating multiple MCDA models as an ensemble, analogous to Random Forest in machine learning. Rather than selecting a single weighting scheme or decision method, researchers could:
- –
Generate a large ensemble of plausible hazard maps by varying: (i) criteria weights across expert-elicited ranges, (ii) aggregation rules (additive, multiplicative, outranking), (iii) classification thresholds, and (iv) input data sources or resolutions.
- –
Train a random forest classifier on this ensemble, using locations with documented hazard events as training labels, to learn which combinations of model outputs are most predictive.
- –
The resulting meta-model would retain interpretability (each base model is a transparent MCDA formulation) while achieving the predictive performance associated with ensemble methods.
This approach directly addresses the core problem identified in this review: the gap between operational uptake and predictive verification.
Machine learning for weight calibration: Use documented hazard events to learn optimal criteria weights from data, rather than relying solely on expert judgement. Methods such as logistic regression, support vector machines, or neural networks can be trained to predict event occurrence from the same criteria used in MCDA models. The learned weights can then be compared with expert-derived weights, and discrepancies can inform iterative refinement of both models and expert understanding.
Hybrid MCDA–ML workflows: Develop pipelines in which machine learning handles tasks MCDA does poorly (automatic feature extraction from remote sensing imagery, pattern recognition in time series) while MCDA handles tasks ML does poorly (incorporating qualitative expert knowledge, making trade-offs explicit, supporting stakeholder deliberation). For example, convolutional neural networks could identify potentially dangerous glacial lakes from satellite imagery, while MCDA prioritises them for ground-based monitoring based on expert-elicited criteria.
Interpretable ML as MCDA alternative: Explore whether inherently interpretable machine learning methods—such as decision trees, rule-based classifiers, or explainable boosting machines—can serve as alternatives to MCDA, combining predictive performance with the transparency that hazard managers require.
8.5. Infrastructure for Reproducible Research
The reproducibility crisis in environmental modelling [
31,
32] has not spared glacier hazard MCDA. Future work should embed reproducibility into research practice:
Open-source software frameworks: Develop and maintain open-source toolkits (e.g., Python libraries, R packages) that implement MCDA methods with built-in sensitivity analysis, uncertainty quantification, and validation reporting. Such tools would lower the technical barrier to rigorous practice and ensure methodological consistency across studies.
Standardised reporting guidelines: Establish community guidelines for reporting MCDA-based hazard assessments, requiring disclosure of: (i) all pairwise comparison matrices, (ii) consistency ratios for all experts and aggregation levels, (iii) full sensitivity analysis results, (iv) validation metrics with confidence intervals, and (v) code and data sufficient for independent replication. Journals should mandate adherence to these guidelines.
Benchmark datasets and challenges: Create curated benchmark datasets with documented hazard events, high-quality criteria layers, and standardised train–test splits. Organise community challenges (e.g., “Predict the next GLOF in the Himalayas using MCDA or hybrid methods”) to accelerate methodological innovation and enable fair comparisons.
8.6. Georeferencing and Spatial Data Infrastructure
A specific but consequential gap in current practice concerns the lack of precise geolocation for case study sites. Most reviewed studies report only a general region (e.g., “Hunza Valley, Pakistan”) without providing decimal latitude-longitude coordinates for individual glacial lakes, hazard zones, or validation points. This omission limits the ability to aggregate data across studies, perform meta-analyses, or link findings to other spatial datasets (e.g., climate models, topographic indices, and land use). The assignment of decimal latitude-longitude ([dLL]) to each study site would enable: (i) mapping of methodological patterns (e.g., which regions use which MCDA methods); (ii) spatial cross-validation (e.g., testing whether models calibrated in one region predict hazards in another); and (iii) semantic linking with open data repositories. Emerging workflows combining Large Language Models (LLMs) for information extraction with geospatial databases could semi-automate the georeferencing of existing studies. We, therefore, encourage future MCDA-based hazard assessments to report study site coordinates as standard practice, and we call for the development of a community-maintained, georeferenced database of MCDA hazard applications—akin to the GLOF database of Veh et al. [
41] but focused on methodological meta-data. This would transform the current corpus from a collection of isolated case studies into an interoperable, spatially explicit evidence base.
8.7. Synthesis and Outlook
The research agenda outlined above responds directly to the limitations identified in this review. Validation moves from exception to norm; uncertainty from omission to analysis; method selection from convenience to evidence; MCDA from standalone tool to component of hybrid, ensemble, and machine-learning-assisted workflows.
The field stands at an inflection point. The glaciers are retreating, the hazards are intensifying, and the demand for actionable assessments will only grow. Meeting this demand requires not more maps but better science: maps that are tested, uncertainties that are quantified, methods that are compared, and results that are reproducible. The path forward is clear. What remains is the collective will to walk it.
The ensemble approach we propose—particularly the random forest aggregation of diverse MCDA outputs—offers a concrete starting point. It honours the interpretability that makes MCDA valuable while harnessing the predictive power that ensemble methods provide. We invite the community to take up this challenge: to build, test, and refine frameworks that combine the best of both worlds, and in doing so, to transform glacier hazard assessment from a craft of plausible representation into a science of accountable prediction.
9. Conclusions
Across sixty studies spanning a decade of research, a consistent and consequential pattern emerges. The field of glacier hazard assessment has embraced multi-criteria decision analysis with enthusiasm, yet the organising logic of this uptake is neither problem-centred nor evidence-driven. Rather, current practice is fundamentally method-centred: the same decision framework—AHP with weighted linear aggregation in a GIS environment—is applied to glacial lake outburst floods, landslides, debris flows, and avalanches, with minimal adaptation to the distinct physical mechanisms or decision contexts of each hazard type. Method selection is driven not by demonstrated predictive superiority or problem-specific fit, but by operational convenience: software availability (ArcGIS Weighted Overlay), low mathematical barriers, and institutional familiarity. The consequence is a methodological monoculture that produces visually compelling hazard maps while systematically deferring empirical validation, uncertainty quantification, and robustness testing. This is not a failure of individual researchers; it is a structural feature of the field’s current incentive landscape, software infrastructure, and publication norms. The central claim of this review, therefore, is not that MCDA is useless—it is not—but that the way MCDA is currently practiced produces maps that are method-centred rather than problem-centred, and interpreted as predictive while functioning as structured expert judgement. Closing this gap requires not technical tweaks but a fundamental reorientation: from producing more maps to producing more accountable ones.
The empirical patterns are instructive. AHP-based approaches dominate methodological choice, accounting for the majority of applications. Case studies are heavily concentrated in High Mountain Asia, shaping both methodological norms and empirical expectations. Quantitative validation, while present, remains limited. Taken together, these patterns do not indicate failure, but they do point to an imbalance: a field that has prioritised operational applicability over evidential grounding.
The central contribution of this study is to make this imbalance explicit. What we observe is a systematic mismatch between methodological uptake and evidential support. MCDA-based hazard maps are widely produced and frequently interpreted as predictive representations of risk. Yet the modelling practices that generate them—reliant on expert-derived weights, deterministic aggregation, and limited validation—are more consistent with structured interpretation than with empirical prediction.
From this perspective, the hazard map is not a measurement but a formalised judgement rendered spatial. This distinction is not merely conceptual; it defines the limits of what can be claimed. While such models can organise knowledge, support prioritisation, and facilitate communication, they cannot, in their current form, be assumed to provide validated predictions of hazard occurrence.
If we take this insight seriously, then the direction for future research becomes clearer. Validation must move from a peripheral activity to a central requirement. Sensitivity and robustness analysis should be treated as integral components of the modelling, not optional additions. Weighting schemes must be recognised as assumptions to be tested rather than inputs to be accepted. And methodological diversity—through comparative and hybrid approaches—should be encouraged to better understand how modelling choices shape outcomes.
More broadly, the field must align its methodological practices with the level of confidence it seeks to claim. Producing hazard maps is not, in itself, sufficient; what matters is the extent to which those maps are supported by evidence, tested against alternative assumptions, and interpreted within their epistemic limits.
For practitioners, the implications are both practical and cautionary. MCDA remains a valuable tool for structuring complex decisions, particularly in data-constrained environments where alternative approaches may not be feasible. Its transparency and flexibility make it well suited for integrating diverse sources of information and supporting stakeholder dialogue.
However, its outputs should not be mistaken for predictive certainty. A hazard map derived from a single model configuration, without validation or uncertainty analysis, is best understood as a hypothesis—a structured representation of risk informed by available knowledge and expert judgement. In high-stakes contexts, such representations should be complemented with multiple lines of evidence and interpreted with explicit recognition of their limitations.
We began this review with a simple question: what kind of knowledge do MCDA-based hazard assessments actually produce? Our answer is that they produce structured, interpretable, and often useful representations of risk—but not, in most cases, validated predictions.
Recognising this distinction is essential. It does not diminish the value of MCDA; rather, it clarifies its proper role. Used appropriately, MCDA can support deliberation, make assumptions explicit, and help navigate complex decision spaces. Used uncritically, it risks conveying a level of certainty that the underlying models cannot justify.
The glaciers are retreating, hazards are intensifying, and decisions cannot wait. In such contexts, clarity is valuable—but only if it is honest about its limits. The challenge for the field is therefore not to abandon MCDA, but to use it with greater rigour, transparency, and interpretive care.
Ultimately, the maps we produce will shape real decisions. Whether they do so wisely depends not only on how they are constructed but also on how they are understood.
These findings are particularly relevant in the context of the growing adoption of AI and machine learning in hazard modelling, where issues of interpretability, validation, and reliability become even more critical.