1. Introduction
The devastating earthquakes that struck Türkiye and Syria in 2023, as well as Myanmar in 2025, have highlighted the urgent global need for robust methods to assess the vulnerability of buildings and infrastructure to seismic hazards. These catastrophic events, which resulted in widespread destruction and significant loss of life, demonstrated that developing accurate vulnerability functions is not only critical for the most affected regions, but also for all communities exposed to earthquake risk. By quantifying how structures respond to varying levels of ground shaking, vulnerability functions provide essential information for disaster preparedness, risk mitigation, and effective emergency response worldwide [
1,
2].
Apostolaki et al. [
1] introduced a rapid damage assessment methodology that leveraged open-access data and tools to estimate building damage from the 2023 Kahramanmaraş, Türkiye earthquake sequence. Their study showed that high-resolution exposure models and updated ground shaking data significantly enhanced the accuracy and reduced the uncertainty of rapid damage estimates, with results closely matching official large-scale damage reports—thereby supporting the effectiveness of this approach for emergency response and recovery planning. Similarly, Cai et al. [
2] documented the catastrophic Mw 7.9 earthquake in Myanmar, which caused extensive damage, thousands of fatalities, and left hundreds of thousands homeless, particularly in Mandalay and surrounding regions. Their report underscored the urgent need for improved seismic preparedness, stricter building code enforcement, and the integration of advanced technologies for rapid disaster assessment and response.
Seismic hazards in Thailand, though often overshadowed by other natural disasters, require careful consideration in disaster mitigation planning. The country’s northern and western regions are situated near active tectonic plate boundaries, making them susceptible to occasional earthquakes. Although the seismic activity in Thailand is generally less frequent and less intense than in neighboring countries, and though major earthquakes are rare, events such as the 2014 Chiang Rai earthquake (Mw 6.2) demonstrated the potential for significant damage, largely due to vulnerable local construction [
3,
4,
5]. More recently, the 2025 Sagaing Fault earthquake (Mw 7.9) in nearby Myanmar produced severe shaking in northern Thailand, underscoring the region’s ongoing seismic risk, despite its typically low activity levels [
2].
A significant challenge remains the lack of comprehensive research and data collection on seismic hazards in Thailand, which impedes the development of effective mitigation and preparedness strategies. Addressing this gap is essential for strengthening disaster risk reduction efforts. Developing vulnerability functions—which statistically relate ground motion amplitude to expected damage—enables decision-makers to estimate potential losses for various building types and population groups, before an earthquake occurs. By combining empirical damage data from past events with structural modeling, these functions support targeted retrofitting programs, informed land-use planning, and efficient resource allocation. This data-driven approach can promote resilience and help reduce future human and economic losses.
The Chiang Rai earthquake of 5 May 2014, originating from the Pha Yao active fault, was a moderate seismic event with a moment magnitude of 6.2. The location is shown in
Figure 1. Despite its relatively small scale, the earthquake caused disproportionate damage and widespread destruction, primarily due to inadequate preparedness and non-compliance with building codes. The event exposed significant vulnerabilities in local construction practices, where homes are not built according to engineering preferences, and without adherence to structural standards. Prior to this incident, seismic hazards were not prioritized in Thailand, resulting in underdeveloped scientific data, research, and disaster management plans. This lack of preparedness contributed to increased losses and public disarray [
6]. The Chiang Rai earthquake highlighted the urgent need for improved seismic risk assessment and disaster management strategies in Thailand. Consequently, this research aimed to develop an empirical vulnerability function based on the 2014 earthquake, providing a valuable reference for future seismic risk assessments and disaster management planning in the region.
This study proposes an alternative approach to vulnerability assessment based on loss data, complementing conventional methodologies. Vulnerability functions, which describe the relationship between hazard level and resulting damage, are essential tools for quantifying the potential consequences of local hazards. The primary objective is to equip policymakers, emergency responders, and planners with a more comprehensive understanding of the risk factors specific to the study area. By applying both conventional and alternative vulnerability function development approaches, this research aimed to capture and analyze the unique vulnerabilities of the local context more effectively. This dual methodology is designed to enhance disaster management strategies by providing more nuanced and tailored insights into local risk factors, based on the uncertainties of the reported loss data, ultimately improving disaster preparedness and response capabilities for future seismic events.
Vulnerability and fragility assessments are integral components of disaster management globally, often discussed in tandem within the literature. These assessments rely on crucial data such as exposure inventories, damage classifications, and hazard levels. Recent studies have contributed to the development of global fragility and vulnerability models, with notable examples including Martins et al. (2020)’s analytical model for common building classes and its application in the Global Earthquake Model Foundation’s seismic risk assessment [
7]. Peduto et al. (2017) proposed a three-phase methodological framework for local assessments, encompassing exposure element identification, damage severity classification, and function generation [
8]. Rossetto et al. (2018) outlined four key elements of catastrophe models: hazard, exposed assets, fragility, and financial modules, noting that vulnerability combines fragility with financial aspects when economic loss is directly measured [
9]. Methodologies for developing these functions can be empirical, heuristic, analytical, or hybrid, with empirical approaches being most common for seismic vulnerability. Japan has been a significant contributor to damage-to-loss function development [
10]. The reliability of empirical functions heavily depends on the data quality and quantity, and the procedures for mitigating biases and fitting statistical models. Key challenges include asset characteristic metadata, sample representativeness, survey availability, potential inaccuracies in building safety evaluations, and misclassification errors [
9].
Porter (2021) provided a clear distinction between fragility and vulnerability in the “Beginner’s Guide to Fragility, Vulnerability, and Risk”, addressing the frequently interchangeable use of these terms in previous studies [
11]. This research adopts a definition of vulnerability as the economic losses associated with repair costs, reflecting the financial impact of disasters on a community. Vulnerability is a crucial component in assessing community resilience, which is defined as the ability to recover from adverse events [
12,
13,
14,
15,
16,
17]. While vulnerability encompasses social, political, environmental, and economic aspects, this study specifically focuses on economic vulnerability in Chiang Rai province. The analysis quantified vulnerability using losses calculated from local government compensation for disaster relief allocated to residential repairs. Importantly, the study differentiates vulnerability from fragility: vulnerability measures the extent of loss (e.g., repair costs), whereas fragility evaluates the probability of damage occurring [
11]. This distinction provides a clear framework for analyzing the economic impact of seismic events and contributes to a more nuanced understanding of disaster risk assessment in the region.
This study developed vulnerability functions using loss data derived from damage claims, ensuring an accurate representation of the damage incurred. Data collection was facilitated by a comprehensive engineering survey administered by a governmental agency, enabling unbiased district-wide asset evaluation. To address data uncertainties, the study employed cumulative Dempster–Shafer possibility theory as an alternative to the conventional cumulative probability lognormal method. Unlike probability theory, which is designed to manage single uncertainties, possibility theory accommodates multiple uncertainties, offering improved correlations for empirical vulnerability function development. This approach enhanced the robustness of the vulnerability assessment, providing a more nuanced understanding of seismic risk in the study area and potentially improving the accuracy of future disaster management strategies.
2. Research Gap and Methodological Context
Seismic vulnerability assessment is a critical component of disaster risk management, as it enables authorities and stakeholders to understand how different buildings and infrastructure are likely to perform during earthquakes. By identifying which structures are most at risk, decision-makers can prioritize retrofitting, enforce building codes, and allocate resources more effectively, to reduce potential losses and enhance community resilience. Globally, several approaches are used for seismic vulnerability assessment: empirical methods rely on observed damage data from past earthquakes; analytical methods use structural modeling and simulations; heuristic (expert-based) approaches draw on professional judgment, where data are limited; and hybrid methods combine elements from both empirical and analytical techniques, to improve reliability and adaptability [
18,
19]. These diverse methodologies provide a robust toolkit for tailoring risk-reduction strategies to different regions and data environments.
Key methodologies for developing seismic vulnerability functions include empirical, analytical, hybrid, and heuristic approaches. Empirical methods rely on observed damage or loss data from past earthquakes to statistically relate ground motion amplitude to expected damage. Their main advantage is that they directly reflect real-world performance, making them practical and straightforward, where sufficient data exists. However, they are limited by the availability and quality of post-event data and may not cover all building types or seismic scenarios [
18,
19]. Analytical methods use structural modeling and simulations (such as nonlinear static or dynamic analyses) to predict how buildings will respond to different earthquake intensities. These methods are highly flexible and can be applied to new or untested building types, but they require detailed information about structural properties and can be computationally intensive. Their accuracy also depends on the assumptions and models used [
19]. Hybrid approaches combine empirical data with analytical modeling to overcome data limitations and improve reliability, while heuristic (expert-based) methods draw on professional judgment when data are scarce. Hybrid methods offer adaptability and robustness, but can be complex to implement, whereas heuristic approaches are fast, but may introduce subjectivity and uncertainty [
18,
19].
This study focuses on the empirical method, using observed data from past earthquake events to evaluate how buildings actually performed under real seismic conditions. By analyzing these real-world outcomes, the research aims to better understand vulnerabilities and improve preparedness for future earthquakes. In application, the cumulative probability lognormal method is the most prevalent approach for developing empirical fragility or vulnerability functions in seismic risk assessment. Several studies have contributed to refining this methodology and enhancing its accuracy. Yamaguchi et al. (2000) improved fragility functions for the 1995 Kobe earthquake using a larger, higher-quality dataset [
20], while Yamazaki et al. (2000) utilized updated Kobe City survey data to identify structure type and construction period as key damage probability factors [
21]. Yamazaki et al. (2019) later developed fragility functions for the 2016 Kumamoto earthquake, revealing greater vulnerability in wooden and older structures [
22]. Midorikawa et al. (2011) analyzed data from seven earthquakes between 2003 and 2008, demonstrating a stronger correlation between peak ground velocity and damage compared to peak ground acceleration [
23]. More recently, Torisawa et al. [
24] examined building damage in Uki City following the 2016 Kumamoto earthquake, confirming higher damage ratios in wooden and older buildings. By combining data from Uki City and Mashiki Town, they improved the fragility function accuracy, particularly for wooden structures. These studies collectively demonstrate the ongoing refinement of fragility and vulnerability assessment methodologies, emphasizing the importance of comprehensive data analysis for improving seismic risk evaluation.
Uncertainty is an inherent part of seismic vulnerability assessment, and managing it effectively is essential for producing reliable results. Typically, uncertainty is addressed by first identifying its main sources, such as variability in ground motion estimates, inconsistencies in damage or loss classification, and limitations in sample size or data aggregation. Statistical techniques, e.g., Bayesian methods, bootstrapping, and model diagnostics (including residual analysis and likelihood-based criteria), are commonly used to quantify and communicate these uncertainties. Transparent reporting of data quality, model assumptions, and confidence intervals further helps users understand the reliability and limitations of assessment outcomes [
18,
19]. By systematically evaluating and disclosing uncertainty at each stage, analysts can improve the credibility and usefulness of vulnerability functions for disaster risk management.
Locally, Latcharote et al. (2020) assessed seismic risk in Chiang Mai Municipality by adapting fragility functions for low-rise buildings, employing rapid visual screening and deterministic earthquake scenarios to estimate potential physical damage and emphasize the need for preparedness [
25]. Foytong et al. (2020) developed empirical seismic fragility curves using post-seismic survey data from the 2014 Mae Lao earthquake, analyzing 26,551 buildings and demonstrating a strong correlation between observed damage and peak ground acceleration, with engineered reinforced concrete buildings showing better seismic performance [
26]. Ornthammarath et al. [
27] focused on damage to public buildings following the 2014 Mae Lao earthquake, deriving fragility functions and comparing them with those for residential buildings. Their research revealed significant nonstructural damage, particularly in community health centers and temples, and highlighted the poor seismic performance of local temples during moderate ground shaking, underscoring the need for improved safety strategies to mitigate future earthquake risks. These studies collectively contribute to a growing body of knowledge on seismic vulnerability in Thailand, emphasizing the importance of continued research and preparedness efforts.
There are still many promising directions for advancing seismic vulnerability research in Thailand. While recent studies have provided valuable insights, there is a clear need for more robust empirical vulnerability functions that are specifically tailored to the loss data collected by local government agencies. These datasets often contain uncertainties, due to variations in data acquisition methodologies, making it important to address and account for such challenges. Exploring advanced uncertainty modeling techniques, such as possibilistic approaches, could further help manage data limitations and clarify ambiguous results. By focusing on these areas, future research can deliver more reliable and actionable guidance for disaster risk reduction and earthquake preparedness in Thailand.
This study directly addresses key gaps in seismic vulnerability research in Thailand by applying both probabilistic and possibilistic models to a uniquely large, standardized loss dataset. Leveraging detailed damage assessments collected by volunteer engineers across nine districts in Chiang Rai, i.e., covering over 15,000 damaged residences and more than 230,000 residents, the research ensured a robust empirical grounding. Methodologically, the study introduces innovations such as data batching and explicit treatment of uncertainty, with particular attention given to classification errors arising from human judgment during field assessments. By systematically quantifying and managing these uncertainties, the approach enhances the reliability and transparency of vulnerability functions. The expected impact extends beyond local practice, offering a replicable framework for other regions facing similar data and uncertainty challenges, and contributing valuable insights to the international discourse on empirical vulnerability modeling and disaster risk reduction.
A key limitation of this study lies in the process of classifying earthquake-induced damage into small, medium, and large categories. This classification was based on field assessments conducted by trained engineers and volunteers using standardized guidelines. However, despite these efforts to promote consistency, the process inherently involved subjective human judgment. Such subjectivity can introduce bias and epistemic uncertainty, potentially affecting the accuracy of the derived vulnerability functions and the reliability of the subsequent risk evaluations. The possibility of misclassification, i.e., whether due to differences in assessor experience, interpretation of criteria, or field conditions, means that some degree of uncertainty was unavoidable in the loss data.
In this study, a sensitivity analysis was conducted to evaluate how potential misclassification of damage levels could have influenced the accuracy of the empirical vulnerability function. Specifically, 10% of cases originally classified as medium loss were reassigned to either the small or large loss categories, reflecting the likelihood of such errors in real-world assessments. The results revealed that moving 10% of medium losses to the small loss class led to a substantial 21.47% increase in the total root mean square error, indicating a significant impact on model accuracy. In contrast, shifting 10% from medium to large loss resulted in only a minimal 0.13% change in RMSE. This result is shown in
Figure 2, i.e., the comparison illustrates the sensitivity of model accuracy to potential misclassification in damage assessment.
The findings demonstrate that the vulnerability function is considerably more sensitive to underestimating medium losses, i.e., misclassifying medium damage as small, than to overestimating them by classifying them as large. This behavior is rooted in the asymmetric distribution of the loss categories: small losses make up approximately 31% of the dataset, while large losses account for only about 3%. When 10% of medium loss (M) cases were reclassified as small (MtoS), this introduced higher actual losses into the dominant small-loss category, significantly increasing the discrepancy between observed and predicted values, and thereby causing a substantial rise in RMSE. In contrast, when a similar 10% shift was made from the medium to the large loss category (MtoL), the RMSE changed only marginally, because the small proportion of large losses reduced the statistical influence of such misclassifications. Thus, the model is more vulnerable to misclassification when it affects high-frequency categories, underscoring the need for particularly careful delineation between small and medium loss categories in empirical data assessments, to preserve the reliability of seismic vulnerability estimations.
A key limitation of this study is the uncertainty from the subjective damage classification, where the boundaries between small, medium, and large losses are often unclear. This ambiguity, common in post-earthquake assessments, can impact the accuracy of vulnerability functions and risk estimates. To address this, the possibilistic method was applied to develop vulnerability functions [
28]. This approach also underscores the need for careful interpretation given the inherent limitations of the data.
3. Loss Data Gathering and Processing
The northern region of Thailand exhibits significantly higher seismic activity compared to other parts of the country. This area has a documented history of seismic events, with the 2014 earthquake garnering substantial public attention due to the extensive damage it caused. The severity of the impact was exacerbated by the fact that most structures in the region were constructed without adherence to seismic engineering regulations [
6]. The 2014 seismic event was recorded as the most economically damaging earthquake in Thailand’s history, profoundly affecting local communities. Given its seismic significance and the considerable socio-economic impact of the 2014 earthquake, this northern region was selected as the focus of this study. The choice of this study area provides an opportunity to analyze vulnerability in a context where seismic risk has been historically underestimated, potentially offering valuable insights for future disaster preparedness and mitigation strategies.
This study analyzed the monetary losses incurred for repairing residential properties following the 2014 earthquake in Chiang Rai, which had a significant economic impact on the community, as illustrated by the loss distribution in
Figure 3. The loss data were derived from compensation requests submitted by local residents to government authorities after the declaration of an official disaster. The allocation process, determined at the discretion of local government, involved homeowners initiating compensation requests, which were subsequently evaluated by volunteer engineers. These engineers categorized the damage into three levels: small, medium, and large, with compensation provided based on these categories and corresponding damage extent limits. This systematic approach to data collection and damage assessment provides a comprehensive foundation for understanding the economic vulnerability of residential structures in the region, offering valuable insights for future disaster preparedness and mitigation strategies.
This study addresses geographical variations in monetary value by calculating the average Replacement Cost New (RCN) per square meter using publicly available data from government-funded construction projects. The research employs a direct unit pricing method to estimate the current cost new for each component necessary to replace or recreate the subject asset [
29]. By utilizing government spending reports on various project types representative of typical buildings, the study establishes a standardized metric for evaluating property losses that accounts for geographical cost disparities. The RCN, which reflects the cost of constructing a new residence from the ground up (excluding land prices), encompasses material acquisition, labor expenses, overhead costs, developer profits, and entrepreneurial incentives. To enable meaningful comparisons across districts, loss data were normalized by dividing them by the calculated RCN, resulting in weighted loss data. This methodology provided a more accurate and comparable assessment of earthquake-induced losses across different geographical areas, enhancing the validity of the vulnerability analysis. The distribution of weighted loss data with respect to PGA is illustrated in
Figure 4, providing a visual representation of the relationship between ground motion amplitude and monetary loss. The plot provides insights into how earthquake-induced ground motion correlates with observed losses across the studied sample of damaged residences.
This research analyzed data from 15,031 damaged residences, comprising approximately 6% of the total residential properties in Chiang Rai province. The study focused on average, non-engineered residential structures, including precast concrete and wooden frame buildings with unreinforced masonry infill walls. During data cleaning, records were removed based on three criteria: unidentifiable House IDs, houses subject to compensation limits, and duplicate House IDs. These exclusions ensured data integrity and accuracy in reflecting the full extent of incurred losses. The total loss data are visually represented in
Figure 5, categorizing losses into 31% small, 67% medium, and 3% large segments, providing a comprehensive overview of the loss distribution. This methodical approach to data preparation and analysis enhanced the reliability of the study’s findings regarding the earthquake-induced damage to residential structures in the region.
The process of determining compensation for disaster-affected individuals carried out by provisional governments inherently involves various uncertainties that require careful consideration. In the context of loss data, these uncertainties primarily arise from the assessment of damage levels, where the distinctions between small, medium, and large damage categories often lack precise boundaries. The nature and extent of losses significantly affect their classification, as a loss categorized as large must necessarily progress through medium and small damage levels. These characteristics of loss data—namely, the ambiguity in data boundaries and the sequential nature of damage categorization—underscore the need for a robust methodology for developing vulnerability functions [
28]. This complexity in damage assessment and classification highlights the importance of employing analytical approaches that can handle multiple uncertainties to ensure accurate loss data for vulnerability function development.
4. Ground Motion Prediction
This study focuses on a seismically active region where the Eurasian and Indian tectonic plates converge, resulting in moderate but impactful earthquakes. The research developed a vulnerability function based on the 5 May 2014 seismic event, which had a moment magnitude (
) of 6.2 and originated at a depth of 6 km within the Pha Yao fault zone, one of Thailand’s fourteen active faults. The Pha Yao fault, with estimated dimensions of 10 km in width, 15 km in length, and 10 km in depth, has a maximum magnitude potential of
6.6 [
6]. The 2014 seismic event exhibited two distinct mechanisms, characterized by strike, dip, and rake angles of 67-81-0 and 337-90-171, respectively. These detailed seismological parameters were utilized to estimate the ground motion, providing a comprehensive basis for the vulnerability analysis in the study area.
This study selected Peak Ground Acceleration (PGA) as the primary measure of ground motion amplitude for analyzing non-engineered, low-rise structures. PGA is directly correlated with the force exerted on structures during seismic events, making it particularly relevant for this context. For small, stiff structures, PGA often serves as a more accurate indicator of potential damage compared to other ground motion amplitude parameters [
30,
31,
32,
33,
34]. This is primarily because these structures are more responsive to high-frequency ground motions, which PGA effectively captures. The majority of non-engineered housing in the study area consists of rigid or brittle structures, predominantly unreinforced masonry [
30,
35,
36,
37,
38]. For such constructions, the sudden accelerations quantified by PGA are typically more critical in inducing damage than the sustained motions described by Peak Ground Velocity (PGV). The abrupt force changes represented by PGA align well with the failure mechanisms of brittle materials, which are prone to sudden fracture or collapse under rapid loading conditions. These factors collectively justified the selection of PGA as the primary seismic amplitude measure for this research.
This study sought to identify the most appropriate ground motion prediction model for the study area. Tanapalungkorn et al. [
39] conducted a thorough investigation to determine the most effective attenuation model for predicting ground motion in Northern Thailand, utilizing seismic data recorded from various local earthquakes. Their research underscored the accuracy of the Sadigh et al. [
40], Boore and Atkinson [
41], and Chiou and Youngs [
42] models in predicting ground motion. Among these models, the Chiou and Youngs model incorporates
, which represents the average shear-wave velocity within the uppermost 30 m of the Earth’s crust. This parameter is critical for accurately capturing ground motion characteristics during seismic events and is a fundamental component in seismic hazard assessments and site-specific ground motion predictions. Accordingly, this study employed the Chiou and Youngs model, utilizing
data from Thamarux et al. [
43]. Thamarux et al. provided a
map based on geomorphological classification, with
values derived from microtremor array measurements using the spatial autocorrelation (SPAC) method.
Figure 6 illustrates the ground motion prediction results for the study area, derived from the Chiou and Youngs (2008) model, indicating Peak Ground Acceleration (PGA) values ranging from 0.00 to 0.30 g. These predictions were integrated with RCN weighted loss data from local government sources to develop an empirical vulnerability function. Validation against measured seismic amplitude data from four stations [
44] revealed that the Chiou and Youngs model provided the closest match to observed values, as demonstrated in
Figure 7. This figure compares observed peak horizontal acceleration (PHA) against predicted PGA from various GMPE models, with the ideal result following the diagonal dash-line. The superior performance of the Chiou and Youngs model in predicting ground motion for the study site underscores its appropriateness for this research, providing a map foundation for the vulnerability analysis.
5. Empirical Vulnerability Function Development
This study focused on developing vulnerability functions using loss data, specifically repair costs, obtained from the Chiang Rai local government. To standardize the monetary equivalent, the loss data were adjusted by weighting Replacement Cost New (RCN) values. The research was structured around two main hypotheses: firstly, that a cumulative possibilistic-based model would yield higher correlations than a probabilistic-based model when accounting for multiple uncertainties in the loss data [
28], and secondly, that controlling for the number of observations would enhance the function correlations. This approach aimed to provide a more accurate and nuanced understanding of vulnerability in the context of seismic risk assessment, taking into account the complexities and uncertainties inherent in loss data analysis.
This study presents results under three scenarios, to address the hypotheses and mitigate potential overestimation in higher seismic amplitude ranges due to disproportionate observation numbers: (i) a probabilistic-based vulnerability function (ProbVF), and (ii) a probabilistic-based vulnerability function with controlled number of observations (nOC-ProbVF). Additionally, to improve the handling of loss data uncertainties, specifically (1) ambiguity in data boundaries, and (2) sequential stages property, a third scenario was introduced: (iii) a possibilistic-based vulnerability function (PossVF). The efficacy of these approaches was evaluated using residual plots, to evaluate the quality of the model fit.
5.1. Probabilistic-Based Vulnerability Function
The probabilistic approach, particularly utilizing cumulative lognormal probability theory, remains the standard method for developing vulnerability functions, also known as fragility curves [
11,
20,
21,
22,
23,
24]. This theory posits that the cumulative probability of damage occurring at or above a specified level follows a lognormal distribution, with parameters determined through the least-squares method applied to lognormal probability. The standard normal distribution function, defined by its median and standard deviation, is calculated through the linear relationship between the loss fraction and the Z-score. In this study, loss data were standardized using RCN values, categorized into three damage levels (small, medium, and large) with ground amplitude intervals of 0.01 g, and the cumulative lognormal probability method was applied to derive the ProbVF model.
To address the significant variability in damage distribution across different seismic amplitude intervals, which could potentially lead to overestimation in higher ground amplitude ranges with fewer observations, this study implemented a refined approach. The method involved controlling the number of observations and implementing non-fixed ground amplitude intervals, thereby improving the accuracy and ensuring a more balanced representation across varying amplitude levels. This strategy mitigated bias and enhanced the reliability of the vulnerability function development process, culminating in the creation of the nOC-ProbVF (number of Observations Controlled Probabilistic Vulnerability Function). By adopting this methodology, the study aimed to provide a more accurate assessment of structural vulnerability across a wide range of ground intensities.
The dataset was divided into batches of PGA based on the largest number of loss observations available within the smallest possible PGA interval. Subsequently, the data were further batched across PGA intervals to the reduce skewness in the distribution of observations and approximate a normal distribution. This was achieved by considering the maximum and minimum number of observations within each PGA interval and grouping adjacent PGA ranges to minimize the standard deviation within each class. This systematic approach produced notable improvements. As a result, the consistency and reliability of the data distribution were enhanced across the entire amplitude spectrum, leading to a more balanced representation of vulnerability loss across varying ground motion intensities.
Figure 8 illustrates the distribution of loss across PGA intervals before and after batching. The figure demonstrates that, following the reorganization of PGA batches, the loss distribution exhibited reduced right skewness.
The PGA interval assembly process for the nOC-ProbVF model followed a systematic approach designed to optimize the data distribution across ground motion intensities. Initially, the dataset was organized in ascending order according to PGA values at the highest available precision, with each increment accompanied by its corresponding number of loss observations. The methodology established a reference benchmark by identifying the maximum number of loss observations within any single PGA interval, which served as the target value for batch optimization. In this study, the reference number was determined to be 302 loss observations, representing the optimal batch size for maintaining statistical reliability, while ensuring adequate sample representation. Adjacent PGA intervals were then systematically combined to achieve a number of loss observations as close as possible to this reference value of 302. This iterative grouping process continued across the entire PGA range, resulting in the formation of 62 distinct batches that maintained more consistent sample sizes compared to the original fixed-interval approach. The rearrangement strategy effectively addressed the inherent variability in loss distribution across the different ground motion intensities, thereby reducing potential bias and improving the overall reliability of the vulnerability function development process.
5.2. Possibilistic-Based Vulnerability Function
Cumulative lognormal probability theory has been predominantly employed in the development of vulnerability functions (also referred to as fragility functions) [
11,
20,
21,
22,
23,
24]. However, this approach cannot fully account for the multiple uncertainties present in the loss data available for this study, particularly those arising from incomplete knowledge and imprecise descriptions [
28,
45,
46,
47,
48]. To address these limitations, this study incorporated Dempster–Shafer theory (evidence theory), which is grounded in possibility-based theory, as an alternative method to handle these uncertainties [
28,
45,
49,
50]. By integrating this complementary approach, the study aimed to provide a more comprehensive analysis that better accounted for the complex nature of uncertainties inherent in the available loss data.
The effectiveness of a possibilistic model is inherently dependent on the number of batches employed, as excessive subdivision can lead to overfitting and reduced model generalizability. Through careful consideration of these factors, this study partitioned the dataset into 23 distinct batches, resulting in an average of 9979 loss observations per batch, thereby balancing the need for detailed analysis with the requirement for robust statistical inference.
Given the number and types of uncertainties inherent in the dataset, the empirical vulnerability function developed using Dempster–Shafer possibility theory was anticipated to improve the accuracy of the results [
28]. The methodology for the PossVF model begins by sorting the dataset in ascending order according to ground amplitude. PGA intervals are then grouped into batches based on the first available evidence of loss at each damage level. If a substantial gap exists before the next available evidence, additional batches are created within these intervals, as appropriate. Detailed procedures regarding the batching process are beyond the scope of this paper; further information on using possibility theory for vulnerability function development and batching methodology can be found in Kim (2018) and Shinozuka et al. (2001) [
28,
51], respectively. The loss fraction–ground amplitude relationship was calculated as the weighted loss over the total number of observations. Subsequently, the degree of belief was determined using evidence derived from Dempster–Shafer theory. Finally, a possibility distribution was constructed based on the certainty measure of loss [
28]. This approach provides an alternative framework for addressing the complex uncertainties present in the dataset and may reduce the tendency to overestimate vulnerability assessments.
6. Results
This study employed both probabilistic and possibilistic approaches to analyze loss data obtained from a local government relief funding project, utilizing PGA (Peak Ground Acceleration) data from the 2014 earthquake in Chiang Rai, Thailand. The vulnerability function was derived from 232,567 loss data records, with PGA values ranging from 0.00 to 0.30 g. The analysis resulted in the development of three distinct models: ProbVF (Probabilistic Vulnerability Function), nOC-ProbVF (number of Observations Controlled Probabilistic Vulnerability Function), and PossVF (Possibility-based Vulnerability Function). By employing multiple methodologies, including probabilistic and possibilistic approaches, this study aimed to identify the most suitable framework for vulnerability function development, based on the available dataset with multiple uncertainties.
Figure 9 presents the results of the ProbVF model, which calculated vulnerability for each PGA value at 0.01 g intervals. Vulnerability probabilities were developed for three loss categories: small, medium, and large.
Figure 10 shows the results of the nOC-ProbVF model, which was adjusted to equalize the number of observations across the dataset. The regression coefficients for both probability-based models are displayed in
Table 1. Notably, the nOC-ProbVF results indicate a higher loss in the lower ground amplitude and lower loss probability in the higher ground amplitude ranges, suggesting less under- and over-estimating across ground amplitude.
Figure 11 illustrates the PossVF model, which offers an alternative representation of the uncertainty in the dataset. The possibility distribution values for each loss category were calculated based on the degree of belief, resulting in a step-like cumulative pattern. This approach suggests discrete jumps in possibilities once the amplitude reaches certain thresholds, aligning with the model’s purpose of minimizing boundary ambiguity between loss sizes. By employing this possibilistic theory, the study aimed to provide a more refined characterization of vulnerability from the direct loss dataset. This model estimated a relatively lower possibility for higher ground amplitude compared to the probability models. It overestimated the vulnerability function the least.
7. Discussion
In this section, we discuss the performance and reliability of the proposed vulnerability models based on two key evaluation methods: RMSE (Root Mean Square Error) and residual analysis. By examining both the overall prediction errors and the patterns of residuals, we aim to provide a comprehensive assessment of each model’s strengths and limitations. This approach allows us to identify which models best captured the observed damage data and to highlight areas where further refinement may be needed.
Figure 12 presents a bar chart comparing the RMSE values for each model across the small, medium, and large loss categories. The PossVF model achieved the lowest RMSE in all categories, with values of 0.007 for small, 0.001 for medium, and
for large losses, indicating the best overall fit to the observed data. The nOC-ProbVF model showed moderate RMSE values of 0.073, 0.020, and 0.002 for the small, medium, and large categories, respectively. In contrast, the ProbVF model recorded the highest RMSE values, with 0.215 for small, 0.023 for medium, and 0.004 for large losses. These results highlight that the PossVF model provided more accurate and reliable predictions across all loss categories, while the ProbVF model exhibited the highest prediction errors based on this particular dataset. This comparison underscores the advantage of the PossVF approach, particularly in scenarios where minimizing prediction error is critical for effective disaster management.
Additionally, this study utilized residual analysis to evaluate the quality of model fit. In an ideal scenario, a residual plot will display residuals that are randomly dispersed around zero, indicating that the model adequately captures the underlying relationship in the data. Conversely, the presence of systematic patterns or trends in the residuals suggests that the model may not fully represent the data structure and could require refinement. A good model fit is characterized by this random distribution of residuals around zero. If the residuals consistently fall above or below zero, this may indicate bias in the model. Furthermore, observable trends or patterns in the residuals can point to missing or misspecified relationships within the model. It is also important to note that positive residuals reflect instances where the model underestimates the observed values, while negative residuals indicate cases of overestimation. This section discusses the residuals plot against the estimated probability and possibility for the three loss models corresponding to small, medium, and large damage categories: ProbVF, nOC-ProbVF, and PossVF. These approaches were employed to evaluate the quality of model fit.
The residual plots in
Figure 13,
Figure 14 and
Figure 15 illustrate the distribution of residuals for three loss models—ProbVF, nOC-ProbVF, and PossVF—across small (S), medium (M), and large (L) loss categories. For the ProbVF and nOC-ProbVF models, the residuals for small damage displayed a noticeable systematic pattern, specifically, a downward trend, indicating that these models may not have fully captured the underlying relationship for this category. In contrast, the PossVF model for small damage showed residuals more tightly clustered around zero, though some concentration at specific values suggests potential model limitations. For medium and large damage (middle and bottom rows), all three models exhibited residuals that were more randomly scattered around zero, with less pronounced patterns, suggesting a better fit for these categories. Overall, the PossVF model demonstrated relatively tighter clustering of residuals around zero across all damage levels, which may indicate a more consistent fit compared to the ProbVF and nOC-ProbVF models, especially for small damage. These observations suggest that model performance and fit quality can vary across damage categories, and systematic patterns in the residuals highlight areas where model refinement may be necessary.
Figure 13 presents the residuals for the small loss category across the different vulnerability function models. Notably, the PossVF model demonstrated the smallest gap between the maximum and minimum residuals, indicating a more consistent performance in predicting small losses. In contrast, the ProbVF model showed the largest gap in residuals, suggesting greater variability and less reliability for this loss category. Additionally, the residuals for the small loss category displayed systematic patterns, which may imply that none of the models fully captured the underlying structure of the observed data. This pattern highlights the need for further refinement of the models to improve their accuracy and better represent the true distribution of small losses.
Figure 14 displays the residuals for the medium loss category across the vulnerability models. The PossVF model again demonstrated superior consistency, showing the smallest gap between maximum and minimum residuals. In contrast, the nOC-ProbVF model exhibited the largest residual range, indicating greater prediction variability for medium losses. Notably, a systematic trend emerged when both probability and possibility values exceeded 0.035, suggesting that all models struggled to accurately capture loss patterns in this higher-amplitude range. This pattern implies that the current modeling approaches may require refinement to better represent damage mechanisms associated with medium-loss scenarios, particularly under stronger seismic intensities.
Figure 15 presents the residuals for the large loss category. In this case, all models showed similar gaps between the maximum and minimum residuals, indicating comparable performance in predicting large losses. The residuals are well distributed around zero and do not display any noticeable systematic patterns. This suggests that the models were generally effective at capturing the data structure for large loss events, and no significant bias or trend was observed in their predictions for this category.
The residual patterns observed in this study highlight important areas for future model improvement. Specifically, the presence of systematic trends in the small and medium loss categories suggests that the current models may not fully capture the complexity of damage in these cases. Future research could address this by incorporating additional factors, such as building characteristics or site conditions, and by exploring more advanced modeling techniques, to better represent non-linear relationships in the data. By refining the models in these ways and validating them with broader datasets, future studies could help improve the accuracy and reliability of vulnerability assessments for a wider range of loss scenarios.
Figure 16 presents a comparative analysis of absolute residual ranges for the three empirical vulnerability function models, i.e., ProbVF, nOC-ProbVF, and PossVF. The bars represent different loss categories—P(S) for small, P(M) for medium, and P(L) for large losses. Lower residual values indicate a better model fit to the observed data. The two probability-based models (ProbVF and nOC-ProbVF) demonstrated similar performance patterns, with both showing relatively high residual values, i.e., 0.48344 and 0.47801, respectively, for small loss predictions. The nOC-ProbVF model, which incorporated a rearranged data distribution, showed slight improvements in the modeling small loss category, of 1.123%. The possibility-based model (PossVF) demonstrated a significant improvement in predicting small losses, with residuals reduced by 49.84% compared to the probability-based approaches. This suggests that the PossVF methodology handled the ambiguities in small damage categorization more effectively, while all three models performed comparably well for medium and large loss predictions.
Rearranging the PGA data into batches, as implemented in the nOC-ProbVF model, appeared to mitigate the issue of underestimation in the lower PGA range. This batching approach allows the model to better capture the variability and trends present in lower PGA values, resulting in residuals that are more evenly distributed around zero for this segment. However, in the higher PGA range, the absolute values of the residuals remain substantial, indicating that the model continues to struggle with accurately predicting outcomes in this area. This suggests that while batching improves the model performance for lower PGA values, additional refinement or alternative modeling strategies may be necessary to enhance the predictive accuracy for higher PGA levels.
The residuals from the PossVF model were more tightly clustered around zero than those of the other models, indicating a comparatively superior fit to the observed data. This distribution suggests that the PossVF model more effectively captured the underlying relationships between variables and exhibited less systematic error across its predictions. The concentration of residuals near zero reflects that the model’s predictions were generally accurate, with minimal consistent overestimation or underestimation, thereby enhancing its reliability and validity. However, it is important to note that the predictive capability of this model is inherently limited to the range of available empirical data, which may constrain its applicability to unobserved scenarios. Despite this limitation, the model’s reliance on observed data reduces the risk of overestimation, particularly at higher damage levels. Therefore, for applications where the empirical vulnerability function is grounded in observed ground motion intensities, the Dempster–Shafer possibility theory approach, as implemented in the PossVF model, is recommended as an alternative.
8. Conclusions
This study developed empirical vulnerability functions for seismic risk assessment in Thailand using repair loss data from the economically significant 2014 Chiang Rai earthquake (Mw 6.2). The research analyzed 15,031 damaged residences from government compensation records, representing approximately 6% of the total 232,567 residential properties in Chiang Rai province. To address geographical variations in construction costs, loss data were standardized using RCN, with ground motion intensities ranging from 0.00 to 0.30 g derived from the Chiou and Youngs ground motion prediction model. The damage distribution analysis revealed 31% small damage, 67% medium damage, and 3% large damage cases, reflecting the moderate nature of the earthquake, while highlighting the widespread impact on residential non-engineered structures.
The analysis yielded three empirical vulnerability functions: (a) ProbVF, (b) nOC-ProbVF, and (c) PossVF. Lastly, residual analysis was employed to systematically compare the fit quality of the three loss models—ProbVF, nOC-ProbVF, and PossVF—across varying damage categories. The analysis revealed that while batching the PGA data in the nOC-ProbVF model helped address underestimation in the lower PGA range, it did not fully resolve prediction inaccuracies at higher PGA values.
Residual analysis demonstrated the superior performance of the possibilistic approach, with the PossVF model achieving a 49.84% reduction in residuals for small loss predictions compared to probability-based approaches (ProbVF: 0.48344, nOC-ProbVF: 0.47801). The nOC-ProbVF model showed a modest 1.123% improvement over the standard probabilistic approach. These quantitative results demonstrate that possibility theory—capable of addressing multiple uncertainties inherent in loss data—provides a significantly better accuracy than conventional probabilistic methods, particularly in handling boundary ambiguities and sequential damage progression.
This study demonstrates that controlling for observation numbers through data batching helps mitigate underestimation in lower PGA ranges, though challenges remain at higher PGA values. The successful application of Dempster–Shafer possibility theory offers a robust alternative to traditional cumulative lognormal probability methods for vulnerability function development, particularly when dealing with datasets characterized by incomplete knowledge and imprecise damage classifications.
Among the models, PossVF consistently exhibited residuals most tightly clustered around zero. However, the predictive capability of the PossVF model remains limited to the range of available empirical data, which may restrict its prediction application. In addition, the predictive capability of the PossVF model remains constrained by the ground amplitude from past events, which may limit its application to more severe earthquake scenarios. Lastly, the analysis focused exclusively on non-engineered residential structures, predominantly unreinforced masonry and wooden frame buildings, which may not represent the full spectrum of Thailand’s building inventory. The study acknowledges important limitations that define the scope of applicability.
In conclusion, while acknowledging the inherent limitations of the PossVF model, the Dempster–Shafer possibility-based empirical vulnerability function offers a compelling alternative for understanding vulnerability in disaster management applications. This methodological advancement is especially valuable in contexts characterized by incomplete knowledge, imprecise damage classifications, and boundary ambiguities between damage categories. The integration of this possibility theory approach into broader disaster risk reduction strategies offers a pathway toward more robust, evidence-based decision-making for earthquake-prone regions, ultimately contributing to enhanced community resilience and more effective resource allocation for future seismic events.