1. Introduction
Hwabyung is a psychosomatic condition characterized by emotional symptoms such as resentment and anger, along with distinctive physical symptoms including chest tightness and heat sensations [
1,
2]. Unlike depressive and anxiety disorders, which are primarily defined by affective, cognitive, and behavioral disturbances, Hwabyung is marked by the prominent coexistence of somatic distress. According to traditional East Asian medicine, physical and emotional distress arise from disrupted circulation and imbalance of vital energy (qi, 氣) [
3]. From this perspective, Hwabyung is understood as a condition in which suppressed anger leads to upward stagnation of Hwa (火) energy.
Although Hwabyung has traditionally been regarded as a culture-bound syndrome in Korea [
4], its underlying mechanism—chronic suppression of anger—may operate across cultures, suggesting that similar symptom patterns can emerge in other cultural or social contexts. Anger is a basic emotion experienced across cultures [
5] and involves both emotional and interoceptive processes, including subjective feelings of anger and associated bodily sensations such as chest tightness and localized heat sensations in the chest and fists [
6]. Accordingly, Hwabyung can be conceptualized as a culturally patterned expression of these shared emotional and interoceptive mechanisms, rather than a purely culture-specific disorder.
Several diagnostic criteria and assessment tools have been developed for Hwabyung. Kim et al. [
2] introduced the Hwa-Byung Diagnostic Interview Schedule, and recent clinical guidelines have provided diagnostic recommendations [
7]. Based on these criteria, prevalence estimates in Korea range from 4.2% to 13.3% [
8,
9,
10]. Hwabyung shows high comorbidity with depressive (65%) and anxiety disorders (27%), whereas Hwabyung-only cases account for approximately 22–26% [
11,
12]. This high comorbidity may reflect shared underlying mechanisms, including chronic negative affect and difficulties in emotion regulation. However, Hwabyung is distinguished by the prominent coexistence of somatic distress and suppressed anger.
Several self-report instruments have also been developed to assess the severity of Hwabyung symptoms. Although the State–Trait Anger Expression Inventory measures anger-related cognitive, affective, and behavioral components, it does not capture the physical symptoms central to Hwabyung. The Hwabyung Scale (HS), developed by Kwon et al. [
13], was the first self-report measure designed to assess both physical and emotional symptoms of Hwabyung. Nevertheless, the HS is structured as a single-factor scale without differentiating physical and emotional domains, and it was developed primarily to distinguish Hwabyung from depressive disorders. As a result, its diagnostic sensitivity for Hwabyung itself is limited, and the use of culturally dated expressions (e.g., Han) has raised the need for conceptual and linguistic modernization.
To address these limitations, the Hwabyung Comprehensive Test (HCT) was developed [
14]. While the HS reflects a unidimensional structure, the HCT was developed as a multidimensional construct that explicitly distinguishes between physical and emotional symptoms. In addition, the HCT replaces culturally dated terminology such as “Han” with more contemporary and clinically accessible expressions (e.g., resentment-related emotional distress), thereby enhancing interpretability in modern clinical contexts. Furthermore, the HCT extends beyond symptom assessment by incorporating Hwabyung-related stressful life events and personality traits associated with vulnerability to Hwabyung. Previous research has demonstrated that the HCT shows superior classification performance compared with the HS and exhibits strong associations with related emotional variables, including depression, anxiety, and anger, supporting its validity [
15].
Despite its contributions, prior validation research on the HCT has several limitations. First, the non-clinical comparison group consisted primarily of university students, limiting its representativeness of non-Hwabyung individuals encountered in clinical settings, resulting in notable age differences between groups. Second, the use of extreme group comparisons between clinically diagnosed Hwabyung patients and asymptomatic student samples may have inflated estimates of sensitivity and specificity, potentially increasing the risk of misclassification when applying the derived cut-off scores in practice. Third, the previous study did not account for the diagnosis of other psychiatric disorders despite the high comorbidity of Hwabyung with depression and anxiety, insufficiently examining the differential diagnostic utility of the HCT.
The present study aims to evaluate the accuracy of the HCT for the clinical identification of Hwabyung and its performance in differentiating Hwabyung from other psychiatric conditions, and to establish clinically applicable cut-off scores. As a follow-up validation study, it addresses the limitations of prior research in several ways. First, by recruiting patients presenting with Hwabyung-related symptoms or distress in a hospital setting, and classifying them based on a structured clinical interview, this study obtained a non-Hwabyung group that more accurately reflects real-world clinical populations. Second, by enrolling help-seeking individuals whose diagnostic status was not determined a priori, and subsequently classifying them after clinical evaluation, this study minimized spectrum bias and derived cut-off scores that are more applicable to routine clinical practice. Third, by simultaneously assessing diagnoses of other psychiatric disorders, this study compared the Hwabyung group with other psychiatric diagnostic groups, and proposed differentiation-oriented cut-off scores, thereby providing practical evidence to support clinical decision-making related to the differentiation of Hwabyung beyond initial screening.
2. Materials and Methods
2.1. Design
This study was designed as a prospective, single-center, cross-sectional study conducted at a university hospital in Seoul, Republic of Korea. Enrollment was conducted between 14 February 2025 (first participant enrolled) and 27 June 2025 (last participant enrolled). Enrollment was completed during the scheduled study visit following provision of written informed consent and eligibility screening. The study design and reporting followed the STARD guidelines (
Supplementary Material S1).
2.2. Participants
The eligibility criteria were as follows: participants were included if they (i) reported psychosomatic distress related to Hwabyung, such as anger, feelings of unfairness, chest tightness, or a sensation of internal heat, and (ii) were aged 19 years or older, which corresponds to the legal age of adulthood in the Republic of Korea. Participants were excluded if they (i) exhibited hallucinations (e.g., visual or auditory) or delusions, (ii) experienced organic brain disorders such as dementia, epilepsy, or intellectual or personality disorders, or (iii) had a condition that made it difficult to complete the interviews and assessments conducted in this study (e.g., difficulty in reading, writing, listening, speaking, or comprehension).
The minimum sample size for the receiver operating characteristic (ROC) curve analysis was calculated using MedCalc software version 23.0 [
16], and was based on standard methods for estimating the area under the curve (AUC). With a type I error of 0.001, a type II error of 0.005, an AUC of 0.75 (considered a medium level), and a recruitment ratio of 1:1 for positive and negative cases, the required sample size for each group was 80. To account for an anticipated 20% rate of incomplete participation and missing data, the target sample size was set at 100 per group. The type I and type II error levels were set to reduce the risk of false positive conclusions regarding diagnostic performance and to ensure sufficient statistical power to detect a clinically meaningful level of discrimination.
2.3. Measurements
2.3.1. Hwabyung Comprehensive Test (HCT)
The HCT [
14] is a self-report scale consisting of three components: Hwabyung symptoms (13 items), Hwabyung-related stressful life events (5 items), and personality traits associated with vulnerability to Hwabyung (21 items). In this study, only the HCT symptom scale was used for analysis. The HCT symptom scale consists of two subscales: physical symptoms (HCT-P; 6 items; e.g., “I feel a sense of chest tightness”) and emotional symptoms (HCT-E; 7 items; e.g., “I feel resentment”). All items were rated on a 5-point Likert scale ranging from 0 to 4. Total scores on the HCT ranged from 0 to 52, with subscale scores ranging from 0 to 24 for HCT-P and 0 to 28 for HCT-E. Higher scores indicate greater severity of Hwabyung symptoms. In previous research, the HCT demonstrated good internal consistency, with a Cronbach’s alpha of 0.89 [
14]. In this study, Cronbach’s alpha was 0.96 for the HCT, 0.93 for HCT-P, and 0.95 for HCT-E.
2.3.2. Korean Version of the Beck Depression Inventory (BDI)
The BDI [
17] is a 21-item self-reporting scale used to assess the severity of depressive symptoms. Each item (e.g., “I feel that the future is hopeless”) was rated on a 4-point Likert scale (0–3), yielding a total score ranging from 0 to 63. Higher scores reflect greater depressive severity. Cronbach’s alpha was 0.85 in a previous study [
17] and 0.93 in this study.
2.3.3. Korean Version of the Beck Anxiety Inventory (BAI)
The BAI [
18] is a 21-item self-report scale that is used to assess the severity of anxiety symptoms. Each item (e.g., “I am unable to relax”) was rated on a 4-point Likert scale (0–3), yielding a total score ranging from 0 to 63. Higher scores indicate greater anxiety severity. Cronbach’s alpha was 0.91 in previous studies [
18] and 0.96 in this study.
2.4. Procedure
Participants were recruited using a convenience sampling approach from patients visiting a university hospital in Seoul, Republic of Korea. Individuals who expressed interest after viewing on-site recruitment notices were invited to participate. Interested individuals scheduled a separate study visit, during which, written informed consent was obtained, followed by eligibility screening to determine study inclusion. Recruitment was conducted between 5 February 2025 and 27 June 2025.
Following eligibility screening, participants underwent a structured clinical interview conducted by trained clinicians to diagnose Hwabyung and other psychiatric disorders [
2,
19]. The interview was conducted as an individual, face-to-face assessment between the clinician and each participant using a structured diagnostic checklist, and included evaluation of core symptoms, functional impairment, and the presence of other medical conditions. Clinicians assessed the presence and clinical significance of symptoms based on predefined diagnostic criteria. This interview served as the reference standard for the diagnosis of Hwabyung given its established reliability and validity [
2].
Hwabyung was diagnosed when individuals exhibited both physical and emotional symptoms, reported identifiable stressors associated with these symptoms, and demonstrated impaired psychosocial functioning, with these conditions not attributable to other medical illnesses [
2]. Based on the results of the structured clinical interview, participants diagnosed with Hwabyung were classified into the Hwabyung group (HG); those not diagnosed with Hwabyung, but meeting diagnostic criteria for other psychiatric disorders, were classified into the non-Hwabyung clinical group (NHCG); and those not receiving any psychiatric diagnosis were classified into the non-clinical group (NCG).
After the interview, all participants completed a series of self-report questionnaires, including the HCT, BDI, and BAI. The HCT was used as the index test to aid the clinical identification of individuals with Hwabyung. All participants received 50,000 KRW as compensation upon completion of the study. Among the excluded participants (
n = 16), none were excluded for failure to meet the inclusion criteria. These participants were diagnosed with Hwabyung after the target sample size for the HG (
n = 100) had already been reached and were therefore excluded for administrative reasons. Recruitment was terminated after the non-Hwabyung group (NHG) reached the minimum required sample size (
n = 82). Study flow diagram is presented in
Figure 1.
2.5. Blinding
This study was conducted using a double-blind design. Participants were blinded to their diagnostic classification, determined using the reference standard (structured clinical interview). In addition, the clinicians who conducted the structured interviews were blinded to the results of the index test (HCT), which was completed independently by the participants following the interview. This procedure ensured that the reference standard assessment was not influenced by the index test results.
2.6. Statistical Analysis
Statistical analysis was performed using SPSS version 22.0 (IBM Corp., Armonk, NY, USA) and R version 4.1.3. There were no missing data or indeterminate results. Analysis of variance and chi-square tests were used to compare the demographic and clinical characteristics of the participants. Post hoc comparisons were conducted using the Bonferroni method. Statistical significance was set at p < 0.05.
Diagnostic accuracy was assessed by calculating sensitivity, specificity, and the AUC. The optimal cut-off score for Hwabyung diagnosis was determined exploratorily using ROC curve analysis to maximize sensitivity and specificity as no pre-specified cut-off value had been established prior to data analysis. 95% confidence intervals (CIs) for the optimal cutoff values were estimated using bootstrap resampling (2000 iterations). Sensitivity and specificity were examined at the lower and upper bounds of the CIs. In addition, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (LR+), and negative likelihood ratio (LR−) were calculated to further evaluate the clinical interpretability of each optimal cut-off score. AUC values of 0.50–0.60 were considered failure, 0.60–0.70 was considered poor, 0.70–0.80 was considered fair, 0.80–0.90 was considered good, and 0.90–1.00 was considered excellent.
Pairwise comparisons of AUC values were conducted using DeLong’s test, and Bonferroni-adjusted p-values were examined for the three pairwise comparisons within each classification task.
4. Discussion
This study examined the clinical validity of the HCT for identifying Hwabyung and for supporting its distinction from other psychiatric disorders. The findings indicate that the HCT demonstrates good validity for identifying Hwabyung and may provide useful probabilistic information when distinguishing Hwabyung from other psychiatric conditions.
HCT-total demonstrated stable and clinically meaningful performance across both screening and differential diagnostic decision-making contexts, supporting the use of a single cut-off score for initial identification of Hwabyung. The consistency of the optimal cut-off score suggests that HCT-total may serve as a pragmatic first-line screening tool applicable to heterogeneous hospital populations, including individuals with other psychiatric conditions as well as non-clinical visitors.
Although HCT-P demonstrated higher sensitivity and a lower LR− than HCT-total, indicating a relative advantage in minimizing false negatives, this gain in sensitivity was accompanied by lower specificity and a weaker capacity to increase post-test probability when results were positive. HCT-total, however, showed a more balanced performance profile, with higher specificity and greater overall discriminative stability. Given that Hwabyung is conceptualized as a psychosomatic condition encompassing both physical and emotional symptom domains [
1,
2], an exclusive focus on either physical or emotional symptoms alone may limit the clinical representation of the construct. In line with this, the relatively lower classification performance of HCT-E further underscores that, unlike depressive and anxiety disorders primarily linked to psychological distress across cognitive, emotional, and behavioral domains, Hwabyung is characterized by the co-occurrence of somatic symptoms such as chest discomfort and heat sensations alongside emotional distress; therefore, an integrative consideration of both domains is essential to capture its key pathological features. Accordingly, the selection of HCT-total for initial screening reflects a trade-off between maximizing sensitivity and maintaining conceptual comprehensiveness and interpretive balance within a heterogeneous clinical population.
In contrast, HCT-P demonstrated relatively higher specificity in distinguishing Hwabyung from other psychiatric disorders, along with higher PPV and a moderate LR+, suggesting that elevated HCT-P scores may increase the probability of Hwabyung in clinically ambiguous cases. However, given its limited sensitivity, HCT-P is best used as a complementary second-step measure following initial screening with HCT-total, rather than as a standalone diagnostic tool. This stepwise approach may enhance clinical decision-making by balancing sensitivity in screening with specificity in differentiation. HCT-P is therefore best conceptualized as a probabilistic indicator within a broader clinical assessment framework, rather than a definitive differential diagnostic test.
The inclusion of 95% CIs for the optimal cut-off scores provides additional insight into the clinical flexibility of the HCT. Rather than relying on a single fixed threshold, the observed range of plausible cut-off values illustrates how sensitivity and specificity shift across different decision points. At the lower bound of the CI, higher sensitivity and lower specificity suggest potential utility in contexts prioritizing the minimization of false negatives, such as initial screening in heterogeneous hospital populations. In contrast, at the upper bound, reduced sensitivity accompanied by increased specificity indicates greater suitability for contexts in which false positives carry greater clinical cost, such as differential evaluation among patients already presenting with psychiatric symptoms.
This pattern was consistently observed for both HCT-total and HCT-P. For example, HCT-total demonstrated high sensitivity at the lower bound and greater specificity at the upper bound, reflecting a trade-off between case detection and diagnostic precision. Similarly, HCT-P exhibited increases in specificity at higher thresholds, supporting its potential role in refining diagnostic impressions in more ambiguous clinical presentations. These findings suggest that the HCT may not need to be interpreted as a rigid dichotomous instrument; rather, it can be applied flexibly according to clinical priorities.
Taken together, these findings regarding diagnostic performance suggest that the HCT can be applied flexibly according to clinical priorities. HCT-total may be appropriate for initial case identification due to its broader construct coverage, whereas HCT-P may provide complementary value when greater specificity is desired in differential assessments. For example, lower cut-off thresholds may be preferred in screening contexts to maximize sensitivity, whereas higher thresholds may be applied in differential assessment settings where specificity is prioritized. However, such threshold selection should be guided by careful consideration of the trade-off between false-positive and false-negative classifications within a probabilistic clinical framework, rather than applied as a fixed diagnostic boundary.
The likelihood ratios observed in this study were in the small-to-moderate range, indicating only modest diagnostic value and a limited ability to substantially alter post-test probability. These findings should be taken into account when interpreting the clinical utility of the HCT. However, this pattern may reflect the nature of the HCT as a self-report instrument assessing subjective symptom patterns. While definitive diagnosis should be supported by comprehensive clinical evaluation, including clinician-administered interviews, the HCT may still provide clinically useful information for both screening and diagnostic decision-making.
The HG exhibited a higher proportion of females and married individuals compared with the NCG. This finding aligns with the traditional view of Hwabyung, which has been predominantly described among married middle-aged women experiencing chronic family-related stress, such as conflicts with spouses and in-laws [
20,
21]. However, recent studies suggest that Hwabyung is not limited to this demographic group and may also be prevalent among younger males, indicating the need for the development of updated conceptual models of Hwabyung [
22]. Moreover, previous studies have reported that young men are more likely to experience problematic anger [
23], and that anger- and violence-related behaviors have been identified as key mental health risks among adolescent males [
24]. This indicates that anger-related problems in young men are becoming an increasingly significant public health concern.
The demographic distribution observed may partly reflect the characteristics of patients seeking care in Korean medicine hospitals, where middle-aged and older women constitute the majority of visitors. This pattern may, at least in part, be attributable to potential age and sex biases arising from convenience sampling. Such a demographic imbalance may limit the understanding of Hwabyung in younger adult men, and the cut-off values proposed for screening and differential support in this study may differ in this population. Therefore, caution is warranted when generalizing the present findings beyond the sampled population.
A high level of psychiatric comorbidity was observed in the Hwabyung group, with a substantial proportion of individuals presenting with at least one comorbid psychiatric disorder. These findings suggest the presence of shared transdiagnostic mechanisms underlying Hwabyung and other emotional disorders. In particular, difficulties in emotional regulation and a range of affective experiences, such as anger, sadness, and fear, may contribute to overlapping symptom presentations. Previous research has indicated that basic emotional states (e.g., anger, sadness, fear, disgust, and happiness) are often accompanied by similar bodily sensations, particularly in the chest region [
6], supporting the possibility that Hwabyung shares common psychophysiological processes with other emotional disorders. From this perspective, Hwabyung may be understood not as a fully discrete diagnostic entity, but rather as a specific manifestation of broader emotional distress processes.
At the same time, Hwabyung is distinguished by the prominence of characteristic somatic symptoms, such as sensations of heat and increased pressure in the chest. These features are less emphasized in conventional mood and anxiety disorders and may contribute to its clinical distinctiveness. In this context, the HCT, which integrates both emotional and somatic symptom domains, may capture both shared and disorder-specific aspects of psychopathology. Notably, HCT-P demonstrated relatively higher specificity in differentiating Hwabyung from other psychiatric disorders, suggesting that the inclusion of somatic symptom dimensions may enhance discriminative validity.
Overall, these findings indicate that while the HCT may reflect both general psychopathology and Hwabyung-related features, its stepwise application—using HCT-total for initial assessment and HCT-P for further differentiation—may improve Hwabyung-specific classification. However, given the relatively small sample size for differential comparisons between Hwabyung and other psychiatric disorders, these findings should be considered exploratory, and further validation in larger and more diverse samples is warranted.
Limitations
This study has some limitations. First, participants were recruited using a convenience sampling approach from a single hospital, which may introduce selection bias and limit the generalizability of the findings. This sampling method may have contributed to the demographic imbalance observed in the sample, particularly the underrepresentation of younger individuals and male participants. The overrepresentation of female participants in the present sample should be considered when interpreting the findings. Although the sample adequately reflects patients visiting a hospital, it did not sufficiently include younger individuals or male participants. Given that anger-related problems, such as interpersonal violence, have been reported more frequently among younger males [
22,
23,
24], further research using more representative sampling strategies, such as stratified sampling, as well as studies focusing on underrepresented groups (e.g., younger male populations), is warranted. Such studies are necessary to better characterize Hwabyung across demographic groups and to develop a comprehensive Hwabyung model that accounts for both age and sex differences.
Second, this study was conducted in a single country, reflecting the East Asian cultural context. Although Hwabyung is often regarded as a culture-bound syndrome specific to Korea, anger is recognized as a basic human emotion, and its association with interoceptive processes is supported by broadly accepted empirical evidence [
25,
26]. From a broader perspective, Hwabyung may be understood in relation to transdiagnostic constructs of emotional distress, particularly those involving emotion dysregulation and shared psychophysiological responses. Therefore, future research should examine the clinical identification, assessment, and monitoring of Hwabyung across diverse cultural contexts. To establish the international validity of the HCT, repeated validation studies examining its screening and differential diagnostic performance in different cultural settings are needed.
Third, this study is limited by the relatively small sample size of the NHCG. This limitation restricts the precision of estimates for differential performance between Hwabyung and other psychiatric conditions. Consequently, the findings regarding the discriminative ability of the HCT should be interpreted as preliminary. Future studies should seek to replicate and extend these findings using larger, multi-center samples that include a broader range of psychiatric diagnoses, thereby enabling more robust estimation of classification accuracy and improving the generalizability of the results.