Experimental Verification of Performance Improvement in Heat-Related Illness Prediction Using Clinical Significance-Based Binning

Shin, Jeongwoo; Jeong, Hanjo

doi:10.3390/app152312500

Open AccessArticle

Experimental Verification of Performance Improvement in Heat-Related Illness Prediction Using Clinical Significance-Based Binning

by

Jeongwoo Shin

and

Hanjo Jeong

^*

Department of Software Convergence Engineering, Mokpo National University, Muan 58554, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(23), 12500; https://doi.org/10.3390/app152312500

Submission received: 10 October 2025 / Revised: 11 November 2025 / Accepted: 23 November 2025 / Published: 25 November 2025

(This article belongs to the Section Applied Biosciences and Bioengineering)

Download

Browse Figures

Versions Notes

Abstract

Due to climate change, the frequency and intensity of heat waves are increasing, leading to a rise in heat-related illnesses, particularly affecting outdoor workers. Existing studies have developed predictive models using hospital clinical data and measurable data from wearable devices, but they face limitations in prediction accuracy due to insufficient training data and overfitting. This study proposes a clinical significance-based binning method to address these issues in predicting heat illness occurrence. Through experiments utilizing various machine learning algorithms and comparisons with other binning methods, including data-driven binning methods, the effectiveness of this binning approach was verified. Additionally, by conducting comparative validation using datasets composed solely of features measurable by wearable devices such as smartwatches, this study demonstrates that the proposed binning method can also be applied to real-time heat illness prediction systems for outdoor workers.

Keywords:

heat illness; binning; machine learning; deep learning; clinical data; wearable device

1. Introduction

Climate change has led to an increase in the frequency and intensity of heatwaves, resulting in a steady rise in the number of heat-related illnesses. According to statistics from the Korea Disease Control and Prevention Agency, 1582 cases of heat-related illnesses were reported between 15 May and 14 July 2025, with 9 of these cases being estimated fatalities [1]. Of the patients, 76.7% were male, and 35% were aged 60 or older. The majority of these cases (approximately 79.4%) occurred outdoors, indicating that groups exposed to high-temperature environments for extended periods, such as those in agriculture and construction, are at higher risk for heat-related illnesses. These illnesses can manifest in various forms, such as heat stroke, heat exhaustion, and heat cramps, and can lead to death if not detected early. Therefore, it is crucial in public health to swiftly predict heat-related illnesses and identify at-risk groups for early intervention during heatwaves.

Previous studies have often collected key vital signs such as age, systolic blood pressure, heart rate, and body temperature as continuous variables, using them as raw data or derived features for model input [2,3,4]. This approach can be advantageous for machine learning or deep learning algorithms to learn fine numerical differences and improve prediction accuracy [5,6,7]. However, prediction accuracy may decline if the training data is insufficient. Furthermore, these methods do not reflect the risk group classification systems used in actual medical settings. Clinically, indicators like blood pressure, pulse, and temperature are categorized around specific criteria into normal, caution, and danger. For instance, blood pressure is categorized into normal, prehypertension, and hypertension; heart rate into normal, bradycardia, and tachycardia; and temperature into afebrile and febrile. Such categorization is crucial for evaluating patient status and making clinical decisions [8,9].

Research on prediction and early detection of heat-related illnesses can be broadly divided into developing prediction models using clinical data and detection technologies based on wearable devices. Clinical data-based studies have used patients’ vital signs and hospital examination data to predict severity early [10,11,12]. In contrast, studies using wearable devices have employed multi-sensor-equipped helmets or smartwatches for outdoor workers to collect vital signs such as body temperature, heart rate, and sweat secretion, as well as external environmental data, to assess the risk of heat-related illnesses [13,14,15,16]. Although research using wearable devices is limited in terms of the types and accuracy of vital signs compared to clinical data measurable in hospitals, it can be more effective for real-time assessment in industrial settings. This is because workers can wear these devices for real-time state and risk measurement, which may be more advantageous compared to prediction methods using clinical data.

This study aims to empirically analyze the effectiveness of a clinical significance-based binning method in predicting heat-related illnesses using machine learning. We apply both raw and preprocessed data through various binning methods, including data-driven binning methods such as K-means clustering [17] and decision tree [18], to different datasets categorized into clinical data and wearable device-measurable data. We compare the performance of various machine learning algorithms, including deep learning. Through this approach, we aim to confirm that preprocessing using the clinical significance-based binning method can enhance prediction performance and interpretability in both clinical data and wearable device-measured data, compared not only to the raw dataset but also to the dataset preprocessed using various data-driven binning methods. This provides practical implications for establishing a public health system capable of swiftly identifying and responding to at-risk groups for heat-related illnesses during heatwaves.

2. Materials and Methods

2.1. Dataset

The dataset used in this study is based on heat illness-related emergency room patient data collected by the Japanese Association for Acute Medicine (JAAM). This dataset was gathered from 103 emergency medical institutions across Japan from 1 June to 31 August 2010 and from 1 July to 30 September 2012, including a total of 3175 patients [19,20,21]. The study subjects were patients transported to the emergency room due to symptoms of heat illness. The dataset includes demographic information, presence of underlying diseases, vital signs before and after transportation to the emergency room, hospital test results, and clinical outcomes (such as admission status, intensive care unit admission, and mortality).

The JAAM-HS dataset can be categorized into dependent and independent variables. The dependent variable is the hospital admission status of the patient, while the independent variables comprise five categories: demographic information, environmental information, underlying diseases, pre-admission vital signs, and hospital-measured vital data. The main features for each category are summarized in Table 1.

2.2. Experimental Strategy and Methods

The overall procedure of this experimental study is summarized in Figure 1. In the initial data cleaning phase, rows with a missing rate of 30% or more were removed. In the feature selection phase, two scenarios were considered:

Utilizing all variables clinically collected in hospitals;
Utilizing only variables measurable in real service environments through wearable devices.

Following the feature selection phase, the data was divided into raw and binning datasets based on whether the three binning methods were applied. The raw dataset retained continuous variables in their original form, while the three binning datasets transformed continuous variables into categorical variables by segmenting them according to criteria based on K-means clustering, decision tree, and clinical significance. This process resulted in eight datasets, as represented in Table 2. The first numbering of the datasets represents whether it is constructed based on all clinical variables or on the variables measurable by wearable devices. Numbering 1 denotes that the dataset is based on all clinical variables, and 2 denotes that the dataset is based on wearable device variables. The second numbering indicates whether the dataset is a raw dataset without using any binning methods or constructed based on a specific binning method. Number 1 denotes that the dataset is constructed without using any binning methods, and 2, 3, and 4 denote that the datasets are constructed using the binning methods K-means clustering, decision tree (DT), and clinical significance, respectively.

In the subsequent phase, all datasets underwent the same processing steps. Missing values were imputed, continuous variables were standardized, and categorical variables were one-hot encoded. The data was then divided into training and test datasets. Models based on KNN, SVM, Random Forest, MLP, CNN, and Transformer were trained. To find the optimal model for each algorithm, a preliminary search was conducted using the training dataset to identify the best combination of hyperparameters. Finally, the models trained on the eight datasets with their best parameters were compared and analyzed to verify the impact of raw vs. the various binning methods and the selection of all variables vs. wearable device variables on predictive performance.

2.2.1. Data Cleaning

To ensure data quality, patient records with a missing value rate of 30% or more were removed [22]. Out of a total of 3175 patient records, 412 met this criterion and were removed, leaving 2763 records for analysis. Figure 2 shows the distribution of the missing value ratio based on the number of records before and after the removal of those with a missing value rate of 30% or more.

2.2.2. Feature Selection

To consider the practical applicability in real-world settings, only variables that can be collected through wearable devices, particularly smartwatches, were selected for analysis. First, for vital signs, results from blood tests or specialized clinical indicators that can only be measured in a hospital setting were excluded, and variables measurable by smartwatches, such as respiration rate (PreRR), body temperature (PreBT), and heart rate (PreHR), were chosen. Other variables, such as environmental and demographic characteristics, including Sex, Age, and Location, as well as underlying disease variables like hypertension (HT), heart disease (HeartDisease), mental disorders (Psycho), and diabetes (DM), were retained as in the raw dataset.

The features finally selected are commonly used in both the raw dataset based on wearable device variables and the binning datasets. In the raw dataset, continuous variables were entered unchanged, whereas in the binning datasets, they were converted into categorical variables according to data-driven clustering algorithms and clinical criteria.

2.2.3. Clinical Significance-Based Binning

In this study, we experimented with both raw and the various binning datasets for hospital clinical data and wearable device data. The raw dataset utilized standardized continuous variables in their original form, while categorical variables were converted into binary variables using the one-hot encoding method. Conversely, the binning datasets transformed major continuous variables—such as age, blood pressure, heart rate, respiration rate, body temperature, and consciousness level—into ordinal variables by binning them according to data-driven cluster and clinical criteria. The specific binning criteria using the clinical criteria are represented in Table 3. Each feature value is categorized as follows:

Age is categorized by 10-year intervals;
Systolic blood pressure is categorized based on the Korea Disease Control and Prevention Agency’s National Health Information Portal [8];
Heart rate is categorized based on the percentage of maximum heart rate (220 minus age), based on zone training criteria [23];
Respiration rate is categorized based on basic nursing vital sign criteria [9];
Body temperature is categorized based on basic nursing vital sign criteria [9];
Glasgow coma scale is categorized based on the Glasgow coma scale [24].

Table 3. The binning criteria based on clinical significance.

Feature	Binning Criteria
Age	0: 0–9, 1: 10–19, 2: 20–29, 3: 30–39, etc.
Systolic Blood Pressure (SBP, PreSBP):	0: Hypotension (<100 mmHg) 1: Normal (100–119 mmHg) 2: Prehypertension Warning (120–129 mmHg) 3: Prehypertension Stage (130–139 mmHg) 4: Mild Hypertension (140–159 mmHg) 5: Moderate or Severe Hypertension (≥160 mmHg)
Heart Rate (HR, PreHR)	0: 50–59%, 1: 60–69%, 2: 70–79%, 3: 80–89%, 4: 90–100%
Respiration Rate (PreRR)	0: Bradypnea (≤11 breaths/min) 1: Normal (12–20 breaths/min) 2: Tachypnea (≥21 breaths/min)
Body Temperature (BT, PreBT)	0: Hypothermia (≤36.0 °C) 1: Normal (36.1–37.5 °C) 2: Fever (>37.5 °C)
Glasgow Coma Scale (PreGCS, GCS)	0: Normal (15 points) 1: Mild Consciousness Impairment (13–14 points) 2: Moderate Coma (8–12 points) 3: Severe Coma (4–7 points) 4: Deep Coma (≤3 points)

2.2.4. Missing Value Imputation and Data Standardization

The missing value imputation was uniformly performed on all datasets using the Autoencoder. The Autoencoder is a structure that compresses input data into a low-dimensional latent space and then reconstructs it, featuring the ability to learn correlations between variables during the training process [25,26]. This allows for a more sophisticated imputation of missing values by reflecting nonlinear relationships between variables, unlike simple mean imputation or multiple imputation [27,28]. Additionally, all continuous variables are standardized based on the normal distribution to enhance the stability and convergence speed of the training process.

2.2.5. Model Training

In this study, we applied six algorithms to compare the predictive performance for heat-related illnesses: K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Random Forest, Multi-Layer Perceptron (MLP), Convolutional Neural Network (CNN), and Transformer-based models. For training and evaluating each model, the datasets were split into training and test datasets at a ratio of 8:2. Using only the training data, we employed 5-fold cross-validation with GridSearchCV [29,30], alongside early stopping and dropout techniques, to prevent overfitting and determine the optimal hyperparameters for each model. The parameter search ranges and optimal values for each model, per dataset, are summarized in Table 4.

3. Results

The experimental results for the performance of the datasets configured with all variables from hospital clinical data are presented in Table 5 in terms of accuracy, precision, recall, F1-Score, and AUC. Based on the results shown in Table 5, we analyzed the improvements of the datasets constructed using various binning methods compared to the raw dataset in Table 6 in terms of F1-Score and AUC. According to the improvements in the F1-Score and AUC, dataset 1-4, constructed using a clinical significance-based binning method, showed the best performance across most machine learning models. In particular, in highly accurate models such as SVM, Random Forest, and Transformer, dataset 1-4 demonstrated superior performance. Notably, the Random Forest model, which exhibited the best performance, showed a 10.5% improvement in F1-Score. Conversely, the datasets constructed using data-driven binning methods such as K-means clustering and decision tree showed an overall decrease in performance.

The experimental results for performance when using only variables measurable by wearable devices are presented in Table 7. Based on the results shown in Table 7, we also analyzed the improvements of the datasets constructed using various binning methods compared to the raw dataset in Table 8 in terms of F1-Score and AUC. In experiments targeting only variables measurable by wearable devices, dataset 2-4, preprocessed using a clinical significance-based binning method, showed the best performance based on both the F1-Score and AUC across all machine learning models. Particularly, the Random Forest and Transformer models, which had the best performance, showed improvements of 11.8% and 13.2% in F1-Score, respectively.

When comprehensively comparing all the datasets, the datasets constructed using the clinical significance-based binning method generally provided superior performance not only compared to the raw dataset but also to other data-driven binning methods. Based on the F1-Score, there was an average performance improvement of 2.1% in the entire hospital clinical dataset and 9.7% in the wearable device dataset in comparison to the raw dataset without using any binning methods. Particularly, the Random Forest model, which showed the best performance, exhibited improvements of 10.5% and 11.8% in the entire dataset and the wearable device dataset, respectively, demonstrating the effectiveness of the clinical significance-based binning preprocessing method. Interestingly, the performance evaluation results of the wearable device dataset were generally superior to both the raw and binning datasets of the entire hospital clinical dataset.

4. Conclusions

This study compared and analyzed the performance of heat illness prediction models using raw datasets without any binning methods, clinical significance-based binning datasets, and other data-driven binning methods such as K-Means clustering and decision tree. Utilizing emergency room patient data collected by the Japanese Association for Acute Medicine (JAAM) [10], two types of datasets were constructed: one with all the variables measurable in hospitals and another with variables measurable via wearable devices. For each dataset, a raw dataset using continuous variables as they are, a dataset preprocessed with the clinical significance-based binning method, and two other datasets with K-Means clustering-based and decision tree-based binning methods were created, resulting in a total of eight datasets. When trained and evaluated using six representative machine learning algorithms—KNN, SVM, Random Forest, MLP, CNN, and Transformer—the datasets constructed using the clinical significance-based binning method generally outperformed the raw datasets and all the other data-driven binning methods. Notably, there was a significant performance improvement in the Random Forest and Transformer models, which already showed good performance with the raw datasets, indicating that clinical significance-based binning contributes greatly not only to interpretability but also to actual classification performance improvement.

Additionally, experiments conducted by separating the entire hospital-measurable clinical data from the wearable device-measurable dataset confirmed that sufficient predictive performance can be achieved using only variables measurable by wearable devices. This demonstrates the high effectiveness of the clinical significance-based binning preprocessing method proposed in this study when implementing an early warning system for heat illnesses using wearable devices in actual industrial settings.

Author Contributions

Conceptualization, J.S. and H.J.; methodology, J.S. and H.J.; software, J.S.; validation, J.S. and H.J.; data curation, J.S.; writing—original draft preparation, J.S. and H.J.; writing—review and editing, H.J.; supervision, H.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Education (MOE) and the Jeollanamdo, Republic of Korea (2025-RISE-14-001).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available in a related research article [10] page, https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0197032 (accessed on 3 October 2025) at https://doi.org/10.1371/journal.pone.0197032.s001 (accessed on 3 October 2025).

Acknowledgments

This research was supported by the Regional Innovation System & Education (RISE) program through the Jeollanamdo RISE center, funded by the Ministry of Education (MOE) and the Jeollanamdo, Republic of Korea (2025-RISE-14-001).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Korea Disease Control and Prevention Agency. Status of the Emergency Room Surveillance System for Heat-Related Illnesses. 2025. Available online: https://www.kdca.go.kr/board/board.es?mid=a20501010000&bid=0015&list_no=728181&act=view (accessed on 3 October 2025).
Castineira, D.; Schlosser, K.R.; Geva, A.; Rahmani, A.R.; Fiore, G.; Walsh, B.K.; Smallood, C.D.; Arnold, J.H.; Santillana, M. Adding continuous vital sign information to static clinical data improves the prediction of length of stay after intubation: A data-driven machine learning approach. Respir. Care 2020, 65, 1367–1377. [Google Scholar] [CrossRef] [PubMed]
Mayampurath, A.; Jani, P.; Dai, Y.; Gibbons, R.; Edelson, D.; Churpek, M.M. A vital sign-based model to predict clinical deterioration in hospitalized children. Pediatr. Crit. Care Med. 2020, 21, 820–826. [Google Scholar] [CrossRef] [PubMed]
Sundrani, S.; Chen, J.; Jin, B.T.; Abad, Z.S.H.; Rajpurkar, P.; Kim, D. Predicting patient decompensation from continuous physiologic monitoring in the emergency department. NPJ Digit. Med. 2023, 6, 60. [Google Scholar] [CrossRef] [PubMed]
Candel, B.G.; Duijzer, R.; Gaakeer, M.I.; Ter Avest, E.; Sir, Ö. The association between vital signs and clinical outcomes in emergency department patients of different age categories. Emerg. Med. J. 2022, 39, 903–911. [Google Scholar] [CrossRef]
van Rossum, M.C.; Vlaskamp, L.B.; Posthuma, L.M.; Visscher, M.J.; Breteler, M.J.; Hermens, H.J.; Kalkman, C.J.; Preckel, B. Adaptive threshold-based alarm strategies for continuous vital signs monitoring. J. Clin. Monit. Comput. 2022, 36, 407–417. [Google Scholar] [CrossRef]
Haahr-Raunkjaer, C.; Mølgaard, J.; Elvekjaer, M.; Rasmussen, S.M.; Achiam, M.P.; Jorgensen, L.N.; Søgaard, M.I.; Grønbaek, K.K.; Oxbøll, A.B.; Sørensen, H.B.; et al. Continuous monitoring of vital sign abnormalities; association to clinical complications in 500 postoperative patients. Acta Anaesthesiol. Scand. 2022, 66, 552–562. [Google Scholar] [CrossRef]
Korea Disease Control and Prevention Agency. Korea National Health Information Portal: Classification of Blood Pressure According to Hypertension Treatment Guidelines. Available online: https://health.kdca.go.kr/healthinfo/biz/health/gnrlzHealthInfo/gnrlzHealthInfo/gnrlzHealthInfoView.do?cntnts_sn=5300 (accessed on 3 October 2025).
Song, K.A.; Choi, D.W.; Kang, M.S.; Kang, H.J.; Gong, K.R.; Kwon, M.J.; Kwon, Y.S.; Kwon, J.O.; Kim, K.M.; Kim, D.Y.; et al. Fundamentals of Nursing I, 9th ed.; Soomoonsa: Paju, Republic of Korea, 2021; p. 338. [Google Scholar]
Hayashida, K.; Kondo, Y.; Hifumi, T.; Shimazaki, J.; Oda, Y.; Shiraishi, S.; Fukuda, T.; Sasaki, J.; Shimizu, K. A novel early risk assessment tool for detecting clinical outcomes in patients with heat-related illness (J-ERATO score): Development and validation in independent cohorts in Japan. PLoS ONE 2018, 13, e0197032. [Google Scholar] [CrossRef]
Hirano, Y.; Kondo, Y.; Hifumi, T.; Yokobori, S.; Kanda, J.; Shimazaki, J.; Hayashida, K.; Moriya, T.; Yagi, M.; Takauji, S.; et al. Machine learning-based mortality prediction model for heat-related illness. Sci. Rep. 2021, 11, 9501. [Google Scholar] [CrossRef]
Kuo, W.Y.; Huang, C.C.; Liu, C.F.; Sung, M.I.; Hsu, C.C.; Lin, H.J.; Su, S.B.; Guo, H.R. Utilizing machine learning for predicting mortality in patients with heat-related illness who visited the emergency department. Int. J. Med. Inform. 2025, 201, 105951. [Google Scholar] [CrossRef]
Shakerian, S.; Habibnezhad, M.; Ojha, A.; Lee, G.; Liu, Y.; Jebelli, H.; Lee, S. Assessing occupational risk of heat stress at construction: A worker-centric wearable sensor-based approach. Saf. Sci. 2021, 142, 105395. [Google Scholar] [CrossRef]
Jang, J.; Lee, K.H.; Joo, S.; Kwon, O.; Yi, H.; Lee, D. Smart Helmet for Vital Sign-Based Heatstroke Detection Using Support Vector Machine. J. Sens. Sci. Technol. 2022, 31, 433–440. [Google Scholar] [CrossRef]
Shimazaki, T.; Anzai, D.; Watanabe, K.; Nakajima, A.; Fukuda, M.; Ata, S. Heat stroke prevention in hot specific occupational environment enhanced by supervised machine learning with personalized vital signs. Sensors 2022, 22, 395. [Google Scholar] [CrossRef]
Yaldiz, C.O.; Buller, M.J.; Richardson, K.L.; An, S.; Lin, D.J.; Satish, A. Early prediction of impending Exertional heat stroke with Wearable Multimodal sensing and anomaly detection. IEEE J. Biomed. Health Inform. 2023, 27, 5803–5814. [Google Scholar] [CrossRef]
Khanmohammadi, S.; Adibeig, N.; Shanehbandy, S. An improved overlapping k-means clustering method for medical applications. Expert Syst. Appl. 2017, 67, 12–18. [Google Scholar] [CrossRef]
Yang, Z.; Zou, W.; Liu, H.; Sharma, R.P.; Zhang, M.; Hu, Z. The Effect of Soil and Topography Factors on Larix gmelinii var. Principis-rupprechtii Forest Mortality and Capability of Decision Tree Binning Method and Generalized Linear Models in Predicting Tree Mortality. Forests 2024, 15, 2060. [Google Scholar] [CrossRef]
Takegawa, R.; Kanda, J.; Yaguchi, A.; Yokobori, S.; Hayashida, K. A prehospital risk assessment tool predicts clinical outcomes in hospitalized patients with heat-related illness: A Japanese nationwide prospective observational study. Sci. Rep. 2023, 13, 1189. [Google Scholar] [CrossRef] [PubMed]
Japanese Association for Acute Medicine, Heatstroke Surveillance Committee. Heat related illness in Japan: The final report of Heatstroke STUDY 2012. Jpn. J. Acute Med. 2012, 23, 211–229. [Google Scholar]
Japanese Association for Acute Medicine, Heatstroke Surveillance Committee. Characteristics of elderly heat illness patients in Japan—Analysis from Heatstroke STUDY 2010. Jpn. J. Acute Med. 2011, 22, 331–340. [Google Scholar]
Lin, W.C.; Tsai, C.F. Missing value imputation: A review and analysis of the literature (2006–2017). Artif. Intell. Rev. 2020, 53, 1487–1509. [Google Scholar] [CrossRef]
McArdle, W.D.; Katch, F.I.; Katch, V.L. Essentials of Exercise Physiology, 5th ed.; Wolters Kluwer Health: Philadelphia, PA, USA, 2015. [Google Scholar]
Korean Neurological Association. Neurology, 5th ed.; Gunja Publishing: Seoul, Republic of Korea, 2017. [Google Scholar]
Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef]
Shrivastava, A.; Rameshan, R.; Agnihotri, S. Latent space characterization of autoencoder variants. arXiv 2024, arXiv:2412.04755. [Google Scholar] [CrossRef]
Beaulieu-Jones, B.K.; Moore, J.H. Missing data imputation in the electronic health record using deeply learned autoencoders. In Proceedings of the Pacific Symposium on Biocomputing, Kohala Coast, HI, USA, 3–7 January 2017; pp. 207–218. [Google Scholar]
Gondara, L.; Wang, K. Mida: Multiple imputation using denoising autoencoders. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Melbourne, VIC, Australia, 3–6 June 2018; pp. 260–272. [Google Scholar]
Darmawan, H.; Yuliana, M.; Hadi, M. GRU and XGBoost Performance with Hyperparameter Tuning Using GridSearchCV and Bayesian Optimization on an IoT-Based Weather Prediction System. Int. J. Adv. Sci. Eng. Inf. Technol. 2023, 13, 851–862. [Google Scholar] [CrossRef]
Tompra, K.V.; Papageorgiou, G.; Tjortjis, C. Strategic machine learning optimization for cardiovascular disease prediction and high-risk patient identification. Algorithms 2024, 17, 178. [Google Scholar] [CrossRef]

Figure 1. Overall process of experiments.

Figure 2. Distributions of missing value ratio (row-wise) before and after the removal of rows.

Table 1. The main features of the JAAM-HS dataset.

Category	Features	Description
Dependent Features	Admission	Hospital admission status (0: not admitted, 1: admitted)
Demographic Information	Age, Sex	Age and gender of patients
Environmental Information	Location, Weather	Location and weather information of heat illness occurrence
Underlying Diseases	HT, DM, Heart Disease, Dementia, etc.	Underlying diseases of patients, such as hypertension (HT), diabetes (DM), heart disease, and dementia
Pre-admission Vital Signs	PreSBP, PreHR, PreBT, PreGCS, etc.	Vital signs such as systolic blood pressure (PreSBP), heart rate (PreHR), body temperature (PreBT), and consciousness level (PreGCS) before hospital admission
Hospital-measured Vital Data	SBP, HR, BT, WBC, Cre, AST, CK, etc.	Systolic blood pressure (SBP), heart rate (HR), body temperature (BT); function indicators of blood, liver and kidney, such as WBC, Cre, AST, CK, etc.

Table 2. Dataset Configuration.

Dataset	Description
Dataset 1-1	Raw dataset based on all variables
Dataset 1-2	Binning dataset based on all variables using K-Means clustering
Dataset 1-3	Binning dataset based on all variables using decision tree
Dataset 1-4	Binning dataset based on all variables using the clinical significance-based binning method
Dataset 2-1	Raw dataset based on wearable device variables
Dataset 2-2	Binning dataset based on wearable device variables using K-Means clustering
Dataset 2-3	Binning dataset based on wearable device variables using decision tree
Dataset 2-4	Binning dataset based on wearable device variables using the clinical significance-based binning method

Table 4. The parameter search ranges and optimal values for each model, per dataset.

Model	Parameter	Search Range	Optimal Values
Model	Parameter	Search Range	Dataset 1	Dataset 2	Dataset 3	Dataset 4
KNN	Number of neighbors (k)	3, 5, 7, 9, 11	11	11	11	11
	Distance metrics	Euclidean, Manhattan	Manhattan	Manhattan	Manhattan	Manhattan
SVM	Kernel	Linear, rbf	rbf	rbf	rbf	rbf
	Regularization Parameter	0.1, 1, 10, 100	100	1	1	10
	Gamma	0.001, 0.01, 0.1, 1	0.001	0.1	0.1	0.1
Random Forest	Number of trees	100, 200, 500	500	200	500	500
	Maximum depth	None, 10, 20, 30	20	None	20	10
	Minimum samples split	2, 5, 10	2	2	2	2
	Minimum samples of leaf nodes	1, 2, 4	1	1	1	1
MLP	Number of hidden layers	2, 3, 4, 5, 6	2	3	3	5
	Learning rate	1 × 10⁻⁴, 5 × 10⁻⁴, 1 × 10⁻³, 2 × 10⁻³, 5 × 10⁻³	1 × 10⁻³	1 × 10⁻⁴	5 × 10⁻³	2 × 10⁻³
	Batch size	16, 32, 64, 128	128	128	16	16
CNN	Number of filters	32, 64, 128	128	128	64	64
	Kernel size	3 × 3, 5 × 5	5 × 5	5 × 5	3 × 3	3 × 3
	Learning rate	1 × 10⁻⁴, 1 × 10⁻³, 1 × 10⁻²	1 × 10⁻³	1 × 10⁻³	1 × 10⁻³	1 × 10⁻²
	Batch size	32, 64	64	64	64	32
	Hidden dimension	64, 128, 256	256	64	128	64
Transformer	Number of attention heads	2, 4, 8	4	2	2	4
	Number of encoder layers	2, 4, 6	2	6	4	4
	Learning rate	1 × 10⁻⁴, 5 × 10⁻⁴, 1 × 10⁻³, 2 × 10⁻³, 5 × 10⁻³	1 × 10⁻³	1 × 10⁻³	1 × 10⁻³	1 × 10⁻³
	Dropout	0.1, 0.3	0.1	0.1	0.3	0.1

Table 5. Experimental results on the raw and binning datasets of all variables.

Model	Dataset	Accuracy	Precision	Recall	F1-Score	AUC
KNN
	Dataset 1-1	81.4	85.2	60.8	70.9	89.6
	Dataset 1-2	80.6	84.9	58.8	69.5	87.9
	Dataset 1-3	81.7	84.9	61.9	71.6	87.8
	Dataset 1-4	80.2	87.6	54.0	66.8	90.2
SVM
	Dataset 1-1	84.1	85.1	69.6	76.6	89.5
	Dataset 1-2	83.2	83.3	68.9	75.4	89
	Dataset 1-3	82.5	81.1	69.1	74.6	89.6
	Dataset 1-4	87.9	88.0	77.9	82.6	94.4
Random Forest
	Dataset 1-1	85.5	82.5	77.6	80.0	92.8
	Dataset 1-2	86.3	84.8	77.3	80.9	92.5
	Dataset 1-3	85.2	82.9	75.9	79.2	92.8
	Dataset 1-4	91.7	91.4	85.5	88.4	96.3
MLP
	Dataset 1-1	84.4	81.9	74.7	78.2	88.9
	Dataset 1-2	80.6	74.9	72.7	73.8	88.3
	Dataset 1-3	84.4	83.1	72.9	77.7	89.2
	Dataset 1-4	83.3	81.2	71.5	76.0	90.6
CNN
	Dataset 1-1	81.6	82.2	66.4	73.5	88.3
	Dataset 1-2	82.2	82.4	66.8	73.8	88.1
	Dataset 1-3	83.5	82.6	70.3	76	88.7
	Dataset 1-4	81.7	87.0	59.6	70.7	90.3
Transformer
	Dataset 1-1	84.7	86.1	70.5	77.5	92.1
	Dataset 1-2	84.1	74.6	87.4	80.5	91.7
	Dataset 1-3	82.1	81.1	67.4	73.6	91
	Dataset 1-4	86.1	79.3	84.7	81.9	93.4

Table 6. Improvements of each binning dataset compared to the raw dataset 1-1.

Model/	Dataset 1-2		Dataset 1-3		Dataset 1-4
Metrics	F1-Score	AUC	F1-Score	AUC	F1-Score	AUC
KNN	−2.0%	−1.9%	1.0%	−2.0%	−5.8%	0.7%
SVM	−1.6%	−0.6%	−2.6%	0.1%	7.8%	5.5%
Random Forest	1.1%	−0.3%	−1.0%	0.0%	10.5%	3.8%
MLP	−5.6%	−0.7%	−0.6%	0.3%	−2.8%	1.9%
CNN	0.4%	−0.2%	3.4%	0.5%	−3.8%	2.3%
Transformer	3.9%	−0.4%	−5.0%	−1.2%	5.7%	1.4%

Table 7. Experimental results on the raw and binning datasets of wearable device variables.

Model	Dataset	Accuracy	Precision	Recall	F1-Score	AUC
KNN
	Dataset 2-1	84.3	84.9	70.6	77.1	90.2
	Dataset 2-2	83.6	82.3	72	76.8	90
	Dataset 2-3	84.3	84.2	71.4	77.3	91.2
	Dataset 2-4	86.6	85.3	77.7	81.3	93.9
SVM
	Dataset 2-1	82.4	81.8	68.1	74.3	83.1
	Dataset 2-2	85	84.3	74.1	78.8	88.6
	Dataset 2-3	83	81.9	70.2	75.6	86.7
	Dataset 2-4	86.6	83.4	80.3	81.8	91.7
Random Forest
	Dataset 2-1	86.8	86.1	77.5	81.6	94.1
	Dataset 2-2	87.2	86.9	77.8	82.1	93.6
	Dataset 2-3	89.1	87.2	83.2	85.2	95.4
	Dataset 2-4	93.4	90.8	91.6	91.2	98.1
MLP
	Dataset 2-1	85.7	85.3	75.0	79.8	92.1
	Dataset 2-2	85.2	85.7	72.8	78.7	88.7
	Dataset 2-3	83.9	85.4	68.9	76.3	93
	Dataset 2-4	89.8	83.9	89.9	86.8	95.2
CNN
	Dataset 2-1	84.7	84.1	73.1	78.2	91.4
	Dataset 2-2	86	87.9	72.8	79.6	92.9
	Dataset 2-3	84.7	84.1	73.1	78.2	92.4
	Dataset 2-4	88.5	83.1	86.9	85.0	95.7
Transformer
	Dataset 2-1	84.7	82.2	75.6	78.8	92.2
	Dataset 2-2	85.8	82.5	79.1	80.8	93.3
	Dataset 2-3	84.7	89	67.7	76.9	92.4
	Dataset 2-4	92.0	89.7	88.6	89.2	97.6

Table 8. Improvements of each binning dataset compared to the raw dataset 2-1.

Model/	Dataset 2-2		Dataset 2-3		Dataset 2-4
Metrics	F1-Score	AUC	F1-Score	AUC	F1-Score	AUC
KNN	−0.4%	−0.2%	0.3%	1.1%	5.4%	4.1%
SVM	6.1%	6.6%	1.7%	4.3%	10.1%	10.3%
Random Forest	0.6%	−0.5%	4.4%	1.4%	11.8%	4.3%
MLP	−1.4%	−3.7%	−4.4%	1.0%	8.8%	3.4%
CNN	1.8%	1.6%	0.0%	1.1%	8.7%	4.7%
Transformer	2.5%	1.2%	−2.4%	0.2%	13.2%	5.9%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shin, J.; Jeong, H. Experimental Verification of Performance Improvement in Heat-Related Illness Prediction Using Clinical Significance-Based Binning. Appl. Sci. 2025, 15, 12500. https://doi.org/10.3390/app152312500

AMA Style

Shin J, Jeong H. Experimental Verification of Performance Improvement in Heat-Related Illness Prediction Using Clinical Significance-Based Binning. Applied Sciences. 2025; 15(23):12500. https://doi.org/10.3390/app152312500

Chicago/Turabian Style

Shin, Jeongwoo, and Hanjo Jeong. 2025. "Experimental Verification of Performance Improvement in Heat-Related Illness Prediction Using Clinical Significance-Based Binning" Applied Sciences 15, no. 23: 12500. https://doi.org/10.3390/app152312500

APA Style

Shin, J., & Jeong, H. (2025). Experimental Verification of Performance Improvement in Heat-Related Illness Prediction Using Clinical Significance-Based Binning. Applied Sciences, 15(23), 12500. https://doi.org/10.3390/app152312500

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Experimental Verification of Performance Improvement in Heat-Related Illness Prediction Using Clinical Significance-Based Binning

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. Experimental Strategy and Methods

2.2.1. Data Cleaning

2.2.2. Feature Selection

2.2.3. Clinical Significance-Based Binning

2.2.4. Missing Value Imputation and Data Standardization

2.2.5. Model Training

3. Results

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI