A Study on ML-Based Sleep Score Model Using Lifelog Data
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Preparation and Preprocessing
- Data with a sleep stage value of 0 among the generated sleep stage features.
- Less than 3 h of total sleep per day, since it does not record stages if the sleep is less than 3 h.
- SRI index with negative values [23].
- Set the sleep habit score derived by the logistic regression model as the target and set the related sleep variable as the explanatory variable. (This is detailed in Section 2.2.1.)
- Process missing values based on the KNN (K-nearest neighbors) machine learning algorithm [24,25]. To derive the optimal k (number of neighbors), the support vector regression [26] and random forest models have been used for k evaluation [27]; k has been selected with the average number of neighbors yielding the best performance among the evaluations.
- Fill in missing values using the KNN method with the derived optimal number of neighbors k. Specifically, the data set was divided into training data and test data at a ratio of 8:2, and the range of k was set from 2 to 15, and performance was measured with each k value. Support vector machine and random forest calculated the final performance with a weighted sum of the calculation results, calculated by applying a weighted sum of 0.5 each, that is, the result calculated by each classifier was multiplied by 0.5 to derive the result in an ensemble method. As a result of the experiment, it was confirmed that the performance was the best when k was 3, and imputation was performed with that value. As described above, a total of 67 variables, such as user identification ID value, date, and sleep characteristics, and 16,053 rows of data are used as analysis data through daily data aggregation, sleep feature generation, outlier processing, and preprocessing of missing values.
2.2. Primary Habit Score: Good/Bad Sleep State
2.2.1. Setting Description Variables (Features) and Result Variables (Target)
- Continuous variables: total sleep variability, SRI (Sleep Regularity Index) (2 days, 3 days, 4 days, 5 days, 6 days, 7 days), number and time of naps, sleep midpoint variability, day-of-week information, daily sleep start and end information, information on sleep stages within the first 90 min of sleep, information on steps 2 h before the first start of sleep.
- Categorical variables: 10~12 h sleep FLAG variable, sleep onset variability 1-h FLAG variable, total sleep time variability within 1 h FLAG variable.
- The first step, fine classing (Leung, 2008; Vejkanchana, 2019) [30,31], is carried out to improve consistency and explanatory power. Through this, a representative variable is selected in consideration of the correlation within the explanatory variable and the information value (Vejkanchana, 2019) [31] and a section for the variable is derived.
- The second step is coarse classing (Leung, 2008; Vejkanchana, 2019) [30,31]; based on the categorization in the first step, a new category is derived by checking the data state. Specifically, for a linear relationship with the occurrence of good sleep, adjacent categories with similar weight-of-evidence (WoE) values (Finlay, 2010; Zdravevski, 2011) [32,33] are integrated so that the WoE value increases or decreases monotonically (Vanneschi, 2018) [34]. In this way, the amount of data on the number of occurrences and nonoccurrence of good sleep for each category is adjusted and categories are integrated based on the WoE value.
2.2.2. Defining Good Sleep Habit Labels Using a Logistic Regression Model for the Primary Sleep Habit Score
2.2.3. Scoring for the Primary Sleep Habit Score
2.2.4. Primary Sleep Habit Score Results
2.3. Second Step: Intermediate Sleep Score
2.3.1. Data Preparation: Training and Test Data Set
2.3.2. Modeling: Multi-Stacking Ensemble Models Based on Machine Learning and Deep Learning
3. Results
Second Step: Intermediate Sleep Score
- For sleep midpoint between 02:00 a.m. and 04:00 a.m. on weekdays, good sleep was 47.12%, bad sleep 12.25%, and intermediate sleep 21.39%.
- For weekend sleep time between 10:00 p.m. and 12:00 a.m., good sleep was 43.25%, bad sleep 22.50%, and intermediate sleep 29.91%.
- For steps within 2 h, good sleep averaged 813 steps, bad sleep 35, and moderate sleep 156.
- For the SRI (Sleep Regularity Index) index (2 days), mean values were 87.21 for good sleep, 73.26 for bad sleep, and 80.58 for intermediate sleep.
- The higher the SRI value, the higher the probability of a good sleep state.
- The higher the gait (step) counts within the first 2 h before sleep, the higher the probability of a good sleep state.
- The greater the weekly total time variability, the higher the probability of a bad sleep state.
4. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Feature | Meaning | Feature | Meaning |
---|---|---|---|
BED_TIME_VAR_FLAG_1 | Sleep onset fluctuation within 1 h (FLAG) | FIRST_LIGHT_MIN | Total time of light sleep phases in earlier (90 min) sleep cycles |
BED_TIME_VAR_FLAG_2 | Sleep onset fluctuation within 2 h (FLAG) | FIRST_DEEP_MIN | Total time of deep sleep phases in earlier (90 min) sleep cycles |
BED_TIME_10_TO_12_FLAG | Sleep onset time between 10 p.m. and 12 a.m. status (FLAG) | FIRST_REM_MIN | Total time of rem sleep phases in earlier (90 min) sleep cycles |
TST_VAR | Total sleep time variability per day | STAGE_SUM_MIN | Total time of all sleep phases in earlier (90 min) sleep cycles |
SRI_2 | Sleep regularity index observed on 2 days | FIRST_ASR | Percentage of awake sleep phases in earlier (90 min) sleep cycles |
SRI_3 | Sleep regularity index observed on 3 days | FIRST_LSR | Percentage of light sleep phases in earlier (90 min) sleep cycles |
SRI_4 | Sleep regularity index observed on 4 days | FIRST_DSR | Percentage of deep sleep phases in earlier (90 min) sleep cycles |
SRI_5 | Sleep regularity index observed on 5 days | FIRST_RSR | Percentage of rem sleep phases in earlier (90 min) sleep cycles |
SRI_6 | Sleep regularity index observed on 6 days | STEP_INFO_BEFORE_SLEEP_2 | Number of steps taken 2 h before sleep |
SRI_7 | Sleep regularity index observed on 7 days | WEEKDAY | Week information expressed as an integer (0: Sunday) |
SLEEP_START_MIN | Daily sleep onset time (minute info) per day | WEEKEND_FLAG | Weekend status (flag) |
SLEEP_END_MIN | Daily sleep offset time (minute info) per day | HOLIDAY_FLAG | Holiday status (flag) |
SLEEP_MIDPOINT | Midpoint between the onset and offset of sleep |
Feature | Interval Value | Score |
---|---|---|
DECIMAL_END_HOUR_MINUTE | ~7.5 | 5 |
7.5~ | −4 | |
DECIMAL_START_HOUR_MINUTE | ~24.5 | 4 |
24.5~ | −116 | |
WEEKLY_MIDPOINT_VAR | 0~4 | 118 |
4~5 | −94 | |
5~5.5 | −59 | |
5.5~ | −155 | |
WEEK_INFO | ~2 | 37 |
2~6.5 | −10 | |
6.5~11.5 | −49 | |
11.5~16.5 | −12 | |
16.5~24 | −47 | |
24 | −96 | |
HOLI_INFO | ~2 | 0 |
2~12.5 | 3 | |
12.5~23.5 | 1 | |
23.5~24.5 | −4 | |
24.5~ | −2 | |
FIRST_LIGHT_MINUTE | 0~29 | 31 |
29~34 | −28 | |
34~37 | 23 | |
37~39 | −38 | |
39~46 | −5 | |
46~ | 9 | |
FIRST_AWAKE_MINUTE | ~1 | 10 |
1~3 | 9 | |
3~8 | 0 | |
8~17 | −1 | |
17~ | −12 | |
STAGE_SUM_MINUTE | 0~31 | 30 |
31~48 | −11 | |
48~56 | 2 | |
56~58 | 25 | |
58~60 | −23 | |
60~ | 15 | |
FIRST_DSR | ~0.005 | 13 |
0.005~0.05 | −47 | |
0.05~0.145 | −13 | |
0.145~0.275 | 33 | |
0.275~ | −50 | |
FIRST_ASR | ~0.01 | 22 |
0.01~0.04 | 21 | |
0.04~0.13 | 7 | |
0.13~0.2 | −5 | |
0.2~ | −34 | |
FIRST_LSR | ~0.5 | 50 |
0.5~0.64 | −21 | |
0.64~0.8 | 10 | |
0.8~0.86 | −4 | |
0.86~0.9 | −32 | |
0.9~ | 16 | |
FIRST_RSR | ~0.01 | −6 |
0.01~0.05 | 28 | |
0.05~0.18 | 55 | |
0.18~ | −10 | |
TOTAL_SLEEP_TIME_VAR | ~0.4 | 0 |
0.4~0.6 | 34 | |
0.6~0.8 | −16 | |
0.8~2.2 | 1 | |
2.2~2.6 | 31 | |
2.6~3.6 | −26 | |
3.6~ | 2 | |
STEP_INFO_2 | ~50 | −31 |
50~1500 | 184 | |
1500~ | 145 |
Model | Hyperparameter | Value | Hyperparameter | Value |
---|---|---|---|---|
LightGBM | reg_alpha | 1.5486 | subsample | 0.5 |
reg_lambda | 4.5005 | learning_rate | 0.008 | |
colsample_bytree | 0.7 | max_depth | 10 | |
num_leaves | 470 | min_child_samples | 47 | |
min_data_per_groups | 100 | n_estimators | 2000 | |
XGBoost | lambda | 0.008 | alpha | 3.818 |
colsample_bytree | 0.4 | subsample | 0.7 | |
learning_rate | 0.02 | min_child_weight | 39 | |
n_estimators | 2000 | max_depth | 7 | |
CatBoost | bagging_fraction | 0.7723 | l_leaf_reg | 1.629 |
max_bin | 235 | learning_rate | 0.0155 | |
min_data_in_leaf | n_estimators | 2000 | ||
max_depth | 7 | task_type | GPU | |
Tabnet | max_type | Entmax | n_da | 64 |
n_steps | 2 | gamma | 1 | |
n_shared | 3 | lambda_sparse | 9.07 × 10−5 | |
patienceScheduler | 9 | epochs | 15 |
ML Model | F1 Score |
---|---|
XGBoost | 89 |
LightGBM | 87 |
CatBoost | 88 |
Tabnet | 85 |
Stacking Method | 90 |
References
- Lee, H.; Kim, J.; Moon, J.; Jung, S.; Jo, Y.; Kim, B.; Ryu, E.; Bahn, S. A study on the changes in life habits, mental health, and sleep quality of college students due to COVID-19. Work 2022, 73, 777–786. [Google Scholar] [CrossRef]
- Heuse, S.; Grebe, J.L.; Esken, F. Sleep Hygiene Behaviour in Students: An Intended Strategy to Cope with Stress. J. Med. Psychol. 2022, 24, 23–28. [Google Scholar] [CrossRef]
- Freeman, D.; Sheaves, B.; Waite, F.; Harvey, A.G.; Harrison, P.J. Sleep disturbance and psychiatric disorders. Lancet Psychiatry 2020, 7, 628–637. [Google Scholar] [CrossRef] [PubMed]
- Bhaskar, S.; Hemavathy, D.; Prasad, S. Prevalence of chronic insomnia in adult patients and its correlation with medical comorbidities. J. Family Med. Prim. Care 2016, 5, 780–784. [Google Scholar] [CrossRef] [PubMed]
- Hafner, M.; Stepanek, M.; Taylor, J.; Troxel, W.M.; Van Stolk, C. Why sleep matters—The economic costs of insufficient sleep: A cross-country comparative analysis. Rand Health Q. 2017, 6, 11. [Google Scholar]
- Estrada-Galiñanes, V.; Wac, K. Collecting, exploring and sharing personal data: Why, how and where. Data Sci. 2020, 3, 79–106. [Google Scholar] [CrossRef] [Green Version]
- Nyman, J.; Ekbladh, E.; Björk, M.; Johansson, P.; Sandqvist, J. Feasibility of a new homebased ballistocardiographic tool for sleep-assessment in a real-life context among workers. Work 2022. [Google Scholar] [CrossRef] [PubMed]
- Wei, Q.; Lee, J.H.; Park, H.J. Novel design of smart sleep-lighting system for improving the sleep environment of children. Technol. Health Care 2019, 27, 3–13. [Google Scholar] [CrossRef] [Green Version]
- Smyth, C. The Pittsburgh sleep quality index (PSQI). J. Gerontol. Nurs. 1999, 25, 10. [Google Scholar] [CrossRef]
- Carpenter, J.S.; Andrykowski, M.A. Psychometric evaluation of the Pittsburgh sleep quality index. J. Psychosom. Res. 1998, 45, 5–13. [Google Scholar] [CrossRef]
- Buysse, D.J. Sleep health: Can we define it? Does it matter? Sleep 2014, 37, 9–17. [Google Scholar] [CrossRef] [PubMed]
- Morrissey, B.; Taveras, E.; Allender, S.; Strugnell, C. Sleep and obesity among children: A systematic review of multiple sleep dimensions. Pediatr. Obes. 2020, 15, e12619. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Moore, P.J.; Adler, N.E.; Williams, D.R.; Jackson, J.S. Socioeconomic status and health: The role of sleep. Psychosom. Med. 2002, 64, 337–344. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nishino, S. The Stanford Method for Ultimate Sound Sleep; Sunmark Publishing: Tokyo, Japan, 2017. [Google Scholar]
- Patel, A.K.; Reddy, V.; Araujo, J.F. Physiology, Sleep Stages; StatPearls [Internet]: Florida, FL, USA, 2021. [Google Scholar]
- Beattie, Z.; Oyang, Y.; Statan, A.; Ghoreyshi, A.; Pantelopoulos, A.; Russell, A.; Heneghan, C.J.P.M. Estimation of sleep stages in a healthy adult population from optical plethysmography and accelerometer signals. Physiol. Meas. 2017, 38, 1968–1979. [Google Scholar] [CrossRef] [PubMed]
- Slyusarenko, K.; Fedorin, I. Smart alarm based on sleep stages prediction. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2020, 2020, 4286–4289. [Google Scholar]
- Reed, D.L.; Sacco, W.P. Measuring sleep efficiency: What should the denominator be? J. Clin. Sleep Med. 2016, 12, 263–266. [Google Scholar] [CrossRef]
- Phillips, A.J.; Clerx, W.M.; O’Brien, C.S.; Sano, A.; Barger, L.K.; Picard, R.W.; Czeisler, C.A. Irregular sleep/wake patterns are associated with poorer academic performance and delayed circadian and sleep/wake timing. Sci. Rep. 2017, 7, 3216. [Google Scholar] [CrossRef]
- Lunsford-Avery, J.R.; Engelhard, M.M.; Navar, A.M.; Kollins, S.H. Validation of the sleep regularity index in older adults and associations with cardiometabolic risk. Sci. Rep. 2018, 8, 14158. [Google Scholar] [CrossRef] [Green Version]
- Rosenthal, L.; Roehrs, T.A.; Rosen, A.; Roth, T. Level of sleepiness and total sleep time following various time in bed conditions. Sleep 1993, 16, 226–232. [Google Scholar] [CrossRef]
- Randler, C.; Vollmer, C.; Kalb, N.; Itzek-Greulich, H. Breakpoints of time in bed, midpoint of sleep, and social jetlag from infancy to early adulthood. Sleep Med. 2019, 57, 80–86. [Google Scholar] [CrossRef]
- Cohen, S.; Fulcher, B.D.; Rajaratnam, S.M.; Conduit, R.; Sullivan, J.P.; St Hilaire, M.A.; Phillips, A.J.K.; Loddenkemper, T.; Kothare, S.V.; McConnell, K.; et al. Sleep patterns predictive of daytime challenging behavior in individuals with low-functioning autism. Autism Res. 2018, 11, 391–403. [Google Scholar] [CrossRef] [PubMed]
- Zhang, M.L.; Zhou, Z.H. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognit. 2007, 40, 2038–2048. [Google Scholar] [CrossRef]
- Rashid, W.; Gupta, M.K. A Perspective of Missing Value Imputation Approaches. In Advances in Computational Intelligence and Communication Technology; Springer: Singapore, 2021; pp. 307–315. [Google Scholar]
- Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. 1998, 13, 18–28. [Google Scholar] [CrossRef] [Green Version]
- Biau, G. Analysis of a random forests model. J. Mach. Learn. Res. 2012, 13, 1063–1095. [Google Scholar]
- Dong, L.; Martinez, A.J.; Buysse, D.J.; Harvey, A.G. A composite measure of sleep health predicts concurrent mental and physical health outcomes in adolescents prone to eveningness. Sleep Health 2019, 5, 166–174. [Google Scholar] [CrossRef] [PubMed]
- Brindle, R.C.; Yu, L.; Buysse, D.J.; Hall, M.H. Empirical derivation of cutoff values for the sleep health metric and its relationship to cardiometabolic morbidity: Results from the Midlife in the United States (MIDUS) study. Sleep 2019, 42, zsz116. [Google Scholar] [CrossRef] [PubMed]
- Leung, K.; Cheong, F.; Cheong, C.; O‘Farrell, S.; Tissington, R. Building a Scorecard in Practice. In Proceedings of the 7th International Conference on Computational Intelligence in Economics and Finance, Taoyuan, Taiwan, 5–7 December 2008. [Google Scholar]
- Vejkanchana, N.; Kuacharoen, P. Continuous Variable Binning Algorithm to Maximize Information Value Using Genetic Algorithm. In International Conference on Applied Informatics; Springer: Cham, Switzerland, 2019; pp. 158–172. [Google Scholar]
- Finlay, S. Data Pre-Processing. In Credit Scoring, Response Modelling and Insurance Rating; Palgrave Macmillan: London, UK, 2010; pp. 144–159. [Google Scholar]
- Zdravevski, E.; Lameski, P.; Kulakov, A. Weight of evidence as a tool for attribute transformation in the preprocessing stage of supervised learning algorithms. IJCNN 2011, 181–188. [Google Scholar]
- Vanneschi, L.; Horn, D.M.; Castelli, M.; Popovič, A. An artificial intelligence system for predicting customer default in e-commerce. Expert Syst. Appl. 2018, 104, 1–21. [Google Scholar] [CrossRef]
- Dastile, X.; Celik, T.; Potsane, M. Statistical and machine learning models in credit scoring: A systematic literature survey. Appl. Soft Comput. 2020, 91, 106263. [Google Scholar] [CrossRef]
- Peng, C.Y.J.; Lee, K.L.; Ingersoll, G.M. An introduction to logistic regression analysis and reporting. J. Educ. Res. 2002, 96, 3–14. [Google Scholar] [CrossRef]
- Obuchowski, N.A. Receiver operating characteristic curves and their use in radiology. Radiology 2003, 229, 3–8. [Google Scholar] [CrossRef] [PubMed]
- Zeng, G. A comparison study of computational methods of Kolmogorov–Smirnov statistic in credit scoring. Commun. Stat. Simul. Comput. 2017, 46, 7744–7760. [Google Scholar] [CrossRef]
- Abdou, H.A.; Pointon, J. Credit scoring, statistical techniques and evaluation criteria: A review of the literature. Intell. Syst. Account. Financ. Manag. 2011, 18, 59–88. [Google Scholar] [CrossRef] [Green Version]
- Woo, H.S.; Lee, S.H.; Cho, H. Building credit scoring models with various types of target variables. J. Korean Data Inf. Sci. Soc. 2013, 24, 85–94. [Google Scholar]
- Park, I. Developing the osteoporosis risk scorecard model in Korean adult women. J. Health Inform. Stat. 2021, 46, 44–53. [Google Scholar] [CrossRef]
- Han, J.T.; Park, I.S.; Kang, S.B.; Seo, B.G. Developing the High-Risk Drinking Scorecard Model in Korea. Osong Public Health Res. Perspect. 2018, 9, 231–239. [Google Scholar] [CrossRef]
- Siddiqi, N. Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring; Wiley & Sons.: Hoboken, NJ, USA, 2012; Volume 3. [Google Scholar]
- Divina, F.; Gilson, A.; Goméz-Vela, F.; García Torres, M.; Torres, J.F. Stacking ensemble learning for short-term electricity consumption forecasting. Energies 2018, 11, 949. [Google Scholar] [CrossRef] [Green Version]
- Wang, T.; Zhang, K.; Thé, J.; Yu, H. Accurate prediction of band gap of materials using stacking machine learning model. Comput. Mater. Sci. 2022, 201, 110899. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149–3157. [Google Scholar]
- Dorogush, A.V.; Ershov, V.; Gulin, A. CatBoost: Gradient boosting with categorical features support. arXiv 2018, arXiv:1810.11363. [Google Scholar]
- Arik, S.Ö.; Pfister, T. Tabnet: Attentive interpretable tabular learning. AAAI 2021, 35, 6679–6687. [Google Scholar] [CrossRef]
- Rasifaghihi, N.; Li, S.S.; Haghighat, F. Forecast of urban water consumption under the impact of climate change. Sustain. Cities Soc. 2020, 52, 101848. [Google Scholar] [CrossRef]
- Hans, C. Elastic net regression modeling with the orthant normal prior. JASA 2011, 106, 1383–1393. [Google Scholar] [CrossRef]
- Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
- Gunst, R.F.; Mason, R.L. Biased estimation in regression: An evaluation using mean squared error. JASA 1977, 72, 616–628. [Google Scholar] [CrossRef]
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-Generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; ACM: New York, NY, USA, 2019. [Google Scholar]
- Halson, S.L.; Johnston, R.D.; Piromalli, L.; Lalor, B.J.; Cormack, S.; Roach, G.D.; Sargent, C. Sleep Regularity and Predictors of Sleep Efficiency and Sleep Duration in Elite Team Sport Athletes. Sport. Med. Open 2022, 8, 79. [Google Scholar] [CrossRef]
- Windred, D.P.; Jones, S.E.; Russell, A.; Burns, A.C.; Chan, P.; Weedon, M.N.; Rutter, M.K.; Olivier, P.; Vetter, C.; Saxena, R.; et al. Objective assessment of sleep regularity in 60 000 UK Biobank participants using an open-source package. Sleep 2021, 44, zsab254. [Google Scholar] [CrossRef]
- Makarem, N.; Zuraikat, F.M.; Aggarwal, B.; Jelic, S.; St-Onge, M.P. Variability in sleep patterns: An emerging risk factor for hypertension. Curr. Hypertens. Rep. 2020, 22, 19. [Google Scholar] [CrossRef]
- Baron, K.G.; Reid, K.J.; Malkani, R.G.; Kang, J.; Zee, P.C. Sleep variability among older adults with insomnia: Associations with sleep quality and cardiometabolic disease risk. Behav. Sleep Med. 2017, 15, 144–157. [Google Scholar] [CrossRef] [Green Version]
- Buman, M.P.; Phillips, B.A.; Youngstedt, S.D.; Kline, C.E.; Hirshkowitz, M. Does nighttime exercise really disturb sleep? Results from the 2013 National Sleep Foundation Sleep in America Poll. Sleep Med. 2014, 15, 755–761. [Google Scholar] [CrossRef]
- Stutz, J.; Eiholzer, R.; Spengler, C.M. Effects of evening exercise on sleep in healthy participants: A systematic review and meta-analysis. Sport. Med. 2019, 49, 269–287. [Google Scholar] [CrossRef] [PubMed]
- Frimpong, E.; Mograss, M.; Zvionow, T.; Dang-Vu, T.T. The effects of evening high-intensity exercise on sleep in healthy adults: A systematic review and meta-analysis. Sleep Med. Rev. 2021, 60, 101535. [Google Scholar] [CrossRef]
- Kim, J.; Lee, J.; Park, M. Identification of Smartwatch-Collected Lifelog Variables Affecting Body Mass Index in Middle-Aged People Using Regression Machine Learning Algorithms and SHapley Additive Explanations. Appl. Sci. 2022, 12, 3819. [Google Scholar] [CrossRef]
- Liang, Z.; CHAPA-MARTELL, M.A. Predicting Medical-Grade Sleep-Wake Classification from Fitbit Data Using Tree-Based Machine Learning. Rep. Number IPSJ SIG Tech. Rep. 2019, 2019, 14. [Google Scholar]
- Jiang, M.; Liu, J.; Zhang, L.; Liu, C. An improved Stacking framework for stock index prediction by leveraging tree-based ensemble models and deep learning algorithms. Phys. A Stat. Mech. Appl. 2020, 541, 122272. [Google Scholar] [CrossRef]
- Pavlyshenko, B. Using Stacking Approaches for Machine Learning Models. In Proceedings of the 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP), Lviv, Ukraine, 21–25 August 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 255–258. [Google Scholar]
- Yu, W.; Li, S.; Ye, T.; Xu, R.; Song, J.; Guo, Y. Deep ensemble machine learning framework for the estimation of PM 2.5 concentrations. Environ. Health Perspect. 2022, 130, 037004. [Google Scholar] [CrossRef] [PubMed]
Category | Value | Description |
---|---|---|
Quantity of sleep data collected by day | 67,180 rows | Sleep data set collected by day with Samsung Galaxy 4 or 5 |
Quantity of sleep data collected by minute | 2,494,862 rows | Sleep data set collected by day with Samsung Galaxy 4 or 5 |
Quantity of step (gait) data collected by day | 78,643 rows | Step data set collected by day with Samsung Galaxy Watch 4 or 5 |
Quantity of step data collected by minute | 18,710,423 rows | Step data set collected by day with Samsung Galaxy 4 or 5 |
Quantity of user information data | 918 rows | User information such as height and age |
Number of users | 714 | |
Period of data collection | 26 November 2020 to 1 January 2022 |
Feature | Meaning | Feature | Meaning |
---|---|---|---|
USER_CODE | User identification code | NAP_FLAG | Daily nap occurrence status |
DATE | Data collection date | NAP_HOUR | Total sleep time from 12 noon to 3 p.m. (less than 3 h) |
SLEEP_EFFICIENCY | Ratio of sleep time excluding awake time to total sleep time | WEEKLY_MEAN_SLEEP_MIDPOINT | The average time of the midpoint of sleep during the weekdays |
DSR | Percentage of deep sleep phases per day | WEEKLY_MEAN_SLEEP_START_TIME | The average time of the sleep onset during the weekdays |
RSR | Percentage of rem sleep phases per day | WEEKEND_MEAN_SLEEP_START_TIME | The average time of the sleep onset during the weekends |
LSR | Percentage of light sleep phases per day | DIFF_WEEK_HOLI | Difference between average weekday onset sleep and average sleep onset on weekends |
ASR | Percentage of awake sleep phases per day | WEEKLY_MEAN_TST | The average time of the total sleep time per day during the weekdays |
TST | Total sleep time per day | DIFF_SLEEP_START_WEEKLY | The difference between the average weekly sleep onset time and daily average sleep onset time |
SLEEP_START_H | Sleep onset time (hours) per day | DIFF_SLEEP_END_WEEKLY | The difference between the average weekly sleep offset time and daily average sleep offset time |
SLEEP_END_H | Daily sleep offset time (hours) per day | WEEKLY_MIDPOINT_VAR | The variation of the midpoint of sleep during the weekdays |
AWAKE_T | Total awake time per day | WEEKLY_TST_VAR | The variation of total sleep time during the weekdays |
DEEP_T | Total time of deep sleep phases per day | GOOD_SLEEP_FLAG | Sleep quality status based on various sleep dimensions |
REM_T | Total time of rem sleep phases per day | GENDER | User’s gender |
LIGHT_T | Total time of light sleep phases per day | AGE | User’s age |
SLEEP_EFFICIENCY_CAT | 85% cutoff criterion flag for Sleep Efficiency Index | AGE_CATEGORY | User’s age category |
BED_TIME_VAR | sleep onset variability | FIRST_AWAKE_MIN | Total time of awake sleep phases in earlier (90 min) sleep cycles |
Feature | Calculation of Bin Interval Excluding Missing Values | Feature | Calculation of Bin Interval Excluding Missing Values |
---|---|---|---|
SRI_2 (Sleep regular index observed over 2 days) | [−inf, 52], [52, 72], [72, 86], [86, inf] | SRI_3 (Sleep regular index observed over 3 days) | [−inf, 56], [56, 70], [70, 82], [82, 90], [90, inf] |
SRI_4 (Sleep regular index observed over 4 days) | [−inf, 52], [52, 62], [62, 70], [70, 82], [82, inf] | SRI_5 (Sleep regular index observed over 5 days) | [−inf, 52], [52, 58], [58, 82], [82, 88], [88, inf] |
SRI_6 (Sleep regular index observed over 6 days) | [−inf, 56], [56, 80], [80, 86], [86, inf] | SRI_7 (Sleep regular index observed over 7 days) | [−inf, 56], [56, 78], [78, 84], [84, inf] |
Daily Sleep offset time information (hour) | [−inf, 7.5], [7.5, inf] | Average weekend sleep onset information | [−inf, 2], [2, 12.5], [12.5, 23.5], [23.5, 24.5], [24.5, inf] |
Daily Sleep onset time information (hour) | [−inf, 24.5], [24.5, inf] | Sleep midpoint variability | [−inf, 3], [3, 4], [4, 4.5], [4.5, inf] |
Average weekly sleep onset information | [−inf, 2], [2, 6.5], [6.5, 11.5], [11.5, 16.5], [16.5, 24], [24, inf] | Daily total sleep time variance (HOUR) | [−inf, 0.4], [0.4, 0.6], [0.6, 0.8], [0.8, 2.2], [2.2, 2.6], [2.6, 3.6], [3.6, inf] |
REM sleep rate (%) in Initial 90 min | [−inf, 0.01], [0.01, 0.08], [0.08, 0.18], [0.18, inf] | LIGHT sleep rate (%) in Initial 90 min | [−inf, 0.54], [0.54, 0.62], [0.62, 0.84], [0.84, 0.9], [0.9, 0.96], [0.96, inf] |
DEEP sleep rate (%) in Initial 90 min | [−inf, 0.01], [0.01, 0.05], [0.05, 0.09], [0.09, 0.15], [0.15, 0.23], [0.23, 0.32], [0.32, inf] | AWAKE sleep rate (%) in Initial 90 min | [−inf, 0.01], [0.01, 0.04], [0.04, 0.08], [0.08, 0.09], [0.09, 0.14], [0.14, 0.17], [0.17, 0.2], [0.2, inf] |
Total REM sleep time (MINUTE) in initial 90 min | [−inf, 1], [1, 6], [6, 11], [11, inf] | Total LIGHT sleep time (MINUTE) in initial 90 min | [−inf, 29], [29, 34], [34, 37], [37, 39], [39, 50], [50, inf] |
Total DEEP sleep time (MINUTE) in initial 90 min | [−inf, 1], [1, 7], [7, 12], [12, 16], [16, 22], [22, inf] | Total AWAKE sleep time (MINUTE) in initial 90 min | [−inf, 1], [1, 2], [2, 6], [6, 8], [8, 12], [12, 16], [16, inf] |
Weekly total sleep time variance | [−inf, 0.8], [0.8, 1.2], [1.2, 2.8], [2.8, 3.7], [3.7, inf] | Total steps taken 2 h before sleep | [−inf, 10], [10, 720], [720, inf] |
Total sleep stage time in initial 90 min | [−inf, 40], [40, 48], [48, 50], [50, 52], [52, 55], [55, 58], [58, 60], [60, inf] |
Category | AUROC | K–S | Gini Coefficient |
---|---|---|---|
Train data | 0.9847 | 0.8912 | 0.9694 |
Validation data | 0.9845 | 0.8882 | 0.969 |
Reference | >0.7 | >0.5 | >0.6 |
Count | Mean | Standard Deviation | 25% | 50% | 75% | Max |
---|---|---|---|---|---|---|
5494 | 960.889516 | 285.114076 | 163 | 748 | 1162 | 1857 |
Feature | Interval Value | Score |
---|---|---|
SRI_2 | ~52 | −2 |
52~72 | −1 | |
72~86 | 11 | |
86~ | 29 | |
SRI_3 | ~56 | −50 |
56~70 | −34 | |
70~82 | 6 | |
82~90 | 87 | |
90 | 180 | |
SRI_4 | ~52 | −45 |
52~62 | −19 | |
62~70 | 2 | |
70~82 | 15 | |
82~ | 29 | |
SRI_5 | ~52 | −17 |
52~58 | −14 | |
58~82 | 5 | |
82~88 | 34 | |
88~ | 67 | |
SRI_6 | ~56 | −19 |
56~80 | −13 | |
80~86 | 9 | |
86~ | 13 | |
SRI_7 | ~56 | −17 |
56~78 | −8 | |
78~84 | 0 | |
84~ | 6 |
Sleep Score | Number of Data Points | Proportion of Data | Frequency of Good Sleep Status | Good Sleep Statis Ratio |
---|---|---|---|---|
50 < ss ≤ 100 | 6 | 0.037% | 0 | 0.00% |
100 < ss ≤ 150 | 21 | 0.131% | 0 | 0.00% |
150 < ss ≤ 200 | 89 | 0.554% | 0 | 0.00% |
200 < ss ≤ 250 | 140 | 0.872% | 0 | 0.00% |
250 < ss ≤ 300 | 214 | 1.333% | 0 | 0.00% |
300 < ss ≤ 350 | 186 | 1.159% | 0 | 0.00% |
350 < ss ≤ 400 | 255 | 1.588% | 0 | 0.00% |
400 < ss ≤ 450 | 321 | 2.000% | 0 | 0.00% |
450 < ss ≤ 500 | 423 | 2.635% | 0 | 0.00% |
500 < ss ≤ 550 | 583 | 3.632% | 0 | 0.00% |
550 < ss ≤ 600 | 739 | 4.604% | 0 | 0.00% |
600 < ss ≤ 650 | 742 | 4.622% | 0 | 0.00% |
650 < ss ≤ 700 | 704 | 4.385% | 0 | 0.00% |
700 < ss ≤ 750 | 819 | 5.102% | 0 | 0.00% |
750 < ss ≤ 800 | 933 | 5.812% | 0 | 0.00% |
800 < ss ≤ 850 | 1099 | 6.846% | 1 | 0.31% |
850 < ss ≤ 900 | 1085 | 6.759% | 0 | 0.31% |
900 < ss ≤ 950 | 848 | 5.283% | 0 | 0.31% |
950 < ss ≤ 1000 | 901 | 5.613% | 0 | 0.31% |
1000 < ss ≤ 1050 | 872 | 5.432% | 1 | 0.61% |
1050 < ss ≤ 1100 | 938 | 5.843% | 2 | 1.23% |
1100 < ss ≤ 1150 | 788 | 4.909% | 1 | 1.53% |
1150 < ss ≤ 1200 | 676 | 4.211% | 7 | 3.68% |
1200 < ss ≤ 1250 | 544 | 3.389% | 12 | 7.36% |
1250 < ss ≤ 1300 | 432 | 2.691% | 13 | 11.35% |
1300 < ss ≤ 1350 | 448 | 2.791% | 14 | 15.64% |
1350 < ss ≤ 1400 | 477 | 2.971% | 41 | 28.22% |
1400 < ss ≤ 1450 | 319 | 1.987% | 44 | 41.72% |
1450 < ss ≤ 1500 | 167 | 1.040% | 57 | 59.20% |
1500 < ss ≤ 1550 | 160 | 0.997% | 68 | 80.06% |
1550 < ss ≤ 1600 | 117 | 0.729% | 59 | 98.16% |
1600 < ss ≤ 1650 | 7 | 0.044% | 6 | 100.00% |
Total | 16,053 | 100.00% | 326 | 2.03% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, J.; Park, M. A Study on ML-Based Sleep Score Model Using Lifelog Data. Appl. Sci. 2023, 13, 1043. https://doi.org/10.3390/app13021043
Kim J, Park M. A Study on ML-Based Sleep Score Model Using Lifelog Data. Applied Sciences. 2023; 13(2):1043. https://doi.org/10.3390/app13021043
Chicago/Turabian StyleKim, Jiyong, and Minseo Park. 2023. "A Study on ML-Based Sleep Score Model Using Lifelog Data" Applied Sciences 13, no. 2: 1043. https://doi.org/10.3390/app13021043