Sound-Based Detection of Slip and Trip Incidents Among Construction Workers Using Machine and Deep Learning
Abstract
1. Introduction
2. Literature Review
2.1. Acceleration-Based Workers’ Behavior Classification
2.2. Computer Vision-Based Workers’ Behavior Classification
2.3. Sound-Based Applications in the Construction Industry
2.4. Research Gaps and Hypotheses
- 1.
- Current slip and trip methods rely on physical responses and computer vision, which have limitations.
- 2.
- No study has explored the use of sound-based classification for construction worker safety assessment and detection of unsafe events.
3. Methodology
3.1. Data Collection and Participants
3.2. Data Labeling, Augmentation, and Segmentation
3.2.1. Data Labeling
3.2.2. Data Augmentation
3.2.3. Data Segmentation
3.3. One-Dimensional (1D) Feature Extraction and Sound Classification Using Machine Learning
3.4. Image-Based (2D) Feature Extraction and Sound Classification Using Machine Learning
4. Results
4.1. Results of Slip and Trip Classification Based on 1D Features Using Machine Learning
4.1.1. Performance of Machine Learning Algorithms in Slip and Trip Classification Based on 1D Features
4.1.2. Performance of Various 1D Feature Categories in Slip and Trip Classification
4.2. Results of Slip and Trip Classification Based on 2D Features Using Machine Learning
4.2.1. Performance of CNN and CNN-LSTM Models in 2D Image-Based Slip and Trip Classification
4.2.2. Performance of Various Pre-Trained Models in 2D Image-Based Slip and Trip Classification
5. Discussion
5.1. Comparison of Machine and Deep Learning 1D Classification Performance
5.2. Comparison Between Custom-Trained Deep Learning Models and Pre-Trained Neural Networks
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Selected Hyperparameters for Various Classifiers
Classifier | Selected Hyperparameters |
Category and Boosting (CatBoost) | {‘eval_metric’: ‘Logloss’, ‘subsample’: 0.6, ‘depth’: 8, ‘learning_rate’: 0.1, ‘leaf_estimation_iterations’: 20, ‘bootstrap_type’: ‘MVS’, ‘max_leaves’: 64} |
Light Gradient Boosting Machine (LightGBM) | {‘boosting_type’: ‘gbdt’, ‘colsample_bytree’: 0.5, ‘learning_rate’: 0.1, ‘min_child_samples’: 20, ‘min_child_weight’: 0.001, ‘n_estimators’: 150, ‘num_leaves’: 40, ‘max_depth’: 13} |
Gradient Boosting Classifier (GBC) | {‘ccp_alpha’: 0.0, ‘criterion’: ‘friedman_mse’, ‘learning_rate’: 0.1, ‘loss’: ‘deviance’, ‘max_depth’: 10, ‘min_samples_leaf’: 2, ‘min_samples_split’: 4, ‘n_estimators’: 100, ‘subsample’: 1.0, ‘tol’: 0.0001} |
Random Forest (RF) | {‘bootstrap’: True, ‘ccp_alpha’: 0.0, ‘criterion’: ‘gini’, ‘min_samples_leaf’: 1, ‘min_samples_split’: 2, ‘n_estimators’: 250, ‘max_features’: ‘sqrt’} |
Extreme Gradient Boosting (XGBoost) | {‘objective’: ‘binary:logistic’, ‘base_score’: 0.5, ‘booster’: ‘gbtree’, ‘colsample_bylevel’: 1, ‘colsample_bynode’: 1, ‘colsample_bytree’: 1, ‘gamma’: 0, ‘learning_rate’: 0.1, ‘max_depth’: 7, ‘min_child_weight’: 1, ‘n_estimators’: 120, ‘num_parallel_tree’: 1, ‘reg_alpha’: 0, ‘reg_lambda’: 1, ‘scale_pos_weight’: 1} |
Extra Trees (ET) | {‘bootstrap’: False, ‘ccp_alpha’: 0.0, ‘criterion’: ‘gini’, ‘max_features’: ‘sqrt’, ‘min_samples_leaf’: 5, ‘min_samples_split’: 2, ‘n_estimators’: 100} |
Decision Tree (DT) | {‘criterion’: ‘gini’, ‘max_depth’: 14, ‘min_samples_leaf’: 7, ‘min_samples_split’: 2} |
Quadratic Discriminant Analysis (QDA) | {‘priors’: None, ‘reg_param’: 0.0001, ‘tol’: 0.0001} |
Adaptive Boosting (AdaBoost) | {‘algorithm’: ‘SAMME.R’, ‘n_estimators’: 350, ‘learning_rate’: 0.78} |
Ridge Classifier (Ridge) | {‘alpha’: 2.0, ‘copy_X’: True, ‘fit_intercept’: True, ‘solver’: ‘auto’, ‘tol’: 0.001} |
Linear Discriminant Analysis (LDA) | {‘solver’: ‘eigen’, ‘shrinkage’: ‘auto’, ‘n_components’: 1} |
Logistic Regression (LR) | {‘C’: 1.5, ‘penalty’: ‘l2’, ‘solver’: ‘lbfgs’, ‘tol’: 0.0002, ‘class_weight’: ‘balanced’} |
k-Nearest Neighbor (kNN) | {‘leaf_size’: 50, ‘metric’: ‘manhattan’, ‘n_neighbors’: 3} |
Support Vector Machine (SVM)—Polynomial | {‘tol’: 0.0001, ‘max_iter’: 2000, ‘gamma’: ‘scale’, ‘C’: 1000.0} |
Naïve Bayes | {‘priors’: None, ‘var_smoothing’: 1 × 10−9} |
Bidirectional-LSTM (Bi-LSTM) | {‘dropout_rate’: 0.30970, ‘learning_rate’: 0.002938, ‘batch_size’: 32, ‘epochs’: 35} |
Long Short-Term Memory (LSTM) | {‘dropout_rate’: 0.34889, ‘learning_rate’: 0.002844, ‘batch_size’: 32, ‘epochs’: 43} |
One-dimensional Convolutional Neural Network (1D-CNN) | {‘filters’: 48, ‘kernel_size’: 3, ‘dropout_rate’: 0.17919, ‘learning_rate’: 0.001974, ‘batch_size’: 32, ‘epochs’: 47} |
References
- Halabi, Y.; Xu, H.; Long, D.; Chen, Y.; Yu, Z.; Alhaek, F.; Alhaddad, W. Causal factors and risk assessment of fall accidents in the US construction industry: A comprehensive data analysis (2000–2020). Saf. Sci. 2022, 146, 105537. [Google Scholar] [CrossRef]
- Lipscomb, H.J.; Glazner, J.E.; Bondy, J.; Guarini, K.; Lezotte, D. Injuries from slips and trips in construction. Appl. Ergon. 2006, 37, 267–274. [Google Scholar] [CrossRef]
- Antwi-Afari, M.F.; Li, H.; Seo, J.; Wong, A.Y.L. Automated detection and classification of construction workers’ loss of balance events using wearable insole pressure sensors. Autom. Constr. 2018, 96, 189–199. [Google Scholar] [CrossRef]
- Yang, K.; Ahn, C.R.; Vuran, M.C.; Aria, S.S. Semi-supervised near-miss fall detection for ironworkers with a wearable inertial measurement unit. Autom. Constr. 2016, 68, 194–202. [Google Scholar] [CrossRef]
- Aulin, R.; Linderbäck, E. Near-miss reporting among construction workers. In Proceedings of the CIB W099 International Conference on Achieving Sustainable Construction Health and Safety, Lund, Sweden, 2–3 June 2014; Lund University, Sweden Ingvar Kamprad Design Centre (IKDC): Lund, Sweden, 2014; p. 430. [Google Scholar]
- McKinnon, R.C. Safety Management: Near Miss Identification, Recognition, and Investigation; CRC Press: Boca Raton, FL, USA, 2012; ISBN 1439879478. [Google Scholar]
- Woźniak, Z.; Hoła, B. Analysing near-miss incidents in construction: A systematic literature review. Appl. Sci. 2024, 14, 7260. [Google Scholar] [CrossRef]
- Howcroft, J.; Kofman, J.; Lemaire, E.D. Review of fall risk assessment in geriatric populations using inertial sensors. J. Neuroeng. Rehabil. 2013, 10, 91. [Google Scholar] [CrossRef] [PubMed]
- Misnan, M.S.; Mustapa, M.; Ramly, Z.M.; Mohamed, S.F.; Rahim, F.N.A. Best practice of reporting accident and safety culture in construction site. Plan. Malays. 2024, 22, 90–103. [Google Scholar] [CrossRef]
- Haas, E.J.; Demich, B.; McGuire, J. Learning from Workers’ Near-miss Reports to Improve Organizational Management. Min. Metall. Explor. 2020, 37, 873–885. [Google Scholar] [CrossRef]
- Jebelli, H.; Ahn, C.R.; Stentz, T.L. Fall risk analysis of construction workers using inertial measurement units: Validating the usefulness of the postural stability metrics in construction. Saf. Sci. 2016, 84, 161–170. [Google Scholar] [CrossRef]
- Lee, S.; Koo, B.; Yang, S.; Kim, J.; Nam, Y.; Kim, Y. Fall-from-height detection using deep learning based on IMU sensor data for accident prevention at construction sites. Sensors 2022, 22, 6107. [Google Scholar] [CrossRef]
- Moohialdin, A.; Lamari, F.; Marc, M.; Trigunarsyah, B. A real-time computer vision system for workers’ PPE and posture detection in actual construction site environment. In Proceedings of the EASEC16: 16th East Asian-Pacific Conference on Structural Engineering and Construction, Brisbane, Australia, 3–6 December 2019; pp. 2169–2181. [Google Scholar]
- Fang, W.; Ding, L.; Luo, H.; Love, P.E.D. Falls from heights: A computer vision-based approach for safety harness detection. Autom. Constr. 2018, 91, 53–61. [Google Scholar] [CrossRef]
- Mushtaq, Z.; Su, S. Environmental sound classification using a regularized deep convolutional neural network with data augmentation. Appl. Acoust. 2020, 167, 107389. [Google Scholar] [CrossRef]
- Han, S.; Lee, S. A vision-based motion capture and recognition framework for behavior-based safety management. Autom. Constr. 2013, 35, 131–141. [Google Scholar] [CrossRef]
- Sabillon, C.A.; Cheng, C.F.; Rashidi, A.; Davenport, M.A.; Anderson, D.V. Activity analysis of construction equipment using audio signals and support vector machines. Autom. Constr. 2017, 81, 240–253. [Google Scholar] [CrossRef]
- Sherafat, B.; Rashidi, A.; Asgari, S. Sound-based multiple-equipment activity recognition using convolutional neural networks. Autom. Constr. 2022, 135, 104104. [Google Scholar] [CrossRef]
- Palmerini, L.; Klenk, J.; Becker, C.; Chiari, L. Accelerometer-based fall detection using machine learning: Training and testing on real-world falls. Sensors 2020, 20, 6479. [Google Scholar] [CrossRef] [PubMed]
- Yang, K.; Ahn, C.R.; Vuran, M.C.; Kim, H. Collective sensing of workers’ gait patterns to identify fall hazards in construction. Autom. Constr. 2017, 82, 166–178. [Google Scholar] [CrossRef]
- Kuhar, P.; Sharma, K.; Hooda, Y.; Verma, N.K. Internet of Things (IoT) based Smart Helmet for Construction. J. Phys. Conf. Ser. 2021, 1950, 12075. [Google Scholar] [CrossRef]
- Rajendran, S.; Giridhar, S.; Chaudhari, S.; Gupta, P.K. Technological advancements in occupational health and safety. Meas. Sens. 2021, 15, 100045. [Google Scholar] [CrossRef]
- Rasouli, S.; Alipouri, Y.; Chamanzad, S. Smart Personal Protective Equipment (PPE) for construction safety: A literature review. Saf. Sci. 2024, 170, 106368. [Google Scholar] [CrossRef]
- Homayounfar, S.Z.; Andrew, T.L. Wearable Sensors for Monitoring Human Motion: A Review on Mechanisms, Materials, and Challenges. SLAS Technol. 2020, 25, 9–24. [Google Scholar] [CrossRef]
- Gietzelt, M.; Nemitz, G.; Wolf, K.; Meyer Zu Schwabedissen, H.; Haux, R.; Marschollek, M. A clinical study to assess fall risk using a single waist accelerometer. Inform. Health Social Care 2009, 34, 181–188. [Google Scholar] [CrossRef] [PubMed]
- Ramachandran, A.; Karuppiah, A. A Survey on Recent Advances in Wearable Fall Detection Systems. BioMed Res. Int. 2020, 2020, 2167160. [Google Scholar] [CrossRef]
- Torres-Guzman, R.A.; Paulson, M.R.; Avila, F.R.; Maita, K.; Garcia, J.P.; Forte, A.J.; Maniaci, M.J. Smartphones and Threshold-Based Monitoring Methods Effectively Detect Falls Remotely: A Systematic Review. Sensors 2023, 23, 1323. [Google Scholar] [CrossRef]
- Fang, Y.C.; Dzeng, R.J. Accelerometer-based fall-portent detection algorithm for construction tiling operation. Autom. Constr. 2017, 84, 214–230. [Google Scholar] [CrossRef]
- Guo, Y.; Zhang, Z.; Chang, L.; Yu, J.; Ren, Y.; Chen, K.; Cao, H.; Xie, H. Temperature compensation for MEMS accelerometer based on a fusion algorithm. Micromachines 2024, 15, 835. [Google Scholar] [CrossRef]
- Lee, H.; Sohn, J.; Lee, G.; Jacobs, J.V.; Lee, S. A Graph-Based approach for individual fall risk assessment through a wearable inertial measurement unit sensor. Adv. Eng. Inform. 2025, 66, 103413. [Google Scholar] [CrossRef]
- Yuhai, O.; Kim, H.; Choi, A.; Mun, J.H. Deep learning-based slip-trip falls and near-falls prediction model using a single inertial measurement unit sensor for construction workplace. In Proceedings of the 2023 4th International Conference on Big Data Analytics and Practices (IBDAP), Bangkok, Thailand, 25–27 August 2023; pp. 1–6. [Google Scholar]
- Park, S.; Youm, M.; Kim, J. IMU Sensor-Based Worker Behavior Recognition and Construction of a Cyber—Physical System Environment. Sensors 2025, 25, 442. [Google Scholar] [CrossRef] [PubMed]
- Kim, H.; Ahn, C.R.; Stentz, T.L.; Jebelli, H. Assessing the effects of slippery steel beam coatings to ironworkers’ gait stability. Appl. Ergon. 2018, 68, 72–79. [Google Scholar] [CrossRef] [PubMed]
- Khan, M.; Nnaji, C.; Khan, M.S.; Ibrahim, A.; Lee, D.; Park, C. Risk factors and emerging technologies for preventing falls from heights at construction sites. Autom. Constr. 2023, 153, 104955. [Google Scholar] [CrossRef]
- Paneru, S.; Jeelani, I. Computer vision applications in construction: Current state, opportunities & amp; challenges. Autom. Constr. 2021, 132, 103940. [Google Scholar] [CrossRef]
- Han, S.; Lee, S.; Peña-Mora, F. Vision-based motion detection for safety behavior analysis in construction. In Proceedings of the Construction Research Congress 2012: Construction Challenges in a Flat World, West Lafayette, IN, USA, 21–23 May 2012; pp. 1032–1041. [Google Scholar]
- Han, S.; Lee, S.; Peña-Mora, F. Comparative Study of Motion Features for Similarity-Based Modeling and Classification of Unsafe Actions in Construction. J. Comput. Civil. Eng. 2014, 28, A4014005. [Google Scholar] [CrossRef]
- Fang, Q.; Li, H.; Luo, X.; Ding, L.; Luo, H.; Li, C. Computer vision aided inspection on falling prevention measures for steeplejacks in an aerial environment. Autom. Constr. 2018, 93, 148–164. [Google Scholar] [CrossRef]
- Ding, L.; Fang, W.; Luo, H.; Love, P.E.D.; Zhong, B.; Ouyang, X. A deep hybrid learning model to detect unsafe behavior: Integrating convolution neural networks and long short-term memory. Autom. Constr. 2018, 86, 118–124. [Google Scholar] [CrossRef]
- Lee, H.; Jeon, J.; Lee, D.; Park, C.; Kim, J.; Lee, D. Game engine-driven synthetic data generation for computer vision-based safety monitoring of construction workers. Autom. Constr. 2023, 155, 105060. [Google Scholar] [CrossRef]
- Bonyani, M.; Soleymani, M.; Wang, C. Construction workers’ unsafe behavior detection through adaptive spatiotemporal sampling and optimized attention based video monitoring. Autom. Constr. 2024, 165, 105508. [Google Scholar] [CrossRef]
- Lee, Y.; Scarpiniti, M.; Uncini, A. Advanced Sound Classifiers and Performance Analyses for Accurate Audio-Based Construction Project Monitoring. J. Comput. Civil. Eng. 2020, 34, 04020030. [Google Scholar] [CrossRef]
- Sabillon, C.A.; Rashidi, A.; Samanta, B.; Cheng, C.; Davenport, M.A.; Anderson, D.V. A productivity forecasting system for construction cyclic operations using audio signals and a Bayesian approach. In Proceedings of the Construction Research Congress 2018, New Orleans, LA, USA, 2–4 April 2018; pp. 295–304. [Google Scholar]
- Liu, Z.; Li, S. A sound monitoring system for prevention of underground pipeline damage caused by construction. Autom. Constr. 2020, 113, 103125. [Google Scholar] [CrossRef]
- Rashid, K.M.; Louis, J. Activity identification in modular construction using audio signals and machine learning. Autom. Constr. 2020, 119, 103361. [Google Scholar] [CrossRef]
- Mannem, K.R.; Mengiste, E.; Hasan, S.; de Soto, B.G.; Sacks, R. Smart audio signal classification for tracking of construction tasks. Autom. Constr. 2024, 165, 105485. [Google Scholar] [CrossRef]
- Collado-Villaverde, A.; R-Moreno, M.D.; Barrero, D.F.; Rodriguez, D. Machine Learning Approach to Detect Falls on Elderly People Using Sound. In Proceedings of the International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Arras, France, 27–30 June 2017. [Google Scholar]
- Kaur, P.; Wang, Q.; Shi, W. Fall detection from audios with Audio Transformers. Smart Health 2022, 26, 100340. [Google Scholar] [CrossRef]
- Garcia, A.; Huang, X. SAFE: Sound Analysis for Fall Event detection using machine learning. Smart Health 2025, 35, 100539. [Google Scholar] [CrossRef]
- Zhang, T.; Lee, Y.; Scarpiniti, M.; Uncini, A. A supervised machine learning-based sound identification for construction activity monitoring and performance evaluation. In Proceedings of the Construction Research Congress 2018, New Orleans, LA, USA, 2–4 April 2018; pp. 358–366. [Google Scholar]
- McLoughlin, I.; Xie, Z.; Song, Y.; Phan, H.; Palaniappan, R. Time–Frequency Feature Fusion for Noise Robust Audio Event Classification. Circuits Syst. Signal Process. 2020, 39, 1672–1687. [Google Scholar] [CrossRef]
- Pohlman, K. Principles of Digital Audio, 6th ed.; Digital Video/Audio; McGraw-Hill Professional: New York, NY, USA, 2010; ISBN 9780071663465. [Google Scholar]
- Nyquist, H. Certain Topics in Telegraph Transmission Theory. Trans. Am. Inst. Electr. Eng. 1928, 47, 617–644. [Google Scholar] [CrossRef]
- Watkinson, J. The Art of Digital Audio; Focal Press: Waltham, MA, USA, 2000; ISBN 0240515870. [Google Scholar]
- Stone, B. Perceptual Evaluation of Digital Audio Parameters in Psychoacoustic Research. Master’s Thesis, James Madison University, Harrisonburg, VA, USA, 2023; p. 8. [Google Scholar]
- Khorami, F.; Obaid, N.; Bhatnagar, T.; Ayoub, A.; Robinovitch, S.N.; Sparrey, C.J. Impact forces in backward falls: Subject-specific video-based rigid body simulation of backward falls. Proc. Inst. Mech. Eng. Part H J. Eng. Med. 2023, 237, 1275–1286. [Google Scholar] [CrossRef]
- Salamon, J.; Bello, J.P. Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 2017, 24, 279–283. [Google Scholar] [CrossRef]
- Ko, T.; Peddinti, V.; Povey, D.; Khudanpur, S. Audio augmentation for speech recognition. In Proceedings of the Interspeech, Dresden, Germany, 6–10 September 2015; p. 3586. [Google Scholar]
- Williams, Z.J.; He, J.L.; Cascio, C.J.; Woynaroski, T.G. A review of decreased sound tolerance in autism: Definitions, phenomenology, and potential mechanisms. Neurosci. Biobehav. Rev. 2021, 121, 1–17. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Z.; Schuller, B. Semi-supervised learning helps in sound event classification. In Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 25–30 March 2012; pp. 333–336. [Google Scholar]
- Shi, J.; Chang, X.; Guo, P.; Watanabe, S.; Fujita, Y.; Xu, J.; Xu, B.; Xie, L. Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals. arXiv 2020, arXiv:2006.14150. [Google Scholar] [CrossRef]
- Venkatesh, S.; Moffat, D.; Miranda, E.R. You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection. Appl. Sci. 2022, 12, 3293. [Google Scholar] [CrossRef]
- Akhavian, R.; Behzadan, A.H. Smartphone-based construction workers’ activity recognition and classification. Autom. Constr. 2016, 71, 198–209. [Google Scholar] [CrossRef]
- Landeira, V.A.R.; Santos, J.O.; Nagano, H. Comparing and Combining Audio Processing and Deep Learning Features for Classification of Heartbeat Sounds. In Proceedings of the ICASSP 2024–2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 7220–7224. [Google Scholar]
- Okaba, M.; Tuncer, T. An automated location detection method in multi-storey buildings using environmental sound classification based on a new center symmetric nonlinear pattern: CS-LBlock-Pat. Autom. Constr. 2021, 125, 103645. [Google Scholar] [CrossRef]
- Deng, M.; Meng, T.; Cao, J.; Wang, S.; Zhang, J.; Fan, H. Heart sound classification based on improved MFCC features and convolutional recurrent neural networks. Neural Netw. 2020, 130, 22–32. [Google Scholar] [CrossRef] [PubMed]
- Sangeetha, J.; Hariprasad, R.; Subhiksha, S. Analysis of machine learning algorithms for audio event classification using Mel-frequency cepstral coefficients. In Applied Speech Processing; Elsevier: Amsterdam, The Netherlands, 2021. [Google Scholar]
- Küçükbay, S.E.; Sert, M. Audio-based event detection in office live environments using optimized MFCC-SVM approach. In Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015), Anaheim, CA, USA, 7–9 February 2015; pp. 475–480. [Google Scholar]
- He, X.; Zhao, K.; Chu, X. AutoML: A survey of the state-of-the-art. Knowl. Based Syst. 2021, 212, 106622. [Google Scholar] [CrossRef]
- Jurafsky, D.; Martin, J.H. Speech and Language Processing: International Edition, 2nd ed.; Pearson: Upper Saddle River, NJ, USA, 2008; ISBN 0135041961. [Google Scholar]
- Rabiner, L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 1989, 77, 257–286. [Google Scholar] [CrossRef]
- Levy, J.; Naitsat, A.; Zeevi, Y.Y. Classification of audio signals using spectrogram surfaces and extrinsic distortion measures. Eurasip J. Adv. Signal Process. 2022, 2022, 100. [Google Scholar] [CrossRef]
- Wolf-Monheim, F. Spectral and Rhythm Features for Audio Classification with Deep Convolutional Neural Networks. arXiv 2024, arXiv:2410.06927. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Shi, X.; Chen, Z.; Wang, H.; Yeung, D.; Wong, W.; Woo, W. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. Adv. Neural Inf. Process. Syst. 2015, 28, 1–9. [Google Scholar]
- Karim, F.; Majumdar, S.; Darabi, H.; Harford, S. Multivariate LSTM-FCNs for time series classification. Neural Netw. 2019, 116, 237–245. [Google Scholar] [CrossRef]
- Gong, Y.; Chung, Y.; Glass, J. Ast: Audio spectrogram transformer. arXiv 2021, arXiv:2104.01778. [Google Scholar] [CrossRef]
- Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Piczak, K.J. Environmental sound classification with convolutional neural networks. In Proceedings of the 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), Boston, MA, USA, 17–20 September 2015; pp. 1–6. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In Proceedings of the AAAI Conference On Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar] [CrossRef]
- Aslam, M.A.; Zhang, L.; Liu, X.; Irfan, M.; Xu, Y.; Li, N.; Zhang, P.; Jiangbin, Z.; Yaan, L. Underwater sound classification using learning based methods: A review. Expert. Syst. Appl. 2024, 255, 124498. [Google Scholar] [CrossRef]
- Miao, Y.; Gowayyed, M.; Metze, F. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding. In Proceedings of the 2015 IEEE workshop on automatic speech recognition and understanding (ASRU), Scottsdale, AZ, USA, 13–17 December 2015; pp. 167–174. [Google Scholar]
- Dahl, G.E.; Dong, Y.; Li, D.; Acero, A. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition. IEEE Trans. Audio Speech Lang. Process. 2012, 20, 30–42. [Google Scholar] [CrossRef]
- Islam, M.R.; Akhi, A.B.; Akter, F.; Rashid, M.W.; Rumu, A.I.; Lata, M.A.; Ashrafuzzaman, M. A Machine Learning Approach for Emotion Classification in Bengali Speech. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 885–892. [Google Scholar] [CrossRef]
- Dorogush, A.V.; Ershov, V.; Gulin, A. CatBoost: Gradient boosting with categorical features support. arXiv 2018, arXiv:1810.11363. [Google Scholar] [CrossRef]
- Bhargavi, K.D.; Deepa, V.; Harshitha, I. Audio Classification through MFCC features using RNN algorithm. In Proceedings of the 2024 4th Asian Conference on Innovation in Technology (ASIANCON), Pimari Chinchwad, India, 23–25 August 2024; pp. 1–5. [Google Scholar]
- Mohammed, D.Y.; Al-Karawi, K.; Aljuboori, A. Robust speaker verification by combining MFCC and entrocy in noisy conditions. Bull. Electr. Eng. Inform. 2021, 10, 2310–2319. [Google Scholar] [CrossRef]
- Lebaka, L.N.; Govarthan, P.K.; Rani, P.; Ganapathy, N.; Agastinose Ronickom, J.F. Automated emotion recognition system using blood volume pulse and xgboost learning. In Healthcare Transformation with Informatics and Artificial Intelligence; IOS Press: Amsterdam, The Netherlands, 2023. [Google Scholar]
- Wibowo, R.; Soeleman, M.A.; Affandy, A. Hybrid Top-K Feature Selection to Improve High-Dimensional Data Classification Using Naïve Bayes Algorithm. Sci. J. Inform. 2023, 10, 113–120. [Google Scholar] [CrossRef]
- Constantinou, M.; Exarchos, T.; Vrahatis, A.G.; Vlamos, P. COVID-19 Classification on Chest X-ray Images Using Deep Learning Methods. Int. J. Environ. Res. Public Health 2023, 20, 2035. [Google Scholar] [CrossRef]
- Yerramreddy, D.R.; Marasani, J.; Gowtham, P.S.V.; Yashwanth, S.; SS, P. Speaker identification using MFCC feature extraction: A comparative study using GMM, CNN, RNN, KNN and random forest classifier. In Proceedings of the 2023 Second International Conference on Trends in Electrical, Electronics, and Computer Engineering (TEECCON), Bangalore, India, 23–24 August 2023; pp. 287–292. [Google Scholar]
- Yuan, H. Current perspective on artificial intelligence, machine learning and deep learning. Appl. Comput. Eng. 2023, 19, 116–122. [Google Scholar] [CrossRef]
- Ajakwe, S.O.; Deji-Oloruntoba, O.; Olatunbosun, S.O.; Duorinaah, F.X.; Bayode, I.A. Multidimensional Perspective to Data Preprocessing for Model Cognition Verity. In Recent Trends and Future Direction for Data Analytics; IGI Global Scientific Publishing: New York, NY, USA, 2024. [Google Scholar]
- Santoro, D.; Ciano, T.; Ferrara, M. A comparison between machine and deep learning models on high stationarity data. Sci. Rep. 2024, 14, 19409. [Google Scholar] [CrossRef]
- Goh, H.H.; He, B.; Liu, H.; Zhang, D.; Dai, W.; Kurniawan, T.A.; Goh, K.C. Multi-Convolution Feature Extraction and Recurrent Neural Network Dependent Model for Short-Term Load Forecasting. IEEE Access 2021, 9, 118528–118540. [Google Scholar] [CrossRef]
- Madhuri Ghuge, E.A. Deep Learning Driven QoS Anomaly Detection for Network Performance Optimization. J. Electr. Syst. 2024, 19, 97–104. [Google Scholar] [CrossRef]
No. of Audio Samples within Each Window | 2048 (211) | 4096 (212) | 8192 (213) | 16,384 (214) | 32,768 (215) | |
Window size (sec) | 0.046 | 0.093 | 0.186 | 0.372 | 0.743 | |
No. of window segments | Walking | 86,898 | 43,026 | 21,156 | 10,170 | 4698 |
Slip | 41,658 | 20,868 | 10,404 | 5208 | 2580 | |
Trip | 46,008 | 22,998 | 11,460 | 5772 | 2856 | |
Total | 174,564 | 86,892 | 43,020 | 21,150 | 10,134 |
Category | Features |
---|---|
Time domain | Mean, standard deviation, maximum, minimum, variance, kurtosis, entropy, zero crossing rate, energy, energy entropy |
Frequency domain | Frequency centroid 1, frequency centroid 2, frequency entropy, frequency flux, frequency roll-off |
MFCC | The first 13 MFCC coefficients |
Classifier | Accuracy | F1 Score | Recall | Precision | |
---|---|---|---|---|---|
1 | GMM-HMM | 0.918 | 0.920 | 0.912 | 0.918 |
3 | CatBoost | 0.907 | 0.891 | 0.891 | 0.891 |
5 | LightGBM | 0.907 | 0.889 | 0.890 | 0.889 |
4 | Extreme Gradient Boosting | 0.905 | 0.890 | 0.890 | 0.890 |
2 | Gradient Boosting | 0.902 | 0.891 | 0.891 | 0.891 |
6 | Extra trees | 0.891 | 0.883 | 0.884 | 0.883 |
7 | SVM | 0.883 | 0.878 | 0.879 | 0.878 |
8 | Random Forest | 0.879 | 0.873 | 0.873 | 0.873 |
11 | KNN | 0.874 | 0.860 | 0.862 | 0.862 |
10 | Logistic regression | 0.868 | 0.864 | 0.865 | 0.864 |
9 | ADA boosting | 0.866 | 0.866 | 0.867 | 0.866 |
12 | Linear Discriminant analysis | 0.865 | 0.857 | 0.859 | 0.858 |
13 | Decision tree | 0.860 | 0.859 | 0.859 | 0.859 |
14 | Ridge classifier | 0.860 | 0.850 | 0.852 | 0.852 |
16 | Quadratic discriminant analysis | 0.791 | 0.776 | 0.785 | 0.800 |
15 | Naïve Bayes | 0.787 | 0.785 | 0.792 | 0.796 |
Average | 0.86 | 0.865 | 0.866 | 0.866 |
Classifier | Window Size | ||||||
---|---|---|---|---|---|---|---|
0.046 | 0.093 | 0.186 | 0.372 | 0.743 | Average | ||
1 | LightGBM | 0.907 | 0.912 | 0.904 | 0.901 | 0.907 | 0.906 |
2 | Extreme Gradient Boosting | 0.905 | 0.910 | 0.904 | 0.886 | 0.907 | 0.902 |
3 | Gradient Boosting | 0.902 | 0.909 | 0.902 | 0.889 | 0.903 | 0.901 |
4 | CatBoost | 0.907 | 0.910 | 0.904 | 0.882 | 0.901 | 0.900 |
5 | Extra trees | 0.891 | 0.895 | 0.891 | 0.883 | 0.893 | 0.890 |
6 | Random Forest | 0.879 | 0.875 | 0.869 | 0.860 | 0.852 | 0.867 |
7 | SVM | 0.883 | 0.884 | 0.868 | 0.854 | 0.834 | 0.865 |
8 | Linear Discriminant Analysis | 0.865 | 0.860 | 0.862 | 0.851 | 0.864 | 0.860 |
9 | ADA boosting | 0.866 | 0.861 | 0.867 | 0.852 | 0.842 | 0.858 |
10 | GMM-HMM | 0.918 | 0.873 | 0.858 | 0.825 | 0.806 | 0.856 |
11 | Ridge classifier | 0.860 | 0.854 | 0.860 | 0.847 | 0.860 | 0.856 |
12 | Logistic regression | 0.868 | 0.864 | 0.855 | 0.843 | 0.848 | 0.856 |
13 | Decision tree | 0.860 | 0.857 | 0.842 | 0.836 | 0.856 | 0.850 |
14 | KNN | 0.874 | 0.865 | 0.851 | 0.819 | 0.789 | 0.840 |
15 | QDA | 0.791 | 0.795 | 0.814 | 0.844 | 0.860 | 0.820 |
16 | Naïve Bayes | 0.787 | 0.788 | 0.799 | 0.800 | 0.811 | 0.797 |
Average | 0.873 | 0.870 | 0.866 | 0.855 | 0.858 |
Classifier | Feature Category | |||
---|---|---|---|---|
Time Domain | Frequency Domain | MFCC | All Features | |
Gradient Boosting | 0.830 | 0.825 | 0.883 | 0.902 |
XGBoost | 0.829 | 0.860 | 0.888 | 0.905 |
CatBoost | 0.831 | 0.861 | 0.888 | 0.907 |
LightGBM | 0.832 | 0.858 | 0.886 | 0.907 |
Extra trees | 0.832 | 0.860 | 0.878 | 0.891 |
KNN | 0.756 | 0.761 | 0.873 | 0.874 |
SVM | 0.809 | 0.738 | 0.874 | 0.883 |
Random Forest | 0.827 | 0.818 | 0.872 | 0.879 |
Logistic regression | 0.793 | 0.710 | 0.859 | 0.868 |
QDA | 0.723 | 0.710 | 0.849 | 0.791 |
LDA | 0.791 | 0.715 | 0.854 | 0.865 |
Decision tree | 0.814 | 0.800 | 0.853 | 0.860 |
Ridge classifier | 0.787 | 0.686 | 0.851 | 0.860 |
Naïve Bayes | 0.662 | 0.701 | 0.846 | 0.787 |
ADA boosting | 0.814 | 0.793 | 0.851 | 0.866 |
Average | 0.795 | 0.779 | 0.867 | 0.873 |
Algorithm—Feature | Time Frame (Sec.) | Avg. | Std. Dev. | ||||
---|---|---|---|---|---|---|---|
0.046 | 0.093 | 0.186 | 0.372 | 0.743 | |||
CNN—Time Series | 0.801 | 0.804 | 0.771 | 0.775 | 0.807 | 0.792 | 0.017 |
CNN—Spectrogram | 0.807 | 0.815 | 0.837 | 0.845 | 0.862 | 0.833 | 0.022 |
CNN—MFCC | 0.764 | 0.785 | 0.785 | 0.866 | 0.828 | 0.806 | 0.041 |
CNNLSTM—Time Series | 0.957 | 0.943 | 0.909 | 0.926 | 0.887 | 0.924 | 0.028 |
CNNLSTM—Spectrogram | 0.928 | 0.929 | 0.894 | 0.893 | 0.886 | 0.906 | 0.021 |
CNNLSTM—MFCC | 0.966 | 0.964 | 0.943 | 0.952 | 0.923 | 0.949 | 0.018 |
Average | 0.871 | 0.873 | 0.857 | 0.876 | 0.866 | ||
Std. dev. | 0.090 | 0.080 | 0.070 | 0.063 | 0.042 |
Algorithm—Feature | Time Frame (Sec.) | Avg. | Std. Dev. | ||||
---|---|---|---|---|---|---|---|
0.046 | 0.093 | 0.186 | 0.372 | 0.743 | |||
InceptionV3—Time Series | 0.844 | 0.847 | 0.833 | 0.830 | 0.821 | 0.835 | 0.011 |
InceptionV3—Spectrogram | 0.861 | 0.862 | 0.866 | 0.854 | 0.860 | 0.861 | 0.004 |
InceptionV3—MFCC | 0.854 | 0.851 | 0.844 | 0.837 | 0.833 | 0.844 | 0.009 |
Resnet50—Time Series | 0.830 | 0.828 | 0.820 | 0.804 | 0.744 | 0.805 | 0.036 |
Resnet50—Spectrogram | 0.779 | 0.774 | 0.725 | 0.722 | 0.773 | 0.755 | 0.029 |
Resnet50—MFCC | 0.851 | 0.839 | 0.828 | 0.751 | 0.719 | 0.798 | 0.059 |
InceptionResNetV2—Time Series | 0.824 | 0.822 | 0.819 | 0.825 | 0.810 | 0.820 | 0.006 |
InceptionresnetV2—Spectrogram | 0.888 | 0.877 | 0.872 | 0.859 | 0.851 | 0.869 | 0.015 |
InceptionresnetV2—MFCC | 0.857 | 0.854 | 0.851 | 0.855 | 0.839 | 0.851 | 0.007 |
Densenet169—Time Series | 0.850 | 0.849 | 0.845 | 0.846 | 0.846 | 0.847 | 0.002 |
Densenet169—Spectrogram | 0.881 | 0.878 | 0.879 | 0.874 | 0.876 | 0.878 | 0.003 |
Densenet169—MFCC | 0.869 | 0.866 | 0.862 | 0.869 | 0.859 | 0.865 | 0.004 |
Average | 0.849 | 0.846 | 0.837 | 0.827 | 0.819 | ||
Std. dev. | 0.029 | 0.028 | 0.040 | 0.047 | 0.049 |
Architecture | Window Sizes | |||||
---|---|---|---|---|---|---|
0.046 s | 0.093 s | 0.186 s | 0.372 s | 0.743 s | Average | |
Bi-LSTM | 0.896 | 0.904 | 0.911 | 0.896 | 0.863 | 0.894 |
LSTM | 0.896 | 0.909 | 0.912 | 0.891 | 0.857 | 0.893 |
1D-CNN | 0.889 | 0.901 | 0.887 | 0.895 | 0.852 | 0.884 |
Average | 0.893 | 0.904 | 0.903 | 0.894 | 0.857 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, F.; Duorinaah, F.X.; Kim, M.-K.; Thedja, J.; Seo, J.; Lee, D.-E. Sound-Based Detection of Slip and Trip Incidents Among Construction Workers Using Machine and Deep Learning. Buildings 2025, 15, 3136. https://doi.org/10.3390/buildings15173136
Li F, Duorinaah FX, Kim M-K, Thedja J, Seo J, Lee D-E. Sound-Based Detection of Slip and Trip Incidents Among Construction Workers Using Machine and Deep Learning. Buildings. 2025; 15(17):3136. https://doi.org/10.3390/buildings15173136
Chicago/Turabian StyleLi, Fangxin, Francis Xavier Duorinaah, Min-Koo Kim, Julian Thedja, JoonOh Seo, and Dong-Eun Lee. 2025. "Sound-Based Detection of Slip and Trip Incidents Among Construction Workers Using Machine and Deep Learning" Buildings 15, no. 17: 3136. https://doi.org/10.3390/buildings15173136
APA StyleLi, F., Duorinaah, F. X., Kim, M.-K., Thedja, J., Seo, J., & Lee, D.-E. (2025). Sound-Based Detection of Slip and Trip Incidents Among Construction Workers Using Machine and Deep Learning. Buildings, 15(17), 3136. https://doi.org/10.3390/buildings15173136