Machine Learning-Based Approach to Identifying Fall Risk in Seafarers Using Wearable Sensors

: Falls on a ship cause severe injuries, and an accident falling off board, referred to as “man overboard” (MOB), can lead to death. Thus, it is crucial to accurately and timely detect the risk of falling. Wearable sensors, unlike camera and radar sensors, are affordable and easily accessible regardless of the weather conditions. This study aimed to identify the fall risk level (i.e., high and low risk) among individuals on board using wearable sensors. We collected walking data from accelerometers during the experiment by simulating the ship’s rolling motions using a computer-assisted rehabilitation environment (CAREN). With the best features selected by LASSO, eight machine learning (ML) models were implemented with a synthetic minority oversampling technique (SMOTE) and the best-tuned hyperparameters. In all ML models, the performance in classifying fall risk showed overall a good accuracy (0.7778 to 0.8519), sensitivity (0.7556 to 0.8667), specificity (0.7778 to 0.8889), and AUC (0.7673 to 0.9204). Logistic regression showed the best performance in terms of the AUC for both training (0.9483) and testing (0.9204). We anticipate that this study will effectively help identify the risk of falls on ships and aid in developing a monitoring system capable of averting falls and detecting MOB situations.


Introduction
Human falls cause serious injury.In particular, falls are even more likely to occur on ships due to their motion.The loss of balance caused by ship's movements at sea causes falls, and due to the iron structure of ships, falls on board can result in much more severe injuries compared to those on land.In the worst-case scenario, an accident that involves falling off a ship, called "man overboard" (MOB), can lead to death.In fact, 22 people fall off board each year on cruise ships, of which 79 percent either go missing or do not survive [1].From a broader perspective, an estimated 1000 or more people are involved in MOB incidents, and the survival rate is meager [2].This low survival rate can be attributed to the unnoticed nature of MOB situations, in which the occurrence (e.g., time and location) is not promptly identified, leading to a drastic drop in the faller's body temperature (i.e., hypothermia) [3].An MOB can be handled more effectively if the situation is immediately recognized by another crew member on the ship [4].However, because of the complex structure of a ship's deck, the coverage of closed-circuit television (CCTV) surveillance cameras is limited, and as the size of the ship increases, blind spots will also increase.It is not easy to directly detect falls because the number of crew members is limited.Therefore, it is essential to accurately and timely detect human falls on ships.To do this, it is necessary to identify the risk of falling while walking on a vessel in motion.
Traditional techniques for fall risk classification often rely on clinical assessments and scoring systems based on observations, questionnaires, and physical examinations conducted by healthcare professionals.Healthcare providers use various assessment tools, such as the Timed Up and Go (TUG) test, Berg Balance Scale (BBS), Tinetti Performance-Oriented Mobility Assessment (POMA), and Functional Reach Test (FRT) to evaluate balance, gait, mobility, and other factors related to fall risk [5][6][7][8].Patients may be asked to complete questionnaires, such as the Falls Efficacy Scale-International (FES-I), Falls Risk Assessment Tool (FRAT), or other self-reported surveys to assess their perception of their own fall risk and related factors [9,10].However, these traditional techniques rely heavily on subjective assessments and observations by healthcare professionals and may have limitations in predicting fall risk accurately, especially in individuals with complex medical conditions or those at higher risk for falls.Therefore, integrating machine learning (ML) techniques into fall risk classification can provide additional insight and improve the accuracy of predictive models.
In recent years, ML techniques have been widely applied in many research domains.Specifically, ML techniques such as classification and clustering tackle many automated recognition or prediction problems.Many researchers have utilized ML to predict or detect falls and identify the risk of falls [11][12][13][14].Thakur and Han [11] proposed an optimal ML approach to improve fall detection in assisted living by comparing 19 different ML methods.Usmani et al. [12] explored the latest research trends in fall detection and prevention using ML algorithms.Noh et al. [13] developed the extreme gradient boosting (XGB) model to predict the fall risk level in older adults by identifying the optimal gait features.Chakraborty and Sorwar [14] discriminated between fallers and non-fallers based on the long-term monitoring of natural fall data using three different ML approaches.These studies primarily examined older adults in the biomedical or healthcare domains.There are fewer studies on the prediction or detection of seafarers' falls in the maritime field.Only a few previous studies have attempted to detect MOBs through ML technologies [15][16][17][18][19]. Tsekenis et al. [15] implemented an ML-based system to detect MOBs using radar sensors.They achieved higher accuracy (97.24%) with a random forest algorithm.Bakalos et al. investigated identifying MOB events using simple RGB streams and thermal imagery through convolutional spatiotemporal autoencoders, which can detect a fall as an anomaly [16,17].Armeniakos et al. [18] built a human fall detection system using multiple long-range millimeter-wave band radar sensors.They emphasized that the system can detect and track real human fall scenarios.Gürüler et al. [19] designed an MOB detection system module using GPS, radio frequency, and a mobile ad hoc network to warn of an MOB situation, including the location and information of the individual involved in the MOB.However, these studies need expensive devices, such as radar sensors and video cameras, and some challenges still exist.The video cameras can be affected by environmental conditions (e.g., fog, rain, and snow), and the radar sensors can be interrupted by other obstacles or reflective objects.To tackle these limitations, we used a wearable sensor to identify the fall risk levels in this study through ML algorithms, since a wearable sensor is cheap, light, and convenient to use in the laboratory and real-world settings and not influenced by weather conditions.The recent advancements in wearable sensor technology have led to its versatile application in various research domains, particularly in healthcare.The flexible nature of these sensors has facilitated their widespread adoption, contributing to innovative solutions and improvements in diverse fields, including fall detection methods [20][21][22].
The purpose of this study was to classify the risk of human falls in a rolling situation of a ship.Our specific aims for this study were:

•
To see whether an ML approach can be applied to identify fall risks with a wearable sensor;

•
To identify the best gait features for the prediction of the fall risk level (high or low) during a ship's rolling conditions; • To examine which ML models perform best for fall risk classifications under a ship's roll motions.
To achieve these goals, a computer-assisted rehabilitation environment (CAREN) was used to systematically simulate the rolling motions of a ship.In this study, we simulated ship roll motions of up to 20 degrees.We also implemented eight ML classification models with the best feature set and hyperparameters.The detailed experimental design is described in Section 2.
The main contributions of this study are summarized as follows: • To the best of our knowledge, this study marks the initial endeavor to detect fall risks in the maritime field using wearable sensors, as the majority of the previous studies used video cameras or radar sensors, often focusing on older adults in biomedical and healthcare fields; We comprehensively analyzed eight ML models for fall risk classification implemented with a synthetic minority oversampling technique (SMOTE) and hyperparameters tuning; The findings of this study can be applied to prevent seafarers or passengers from falls and MOBs by determining the risk of falls during a ship's rolling motions.
The remaining paper is structured as follows: Section 2 describes this study's experimental design and methodology, including data collection, data preprocessing, ML techniques, and overall implementation.The results of the study are presented in Section 3. In Section 4, we discuss the findings of this study.Finally, in Section 5, we provide conclusions and future research directions.

Related Work
Fall detection is a critical area of research in healthcare and assistive technology, aiming to prevent fall-related injuries among vulnerable populations.Over the years, researchers have explored various methodologies and technologies to develop effective fall detection systems.In this section, we review the existing literature on fall detection, focusing on recent advancements and key findings in the field.

Traditional Sensor-Based Approaches to Fall Detection
Early efforts in fall detection primarily relied on traditional approaches, including rule-based systems, wearable sensors, and ambient sensors [23][24][25].These systems often involved threshold-based algorithms to detect abrupt changes in motion patterns indicative of a fall.While effective to some extent, traditional approaches face challenges, such as high false alarm rates and limited adaptability to diverse environments and user behaviors.In particular, ships are affected by continuous movements, including rolling, pitching, and heaving, which can cause significant variability and noise in sensor data.However, many existing fall detection studies using sensors have been conducted in static environments [11,21,26,27].Therefore, distinguishing between normal ship movement and fall events requires sophisticated algorithms that can robustly detect falls under dynamic movement patterns.The proposed study considered ship movement, to some extent, by applying rolling.

Hidden Markov Model (HMM) for Fall Detection
Many pieces of literature have reported on Hidden Markov Models (HMMs) for their high recognition rates in human activity recognition (HAR), particularly in fall detection using wearable sensors [26,[28][29][30].Moreover, HMMs exhibit superior efficiency, interpretability, and scalability owing to their innate and robust modeling capabilities for time series data [29,31].However, HMMs require a predefined number of states, which can make it difficult for these models to handle complex situations, and they are mainly used to deal with sequential data, especially time series data.Thus, their application to different types of data and problems is difficult.On the other hand, ML can provide more flexible models because it can handle different types of data, such as images, text, and speech, and it can model complex relationships among different variables.The proposed study evaluated

Data Collection
We recruited 30 healthy participants for this study.In Table 1, the participants' demographics are summarized.All participants read and signed a consent form approved by the Institutional Review Board at the University of Nebraska Medical Center (IRB 141-21-EP).A general inclusion criterion was that participants should be between 19 and 55 years old.Participants were excluded if they had:

•
A major lower extremity injury or surgery; • Known cardiovascular conditions that make it unsafe for them to exercise; • A history of dizziness due to vestibular disorders, such as Meniere's disease and vertigo; • Any difficulty in walking in unstable, moving environments.71.9 ± 14.5

Data Collection
We recruited 30 healthy participants for this study.In Table 1, the participants' demographics are summarized.All participants read and signed a consent form approved by the Institutional Review Board at the University of Nebraska Medical Center (IRB 141-21-EP).A general inclusion criterion was that participants should be between 19 and 55 years old.Participants were excluded if they had:

•
A major lower extremity injury or surgery; • Known cardiovascular conditions that make it unsafe for them to exercise; • A history of dizziness due to vestibular disorders, such as Meniere's disease and vertigo; • Any difficulty in walking in unstable, moving environments.We recorded the subject's movement at 100 Hz with ten cameras using a 3D motion capture system (Vicon Motion System Ltd., Oxford, UK).Anatomical landmarks were marked with 37 reflective markers using the Plug-In Gait full-body model [32]; four markers were applied to the head, five to the torso, 12 to the upper limbs, four to the pelvis, and 12 to the lower limbs.In addition, we placed seven accelerometers (Xsens, Enschede, The Netherlands) on the pelvis, feet, shanks, and thighs to obtain three-dimensional accelerations.The sampling frequencies of the accelerometers were set to 100 Hz.This study analyzed acceleration data from the pelvis because upper body motion is more appropriate for measuring balance [33].As shown in Figure 2a, the reflective markers and accelerometers were placed accordingly.The ship's roll motion was simulated up to 20 degrees using the CAREN system (Motek, Amsterdam, The Netherlands).Participants walked for two minutes at their own pace using the CAREN system with a split-belt treadmill, as shown in Figure 2b.All participants were fitted with safety harnesses to prevent accidents on the moving platform.Nine different conditions were applied: no rolling and 5-, 10-, 15-, and 20-degrees of rolling with slow (12 s) and fast (6 s) rolling cycles.Previous studies used different incline degrees, such as 5, 10, 15, and 20, to determine the evacuation walking time in an emergency at sea [34][35][36].Thus, we chose the same rolling angles for our experiments.We picked a 12 s rolling cycle of a passenger ship and a 6 s rolling cycle of a general cargo ship for the slow-and fast-rolling cycles, respectively [37].We conducted nine different walking trials in random order to prevent learning effects.
Body Mass Index (BMI) (kg/m 2 ) 23.8 ± 3.4 We recorded the subject's movement at 100 Hz with ten cameras using a 3D motion capture system (Vicon Motion System Ltd., Oxford, UK).Anatomical landmarks were marked with 37 reflective markers using the Plug-In Gait full-body model [32]; four markers were applied to the head, five to the torso, 12 to the upper limbs, four to the pelvis, and 12 to the lower limbs.In addition, we placed seven accelerometers (Xsens, Enschede, The Netherlands) on the pelvis, feet, shanks, and thighs to obtain three-dimensional accelerations.The sampling frequencies of the accelerometers were set to 100 Hz.This study analyzed acceleration data from the pelvis because upper body motion is more appropriate for measuring balance [33].As shown in Figure 2a, the reflective markers and accelerometers were placed accordingly.The ship's roll motion was simulated up to 20 degrees using the CAREN system (Motek, Amsterdam, The Netherlands).Participants walked for two minutes at their own pace using the CAREN system with a split-belt treadmill, as shown in Figure 2b.All participants were fitted with safety harnesses to prevent accidents on the moving platform.Nine different conditions were applied: no rolling and 5-, 10-, 15-, and 20-degrees of rolling with slow (12 s) and fast (6 s) rolling cycles.Previous studies used different incline degrees, such as 5, 10, 15, and 20, to determine the evacuation walking time in an emergency at sea [34][35][36].Thus, we chose the same rolling angles for our experiments.We picked a 12 s rolling cycle of a passenger ship and a 6 s rolling cycle of a general cargo ship for the slow-and fast-rolling cycles, respectively [37].We conducted nine different walking trials in random order to prevent learning effects.

Data Preprocessing
Using the collected accelerations from the pelvis for nine different walking trials, we first labeled the data in terms of the fall risk as high or low.Choi et al. found significant balance and stability variations in rolling above 15 degrees [38].Thus, the data on the walking trials in 15 and 20 degrees of rolling for both the slow and fast cycles were labeled as "high risk", and the remaining data (i.e., no rolling and 5 and 10 degrees in slow and fast cycles) were labeled as "low risk".We also randomly divided the data into training (70%) and test (30%) datasets, as shown in Table 2.

Data Preprocessing
Using the collected accelerations from the pelvis for nine different walking trials, we first labeled the data in terms of the fall risk as high or low.Choi et al. found significant balance and stability variations in rolling above 15 degrees [38].Thus, the data on the walking trials in 15 and 20 degrees of rolling for both the slow and fast cycles were labeled as "high risk", and the remaining data (i.e., no rolling and 5 and 10 degrees in slow and fast cycles) were labeled as "low risk".We also randomly divided the data into training (70%) and test (30%) datasets, as shown in Table 2.In [38], we calculated the center of mass excursion (COME) and the variability in the margin of stability (vMOS) with the data collected from the motion capture system, since these two variables represent the balance or stability during walking and have been proven to be reasonable predictors of falls in many studies [39][40][41][42][43][44][45][46][47].To verify the determination of data divided into high and low risk, we compared the difference between the two groups using an independent samples t-test for the four variables: mediolateral and anterior-posterior directional COMEs and vMOSs, denoted by ML-COME, AP-COME, ML-vMOS, and AP-vMOS, respectively.The t-test analysis revealed a statistically significant difference in the two groups' risk of falling movements (p < 0.001), as shown in Table 3 and Figure 3. proven to be reasonable predictors of falls in many studies [39][40][41][42][43][44][45][46][47].To verify the determination of data divided into high and low risk, we compared the difference between the two groups using an independent samples t-test for the four variables: mediolateral and anterior-posterior directional COMEs and vMOSs, denoted by ML-COME, AP-COME, ML-vMOS, and AP-vMOS, respectively.The t-test analysis revealed a statistically significant difference in the two groups' risk of falling movements (p < 0.001), as shown in Table 3 and Figure 3.To extract gait features from the pelvis accelerations, the initial step involved identifying each step event.The methods employed for detecting step events and extracting features are consistent with the peak detection method in our previous works [48,49].Table 4 lists twenty gait features extracted from the pelvis.Each feature was also calculated with its average value (denoted by a lowercase "a"), symmetry value (denoted by a lowercase "s"), and variability value (denoted by a lowercase "v").This study used 60 features, and detailed methods for step detection and feature extraction can be found in [48,49], respectively.We normalized the features to have a zero mean and scaled them to unit variance.

Feature Selection Using LASSO
Feature selection is a key part of this work.Feature engineering in HAR encompasses various techniques, including feature stacking, feature space reduction, and the design of high-level HAR features [51][52][53].We applied the least absolute shrinkage and selection operator (LASSO) to select a subset of relevant features, since the LASSO was the best feature selection method for identical data in the previous study [50].In LASSO, the residual sum of squares of a vector of regression coefficients is minimized subject to a constraint on the L1-norm [54].A sparse model is obtained by shrinking the coefficients of less important variables to zero using this method.LASSO is defined as: where y i and x ij represent the respective outcome and variables of the i-th subject; λ is a non-negative hyperparameter; and β is a vector of regression coefficients.The best λ was chosen to minimize the mean squared error (MSE) based on 10-fold cross-validation (CV).We repeated this step 100 times with training data.By using the cut-off threshold (i.e., selected at least 50 times), we derived the most selected features after 100 iterations.

SMOTE Resampling
A majority class is a class with a larger number of samples, and a minority class is a class with a smaller number of samples.On the basis of the label distribution, our datasets are slightly skewed as high-risk labels are the minority class.In the case of an imbalanced dataset, it is more difficult for a model to learn from the high-risk group, which is the class of interest.The accuracy of a traditional classifier is biased toward the majority when there are not enough samples in the minority class.There is no guarantee that minority class samples can be classified, even if their accuracy rate is high.Resampling techniques can mitigate the problem of imbalanced classes.This technique improves model performance by increasing the distribution of the minority class in the training data.In general, there are two resampling methods, namely oversampling and undersampling.
For this study, we selected a popular resampling method called synthetic minority oversampling technique (SMOTE), which was proposed by Chawla et al. [55] in 2002.With SMOTE, synthetic examples of the minority class are created to add instances to the minority class to balance the classes [55].The SMOTE method creates new samples by linearly interpolating two minority samples.By doing so, we are alleviating the overfitting problems caused by random oversampling, making class distributions more balanced and improving the generalization capabilities of the classifier.
For binary classification, LR is one of the most popular methods [63].The LR uses the maximum likelihood to find the regression coefficients for each feature so that the predicted probability of each class is as close to the actual class as possible.The estimated coefficients can calculate the probability of a given observation falling into each class [64].
A DT performs classification using recursive binary splitting.A tree is constructed from a root node, and splitting occurs in each node until it reaches the minimum size of a class subgroup or a stop condition.During the construction of a tree, the Gini index or entropy is used to assess each split's quality [65].
A KNN classifier constructs a decision boundary by identifying the k samples closest to a given observation [66].The KNN assigns a class to the observation based on the simple majority vote of its k nearest neighbors [65].We used the default configuration for the KNN classifier to calculate the distance, known as the Minkowski distance metric.
An RF solves classification tasks by building many decision trees.Unlike DT, an RF is composed of a large number of individual trees [59,67,68].Because of the sensitivity of each tree to its training data, each tree's structure changes when given slightly different data each time.Each tree is constructed using a subset of the training data and splitting each node according to the best randomly selected feature set.A final classification is determined based on the majority vote from the decision trees.The RF is less prone to overfitting [64].
The XGB is an integrated learning algorithm developed by Chen in 2016 [60].The XGB algorithm has been applied in many fields because it is fast, accurate, and robust.The XGB algorithm optimizes and enhances a gradient-boosted DT algorithm, which can parallelize computations, build approximate trees, and process sparse data efficiently.In addition, XGB optimizes CPU and memory usage, making it ideal for recognizing and classifying multidimensional data features [69].
An SVM maps the features onto high-dimensional space using kernels to accommodate nonlinear class boundaries and then constructs a hyperplane that effectively separates observations [61,70,71].As new observations are provided, they will be mapped into highdimensional space to assign a class based on the hyperplane [64].We built SVM classifiers with three kernels: a linear kernel (SVM-L), a radial basis function kernel (SVM-RBF), and a polynomial kernel (SVM-Poly).
ML models require optimized hyperparameters to achieve robust performance results.Default hyperparameter settings cannot optimize ML techniques, and this crucial step requires additional attention [72,73].The hyperparameters of each model had to be tuned during the training phase to construct a model that performed relatively well.We adjusted the hyperparameters of each method as specified in Table 5 using a grid search with a 10-fold CV.The best hyperparameters were set according to the area under the receiver operating characteristic (ROC) curve (AUC).

Evaluation Metrics
The testing dataset was used to evaluate the predictive performance of the ML classification models during the model evaluation phase.Performance comparisons were made using the accuracy, sensitivity, specificity, and AUC metrics.Accuracy refers to the proportion of true positives and true negatives among all cases.Sensitivity (also known as the true positive rate) is the probability of a positive test, assuming that the results are genuinely positive.Specificity (also known as true negative rate) refers to the probability of a negative test, assuming that the results are truly negative.The accuracy, sensitivity, and specificity are defined as: where TP is predicted to be a positive class for the original positive class; TN is predicted to be a negative class for the original negative class; FP is predicted to be a positive class for the original negative class; and FN is predicted to be a negative class for the original positive class.This study mainly used the AUC to evaluate ML classifiers as the ROC curve provides valuable insight into the classifier's performance across the entire range of possible operating points, helping us understand its strengths and limitations.Also, the AUC is less affected by class imbalance compared to other metrics like accuracy [74].The ROC curve captures the trade-off between the true positive rate and the false positive rate for all decision boundaries [55] and offers a straightforward interpretation.The AUC is a valuable measure of a binary classifier's performance that summarizes the ROC curve into a single value between 0 and 1, with 1 representing the best result.

Software
All analyses were performed using R statistical software (version 4.2.1).The glmnet package [75] was used to perform the LASSO for feature selection.All ML classification models, including the hyperparameter tuning process, were implemented and tested using the caret package [62].

Feature Selection Results
In this study, the LASSO method was employed for the feature selection.We repeated the LASSO 100 times with randomly picked training data to find the most selected features.The smallest MSE determined the best λ of the LASSO for each iteration with a 10-fold CV. Figure 4 shows an example of the best parameter λ tuning process for the LASSO.
On the basis of 100 iterations, we found the most frequently selected features for fall risk classification, as represented in Table 6.We chose only the features selected more than 50 times to build the ML classifiers.The initial step-event relevant features (i.e., aLHM, sLHS, sAHS, aAHS, and vAHS) and double limb support relevant features (i.e., sAMD and aAMD) were mainly selected.We also found that most selected features were lateral and anterior directional features.

Hyperparameter Tuning Results
Using a grid search method with a 10-fold CV, we tuned the hyperparameters for the ML models based on the highest AUCs to achieve the best performances.Figure 5 illustrates examples of the hyperparameter tuning process.The hyperparameters for DT, KNN, and RF were set as cp = 0.0001, k = 5, and mtry = 2, respectively.For the XGB, the hyperparameters were determined as nround = 90, max_depth = 3, and eta = 0.1.For the SVM, we set the hyperparameters for three different kernels: (1) C = 0.1 for SVM-L; (2) C = 1 and sigma = 0.1 for SVM-RBF; and (3) C = 0.561, scale = 0.135, and degree = 2 for SVM-Poly.Table 7 summarizes the list of the best hyperparameters for all of the ML classification models.

Hyperparameter Tuning Results
Using a grid search method with a 10-fold CV, we tuned the hyperparameters for the ML models based on the highest AUCs to achieve the best performances.Figure 5 illustrates examples of the hyperparameter tuning process.The hyperparameters for DT, KNN, and RF were set as cp = 0.0001, k = 5, and mtry = 2, respectively.For the XGB, the hyperparameters were determined as nround = 90, max_depth = 3, and eta = 0.1.For the SVM, we set the hyperparameters for three different kernels: (1) C = 0.1 for SVM-L; (2) C = 1 and sigma = 0.1 for SVM-RBF; and (3) C = 0.561, scale = 0.135, and degree = 2 for SVM-Poly.Table 7 summarizes the list of the best hyperparameters for all of the ML classification models.

Classification Results
The performance results of the fall risk classifications for each ML model are shown in Table 8, with the four metrics (i.e., accuracy, sensitivity, specificity, and AUC) used in this study.The result indicates that the XGB and SVM-Poly had the highest accuracy (0.8519) among all models.The SVM-RBF performed better in specificity (0.8889).The LR performed best in terms of the sensitivity (0.8667) and AUC (0.9204). Figure 6 shows the binary classification confusion matrix for the eight models to illustrate how the classifiers predict fall risk.The ROC curves, including the AUCs for all methods, are shown in Figure 7.The LR outperformed the other classifiers for both the training (AUC = 0.9483) and

Classification Results
The performance results of the fall risk classifications for each ML model are shown in Table 8, with the four metrics (i.e., accuracy, sensitivity, specificity, and AUC) used in this study.The result indicates that the XGB and SVM-Poly had the highest accuracy (0.8519) among all models.The SVM-RBF performed better in specificity (0.8889).The LR performed best in terms of the sensitivity (0.8667) and AUC (0.9204). Figure 6 shows the binary classification confusion matrix for the eight models to illustrate how the classifiers predict fall risk.The ROC curves, including the AUCs for all methods, are shown in Figure 7.The LR outperformed the other classifiers for both the training (AUC = 0.9483) and testing (AUC = 0.9204) datasets.Overall, the results show that the LR was the best classification model for identifying the fall risk in this study.

Discussion
Many researchers have studied the assessment of fall risk [11][12][13][14], but most of them examined older adults, and little research has been conducted on fall risk identification for seafarers on a ship.In addition, existing studies related to classifying fall risks have been undertaken via different devices like radars and cameras [15][16][17][18][19].However, wearable sensor-based fall risk evaluation studies have been relatively insufficient.To the best of our knowledge, this study represents the first use of machine learning approaches to identify fall risks using wearable sensors in the context of ship's rolling situations.Various ML models (LR, DT, KNN, RF, XGB, SVM-L, SVM-RBF, and SVM-Poly) were applied and evaluated for fall risk classifications.The performance of each model was compared accordingly based on the evaluation metrics (i.e., accuracy, sensitivity, specificity, and AUC).The results show that the overall accuracies for all ML models were greater than 0.7788, and the XGB and SVM-Poly had the highest accuracies (0.8519) among all models.Regarding the AUC metric, the LR performed best with the highest AUCs for training (0.9483) and testing (0.9204).The results of this study demonstrate that an ML approach can be applied to identify fall risks with a wearable sensor.The performance in classifying fall risk levels by the proposed models (accuracy: 0.7778~0.8519;sensitivity: 0.7556~0.8667;specificity: 0.7778~0.8889;and AUC: 0.7673~0.9204)outperformed that of a previous study on older adults (accuracy: 0.67~0.70;sensitivity: 0.43~0.53;specificity: 0.77~0.84;and AUC: 0.71~0.72)[13].The evaluation of eight ML models serves the purpose of providing researchers with a comprehensive understanding of the performance and characteristics of various approaches to fall risk classification.By assessing multiple models, researchers gain insight into the strengths, weaknesses, and suitability of each method for their specific datasets and requirements.This approach empowers researchers to make informed decisions about which models to explore further and potentially adopt for their own fall risk classification tasks.This study also contributed to building strong ML classification models with advanced techniques like SMOTE and tuning the best hyperparameters using novel frameworks for ML-based fall risk classification.
In addition, the study exhibited the best feature for predicting the risk of falls.The LASSO selected the best features, as shown in Table 5.We found that the initial step-event relevant features (i.e., aLHM, sLHS, sAHS, aAHS, and vAHS) and double limb support relevant features (i.e., sAMD and aAMD) were primarily selected.When walking, the body system is translated mechanically, with the center of mass (COM) moving forward and recovering dynamic balance by moving another foot forward to avoid falls [76].Since the initial contact and double limb support features might be associated with recovering balance mechanics, these features are mostly selected for detecting fall risks.Furthermore, the ship's rolling motion may alter the COM motion and reduce dynamic stability during walking [77].In a previous study [50], these gait features successfully predicted COM motion, which can be said to be effective in detecting fall risks and be closely related to dynamic stability.We also found that the mediolateral and anterior-posterior directional features were mainly selected.This is because the rolling motion can affect the dynamic instability by moving the body forward or left and right [38].
There are several limitations to this study.First, the sample size was small.Machine learning can create improved models when provided with a larger volume of training data.In this study, it was challenging to confirm that the number of samples was sufficient, because the system uses a total of 270 data samples split into 189 training and 81 testing samples.However, we tackled this problem through SMOTE oversampling methods, feature engineering, and regularization techniques using LASSO, as well as a 10-fold CV to mitigate this limitation.Second, as the participants are not seafarers, the dataset may not fully capture the walking characteristics of seafarers.While some crew members have much experience, there may be trainees or novice sailors.Additionally, in the case of passenger ships, there are more passengers on board than crew members.Thus, our proposed model can be used to assess fall risks among inexperienced seafarers or ensure the safety of on-board passengers.Third, while the actual movement of a ship at sea encompasses six degrees of freedom, including rolling, and pitching, the experiment in this study solely focused on the ship's rolling motion, which could impact the selected features.In addition, although the vessel can have a rolling motion of more than 20 degrees in rough seas, only 20 degrees of rolling was tested in our experiment, because the CAREN system supports only up to 20 degrees.This study focused on roll motions, as they are the primary motion of a ship.Finally, environmental risk factors, such as weather that can affect a ship's movement (e.g., wind, wave, swell, rain, or snow); the type of footwear worn by the seafarers; and the friction force of the hull (e.g., deck floor material or wet floor), were not considered.Since this study was conducted through simulations, those external risk factors could not be incorporated.Considering that such a scenario could pose a risk of injury to the subjects, we conducted the experiment with the utmost priority given to the safety of the participants.To address these limitations in future studies, it is essential to develop a robust fall risk evaluation model through experiments conducted on an actual ship and involving more experienced seafarers in the experiments.

Conclusions
The objective of this study was to assess whether a wearable sensor could detect fall risk in simulations of a ship's rolling movement.This study demonstrated that the proposed ML models effectively classified settings with high and low fall risk using ML algorithms.Using LASSO, we also investigated the best feature set for the fall risk classifications.The results determined that the lateral and anterior directional features can affect the identification of fall risks in the ship's rolling conditions.Through this study, we developed a model that reliably detects seafarers' fall risks and opened the possibility of developing an effective monitoring system for preventing the ship's crew or passengers from falls and MOB accidents.Future research needs to consider effectively reducing the computational time for developing a real-time fall prediction or detection system in a natural marine environment.

Figure 1 .
Figure 1.Framework for the fall risk classification model based on machine learning.

Figure 1 .
Figure 1.Framework for the fall risk classification model based on machine learning.

Figure 2 .
Figure 2. Experimental settings: (a) location of the reflective markers and accelerometers; (b) example of the rolling simulations using the CAREN.

Figure 2 .
Figure 2. Experimental settings: (a) location of the reflective markers and accelerometers; (b) example of the rolling simulations using the CAREN.

J 20 Figure 4 .
Figure 4. Example of the best parameter λ tuning process: cross-validation plot for LASSO.

Figure 4 .
Figure 4. Example of the best parameter λ tuning process: cross-validation plot for LASSO.

Figure 7 .
Figure 7. AUC values of the eight classification models: (a) training data; (b) testing data.The LR model performed best in both the training (AUC = 0.9483) and testing (AUC = 0.9204).

Table 1 .
Summary of the participants' demographics.

Table 1 .
Summary of the participants' demographics.

Table 2 .
Distribution of the training and test datasets for classification.

Table 2 .
Distribution of the training and test datasets for classification.

Table 3 .
The results of the independent samples t-test between low and high risks for each variable (*** p < 0.001).

Table 3 .
The results of the independent samples t-test between low and high risks for each variable (*** p < 0.001).

Table 5 .
List of hyperparameters for each model.

Table 6 .
The most selected features by LASSO.The cut-off value for frequency was 50.

Table 6 .
The most selected features by LASSO.The cut-off value for frequency was 50.

Table 7 .
Best hyperparameters for each model.

Table 8 .
Results of the performance in classification for each model.