Simulated Fall Detection Using a Semi-Supervised Machine Learning Method

Arcilla, Julius John C.; Palaruan, Ildreen D.; Padilla, Dionis A.

doi:10.3390/engproc2026134082

Open AccessProceeding Paper

Simulated Fall Detection Using a Semi-Supervised Machine Learning Method^†

by

Julius John C. Arcilla

,

Ildreen D. Palaruan

and

Dionis A. Padilla

^*

School of Electrical, Electronics, and Computer Engineering, Mapua University, Manila 1002, Philippines

^*

Author to whom correspondence should be addressed.

^†

Presented at the 7th Eurasia Conference on IoT, Communication and Engineering 2025 (ECICE 2025), Yunlin, Taiwan, 14–16 November 2025.

Eng. Proc. 2026, 134(1), 82; https://doi.org/10.3390/engproc2026134082

Published: 24 April 2026

(This article belongs to the Proceedings of The 7th Eurasia Conference on IoT, Communication and Engineering 2025 (ECICE 2025))

Download

Browse Figures

Versions Notes

Abstract

A multimodal strategy for fall detection within the broader domain of human activity recognition is developed in this study. A fine-tuned Inflated 3D Convolutional Network model, trained in optical flow data derived from video inputs, achieves 92.70% accuracy in classifying fall-related events. Simultaneously, a Convolutional Neural Network–Bidirectional Long Short-Term Memory model incorporating attention mechanisms processes time-series sensor data, contributing to an ensemble performance of 97.87%. The integration of visual and sensor modalities illustrates a promising direction for developing reliable, real-time fall detection systems applicable in healthcare and assisted living environments.

Keywords:

fall detection; human activity recognition; multimodality; optical flow; sensor data

1. Introduction

Falls are a critical public health concern, causing 42,114 deaths in the US in 2020, with 86% of adults aged 65+ [1]. One in three elderly individuals experiences at least one fall annually [2], while 17.7% of older Filipinos report fall-related injuries and reduced grip strength [3]. While traditional fall detection relies on clinical assessments such as the Timed Up and Go (TUG) test and the Berg balance scales [4,5], recent studies have been increasingly employing AI-based systems using wearables (inertial measurement unit (IMU) and accelerometers) or non-wearables (cameras and Kinect) with algorithms such as Support Vector Machine, Artificial Neural Network, and k-Nearest Neighbors [6]. One IMU-based Bidirectional Long Short-Term Memory (B-LSTM) achieved 99.99% accuracy [7].

However, existing systems rarely combine wearable sensors with vision-based approaches, limiting contextual awareness and potentially increasing false alarms. This research addresses this gap by developing a multi-device fall detection prototype that integrates a wearable IMU and camera system, applies transfer learning to adapt an Inflated 3D Convnet (I3D) model for fall detection, and evaluates ensemble accuracy against individual device performance using confusion matrices.

The system leverages a Raspberry Pi paired with a camera module for visual classification, similar to embedded systems used for activity recognition [8]. The I3D architecture processes temporal information from preprocessed video frames, while BiLSTM handles IMU sensor data. This sensor-camera fusion approach improves detection accuracy and reduces false positives through complementary data streams.

2. Methodology

Figure 1 shows the block diagram of the system, where a Raspberry Pi camera is connected to the a ribbon link, while the IMU device is connected directly to the Laptop via Bluetooth. The Raspberry Pi is connected to the Laptop via an Ethernet link.

Figure 2 illustrates the system workflow, which consists of three steps in the fall detection logic. In the first step, input data are collected from the IMU accelerometer and gyroscope, recorded along the x, y, and z axes, and stored in .csv format together with camera recordings stored in .mp4 format. In the second step, data preparation is performed, wherein optical flow features are extracted from the grayscale .mp4 files. Both sensor and video data are then normalized and converted into tensors. In the final step, these tensors are processed by the prediction algorithm, which generates the fall detection output and issues an application notification if a fall event is identified.

2.1. Dataset

The dataset comprises 420 IMU sensor samples and 417 camera video samples collected from simulated fall scenarios. Data was gathered from 2 subjects aged 23–24 years performing activities including falling from the edge of the bed, sliding, falling when changing position, falling while sitting on the bed, and falling while rolling over, as well as daily activities (sitting at the edge, sitting in the center, lying down, rolling over). For binary classification (fall and not fall), 280 sensor samples and 278 camera samples were used, excluding near-fall events. Near-fall samples were excluded to focus on binary classification between fall and non-fall events. Camera recordings were captured at 15 Hz with 3280 × 2464 resolution, resized to 112 × 112 to fit I3D requirements. IMU data were sampled at 20 Hz, recording 6-axis motion data (3-axis accelerometer and 3-axis gyroscope). Video augmentation techniques were applied to increase the camera samples from 278 to 1215 samples.

2.2. Training

2.2.1. Camera Model (I3D)

The I3D model was initialized with weights pretrained on the Kinetics-400 action recognition dataset and fine-tuned on optical flow data. Videos were processed into 32-frame clips at 112 × 112 pixel resolution. The model was trained using the Adam optimizer with a learning rate of 0.00005 for 40 epochs with a batch size of 16. To prevent the model from memorizing the training data too quickly, a dropout rate of 0.7 was applied, and the early convolutional layers were frozen, allowing only the deeper layers to adapt to fall detection. MixUp augmentation, which blends pairs of training samples, was used to improve the model’s ability to generalize to new scenarios. Training utilized mixed-precision computation to reduce memory usage and accelerate processing.

2.2.2. Sensor Model (Convolutional Neural Network-BiLSTM with Attention)

The sensor model processes 200-timestep windows of 6-channel IMU data (3-axis accelerometer and gyroscope). The architecture combines convolutional layers for feature extraction, a 3-layer bidirectional LSTM (hidden size: 192) for temporal pattern recognition, and attention mechanisms to emphasize critical time periods. To address class imbalance, weighted sampling and data augmentation techniques, including scaling, rotation, noise injection, and time warping, were applied. Training used focal loss with label smoothing, AdamW optimizer (learning rate: 0.002, batch size: 64) for 100 epochs with early stopping (patience: 25 epochs). MixUp augmentation was applied with decreasing probability as training progressed.

2.2.3. Ensemble Strategy

The system employs a two-stage verification approach: the sensor model continuously monitors IMU data and triggers an alert when detecting a potential fall. Upon triggering, the camera module activates to record the event, and the I3D model analyzes the footage to verify whether the event is truly a fall. This sequential pipeline reduces computational load by avoiding continuous video processing while maintaining high accuracy through camera-based verification of sensor-triggered events.

2.3. System Component

Figure 3 shows the system component: the Raspberry Pi Camera connects via ribbon cable to the Raspberry Pi 5 (manufactured by Sony UK Technology Centre, Wales, UK), which serves as an Ethernet-linked server to the laptop client. The IMU device streams data via Bluetooth to the laptop, which handles the graphic user interface (GUI) and classification logic.

2.4. Experimental Setup

Figure 4 shows the experiment using the developed system. The Raspberry Pi with camera was mounted on the ceiling directly above the bed, positioned to capture the full bed area. The subject wore the IMU device on their left wrist, while the laptop client was positioned nearby. Figure 5 shows the GUI: the left panel displays the camera preview, while the right panel shows IMU status and tabs for notifications, predictions, and logs.

3. Results and Discussion

Statistical Treatment

Figure 6 shows the normalized confusion matrix for the sensor model; normalization was utilized as it helped even out the imbalanced class data. The sensor model achieved 83.86% accuracy, 84.75% specificity, 94.77% precision, 83.59% recall, and 88.80% F1 score, demonstrating high reliability against false alarms with a trade-off in overall accuracy. The metrics were calculated using the following equations.

Accuracy = (TP + TN)/(TP + FP + TN + FN)

(1)

Precision = TP/(TP + FP)

(2)

Recall = TP/(TP + FN)

(3)

Specificity = TN/(TN + FP)

(4)

F1 Score = 2·(Precision·Recall)/(Precision + Recall)

(5)

Equation (1): Accuracy = (TP + TN)/(TP + FP + TN + FN)
- - Accuracy measures the overall correctness of the model by calculating the proportion of correct predictions (both true positives and true negatives) out of all predictions made.
Equation (2): Precision = TP/(TP + FP)
- - Precision measures the proportion of positive predictions that were actually correct.
Equation (3): Recall = TP/(TP+FN)
- - Recall (also called sensitivity or true positive rate) measures the proportion of actual positive cases that were correctly identified by the model.
Equation (4): Specificity = TN/(TN+FP)
- - Specificity measures the proportion of actual negative cases that were correctly identified as negative.
Equation (5): F1 Score = 2·(Precision·Recall)/(Precision+Recall)
- - F1 Score is the harmonic mean of precision and recall, providing a single metric that balances both measures.
Terms:
- TP (True Positive): Cases correctly predicted as positive.
- TN (True Negative): Cases correctly predicted as negative.
- FP (False Positive): Cases incorrectly predicted as positive (Type I error).
- FN (False Negative): Cases incorrectly predicted as negative (Type II error).

Figure 7 shows the normalized confusion matrix for the camera model. The camera model achieved 92.70% accuracy, 90.91% specificity, 95.65% precision, 93.62% recall, and 94.62% F1 score, demonstrating high reliability against false alarms with a trade-off in overall accuracy.

Figure 8 shows the confusion matrix for the ensemble performance of the system, achieving an overall accuracy of97.87%, a specificity of 97.06%, a precision of 97.87%, a recall of 99.02% and an F1 score of 98.44%. These scores show that the ensemble performances were able to complement each other and boost performance significantly.

Table 1 shows the summary of the performance of each model and the ensemble performance.

4. Conclusions

We developed a multimodal ensemble system achieving 97.87% accuracy, significantly outperforming the individual sensor (83.86%) and camera (92.75%) models and minimizing false alarm trade-offs. However, the limited dataset (280 sensor samples, 1215 video samples including augmentation) and controlled environment necessitate validation with larger, more diverse datasets across varied settings. Future work should also miniaturize the wearable sensor for improved user comfort.

Author Contributions

Software Development, J.J.C.A.; Paper Development, J.J.C.A. and I.D.P.; Hardware Development, I.D.P.; Topic Development, D.A.P.; Paper Copyediting and Proofreading, D.A.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study did not require ethical approval.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in https://kaggle.com/datasets/b97b8ec10e91ff13658f56ef3a5d1b8ef4e09c014243e3df2c6cc0b0fae5e452 (accessed on 1 October 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Santos-Lozada, A.R. Trends in Deaths From Falls Among Adults Aged 65 Years or Older in the US, 1999–2020. JAMA 2023, 329, 1605. [Google Scholar] [CrossRef] [PubMed]
Ang, G.; Low, S.; How, C. Approach to falls among the elderly in the community. Singap. Med. J. 2020, 61, 116–121. [Google Scholar] [CrossRef] [PubMed]
Mgabhi, P.S.; Chen, T.-Y.; Cruz, G.; Vu, N.C.; Saito, Y. Falls among community-dwelling older adults in the Philippines and Viet Nam: Results from nationally representative samples. Injury 2024, 55, 111336. [Google Scholar] [CrossRef] [PubMed]
Gafner, S.C.; Allet, L.; Hilfiker, R.; Bastiaenen, C.H.G. Reliability and Diagnostic Accuracy of Commonly Used Performance Tests Relative to Fall History in Older Persons: A Systematic Review. Clin. Interv. Aging 2021, 16, 1591–1616. [Google Scholar] [CrossRef] [PubMed]
Appeadu, M.K.; Bordoni, B. Falls and Fall Prevention in Older Adults. In StatPearls [Internet]; StatPearls Publishing: Treasure Island, FL, USA, 2025. Available online: https://www.ncbi.nlm.nih.gov/books/NBK560761/ (accessed on 4 June 2023).
Usmani, S.; Saboor, A.; Haris, M.; Khan, M.A.; Park, H. Latest Research Trends in Fall Detection and Prevention Using Machine Learning: A Systematic Review. Sensors 2021, 21, 5134. [Google Scholar] [CrossRef] [PubMed]
Mubibya, G.S.; Almhana, J.; Liu, Z. Efficient Fall Detection using Bidirectional Long Short-Term Memory. In Proceedings of the 2023 International Wireless Communications and Mobile Computing (IWCMC), Marrakesh, Morocco, 19–23 June 2023; pp. 983–988. [Google Scholar]
Caya, M.V.C.; Yumang, A.N.; Arai, J.V.; Ninofranco, J.D.A.; Yap, K.A.S. Human Activity Recognition Based on Accelerometer Vibrations Using Artificial Neural Network. In Proceedings of the 2019 IEEE 11th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), Laoag, Philippines, 29 November–1 December 2019; pp. 1–5. [Google Scholar]

Figure 1. Block diagram of the system.

Figure 2. System workflow of fall classification using BiLSTM and I3D models.

Figure 3. Hardware Setup.

Figure 4. Experimental setup. (A) Bed. (B) Raspberry Pi Camera. (C) IMU Device. (D) Monitor.

Figure 5. GUI setup. (A) Bed. (B) IMU Device. (C) IMU Status Indiciator. (D) Notification Tab. (E) Prediction Tab. (F) Logs Tab.

Figure 6. Confusion matrix for the sensor model.

Figure 7. Confusion matrix for camera model.

Figure 8. Confusion matrix for ensemble performance.

Table 1. Model performance summary.

Model	Accuracy	Specificity	Precision	Recall	F1 Score
Sensor Model	83.86%	84.75%	94.77%	83.59%	88.80%
Camera Model	92.70%	90.91%	95.65%	93.62%	94.62%
Ensemble Performance	97.87%	97.06%	97.87%	99.02%	98.44%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Arcilla, J.J.C.; Palaruan, I.D.; Padilla, D.A. Simulated Fall Detection Using a Semi-Supervised Machine Learning Method. Eng. Proc. 2026, 134, 82. https://doi.org/10.3390/engproc2026134082

AMA Style

Arcilla JJC, Palaruan ID, Padilla DA. Simulated Fall Detection Using a Semi-Supervised Machine Learning Method. Engineering Proceedings. 2026; 134(1):82. https://doi.org/10.3390/engproc2026134082

Chicago/Turabian Style

Arcilla, Julius John C., Ildreen D. Palaruan, and Dionis A. Padilla. 2026. "Simulated Fall Detection Using a Semi-Supervised Machine Learning Method" Engineering Proceedings 134, no. 1: 82. https://doi.org/10.3390/engproc2026134082

APA Style

Arcilla, J. J. C., Palaruan, I. D., & Padilla, D. A. (2026). Simulated Fall Detection Using a Semi-Supervised Machine Learning Method. Engineering Proceedings, 134(1), 82. https://doi.org/10.3390/engproc2026134082

Article Menu

Simulated Fall Detection Using a Semi-Supervised Machine Learning Method^†

Abstract

1. Introduction