1. Introduction
Work environments such as construction sites present significant safety challenges related to hazards such as falls, scaffolding accidents, heavy machinery operations, and the handling of hazardous materials [
1]. Traditional safety approaches rely heavily on human observation and intervention, which are often inefficient, time-consuming, and prone to errors. There remains a need for a cohesive, real-time safety alerting system that can operate reliably in dynamic environments while maintaining low latency and high localization accuracy. Various technologies have been explored to ensure workers’ safety within designated safe zones. Wearable devices, drones, Internet of Things (IoT) devices, and GPS technology combined with geofencing have been proposed [
2], which can improve construction site safety by monitoring conditions, providing real-time alerts, and ensuring workers stay in safe zones.
A comprehensive review provided in [
3] underscores the critical need for effective hazard management and accident prevention strategies. Recent advances include the adoption of sensor-based systems, which have gained considerable attention for their potential to enhance the safety of the construction site. Improved safety management has multifaceted impacts, including increased worker productivity, adherence to project timelines, reduced insurance costs, and a general improvement of the company’s reputation [
1]. The integration of multi-method modeling approaches into safety risk management, particularly those incorporating machine learning, has shown notable effectiveness in dynamically improving safety performance [
4]. Recent research emphasizes the importance of implementing Smart Personal Protective Equipment (PPE) and using machine learning to predict the types of injury and develop effective safety controls [
5,
6].
The robustness and adaptability of IoT systems, coupled with advanced machine learning techniques, highlight their potential for versatile applications in different domains, as evidenced by their successful implementation in various industrial settings [
7,
8,
9].
Effective zoning practices play a crucial role in safety management [
1]. Technologies such as machine learning and indoor positioning systems (IPS) play an essential role in increasing safety practices, providing dynamic risk assessment, and enabling real-time monitoring of worker compliance with safety zones [
4]. For example, the use of Gaussian process regression in localization [
10] exemplifies the application of machine learning techniques in this domain.
This paper presents a proof-of-concept system that integrates Bluetooth Low Energy (BLE)-based indoor localization with real-time tracking, dynamic safety zoning, and machine learning-based hazard detection. The system architecture comprises a mobile application for user interaction, a server-side inference engine for real-time data processing, and a web-based dashboard for centralized monitoring and visualization. To assess the system’s performance, we carried out extensive testing in a controlled laboratory environment designed as a preliminary step toward deployment in real-world settings such as construction sites.
1.1. Related Works
Indoor localization techniques that leverage WiFi, Bluetooth, and RFID have been developed to overcome the limitations of global positioning systems (GPS) in indoor environments [
11]. Although GPS is a foundational technology for outdoor localization, its performance decreases significantly indoors due to signal attenuation caused by building materials, resulting in considerable positioning inaccuracies [
12]. A comparative study on fingerprinting and range methods using the Received Signal Strength Indicator (RSSI) reveals the complexities of wireless indoor location [
13]. This research highlights the challenges associated with various approaches, emphasizing the need for advanced techniques to improve accuracy in indoor positioning. WiFi positioning leverages signals from multiple access points that can be further enhanced by machine learning techniques to improve accuracy [
14]. The work presented in [
15] introduced a hybrid indoor positioning system that uses a single WiFi Fine Timing Measurement (FTM) access point. Bluetooth-based positioning, particularly Bluetooth Low Energy (BLE), strikes a balance between accuracy and power efficiency, making it suitable for diverse indoor environments [
16]. Furthermore, RFID technology offers high positioning accuracy through integrated systems [
12].
Combining different localization methods can significantly improve accuracy and reliability. For example, combining trilateration with dead reckoning and employing algorithms such as Kalman filtering can enhance positioning precision [
17]. In [
18], a comprehensive comparison of various indoor positioning technologies in terms of accuracy, implementation cost, and power consumption was made. The findings indicate that BLE, pedestrian dead reckoning (PDR), and WiFi are relatively low-cost options (with BLE being the least expensive), offering accuracy in the range of 1–5 m. In contrast, Ultra-Wideband (UWB) and RFID systems are classified as medium- to high-cost solutions, achieving accuracies below 2 m.
The advancements in Bluetooth Low Energy (BLE) technology, particularly the introduction of Angle of Arrival (AoA) and Angle of Departure (AoD) capabilities in BLE 5.1 and later versions, represent a significant leap forward in localization accuracy. Using in-phase and quadrature (I/Q) samples, these technologies have demonstrated a marked improvement in positioning precision [
19,
20]. Recent studies have investigated the effectiveness of combining AoA and AoD with other localization methods, demonstrating improved tracking accuracy in dynamic environments [
21,
22,
23].
In examining the current landscape of BLE-based localization systems, several studies have demonstrated diverse approaches and their effectiveness in various environments. Carrasco et al. [
24] present a case study focused on identifying the nearest machine to the user rather than determining the user’s exact location. Their setup involved four BLE beacons distributed among eight machines, with localization restricted to specific areas near each machine. Utilizing techniques such as fingerprinting, nearest neighbor, and Bayesian inference, they achieved an average detection accuracy of 92%. Similarly, Zhao et al. [
25] aimed to enhance operator safety through real-time detection of motionless behavior and precise localization within a defined 3 m × 3 m cell. Their approach involved nine anchors within a smaller section of the warehouse (54 m × 12 m). Utilizing trilateration, fingerprinting, and a self-learning genetic algorithm, they achieved localization accuracies of 78%, 91%, and 95%, respectively. Further comparisons can be drawn with Wang et al. [
26], who implemented an RSSI-based indoor localization system across a 1000 m
2 area using fingerprinting techniques. Their approach proved less susceptible to obstacles compared with traditional propagation models and multilateration methods. Using the k-nearest neighbor algorithm, they achieved an average localization error of 1.83 m, with beacons spaced approximately 5 m apart. In a different context, Bloch and Pastell [
27] investigated the localization of dairy cows within a 10 m × 40 m barn using ten anchors. Their study utilized RSSI measurements for both localization and tracking to enhance farm management, resulting in an average accuracy of 3.27 m.
The implementation of advanced localization techniques, particularly those leveraging deep learning and robust filtering methods, shows great promise in enhancing the precision and security of indoor positioning systems. For example, the application of a robust Kalman filter for position estimation under cyber attacks [
28] exemplifies how robust statistical methods can safeguard Automated Guided Vehicles (AGVs) from cyber threats. Similarly, the use of deep convolutional neural networks for indoor positioning demonstrates the potential of deep learning techniques to effectively manage the instability of signal strengths in challenging indoor environments [
28]. These innovations signify a substantial shift toward more secure and reliable localization techniques, which are essential for ensuring safety in environments vulnerable to both physical and cyber disruptions.
The above studies illustrate the varying degrees of success and the distinct challenges encountered in different settings, ranging from industrial workshops to expansive warehouses and agricultural environments. The integration of diverse technologies, such as the Internet of Things (IoT) and wearable devices, with varying communication protocols and data formats, poses significant challenges related to architectural frameworks, security, privacy, and interoperability [
29]. Managing large volumes of heterogeneous data from multiple sources requires robust storage and processing capabilities. Security and privacy concerns require the implementation of stringent protective measures, such as anonymization techniques to protect user identities, encrypted storage, and rigorous access control mechanisms. Using Transport Layer Security (TLS) and Secure Sockets Layer (SSL) protocols can ensure confidentiality and integrity during transmission (e.g., [
30]). Additionally, the growing number of connected devices requires more stringent maintenance and updates to ensure ongoing compatibility and functionality [
31].
1.2. Summary of Contributions and Scope of This Study
In this paper, we investigate and compare the effectiveness of various approaches to indoor localization, focusing on their applicability in contexts such as construction sites. The contributions of this study are as follows:
- 1.
The performance of several machine learning models was evaluated for both localization and hazard detection tasks. Random Forest, RBF-SVM, and Decision Trees demonstrated high accuracy and robust performance, while Neural Networks performed less effectively due to limitations imposed by categorical input constraints.
- 2.
A modular server-side platform was developed using FIND and BLE integration, enabling efficient handling of localization data and zoning-based safety management.
The rest of this paper is organized as follows:
Section 2 discusses the intricacies of indoor localization using fingerprinting, laying the foundation for understanding the system’s core technology.
Section 3 presents the architecture and machine learning algorithms.
Section 4 presents details regarding implementation and testing, highlighting the real-time processing and data collection techniques. Experimental results, including the efficacy of the system, are examined in
Section 5. The paper concludes with
Section 6, summarizing the study’s key findings and prospective future developments and enhancements.
3. Results
3.1. Data Collection and Ground Truth Validation
In this study, we conducted a series of data collection sessions that resulted in a dataset consisting of 562 training data points and 222 test data points. The primary data collection took place in an indoor environment measuring 5 m × 2 m. To evaluate the system’s consistency and robustness, the same setup was replicated in a second indoor location. For testing, we utilized K5P BLE beacons, which are waterproof (IP67-rated) and compact in size (40 mm × 15 mm) (
www.kkmcn.com/long-range-beacon-k5p, accessed on 30 August 2025). Ten beacons were deployed, each positioned 1.2 m apart around the perimeter of the testing area for optimal coverage (see Figure 8).
For precise ground truth data collection, we employed an Android device, which was manually navigated through predefined reference points within the test environment. Each reference point formed a 1.2 m × 1.2 m square grid. The Android device was systematically moved between these reference points, with data being recorded alongside their corresponding coordinates. This method enabled the creation of a dataset that accurately reflected the physical movements within the space. The collected data points were labeled according to their specific reference positions, providing ground truth for validating the system’s localization accuracy. The setup further included supplementary BLE beacons attached to various pieces of equipment within the testing environment. This allowed us to detect the proximity of an individual to specific equipment, helping to verify if a person was wearing or interacting with the equipment, further enhancing the system’s tracking and localization capabilities.
BLE beacons facilitate precise tracking on the construction site. Ten beacons were strategically placed in a 5 m × 2 m room to ensure comprehensive coverage and accurate location triangulation, considering factors like coverage, obstructions, and signal redundancy. Key characteristics of the beacons include waterproofness (IP67) for durability in diverse conditions, compact size (40 mm (diameter) × 15 mm) for unobtrusive placement, long battery life (up to 4 years), compatibility with BLE 5.0, and a range of 300 m. Details of the size of the beacon and its transmission range are illustrated in
Figure 2 and
Figure 3, respectively.
3.2. Android Application
The Android application operates in two distinct modes. In Learning Mode, it collects Received Signal Strength Indicator (RSSI) values at various locations to facilitate offline fingerprinting. This initial phase establishes a reference database of signal strengths associated with different points in the environment. Once the fingerprinting process is complete, the application transitions to Tracking Mode, where it continuously monitors and tracks worker locations in real time, ensuring effective and accurate localization throughout the workspace. Initially, the FIND framework was primarily WiFi-centric. Challenges in development, such as adaptability to Android updates, led to a shift to Bluetooth signals. As we transitioned to BLE, we confronted the challenge of an outdated framework codebase that was not compliant with the latest Android standards. To address this, we updated the application’s code to ensure compatibility with modern Android versions while focusing on BLE signals for localization. Customizations, including RSSI cutoff values and safety notifications, were specifically tailored to meet the requirements of construction sites.
Challenges in development, such as adaptability to Android updates, led to a focus on Bluetooth signals.
The application’s core functionality revolves around Bluetooth fingerprinting. A LinkedHashMap maintains Bluetooth results, averaging new RSSI values or adding new beacons as needed. In the Learning Mode, data is sent to the server with every new RSSI value, while in Tracking Mode, data transmission occurs every four new values, balancing real-time tracking with server load management.
In the following, the system architecture is presented, emphasizing the BLE device integration and specialized Android application.
3.3. Server-Side Components
The server-side architecture is based on an augmented FIND framework, which can be programmed to incorporate key enhancements for construction site applications. It includes a dynamic mapping feature to visualize worker locations on floor plans and robust data management for historical tracking and safety audits.
The chosen framework consists of a website template with functionalities including managing site names, construction sites, workers, and reference point nomenclature. It can process API calls for executing Python code, which is crucial for customizing machine learning algorithms for zoning-based safety management. Integration of floor plans as dynamic maps enables visual representation of workers’ locations in real-time. This feature updates worker positions on the floor plan every 30 s, providing near real-time tracking for enhanced situational awareness.
The key features of website and data management include data logging for comprehensive historical analysis and routine safety audits. The website was designed to handle various functions essential for site management, providing a complete record of workers’ movements.
3.4. Zone Division and Safety Requirements
Zoning within a construction site is a critical safety strategy. Different zones, color-coded for clarity, dictate specific safety protocols and equipment requirements. Zones are categorized as follows: (i) Common Working Zone (Green Zone), which is accessible to all workers with minimal risk; (ii) Controlled Working at Height Zone (Purple Zone), which requires PPE, marked by anchor points for safety gear; (iii) Danger Working at Height Zone (Red Zone), which is high-risk areas, requiring strict safety measures; and (iv) Personal protective equipment (PPE) beacons attached to safety gear to ensure compliance with zone-specific safety requirements.
Figure 4 illustrates the zoning concept within a typical construction site.
3.5. Real-Time Monitoring and Compliance and Alarm System
The system’s real-time monitoring capability assesses compliance with safety protocols based on workers’ locations and associated zone requirements. It links real-time locations of workers to safety zone requirements, triggering alerts for any safety protocol breach. Alerts are delivered to workers via their mobile devices and to supervisors through the website dashboard for immediate action on safety breaches. An example of a visual notification on the Android device, indicating a worker is in a safe zone, is illustrated in
Figure 5. The proposed architecture thus integrates tracking, zoning, and compliance and can therefore enhance the safety and efficiency of operations.
3.6. Machine Learning Algorithms for Hazard Detection
In this research, we have adapted the FIND for our server-side application. This enhancement specifically focuses on managing the real-time data flow from Bluetooth Low Energy (BLE) devices. By integrating advanced machine learning techniques, the server-side framework is now capable of processing incoming data more efficiently and providing precise location predictions. This adaptation not only leverages the strengths of the FIND framework but also tailors it to suit the specific needs of real-time indoor localization in construction site environments.
In the machine learning structure, we utilize two key components: the Naive Bayes method and an AI endpoint. The AI endpoint is a crucial element that operates through an API call. This endpoint houses machine-learning models that process sensor data for location estimation.
However, to ensure system reliability, especially in instances where the API call to the AI endpoint might not respond as expected, we employ the Naive Bayes algorithm as a fallback mechanism. This algorithm, based on Bayes’ theorem [
37], provides a backup method for location prediction, safeguarding against potential interruptions in data processing from the AI endpoint. Such redundancy is vital in maintaining consistent tracking accuracy within the dynamic and unpredictable environment of a construction site.
Extending the traditional Naive Bayes algorithm, the system incorporates several optimizations as follows:
Dynamic Data Handling: Efficiently processes incoming sensor data.
Enhanced Probability Estimation: Uses Gaussian filters for smoothing RSSI values, represented by the following: .
Probability Normalization: Ensures consistent predictions across all locations.
Efficient Data Management: Stores and retrieves learned data for accurate predictions.
An AI Endpoint component is used to handle an array of machine learning algorithms, including k-Nearest Neighbors (k-NN), Support Vector Machines (SVM), Random Forest, and Neural Networks. These algorithms play a pivotal role in processing sensor data and enhancing location prediction accuracy.
3.7. k-NN, SVM, Random Forest, and Other Algorithms
Multiple algorithms, including k-NN, SVM, and Random Forest, were utilized for sensor data classification. Each algorithm undergoes hyperparameter tuning for optimal performance. A review of the algorithms is presented in the following.
The k-NN algorithm uses the Euclidean distance metric, defined as
The SVM hyperplane decision function is
Random Forests use the Gini impurity, given by
A rigorous hyperparameter tuning process was implemented using RandomizedSearchCV with StratifiedKFold cross-validation. This approach optimizes each algorithm’s performance based on the dataset’s characteristics.
The system includes several enhancements to improve prediction accuracy as follows. A sliding window is used to fill missing RSSI values, ensuring data consistency. An expanded hyperparameters dictionary allows fine-tuning of various classifiers. An efficient hyperparameter search was implemented utilizing RandomizedSearchCV for efficient hyperparameter sampling. A comprehensive logging mechanism was implemented to facilitate monitoring and debugging. Finally, confusion matrices were utilized for a detailed evaluation of model performance.
The consensus prediction mechanism is a critical aspect of the system, enhancing the reliability and accuracy of the location predictions. This mechanism harmonizes the outputs of diverse machine learning algorithms, with each algorithm contributing its prediction to inform a unified decision. The process encompasses the following steps: Each algorithm’s prediction is assigned a weight based on its historical performance and current prediction confidence. The consensus prediction is formulated as a weighted average, mathematically expressed as
where
is the consensus prediction,
denotes the weight assigned to each algorithm’s prediction
, and
n represents the number of algorithms.
Weights () for each algorithm are calculated from two factors: the prediction probability provided by the algorithm and an ‘informedness’ score. The ‘informedness’ score is a statistical measure derived from the algorithm’s past accuracy and reliability in similar scenarios. It encapsulates the historical performance, reflecting the algorithm’s efficacy in making accurate location predictions. To ensure a balanced contribution from each algorithm, the weights are normalized. This normalization process adjusts the weights so that their sum equals one, maintaining a consistent probability distribution across all predictions. The weighted predictions from each algorithm are aggregated to form the final consensus prediction. This aggregation process ensures that the final prediction reflects a blend of the most confident and historically reliable predictions from the participating algorithms.
Historical performance metrics provide a data-driven basis for assigning weights and ensuring that algorithms with a track record of accuracy have a greater influence on the final prediction. This approach not only leverages the strengths of each algorithm but also compensates for any individual weaknesses, leading to a more robust and accurate consensus prediction.
The consensus mechanism combines predictions from multiple algorithms, allowing the system to capitalize on the strengths of each, leading to higher overall accuracy. The system redundancy ensures that it remains effective even if one algorithm underperforms under certain conditions. Furthermore, the mechanism can adapt to varying scenarios by dynamically adjusting the weights based on current and historical performance.
4. Implementation and Testing
4.1. Indoor Localization and Fingerprinting
Fingerprinting operates in two phases: offline (training) and online (tracking). The offline phase involves constructing a radio map by collecting RSSI values at predefined reference points (RPs), whereas the online phase involves real-time device location estimation by comparing current RSSI readings with the radio map.
Figure 6 represents the two distinct phases of fingerprinting, enhancing the clarity of the process.
4.2. Data Collection and Storage
Critical to real-time worker localization is the systematic collection and storage of RSSI values from BLE beacons. Variables impacting RSSI readings include distance, obstructions, beacon transmission power, and hardware inconsistencies. The system employs a cutoff value for RSSI readings to minimize noise, with data collected using an Android application. This data is then structured and stored in a database, facilitating future retrievals and queries for location prediction.
4.3. Real-Time Location Estimation Process
The process involves collecting current RSSI values from a worker’s device, analyzing data against the radio map using machine learning techniques, predicting the worker’s location based on the closest match in the radio map, and updating the database with new location predictions or marking as unknown if uncertain. The challenges in real-time implementation include managing speed, data volume, accuracy, server load, and data reliability.
4.4. Building a Reliable Fingerprinting Database
The initial setup of the fingerprinting database involves defining RPs strategically to balance accuracy and practicality and collecting RSSI data at each RP to establish a baseline for the fingerprinting process.
To ensure the system remains accurate and adapts to environmental changes, periodic re-calibration is an integral feature. Users are empowered to add new data for each reference point at their discretion, enhancing the system’s adaptability. Following any data addition, the server automatically updates the weights in the database to reflect these changes. Additionally, users have the option to re-run the AI endpoint as needed, providing a method for re-calibrating the machine learning models and refining the system’s overall accuracy.
Figure 7 visually depicts how RSSI data from beacons is utilized in both the learning and tracking phases of the fingerprinting process.
4.5. Challenges and Mitigation Strategies
The dynamic nature of construction sites and the inherent instability of RSSI values pose significant challenges for indoor localization systems. To address these challenges and ensure the robustness and accuracy of the system, we have implemented several strategic measures as follows:
Machine Learning Robustness: During the training phase, models such as Random Forest and Support Vector Machines (SVM) were selected for their proven efficacy in handling high-dimensional and noisy datasets. Additionally, RSSI filtering and smoothing techniques were implemented, particularly for the Naive Bayes algorithm, to counteract fluctuations caused by environmental factors such as physical obstructions and multi-path interference.
Redundancy and Consensus Mechanisms: The system features a consensus mechanism among various machine learning models to determine the most probable location based on multiple predictions. This approach not only leverages the strengths of each model but also mitigates any individual model’s weaknesses, enhancing the overall reliability and accuracy of the location estimations.
Dynamic RSSI Cutoff Values: To further combat the variability of RSSI data, dynamic RSSI cutoff values were employed. By disregarding RSSI readings below a specified threshold (e.g., −70 dBm), which are likely affected by distance and obstructions, we enhance the data’s fidelity, focusing only on signals strong enough to provide reliable information.
Data Management with LinkedHashMap: A LinkedHashMap was utilized to efficiently manage BLE signals during the online and offline phases. This data structure holds the ten most recent—and likely closest—beacons’ signals. If the LinkedHashMap reaches capacity, the oldest entry is replaced with the newest data. When an existing beacon’s new data arrives, it is averaged with the old data, smoothing out signal strength fluctuations and thus reducing the impact of instantaneous noise.
Sliding Window Technique: A sliding window technique was employed to further mitigate the impact of noisy data during real-time processing. This method involved averaging the RSSI values within a predefined window size as the device moves through the environment, which helps in stabilizing the data input to our models, ensuring consistent and reliable predictions.
Continuous Learning and Adaptation: The system operates in both a learning mode and a tracking mode. In the learning mode, RSSI values were continuously collected and analyzed to update and refine the system’s understanding of the environment. This ongoing adaptation helps the system maintain high accuracy even as the physical layout or environmental conditions of the construction site change.
The above strategies collectively ensure that our system can effectively manage the inherent uncertainties and environmental variability characteristic of construction sites, thereby enhancing the reliability and accuracy of our localization solutions.
4.6. Handling Overfitting
Addressing overfitting is paramount to ensuring the reliability of proposed machine learning models, especially given the complex and high-dimensional nature of construction site data. We have implemented several strategies to mitigate overfitting, ensuring our models perform well not only on training data but also in practical, real-world applications as follows.
Stratified K-Fold Cross-Validation: To enhance model generalization and prevent overfitting, we utilized Stratified K-Fold Cross-Validation in the model training process. This method involved dividing the dataset into five different subsets (or folds) and conducting training and validation iteratively across these folds. Each fold serves as a test set once and as a training set four times, helping to maintain an even class distribution and prevent the model from learning to predict outcomes based solely on distribution biases.
Regularization and Hyperparameter Tuning: Through RandomizedSearchCV, we performed hyperparameter tuning to find the optimal settings for our models. For instance, in the SVM models, the regularization parameters ‘C’ and ‘gamma’ were adjusted to find a balance between minimizing training errors and avoiding high generalization errors. Similarly, for tree-based models such as Random Forest and Gradient Boosting, parameters like ‘max_depth’ and ‘min_samples_split’ were tuned to control model complexity and prevent overfitting.
Early Stopping in Neural Networks: We incorporated early stopping mechanisms in the neural network configurations, specifically using the MLPClassifier. This approach halts training as soon as the validation performance begins to deteriorate, preventing the models from overfitting.
Pruning Decision Trees and Random Forests: To avoid overly complex trees, we applied pruning techniques that remove sections of the tree that provide little power in predicting the target variable. This helps in simplifying the model, enhancing its performance on unseen data.
Feature Importance and Model Evaluation: Understanding which features significantly influence the predictions helps reduce model complexity and focus on the most relevant predictors. Regular checks of the confusion matrix and cross-validation results were performed, which were helpful in assessing and refining the models’ performance across various data segments.
The above combined efforts helped to ensure that the utilized models were capable of delivering reliable predictions across the diverse and dynamic environments typically encountered on construction sites.
4.7. Deployment Constraints in Construction Sites
Deploying indoor localization systems within construction sites presents unique challenges that significantly impact performance and feasibility. One primary constraint is the large scale of construction sites, necessitating a dense network of BLE beacons to ensure comprehensive coverage. However, increasing the density of beacons places a significant strain on system resources, including data processing capabilities and network bandwidth. Effectively managing this balance is crucial to maintaining system responsiveness without compromising localization accuracy.
Additionally, the dynamic nature of construction sites, characterized by frequent structural changes, poses further challenges. As new elements are added and existing ones are modified or removed, the environment constantly evolves. This requires not only the frequent repositioning of beacons on newly constructed walls or areas but also the removal of beacons from demolished sections. Each change necessitates an update to the radio map to accurately reflect the new site layout. Maintaining an up-to-date radio map is essential for the system to function accurately, requiring robust computational algorithms and dynamic system parameters that can adapt quickly to changing signal patterns and environmental conditions.
4.8. Testing and Evaluation
The system underwent a comprehensive evaluation, focusing primarily on assessing its ability to correctly identify and match locations against the reference points. Concurrently, careful checks were conducted to ensure the flawless functionality of both the Android application and the server-side components. These tests were done to ensure the system’s readiness for deployment in a real-world construction site environment, ultimately enhancing the safety and efficiency of construction site operations.
5. Results and Discussion
The Real-time Safety Alerting System underwent rigorous testing in an office space mimicking a construction site, equipped with 10 beacons in a 5 m × 2 m area. The placement of beacons in our test setup is depicted in
Figure 8. The testing included both static and dynamic evaluations to determine the system’s location prediction accuracy.
Training and Validation Methodology: To train the proposed machine learning model, we utilized a dataset comprising 784 data points, with 28% of the data allocated for testing. Given the nearly balanced distribution of data points across the various reference points, we employed stratified K-fold cross-validation to ensure a robust evaluation of the model. Specifically, a 5-fold stratified cross-validation approach was used to preserve the proportional representation of each class (i.e., reference point) across all folds, enhancing the reliability of the performance assessment.
This stratified approach was integrated with the RandomizedSearchCV function, which allowed for efficient hyperparameter optimization while maintaining balanced class distribution throughout the evaluation process. By leveraging this technique, we ensured that the model’s performance was assessed in a way that minimized bias and variance, ultimately leading to a more generalizable and accurate model.
The system’s performance was primarily evaluated by its accuracy in real-time location tracking. This involved a thorough examination of the system’s ability to match the estimated location with the actual reference points. The output from the machine learning algorithm provided probability scores for each location prediction. It should be noted that when evaluating machine learning methods for indoor localization, it is crucial to consider a range of other metrics [
38]. In this respect, precision and recall are essential for assessing predictive performance, while Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) provide insights into prediction errors. The Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC) are valuable for evaluating class distinction. Additionally, latency is critical for real-time applications, and both robustness and scalability are important for ensuring adaptability to larger datasets. Together, these metrics form a comprehensive evaluation framework that is vital for effectively applying machine learning techniques in indoor localization.
To evaluate the system’s performance, we conducted a structured test in which a subject walked through predefined reference points while recording the system’s location estimations at each point. Additionally, we collected data on the system’s confidence levels at these points and measured the average response time. During the training phase, distinct algorithms exhibited varying performance levels. Traditional machine learning methods such as Support Vector Machines (SVM), Decision Trees, and ensemble techniques significantly outperformed the Neural Net model. Specifically, the RBF SVM, Decision Tree, and Random Forest algorithms achieved near-perfect scores, underscoring their effectiveness in the categorical classification challenges associated with fingerprinting. In contrast, the Neural Net algorithm recorded a lower performance score of 0.908. This can be attributed to the sensitivity of neural networks to the categorical nature of the data and their need for larger datasets to optimize their complex architectures effectively. Furthermore, the AdaBoost and Gradient Boosting algorithms also performed exceptionally well, demonstrating robust handling of fingerprinting data.
The performance score referenced throughout this paper, particularly regarding the effectiveness of the machine learning algorithms, is defined as the accuracy score. This score, calculated as the ratio of correctly predicted instances to the total number of instances in the dataset, is a standard metric in machine learning evaluations. It provides a straightforward measure of an algorithm’s predictive performance, which is especially relevant in our context of precise indoor localization. This accuracy score is calculated as follows:
The accuracy scores of different algorithms during the training phase were obtained as follows:
Linear SVM: ;
RBF SVM, Decision Tree, Random Forest: .
Neural Net: : This lower performance underscores the challenges neural networks face in categorical classification tasks like fingerprinting, where the precision of the input data and the complexity of the model play significant roles.
AdaBoost: .
Gradient Boosting: .
The overall accuracy in the testing phase was , slightly lower than in the training phase, highlighting the challenges of real-world application versus controlled training environments.
The overall confidence level of the system is a measure of how certain the system is about its location predictions. It was obtained by analyzing the probability scores provided by the machine learning algorithms during the classification process. Each algorithm predicts the likelihood of the device being at a specific location and assigns a probability score to that prediction. The overall confidence level is then calculated as an average of these probability scores. The overall confidence level of the system was approximately , suggesting that the system is generally reliable in its real-time location predictions despite challenges.
The system’s confidence over time, as depicted in
Figure 9, fluctuates based on the subject’s movement and transition between reference points. The red vertical lines in the figure indicate the moments when the subject moved from one reference point to another, during which a drop in confidence level is observed. This drop is expected and illustrates the system’s re-calibration of predictions in response to changing location data.
The system’s response time, characterized as the duration between the physical movement of the phone from one reference point to another and the system’s ability to detect and precisely display this movement, was measured at s. This measure was obtained during testing with a Samsung A13 smartphone. This observation underlines the impact of device capabilities on the overall system’s performance.
The spatial integrity analysis revealed a maximum error of
m, aligning with the distance of the reference points. This indicates that the system, when erring, typically predicts the immediate neighboring reference point rather than a distant one. An illustration of this scenario is provided in
Figure 10, showing a simple example of a reference point that can be mistakenly estimated with adjacent reference points.
The results of the proof-of-concept study demonstrate the system’s performance in accurately tracking fictitious worker locations with minimal delay and high confidence. Critical considerations for real-world deployment include the influence of hardware capabilities on response time and the alignment of spatial accuracy with reference point spacing. The system’s adaptability and reliability in dynamically tracking movement, coupled with its high accuracy rates, position it as a promising solution for enhancing safety in a construction environment. The observed minimal delay in location updates and the system’s ability to adapt to movement suggest strong potential for real-world applications. However, it is important to note that response time and spatial accuracy, while well-aligned with the controlled test environment, may vary in an actual construction site due to differing conditions and hardware capabilities.
In large structures such as construction sites, the density and placement of BLE beacons become critical factors. Increasing beacon density for larger spaces could significantly strain system resources, including data processing capabilities and network bandwidth. Such setups demand robust computational resources to handle the increased volume of data effectively without compromising system responsiveness. Furthermore, the dynamic nature of construction sites introduces additional challenges in maintaining an accurate and up-to-date radio map for indoor localization. Construction sites are characterized by frequent structural changes, with new obstructions appearing and existing ones being removed as work progresses. This constant flux can significantly alter the indoor environment, thereby affecting the propagation of BLE signals. Creating a comprehensive radio map that reflects these changes is not only challenging but also requires frequent updates to ensure the system remains accurate.
Deploying the proposed real-time safety alert system in construction sites involves addressing several key issues, including affordability, environmental challenges, signal accuracy, coverage, and scalability. Integrating IoT devices, wearables, and supporting infrastructure can be costly due to initial setup costs, ongoing operational expenses, and the need for scalable solutions. These costs can be managed through phased implementation, leveraging scalable cloud services, and utilizing open-source options.
Furthermore, construction sites often present challenges such as noise, interference, and physical obstructions, which can impact communication and data collection. Robust communication protocols like Zigbee, LoRaWAN, and LTE-M, along with a mesh network topology, can help mitigate these issues by enabling devices to communicate through multiple pathways and reroute data around obstacles. Advanced signal processing and noise-cancellation techniques enhance data accuracy, while redundancy in sensor placements and edge computing capabilities ensure reliable data collection. Using high-gain antennas and strategically placed repeaters or gateways further improves signal coverage and reduces the impact of obstructions. By employing these strategies, the system can achieve reliable and cost-effective operation in diverse and challenging construction environments.
Collecting and processing large amounts of data for fingerprinting can put significant pressure on system resources. To manage the associated overhead, one can employ data reduction techniques at the edge computing stage and leverage cloud infrastructure, distributed computing, and parallel processing mechanisms to optimize resource utilization. Notable approaches in the literature include a transfer learning (TL)-based framework [
39], which alleviates offline training overhead and enhances system scalability. These strategies collectively improve system performance and reliability during intensive data fingerprinting tasks.
6. Conclusions
The proposed system utilizes the FIND Framework for BLE technology, enhancing an Android app and server-side functionalities to achieve precise indoor localization. Key features include real-time alerting, RSSI filtering, and safety zones. Machine learning algorithms like Radial Basis Function Support Vector Machine (RBF SVM), Decision Tree, and Random Forest demonstrated strong performance, while Neural Networks showed limitations in categorical classification tasks. The system reliably tracks personnel locations with minimal latency, highlighting the importance of hardware capabilities on response time and spatial accuracy.
While the system serves as an initial investigation into real-time safety alerting on construction sites, further refinement is needed for broader implementation. Scalability can be achieved through a modular architecture, scalable communication protocols, and cloud-based infrastructure, facilitating seamless expansion. Future enhancements could incorporate advanced BLE technologies such as Angle of Arrival (AoA) and Angle of Departure (AoD). Research, including studies on dynamic BLE tracking datasets and AoA in challenging conditions, offers insights for improving accuracy in stable environments. Furthermore, a reinforcement learning-based information fusion framework could enhance location precision in dynamic settings. Other extensions include expanding the system’s application to optimize precision, cost-effectiveness, and adaptability, as well as enhancing response times for fast-moving or emergency scenarios and improving hazard detection accuracy through refined machine learning algorithms.
The selected algorithms, including k-NN and SVM, were chosen for their established effectiveness in RSSI-based localization under limited data conditions. They provided reliable baselines for initial evaluation. The relatively poor performance of neural networks is attributed to limited training data and model complexity, which may lead to overfitting in small-scale environments.
This study serves as a proof of concept to demonstrate baseline detection capability in a controlled environment. While current validation is limited in scale and complexity, future work will extend to larger, dynamic, and safety-critical indoor environments. We also plan to adopt a broader set of evaluation metrics, such as those used in [
40], to support comprehensive performance assessment and real-time deployment.