1. Introduction
Vertical elevators are indispensable vertical transportation devices in modern urban buildings, and their stability, safety, and maintenance quality constitute an essential part of the urban public safety system. Particularly with the rapid development of high-rise and super high-rise buildings, the frequency of elevator operation has increased significantly, imposing higher requirements for real-time and accurate load state perception [
1]. However, existing elevator load monitoring methods are predominantly based on mechanical measurement or single-sensor schemes, which suffer from limitations such as insufficient measurement accuracy, poor environmental adaptability, and high maintenance costs. In particular, after prolonged use, the aging of rubber buffers and the influence of complex external environmental factors lead to a marked decline in the stability and accuracy of traditional weighing systems, making it difficult to effectively address the safety risks caused by missed overload alarms and false empty status detections. Therefore, there is an urgent need to explore more reliable and intelligent load monitoring technologies. To address the above challenges, this study proposes a multimodal elevator weighing method that integrates displacement sensing with visual perception. By employing an embedded intelligent terminal, the system enables the real-time and accurate analysis of the load state, achieving dynamic self-calibration and overload identification. This approach effectively improves the measurement accuracy and operational stability and demonstrates strong application potential in enhancing elevator safety management, energy efficiency, and maintenance performance.
2. Related Works
Elevator load monitoring technology has long relied on single-sensor solutions such as pressure sensors and strain gauges. However, in complex operational environments and under long-term service conditions, these systems face significant challenges, including degraded monitoring accuracy and insufficient real-time performance [
2,
3,
4]. Recent research efforts have primarily focused on three directions: multi-sensor data fusion, intelligent algorithm optimization, and remote monitoring.
In China, several advancements have been made. Hangzhou Ambida Elevator Co., Ltd. developed a load-weighing device based on the mechanical deformation detection of steel wire ropes, which indirectly measures the load through a spring structure and provides overload warnings. Nevertheless, this approach is susceptible to spring fatigue failure and delayed dynamic response [
2]. Shanghai Mitsubishi Elevator Co., Ltd. employs absolute position sensors to estimate the car load by measuring the vertical displacement, simplifying the installation but requiring precise calibration of multiple elastic coefficients [
3]. Shandong Fuji Control Electric Co., Ltd. integrates pressure and displacement sensors to monitor the rope-head load. However, its averaging mechanism across multiple sensors may compromise the system’s robustness [
4]. Toshiba Elevator (China) Co., Ltd. introduced a CAN bus-based hierarchical load monitoring scheme that improves communication efficiency but suffers from increased integration complexity, which limits the dynamic response speed [
5]. Schindler China Elevator Co., Ltd. utilizes a central pressure sensor to directly measure the load in belt-type elevators, though the use of a single sensing unit makes the system vulnerable to load eccentricity and reduced stability [
6]. Haomen Electronic Technology (Xiamen) Co., Ltd. proposed a novel integration of dynamic weighing units with vision-based detection, but the structural complexity increases the maintenance cost and environmental sensitivity [
7].
In other countries, research has focused on enhancing the accuracy and system intelligence. The Otis Elevator Company deployed a distributed pressure sensor array to achieve high-precision load measurements, but the multi-sensor layout increases the system complexity [
8]. KONE established a sensor network to support predictive maintenance, though its reliance on algorithms and vulnerability to environmental interference limit real-time performance [
9]. Thyssenkrupp Elevator developed a compact indirect measurement system based on rope tension sensing, which is sensitive to rope wear conditions [
10]. Mitsubishi Electric integrated pressure sensors with optical detection to improve the load distribution identification; however, the limited environmental adaptability of optical units remains a major implementation bottleneck [
11]. Hitachi introduced accelerometers to enhance the dynamic load response, though vibration interference poses a significant source of measurement error [
12].
In recent years, multi-sensor fusion technology has demonstrated significant potential in elevator monitoring, opening up new research directions for intelligent solutions. Guo et al. [
13] proposed a real-time elevator fault monitoring system that combines vibration and displacement data, enabling cloud-based data access. However, the self-calibration and dynamic error compensation issues of load weighing have not yet been addressed. Kullu and Cinar [
14] developed a deep learning method that fuses vibration and current data, significantly improving the accuracy of industrial equipment fault detection. However, this method is not optimized for elevator load monitoring scenarios. These studies have demonstrated the effectiveness of multimodal fusion under complex operating conditions but remain limited in addressing accuracy drift under dynamic loads. Khatir et al. [
15] systematically reviewed the applications of machine learning and deep learning in structural health monitoring, highlighting their advantages in processing complex data and achieving real-time monitoring, providing theoretical support for the design of multimodal fusion monitoring systems. Garcia-Perez et al. [
16] explored the performance of edge computing devices in embedded AI tasks, demonstrating their efficiency in resource-constrained environments. This is highly consistent with the design goal of this study, which aims to achieve low-power real-time monitoring based on the Rockchip RK3568 platform. Dilmi et al. [
17] compared the performance of YOLOv5 and YOLOv8 on embedded platforms. The results showed that YOLOv8 demonstrated a better balance between accuracy and real-time performance in embedded environments, providing strong support for the selection of visual detection models in this study.
Overall, the current research suffers from three critical limitations: (1) single-sensor schemes are unable to effectively resist the accuracy drift caused by environmental disturbances and mechanical aging; (2) multi-sensor systems often sacrifice reliability due to their structural complexity; (3) real-time error compensation mechanisms under dynamic load scenarios remain underdeveloped. These gaps provide opportunities for innovations that integrate multi-source sensing with online self-calibration techniques. It is noteworthy that the use of car-base buffer rubber is a common practice in the elevator industry. However, for the vast existing and incremental equipment, no effective self-calibration method has been reported. The self-calibration approach proposed in this paper is an industry first, demonstrating significant advancements, with its effectiveness fully validated in
Section 3.
4. Experimental Evaluation and Application Validation
4.1. Hardware Deployment and Testing Platform
As illustrated in
Figure 3, this section presents the application test deployment of the elevator car weighing system and its integrated human–machine interface.
In
Figure 3a,b, the draw-wire displacement sensor is vertically mounted at the central rigid support point of the car base with bolts. The sensor’s axis is adjusted to align vertically with the car’s movement trajectory, thereby ensuring the accuracy of the displacement data acquisition.
Figure 3c shows the vision perception module fixed to the car ceiling at a 30° downward angle. It connects to a PoE switch via a shielded CAT6 cable, establishing a Local Area Network (LAN) with the embedded development board. The same-subnet design minimizes the communication latency, enabling reliable real-time video transmission. The embedded development board acquires data from the draw-wire displacement sensor. An opto-isolated relay implements control functions with galvanic isolation between low-voltage and high-voltage circuits. As shown in
Figure 3d,e, the core modules are integrated within a structural enclosure, preventing direct contact with the high-voltage components and enhancing the operational safety.
Figure 3g–j demonstrate the intelligent module developed on the embedded Qt platform, which supports remote access and Over-The-Air (OTA) updates. The graphical interface integrates functionalities for real-time monitoring, system status feedback, data interaction, and historical records, dynamically displaying key parameters including the passenger count, car load, and calibration status. Maintenance personnel can perform cross-platform remote monitoring and program upgrades via multiple terminals (
Figure 3f), significantly improving the quality, efficiency, and safety of intelligent operation and maintenance.
4.2. Load Estimation Model Initialization
First, the absolute height value returned by the draw-wire sensor when the car is empty is recorded as the baseline value (
) for that particular round of data collection. Subsequently, standard 20 kg weights are incrementally added to the empty car until reaching the full load of 1000 kg. After each load step stabilizes for 3 s, the absolute height value (
) is recorded. This process collects 51 data points sequentially. Following this, all weights are removed, the baseline is reset, and this procedure is repeated to collect 5 rounds of raw data, resulting in a total of 255 raw data pairs. For each round of raw data, the relative compression (
) corresponding to each load level is calculated relative to that round’s baseline value
using Equation (
3). Subsequently, the arithmetic mean of the
for the same load level across different rounds is computed to reduce the noise, forming a set of 51 calibration nodes:
where
represents the corresponding compressive displacement (mm), which strictly monotonically increases from 0 mm to 3.08 mm;
denotes the load magnitude (kg).
As indicated by these calibration nodes, the load exhibits an approximately linear relationship with the relative compression (as shown in
Figure 4). Considering the limited computational resources of the embedded controller, this study employs piecewise linear interpolation to establish the load estimation function. For any measured
, the corresponding load
F can be estimated using Equation (
4).
4.3. Dataset Construction
To address the challenge of passenger detection in complex elevator car environments, characterized by illumination variations, viewpoint changes, dense target occlusion, and overlap, this study employed a fixed ceiling-mounted camera (1920 × 1080 resolution @ 30 fps). Diverse images were captured under varied lighting conditions and with different passenger groups, thereby constructing an evaluation dataset comprising 4568 images. As illustrated in
Figure 5, this dataset mitigates the critical issue of data scarcity for samples inside vertical elevator cars, representing a significant contribution to the research domain of elevator intelligence.
The operational condition distribution of the dataset was meticulously designed to reflect the diversity of real-world elevator scenarios: 50% of the images (2284) capture multi-passenger load scenarios, 40% (1827) correspond to empty cabin states, and the remaining 10% (457) encompass complex conditions, including dense occlusion, rapid passenger entry/exit, dynamic scenes during door opening/closing, and interference from specular reflections and adverse lighting. This distribution ensures robust model performance and generalization across diverse operational contexts.
To ensure annotation accuracy and consistency, all images were manually annotated using the LabelMe tool, with cross validation by multiple annotators to minimize errors. The dataset was subsequently partitioned into a training set (3654 images) and a validation set (914 images) in an 8:2 ratio and converted into the COCO format, facilitating the training and performance evaluation of the YOLOv8 object detection model.
To tackle noise and data quality challenges in complex environments, systematic preprocessing and augmentation were applied to the raw images prior to training. The preprocessing steps included resizing images to 640 × 640 pixels to align with YOLOv8’s default input size, converting the color space from BGR to RGB, and normalizing the pixel values to the [0,1] range by dividing by 255 to meet standardized model input requirements. Data augmentation adopted YOLOv8’s default strategies, including random horizontal flipping, brightness and saturation jittering in HSV color space, and Mosaic multi-image stitching to simulate dense target scenarios, thereby enhancing the model robustness against viewpoint changes and occlusions while mitigating overfitting risks. Furthermore, YOLOv8’s training pipeline automatically incorporates affine transformations (e.g., random scaling, translation, and rotation) and additional color perturbations, bolstering model generalization without requiring further configuration.
4.4. Effectiveness Evaluation of the Visual Module
To evaluate the accuracy of the visual module in identifying the empty state and counting passengers within the elevator car during actual operation, this study designed experiments specifically for empty status detection and passenger counting. The detailed test procedure is as follows: During routine elevator operation, video data of the car interior were randomly captured, resulting in a total of 1500 representative experimental sample frames. Among these samples, 804 frames depict the empty state, while the remainder represent loaded states with the number of passengers ranging from one to five persons. All collected samples were subsequently input into the visual module deployed on the LubanCat 2 edge computing platform, which is equipped with a Rockchip RK3568 processor featuring a quad-core ARM Cortex-A55 CPU, an integrated 0.8 TOPS NPU, 4 GB LPDDR4 memory, and support for USB 3.0, GPIO sensor expansion, and Mini-PCIe interfaces. The system automatically outputs, for each frame, the empty status detection result and the passenger count value. Using manual annotation as the reference standard, the system’s determinations were compared against the ground truth. The number of correct identifications was then separately tallied for the empty state and for passenger counts. The evaluation metrics are described as follows:
Empty State Recognition Accuracy
where
is the number of frames in which the empty state was correctly identified, and
is the total number of empty sample frames.
Passenger Counting Accuracy
where
is the number of frames with correctly counted passengers, and
N is the total number of sample frames.
Based on experimental validation under diverse real-world conditions, the visual module achieves a high recognition accuracy of 98.9% in detecting the empty state of the elevator car (as illustrated in
Table 1). This result indicates that the probability of misclassifying an empty cabin is extremely low in practical applications, thereby ensuring the reliability of the self-calibration trigger condition. For the passenger counting task inside the elevator cabin, the visual module attains an accuracy of 93.5%. Although this is slightly lower than the empty-state recognition accuracy, it remains at a high level considering the complex and dynamic environments. This reflects the robustness and engineering viability of the module. Benchmark tests show that the vision model exhibits strong real-time performance on the LubanCat 2 platform. For a 640 × 640 resolution input image, the average inference time is 45 ms per frame, with end-to-end processing (including preprocessing and postprocessing) achieving 22 FPS. Under varying load conditions, the inference latency ranges from 30–60 ms, with power consumption remaining stable at 6–8 W during extended operation. Compared to the CPU-only mode (latency exceeding 200 ms), the INT8 quantized model on the NPU achieves approximately 4–5 times acceleration. This performance validates the model’s efficiency on embedded hardware and ensures its seamless integration into the proposed multimodal fusion and self-calibration framework, meeting the real-time requirements for elevator safety monitoring.
In addition, approximately 5.2% of the total samples exhibited misclassification. Further analysis reveals that these misidentifications are caused by three categories of challenging scenarios (as illustrated in
Figure 6):
Ghosting effects caused by reflections in the car’s mirrors.
Target missed detection due to severe occlusion by overcrowded passengers.
Extraneous pedestrians outside the elevator door being captured into the frame during door openings.
These specific misclassification scenarios account for a substantial proportion of the total errors and represent the primary factors limiting further improvements in the recognition accuracy. It is worth noting that the self-calibration function is only activated when the elevator is stationary, empty, and the doors are closed. Therefore, the misclassification types shown in
Figure 6 do not have any substantive impact on the core functionality of the proposed system.
4.5. Effectiveness Evaluation of the Load Estimation Module
To assess the load estimation accuracy and stability of the system under varying operating conditions, this study designed simulation experiments for two typical scenarios. The process of passengers entering the elevator car was simulated by quantitatively adding and removing weights, thereby creating environments for “Stable empty loading experiments” and “Unstable empty loading experiments” to evaluate the system’s weight detection error. The specific descriptions are as follows.
4.5.1. Stable Empty Loading Experiment
This experiment simulates the typical condition where passengers enter after the elevator has remained empty and stationary for an extended period. It tests the system’s load estimation accuracy under the condition that the car-base rubber has fully rebounded, and the deformation response is stable. The specific test procedure is as follows: Confirm the elevator is in an empty and stationary state, and allow the system to remain idle for more than 10 min. Record the current encoder value of the sensor as the baseline value for empty state () for the current test group. Load standard weights equivalent to kg into the car within 5 s. Wait until the system’s estimated output value stabilizes, then record the estimated load (). Unload the weights and wait for the sensor data to return to . Sequentially increase the to 50 kg, 100 kg, 150 kg, …, 1000 kg. For each load level, record both the true load value () and the model’s estimated value (). Repeat the above procedure to obtain eight sets of data.
This scenario can be used to verify the accuracy baseline of the model under relatively stable operating conditions. The piecewise linear interpolation model was employed for load estimation. The Mean Absolute Error (Mean
) and Maximum Absolute Error (Max
) metrics were used to evaluate the experimental results:
As shown in
Figure 7, the experimental data indicate that in the low-load range (0–400 kg), the estimated values from all test groups closely match the actual load values, demonstrating that the deformation behavior of the car-base rubber within this range is approximately linear and that the load estimation model achieves a high degree of fitting accuracy. In contrast, in the high-load range (600–1000 kg), deviation in the estimated values is observed, which is primarily caused by the nonlinear physical deformation characteristics of the rubber material, as well as by compounded environmental disturbances.
As summarized in
Table 2, the Mean
across all test groups falls within the range of 9.11 to 16.9 kg, while the Max
ranges from 30.3 to 47.45 kg, predominantly occurring in the high-load interval of 800–1000 kg. It is worth emphasizing that even under conditions where estimation errors are more pronounced, the maximum observed error remains below the typical single-passenger weight standard defined in the elevator industry. This result confirms the strong engineering applicability and safety redundancy of the proposed load estimation model.
4.5.2. Unstable Empty Loading Experiment
This experiment simulates scenarios where passengers re-enter the elevator immediately after short unloading periods. Under such conditions, the car-base rubber lacks sufficient rebound time, causing hysteresis in the displacement sensor response that may introduce estimation deviations. The specific test procedure is as follows: Confirm the elevator is in an empty stationary state, and record the draw-wire sensor reading as the initial height (). Load masses kg into the car, and maintain the preload for a dwell time min. Unload completely; then, reload kg within 5 s, and record the system’s estimated output. Repeat the above procedure to obtain 40 sets of data.
This scenario evaluates the impact of elastic hysteresis on the estimation accuracy. The evaluation uses the
and Max
metrics defined in Equation (
15).
As shown in
Figure 8, the experimental data reveal that the larger the preload mass and the longer the preload duration, the more pronounced the deviations in the estimated values across different test groups. The AE for each group is presented in
Figure 9, where the maximum error reaches 46.1 kg under the extreme condition (
kg,
min).
The results indicate that when the preload mass is small and the preload duration is short, the load estimation exhibits higher accuracy and remains consistently stable. Under more extreme conditions—specifically large preload masses combined with extended durations—the estimation error increases slightly. This phenomenon reflects the elastic hysteresis effect of rubber materials, wherein the deformation is not fully recovered immediately after unloading, leading to a shift in sensor readings and a subsequent reduction in estimation accuracy during reloading. Nevertheless, the overall magnitude of the estimation errors remains within acceptable industry limits, consistently falling below the standard body weight of a single adult. In summary, this section confirms both the effectiveness and robustness of the load estimation module under dynamic and complex loading fluctuations. Moreover, when applied to overload warning scenarios, the residual estimation error can be further mitigated through appropriate adjustment of the alarm threshold, enabling the system to maintain a relatively accurate and reliable overload alerting performance.
4.6. Effectiveness Evaluation of Self-Calibration Module
To quantitatively assess the impact of the car-base rubber aging on overload warning thresholds and verify the mitigation effect of self-calibration on premature false alarms, an experiment was designed with the following procedure: Load masses kg into the empty and stable car to calibrate the absolute height as the original overload threshold (), then, unload the mass. Compress the car-base rubber under preloads kg for 17 continuous hours per load to induce material fatigue. After complete unloading, reload the car until reaching height ; then, record the actual load as , and calculate the error . Activate the self-calibration protocol to generate a compensated threshold . Reload the car to height , record the calibrated load , and calculate the residual error . The evaluation metrics are described as follows:
Premature Alarm Rate (PAR)
This quantifies severity of premature warnings (higher values indicate worse performance).
Calibrated Residual Alarm Rate (CRAR)
This measures the residual error after calibration.
Calibration Improvement Rate (CIR)
This quantifies the mitigation efficacy (higher values indicate better calibration).
As evidenced by the data in
Table 3, a higher preload
correlates directly with increased car-base rubber aging, resulting in a larger
(e.g., reaching 99 kg at
kg). This trend suggests that sustained high-load conditions significantly accelerate the elastic degradation of the rubber material, causing a substantial drift in the originally set
. This drift increases the likelihood of early triggering of the overload warning mechanism, potentially resulting in frequent false alarms during normal operation. Following self-calibration, these errors decreased significantly by 31 kg, 38 kg, and 59 kg, respectively, demonstrating more pronounced improvement under severe aging conditions.
As shown in
Table 4, the EAR rises significantly with increasing aging severity, peaking at 12.38%. In contrast, the CRAR is successfully constrained below 5% across all cases. Furthermore, the CIR consistently exceeds 50%, reaching a maximum of 75.6%, highlighting the mechanism’s sensitivity and corrective capability even under mild fatigue conditions.
Overall, the experimental results confirm that the self-calibration function provides stable and substantial mitigation of premature overload warnings caused by car-base rubber fatigue. It effectively reduces the risk of overload misjudgments due to material aging, thereby validating the reliability, technical soundness, and engineering applicability of the proposed self-calibration module.
5. Conclusions and Outlook
This study presents an intelligent elevator weighing and warning system based on multimodal sensing, integrating draw-wire displacement sensing with visual perception. The core innovations encompass a load estimation algorithm, vision-based passenger recognition, and a self-calibration mechanism for mitigating overload false alarms. The comprehensive experimental validation demonstrates the following:
98.9% accuracy in identifying empty states;
93.5% precision for passenger counting;
<5% maximum load estimation error relative to the rated capacity;
% residual false alarm rate after self-calibration implementation.
This system achieves the real-time intelligent perception of cabin load status and passenger occupancy, effectively resolving the industry-wide challenge of premature overload warnings caused by car-base rubber aging.
Despite these advancements, certain limitations warrant further investigation:
The visual recognition module exhibits reduced robustness under strong specular reflections and extreme occlusion scenarios, necessitating enhanced generalization capabilities.
The load estimation accuracy experiences minor degradation during rapid passenger ingress/egress cycles, suggesting potential optimization in dynamic modeling.
These findings establish a critical technological pathway for multimodal data fusion and real-time inference on resource-constrained embedded platforms. The proposed methodology significantly elevates elevator operational safety and transport efficiency while providing a practical foundation for intelligent safety maintenance within the vertical transportation industry, thereby propelling the evolution of smart elevator systems.