A Computer Vision-Based Pedestrian Flow Management System for Footbridges and Its Applications

Zhao, Can; Jiang, Yiyang; Wang, Jinfeng

doi:10.3390/infrastructures10090247

Open AccessArticle

A Computer Vision-Based Pedestrian Flow Management System for Footbridges and Its Applications

by

Can Zhao

,

Yiyang Jiang

and

Jinfeng Wang

^*

College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310027, China

^*

Author to whom correspondence should be addressed.

Infrastructures 2025, 10(9), 247; https://doi.org/10.3390/infrastructures10090247

Submission received: 19 June 2025 / Revised: 12 September 2025 / Accepted: 15 September 2025 / Published: 17 September 2025

(This article belongs to the Topic Advances in Intelligent Construction, Operation and Maintenance, 2nd Edition)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Urban footbridges are critical infrastructure increasingly challenged by vibration issues induced by crowd activity. Real-time monitoring of pedestrian dynamics is essential for evaluating structural safety, ensuring pedestrian comfort, and enabling proactive management. This paper proposes a lightweight, fully automated computer vision system for real-time monitoring of crowd dynamics on footbridges. The system integrates object detection, multi-target tracking, and monocular depth estimation to precisely quantify key crowd metrics: pedestrian flow rate, density, and velocity. Experimental validation demonstrated high performance: Flow rate estimation achieved 92.7% accuracy; density estimation yielded a 2.05% average relative error; and velocity estimation showed an 8.7% average relative error. Furthermore, the system demonstrates practical utility by successfully categorizing pedestrian behaviors using velocity data and triggering timely warnings. Crucially, field tests confirmed a minimum error of 5.56% between bridge vibration simulations driven by the system’s captured crowd data and physically measured acceleration data. This high agreement validates the system’s capability to provide reliable inputs for structural assessment. The proposed system establishes a practical technological foundation for intelligent footbridge management, focusing on safety, comfort, and operational efficiency through real-time crowd insights and automated alerts.

Keywords:

footbridges; pedestrian flow management; computer vision; finite element analysis

1. Introduction

Modern footbridges, while embodying aesthetically pleasing slender and lightweight forms, are increasingly susceptible to vibration issues induced by pedestrian activity. Excessive vibrations not only compromise pedestrian comfort [1,2] but can also threaten structural serviceability. The inherent variability of pedestrian loads, influenced by factors such as behavior (e.g., walking, running, jumping) and crowd density [3,4], poses significant challenges for real-time monitoring and prediction using traditional manual inspections or conventional sensors. Particularly during high-density events or instances of vigorous activity like running or jumping, the limitations of these approaches can lead to inadequate safety management and potential incidents [5]. Consequently, real-time monitoring and predictive management of pedestrian flows are paramount for ensuring the safety and serviceability of footbridges during normal operation.

However, significant gaps persist in applying CV specifically to vibration-sensitive footbridge management. While CV has made substantial progress in related areas, existing solutions exhibit limitations for this critical application:

Crowd Monitoring Systems: Most existing CV-based crowd monitoring and early warning systems are primarily designed for large-scale events or scenic area management, exhibiting insufficient accuracy for bridge environments [6,7].
Pedestrian Behavior Analysis: Research on pedestrian behavior analysis using CV has predominantly been applied in autonomous driving [8,9], differing footbridge application scenarios.
Structural Response Extraction: Within footbridge research field, CV has demonstrated substantial progress in structural response extraction [10], including accelerations by Dong et al. [11], displacement influence lines by Martini et al. [12], 3D displacement by Sun et al. [13]. While foundational, this work often addresses symptoms (vibration) rather than the root cause (pedestrian loading dynamics) in real-time.

Generally, current pedestrian monitoring solutions for vibration-sensitive footbridges face persistent challenges including suboptimal operational efficiency, limited real-time processing capabilities, prohibitive implementation costs, and delayed response characteristics. This technological gap necessitates the development of intelligent monitoring solutions that balance cost-effectiveness with operational reliability to ensure structural integrity and vibration comfort compliance.

Therefore, this study aims to bridge this gap by developing a novel, lightweight, and fully automated computer vision framework specifically designed for real-time pedestrian flow management on vibration-sensitive footbridges. The key advancements and novelty of our approach lie in:

Integrated Multi-Algorithm Pipeline: The system integrates an occlusion-optimized YOLOv8 detector, the efficient ByteTrack tracker, and a robust monocular depth estimation module based on bridge geometry—into a unified, real-time system for simultaneous and precise quantification of the three critical pedestrian flow metrics (flow rate, density, speed) directly on the bridge deck.
Real-Time Structural Risk Assessment: Moving beyond mere parameter estimation, the system directly links the extracted crowd dynamics (density, speed) to structural safety assessment. It employs rule-based classification to categorize density levels and locomotion modes (stationary, walking, running), triggering timely visual and data-logged warnings for potential risks like overcrowding or excessive running.
Validation for Structural Input: A core novelty is the demonstrated capability to provide reliable inputs for structural vibration models. Through rigorous field testing, it shows a strong correlation between bridge vibration simulations driven solely by our system’s captured crowd data and physically measured accelerations, validating the system’s output as a trustworthy source for predictive structural assessment.
Practical Implementation Focus: The entire system is designed for practicality, utilizing lightweight models (YOLOv8n) and demonstrating low latency on cost-effective hardware. The deployment only needs standard surveillance cameras with stable internet connectivity, making it suitable for quick and easy deployment and upgrades on operational bridges.

2. Literature Review and Related Works

The management of vibration-sensitive footbridges necessitates the precise monitoring of pedestrian dynamics, as the core quantitative metrics—flow rate, density, and speed [14,15]—directly influence structural dynamic performance. Elevated flow rates or densities can amplify bridge vibrations, while variations in pedestrian velocity directly correlate with structural acceleration responses, making these parameters crucial inputs for predictive vibration models within safety management frameworks [16]. This section reviews the state-of-the-art in pedestrian monitoring, computer vision (CV) techniques for object detection and tracking, and their applications in infrastructure management, thereby identifying the research gap this study aims to address.

The classical hydrodynamic relation among pedestrian flow parameters (flow rate = density × speed) is well-established but can lead to inaccuracies and data loss under high-density traffic conditions [17,18]. Traditional monitoring methods, such as manual inspections or conventional sensors, struggle with the inherent variability of pedestrian loads and often exhibit suboptimal efficiency, limited real-time capabilities, and delayed response characteristics [5]. These limitations pose significant challenges for ensuring the safety and serviceability of footbridges, particularly during events inducing vigorous activities like running or jumping.

Computer vision technology has emerged as a transformative solution, offering advantages in operational cost-effectiveness, continuous real-time processing, and minimized human intervention [19]. Consequently, it has been increasingly adopted for footbridge monitoring [20], enabling automated detection of pedestrian dynamics and crowd flow regulation through video analytics. Significant progress has been made in algorithms for complex scenarios, with deep learning becoming the dominant paradigm in Multi-Object Tracking (MOT) [21]. Current deep learning-based MOT methods are primarily categorized into Detection-Based Tracking (DBT) and the emerging Joint Detection and Tracking (JDT) paradigm.

The Detection-Based Tracking (DBT) framework, or Tracking-by-Detection, remains prevalent. It involves sequential stages: object detectors (e.g., CNN or R-CNN families) identify targets in each frame [22], followed by feature extraction, similarity computation, and data association across frames. Early DBT methods like SORT [23] combined Kalman filters with the Hungarian algorithm for high efficiency but suffered from identity switches during occlusions. DeepSORT [24] improved upon this by incorporating deep appearance descriptors (Re-Identification), significantly enhancing identity preservation at the cost of increased computational overhead. Subsequent works like the Deep Affinity Network (DAN) [25] explored end-to-end learning of appearance similarity for more robust association. Wei Chen et al. [26] systematically analyzed challenges in occlusion and multi-scale pedestrian detection, highlighting the effectiveness of feature extraction methods in the YOLO series. Y. Zhang et al. [27] proposed the BYTE tracking algorithm, which demonstrates superior performance in associating detection boxes in cluttered and occluded environments.

Beyond tracking, CV has been applied to broader crowd analysis challenges. Sohn et al. [28] developed methods for real-time prediction of numerous crowd behavior scenarios in complex environments, addressing the computational inefficiency of traditional simulations. Zhang et al. [29] proposed AMS-Net, an attention-guided multi-scale fusion network for dense crowd counting. The integration of CV into construction and infrastructure management is also evident [30]. For instance, Chou et al. [31] proposed a UAV-based bridge inspection system using an optimized YOLOv7 for automated deterioration detection. Li et al. [32] and Wu et al. [33] developed frameworks combining CV with ontology and semantic reasoning for on-site safety hazard identification and management. These studies demonstrate the established technical foundation and notable advantages of computer vision in intelligent monitoring applications.

However, a significant gap persists in the application of CV specifically for vibration-sensitive footbridge management. Existing solutions often address isolated aspects: general crowd analysis, structural vibration measurement (the symptom), or pedestrian behavior recognition for unrelated applications. There is a pronounced lack of integrated, real-time systems that directly link high-fidelity, real-time pedestrian dynamics data (flow, density, speed) to structural performance assessment for operational footbridges. This critical gap, where the root cause (pedestrian loading) is not seamlessly connected to the structural effect (vibration) in a practical management system, motivates the development of the lightweight, multi-algorithm pipeline proposed in this study.

3. Research Methodology

3.1. Pedestrian Management Objectives

Flow rate, density, and speed are the critical physical parameters in pedestrian flow management.

Pedestrian flow rate is typically defined as the number of individuals passing through a counting line per unit time. This parameter can be calculated by computer vision detection and recognition of bidirectional pedestrian counts within a specified timeframe, which helps the system perform safety management decisions based on real-time flow rate analysis.

Pedestrian density refers to the number of persons per unit area. Excessive density may lead to bridge congestion, stampede risks, and structural safety hazards [34]. Following the German standard EN03 for pedestrian traffic classification, the system categorizes density levels into three tiers as shown in Table 1 to implement risk management and early warnings.

Pedestrian speed serves as the basis for classifying locomotion modes. Different behaviors induce distinct structural responses. In this study, pedestrian behaviors are categorized into three types: stationary, walking, and running. Referencing the Transportation Engineering Manual and considering actual footbridge conditions, speeds < 0.5 m/s are classified as stationary. Speeds of 0.5–2.5 m/s are identified as walking and speeds > 2.5 m/s are defined as running.

3.2. Pedestrian Monitoring System Design

The system architecture of the pedestrian flow analysis platform, as depicted in Figure 1, is structured with three specialized components.

3.2.1. Visual Perception Layer

Cameras continuously capture bridge deck footage, transmitting video streams to lightweight YOLOv8 networks for real-time preprocessing before routing processed metadata to the analytics tier.

YOLO [35] is a deep learning-based object detection architecture, conducts comprehensive image analysis to achieve real-time object detection. The YOLOv8 iteration improves object recognition accuracy and system robustness through its backbone network incorporating Cross Stage Partial (CSP) modules [36], as shown in Figure 2. ByteTrack improves upon conventional SORT and DeepSORT methods by significantly enhancing tracking precision and efficiency [37,38]. Integrated with YOLOv8, ByteTrack receives pedestrian detection boxes and matches corresponding targets across consecutive frames, enabling real-time pedestrian tracking on footbridges.

To improve occlusion robustness in YOLOv8-based detection, this study incorporates the CUHK Occlusion Dataset [39] with cross-domain surveillance data through systematic dataset fusion. The combined dataset, restructured into YOLO-compatible format, comprises diverse occlusion scenarios and is randomly partitioned into training, validation, and test sets at a 7:2:1 ratio. A 300-epoch data-augmented training regimen is implemented.

Pedestrian spatial coordinates are reconstructed through camera calibration and monocular ranging, enabling velocity analysis and abnormal behavior detection (e.g., running, crowding). Assuming the bridge deck is flat and pedestrians always maintain contact with the bridge surface during movement, with the contact point representing the pedestrian’s coordinates on the bridge deck, the impact of the bridge’s pre-camber on the calculation is neglected. Additionally, the fluctuation in the center of gravity caused by walking or running is ignored, simplifying the position estimation. It is assumed that the lowest point of the YOLO detection box aligns with the pedestrian’s foot position, and the pedestrian’s spatial coordinates can be represented by the midpoint of the lowest point of the detection box.

After applying the above basic assumptions, solving the pedestrian’s spatial position is transformed into determining the position of any given point on the bridge deck. Since the position of the surveillance camera is relatively fixed with respect to the bridge deck, and the width of the bridge deck remains consistent, a monocular distance estimation algorithm based on geometric relationships can be constructed as shown in Figure 3.

Zhang’s camera calibration method [40] is employed to derive intrinsic matrix K and extrinsic matrix

[R∣ t]

, where R denotes rotation and t represents translation. Upon this, a monocular ranging algorithm is developed using bridge geometry. By establishing the proportional relationship between physical bridge width D and its pixel width d in images, pedestrian depth z is calculated via Equation (1)

z = \frac{f_{x} \cdot D}{d (v)}

(1)

In this formula, z is the depth information (m), f_x is the focal length of the camera in the x-direction (px), D is the bridge deck width (m), and d(v) is the bridge deck pixel width at the pixel coordinates (u, v) in pixels (px).

The pixel coordinates

(u, v)

combined with depth z are substituted into Equation (2) to derive camera coordinates

{(X}_{c}, Y_{c}, Z_{c})

. These coordinates are then transformed into bridge spatial coordinates

{(X}_{w}, Y_{w}, Z_{w})

via Equation (3):

[\begin{matrix} X_{c} \\ Y_{c} \\ Z_{c} \end{matrix}] = z \cdot K^{- 1} \cdot [\begin{matrix} u \\ v \\ 1 \end{matrix}]

(2)

[\begin{matrix} X_{w} \\ Y_{w} \\ Z_{w} \end{matrix}] = R \cdot [\begin{matrix} X_{c} \\ Y_{c} \\ Z_{c} \end{matrix}] + t

(3)

3.2.2. Data Analytics Layer

This tier ingests preprocessed sensory data through message queues, transforming raw inputs into pedestrian flow metrics (including flow rate, density, and velocity) using multi-algorithm pipelines. A rule-based classification engine applies manually configured threshold parameters to these metrics, categorizing flow states into operational levels.

Pedestrian flow rate estimation is to monitor and manage the crowd flow in real-time by counting the number of pedestrians passing through a given area within a unit of time. In this algorithm, after manually selecting the determination line, the relative position between the pedestrian and the determination line is calculated using the cross product of the pedestrian position vector and the determination line vector. By comparing the change in the pedestrian’s position after crossing the determination line, the counter is dynamically updated, as it shows in Figure 4. The number of pedestrians moving upward or downward can be accurately calculated, allowing for real-time recording of the number of pedestrians crossing the decision line.

Pedestrian density estimation is a key component of pedestrian flow management, playing an important role in assessing congestion, ensuring safety, and other aspects. By combining pedestrian behavior recognition and trajectory prediction, the system can real-time assess the pedestrian density within a given area and establish a risk warning mechanism by integrating pedestrian behavior and density levels to avoid potential safety hazards. Using the pedestrian spatial coordinates determined earlier, the number of pedestrians N within a self-defined detection area is counted. Once the world coordinates are obtained, the area of the polygon A can be calculated using Gauss’s area formula, as shown in Equation (4) below.

A = \frac{1}{2} |\begin{matrix} x_{1} & x_{2} & x_{3} & \dots & x_{n} \\ y_{1} & y_{2} & y_{3} & \dots & y_{n} \end{matrix}|

(4)

where

x_{n}, y_{n}

are the Cartesian coordinates of points.

This calculation method ensures that even if the shape of the monitoring area is complex, the algorithm can still accurately compute its area. Finally, the pedestrian density D_p is computed with Equation (5) as follows:

D_{p} = \frac{N}{A}

(5)

Pedestrian speed is calculated through pedestrian tracking data and spatial coordinate; the system analyzes the changes in pedestrian coordinates between consecutive frames to obtain the displacement vector.

To improve the stability of the calculations, the algorithm ultimately uses a linear Kalman filter to estimate the target’s motion state. The LKF is preferred over nonlinear alternatives due to pedestrians on footbridges typically exhibit near-constant velocity between consecutive frames (Δt = 0.04 s at 25 fps) and computational efficiency which is critical for real-time operation. A projection method is used for velocity constraint handling. In this algorithm, speed can be classified into three levels: “stationary,” “walking,” and “running.” Based on these levels, the system not only displays the real-time speed on the image but also intuitively shows the pedestrian’s current movement state through text prompts (e.g., “walking” or “running”). When the predicted velocity v_k_∣k−1 exceeds the physical limits, constraints are applied, as shown in the following Equation (6).

v_{k}^{c o n} = \{\begin{matrix} v_{m a x} \frac{v_{k | k - 1}}{‖ v_{k | k - 1} ‖} & ‖ v_{k | k - 1} ‖ > v_{m a x} \\ 0 & ‖ v_{k | k - 1} ‖ < v_{m i n} \end{matrix}

(6)

3.2.3. Decision Support Layer

This tier aggregates multi-source inputs from analytics modules and generates composite visualizations through frame-overlay techniques, superimposing real-time metrics (tracking trajectories and warning markers) onto surveillance footage. Storage archives structured alert records with timestamped annotations in relational databases. The integrated decision support interface features interactive dashboard visualization and incident report generation capabilities, enabling rapid emergency response coordination while maintaining continuous status monitoring.

To visually present the pedestrian flow counting results, the algorithm instantly displays the counting line and pedestrian count text on the image frame as shown in Figure 5a. Subsequently, it renders pedestrian density estimation results in Figure 5b. Pedestrian speeds will be marked next to the tracking boxes, showing both the calculated speed results and the locomotion modes as shown in Figure 5c.

3.3. System Field Test Overview

The field test is located on a footbridge across a canal, with a span configuration of (15.9 + 150 + 15.9) meters. The test uses a total of 2 monitoring cameras with one camera and one acceleration sensor installed on each of the main towers on the east and west sides, which costs around 150$ in total. These cameras record video at 1920 × 1080 resolution and 25 fps, with integrated loudspeakers enabling remote verbal intervention during anomalies to maintain deck order, technical details can be seen in Table 2. 10 triaxial accelerometers were pre-installed on the bridge to validate experimental data. The sensors are distributed across 5 measurement locations, with one sensor mounted symmetrically on each side of the bridge at every location. Technical specifications of the accelerometers are provided in Table 3. The sensor layout and camera placement are illustrated in Figure 6.

The cloud-based data processing equipment can be any computer. In this field test, a computer equipped with an RTX-4060Ti (8 GB) image processor, an Intel Core i5-13600KF processor, and 32 GB of RAM is used, running the Windows 11 64-bit operating system. The programming language used is Python 3.11, and the computational platform is CUDA 11.8. The main image processing and data analysis are performed in VS Code 1.88, ensuring efficient real-time monitoring and data processing capabilities.

The system streams surveillance video via the RTMP protocol to enable real-time pedestrian flow monitoring and analysis on the bridge. During initial system deployment, pedestrian density monitoring zones are manually configured. Thereafter, the system operates autonomously, continuously detecting pedestrian movement on the bridge deck while visually presenting flow metrics (rate, velocity, density). Based on these inputs, it evaluates pedestrian flow and behavioral risks, generating corresponding alerts.

4. Results

4.1. Visual Perception Test

4.1.1. YOLOv8n Training Result

The model performance is quantified using four standard object detection metrics.

Precision (P): Proportion of true positives (TP) among all detected objects, measuring detection accuracy.
Recall (R): Proportion of actual pedestrians correctly identified, evaluating coverage capability.
mAP@50: Mean Average Precision at IoU threshold 0.5, assessing localization accuracy under moderate overlap requirements.
mAP@50–95: Average mAP across IoU thresholds 0.5 to 0.95 (step 0.05), evaluating robustness to strict localization criteria.

After 300-epoch training with 18,200 images, the performance of YOLOv8n with detailed training curves shown in Figure 7.

As demonstrated in Table 4, the YOLOv8n model trained on combined datasets achieves enhanced validation performance with precision = 0.911 (+2.6% vs. CUHK), recall = 0.857 (+11.7%), mAP@50 = 0.937 (+10.4%), and mAP@50–95 = 0.643 (+4.9%) compared to the CUHK-only baseline. This multi-dataset training strategy demonstrates great performance gains in occlusion handling, particularly evident in the 10.4% mAP@50 improvement, which directly enhances pedestrian detection reliability in crowded bridge scenarios.

The proposed model was evaluated on the CrowdHuman dataset [41], a standard benchmark for crowded pedestrian detection. As detailed in Table 5, the Combined-T-Model achieved a mAP@50 of 0.626, which outperforming YOLOv8n by 23.0% and surpassing the CUHK-only model by 3.3% while being 36% faster (10.81 it/s vs. 7.94 it/s). These results confirm that the multi-dataset training strategy improves both accuracy and efficiency, providing an optimal balance for real-time bridge monitoring.

4.1.2. Pedestrian Localization Test

Localization test is conducted using a laser rangefinder to measure ground truth distances (Actual Displacement, AcD) between target and the camera. Simultaneously, the system calculated corresponding distances

Z_{x y}

. Forty measurement points are sampled across four critical columns: (1) Eastern walkway left edge, (2) Eastern walkway right edge, (3) Western walkway left edge, and (4) Western walkway right edge.

After excluding four laser measurement outliers (Points 10, 11, 39, 40), 36 valid data points are retained as shown in Figure 8. Subsequent analysis yielded a root mean square error (RMSE) of 1.945 and coefficient of determination (R²) of 0.945 between

Z_{x y}

and the measured AcD. This indicates that the monocular distance estimation algorithm can effectively predict the target distance, although there is still a small error of 5.5%.

Preliminary analysis indicates that measurement errors primarily stem from two linear sources: residual optical-axis miscalibration and height variations introduced by operator-held rangefinder positioning. After applying the least squares method to fit the corrected calculated distance

Z_{x y} c o r

, the RMSE between

Z_{x y} c o r

and the measured AcD is 0.452, and R² is 0.997. This demonstrates the correction model’s efficacy in eliminating systematic bias, achieving high accuracy that meets operational standards.

4.2. Pedestrian Analytics Test

4.2.1. Pedestrian Flow Count Test

The experimental validation comprised three pedestrian flow scenarios: 15-person queue formation, 30-person queue formation, and 30-person random movement. The three experimental scenarios are strategically chosen to:

Cover critical density thresholds defined by EN03 standards: In our 35 m² monitoring area, 15-person formations achieve 0.43 pers/m² aligning with EN03 Level 1 Sparse Traffic (<0.5 pers/m²). 30-person configurations reach 0.86 pers/m², matching EN03 Level 2 Busy Traffic (0.5–1.5 pers/m²) thresholds.
Evaluate occlusion robustness: Queue formations test linear occlusion patterns. Random movement induces complex multi-directional occlusions.
Simulate real-world condition observed on the monitored footbridge.

Each scenario underwent bidirectional crossing trials across the detection line. Algorithm-generated counts are compared against manual ground-truth data, with comparative results detailed in Table 6.

During validation, the algorithm maintained precise pedestrian counts across the detection line, achieving 92.7% overall accuracy against manual ground-truth data. Descending flows (top-to-bottom) demonstrated minimal error (e.g., Experiments 2,4,6), while ascent flows (bottom-to-top) exhibited greater deviation such as Experiments 1,3,5 with the accuracy from 93.3% to 86.7%.

4.2.2. Pedestrian Density Estimation Test

After the tester delineates the target area on the pedestrian bridge, the following actions are taken: 1. Simulate busy traffic, with a pedestrian density exceeding 0.5 persons/m². 2. Simulate crowded traffic, with a pedestrian density exceeding 1.5 persons/m². Experimental results are quantified in Table 7.

Experimental results indicate a 2.05% mean relative error in density estimation. The algorithm demonstrates robust performance across varying density regimes, providing real-time density classification, visual status feedback, and automated warning triggers.

4.2.3. Pedestrian Speed Estimation Test

The pedestrian speed validation protocol comprises the following steps: A test subject traverses a 10 m bridge section at constant walking/running speeds under operator guidance. A timer records movement duration to calculate actual speed

v_{r}

, while simultaneously the system outputs calculated speed

v_{c}

. A total of 30 sets of data are collected as shown in Figure 9.

According to the test results, the absolute error mean of the walking speed

v_{c}

compared to the actual measured speed

v_{r}

is 0.20 m/s, with a relative error mean of 15.8%. During running tests, the absolute error mean is 0.44 m/s, with a relative error mean of 14.1%. These results meet the requirements for practical applications.

4.3. Full System Field Test

During operation, the system exhibits an end-to-end video processing latency of 10–20 ms, satisfying real-time monitoring requirements. Figure 10 shows two typical scenarios in field tests.

4.3.1. Acceleration Sensor Result

Acceleration data acquired from sensors deployed at quarter-span locations on both bridge girders are analyzed during a continuous 24 h monitoring period (09:30–09:30). The maximum acceleration of every 10 min is recorded.

As depicted in Figure 11, two distinct acceleration peaks are observed corresponding to pedestrian activity patterns: evening (19:00–22:00), and early morning (05:00–08:00). The maximum vibration response occurred during the evening peak, with recorded accelerations of 0.0904 m/s² and 0.0817 m/s² around 8:00 p.m., indicating temporal alignment with pedestrian flow intensity variations.

4.3.2. Pedestrian Flow Rate Data

Pedestrian flow data from 9:30 a.m. on one day to 9:30 a.m. the next day is selected for analysis. This time period corresponds to a summer weekend with clear weather. After processing by the pedestrian flow management system, the detailed pedestrian flow results for this period are shown in Figure 12.

4.3.3. Pedestrian Density Data

Analysis of pedestrian density during peak hours (19:00–20:30) focused on a representative 10 m² zone. By monitoring the changes during this period, significant fluctuations in pedestrian density can be observed. Although the pedestrian density is generally low for most of the time, occasional high-density of 0.387 persons/m² occurrences are noted. The pedestrian density variation in the target area during this period is shown in Figure 13.

4.3.4. Pedestrian Speed Data

Further analysis of pedestrian speed data during the 7:00 p.m. to 8:30 p.m. peak period reveals distinct velocity distributions, with statistical metrics detailed in Table 8. It can be observed that most pedestrians have speeds concentrated between 0.42 m/s and 0.99 m/s. A small number of pedestrians have speeds near 0 m/s, while only a few exceed the 2.5 m/s running threshold.

5. Discussion

This section subsequently discusses field test data by incorporating system-collected pedestrian metrics into finite element models for vibration analysis. Comparative evaluation between measured acceleration values and finite-element-derived acceleration data facilitates discussion on the congruence between system-generated pedestrian flow data and physical reality.

5.1. Field Test Discussion

The period from 7:00 p.m. to 7:50 p.m., when the pedestrian flow reached its maximum peak, is selected as a representative value for system accuracy verification. A comparison between manual counting and system counting is conducted to verify the system’s accuracy in estimating pedestrian flow. The comparison is shown in Table 9.

The detection accuracy for upward traffic (94.7%) is slightly lower than that for downward traffic (98.3%). This difference is primarily due to the perspective principle, where objects appear larger when closer to the camera and smaller when farther away. As the target moves upward and it is closer to the camera, this results in greater occlusion and reduced system reaction time, ultimately leading to detection errors. Moreover, this directional discrepancy also stems from camera-facing pedestrian orientation during descent, where facial features enhance target localization accuracy by mitigating occlusion and positional uncertainty errors. The overall bidirectional counting accuracy is 96.3%, indicating that the system accuracy is sufficient for practical application requirements.

Analysis of pedestrian density distribution reveals an inverse relationship between density levels and temporal occurrence frequency. The zero-density state (no pedestrians) exhibits the highest frequency, accounting for 85.79% of observed periods. Frequencies progressively decline with increasing density: a density level of 0.097 occurs at 9.52%, 0.194 at 4.11%, while peak density instances (0.388) register only 0.08% temporal occurrence. This distribution pattern confirms low pedestrian congregation density within the monitored bridge area.

The probability density graph of pedestrian speeds for all pedestrians passing during this period is shown in Figure 14. Excluding the low-speed zone close to 0 m/s, the pedestrian speeds follow a concentrated and regular normal distribution. Specifically, for pedestrians whose behavior meets the running identification criteria with an average speed greater than 2.5 m/s, there are 10 instances, accounting for 0.90% of the total pedestrian traffic during this period.

Correspondingly, the system triggered running warnings 13 times. Notably, manual verification revealed three false running alarms triggered not by pedestrians running, but by individuals cycling rapidly through the monitoring area. The misdetection results are shown in Figure 15. This indicates persistent limitations in the running behavior identification system, particularly in distinguishing between running pedestrians and cyclists during rapid movement. From the perspective of abnormal behavior alerting, both running and cycling constitute elevated-risk activities. Notably, since cycling is prohibited on this pedestrian bridge, such instances should be retained as valid warning events within the alert system. However, cycling occurrences, especially frequent during evening peak hours, warrant system alerts to prompt managerial attention.

Generally, through field test on the real bridge project, it can be concluded that the pedestrian flow management system successfully integrates with the monitoring interface. The system achieves precise pedestrian tracking, enabling traffic counting, movement prediction, and scheduling support. During peaks, it utilizes monocular ranging to detect density/speed, identifies crowding/running anomalies, and issues real-time warnings—effectively preventing congestion and accidents.

5.2. FEA Analysis Discussion

The peak pedestrian density obtained from the measurements is 0.388 persons/m², with the highest probability pedestrian speed being 0.77 m/s, corresponding to a step frequency of approximately 1.39 Hz. Referencing EN03 German Pedestrian Bridge Design Guidelines, this is converted into pedestrian load and input into the Midas finite element model for structural dynamic analysis using the Lanczos method. The finite element model and vertical load application are shown in Figure 16.

Under the vertical pedestrian load on the bridge, the main analysis focuses on the first-order antisymmetric vertical bending mode. After performing finite element analysis, vibration acceleration time history curves are extracted from key sections located at 1/8, 1/4, 3/8, and 1/2 of the full bridge length. The acceleration response results at each interface are shown in Figure 17.

As can be seen in Table 10, the maximum acceleration value occurs at the 1/4 section, with a peak value of 0.0952 m/s². The location of the peak acceleration aligns with the expected outcome, and the overall structural response trend is within a reasonable range. The relative errors between the theoretical and measured data are 17.48% for the first set (0.081 m/s²) and 5.56% for the second set (0.090 m/s²), but still under the comfort limit of 0.15 m/s² for footbridges based on EN03.

The differences in the relative errors between the two datasets can be attributed to the actual positions of pedestrians during the field test. Specifically, the two measurement points are located on opposite sides of the same section, approximately 4 m apart. The actual pedestrian excitation is likely closer to the second measurement point (Data Point 2), which explains why the peak acceleration at this point is higher and closer to the theoretical value. In contrast, the first measurement point (Data Point 1) is farther from the pedestrian’s actual path, resulting in a lower measured acceleration value. This positional deviation between the measurement points and the pedestrian’s position during the excitation likely caused the smaller peak acceleration observed at Data Point 1, contributing to the higher relative error. Other sources of error may include differences in pedestrian load distribution, deviations in the assumed load model, errors caused by the surrounding environment, and so on.

The measured acceleration data and finite element simulation results validate the consistency between pedestrian flow data and acceleration results, indicating that the system can accurately identify pedestrian flow characteristics and has practical significance for further integration with finite element analysis in bridge health monitoring.

6. Conclusions

This study presents a computer vision-based crowd analytics framework demonstrating three principal technical advantages over conventional approaches:

For pedestrian detection, the optimized YOLOv8n framework is deployed to enable real-time pedestrian localization and multi-target tracking capabilities. It achieves superior performance metrics with 0.911 precision, 0.857 recall, and 0.937 mAP@50, outperforming standard implementations by 2.6–10.4% across critical detection parameters. This enhanced architecture generates reliable trajectory data outputs for subsequent crowd flow analytics.

For pedestrian flow metrics, a series of algorithms is designed for flow rate, density, speed estimation and calculation. Pedestrian flow is estimated using vector cross-product-based counting, eliminating duplicate counting results. The test accuracy reaches 92.7%, demonstrating high robustness and accuracy. The pedestrian density algorithm allows for customizable detection areas and combines adaptive region-of-interest configuration with Gaussian kernel density estimation, enabling dynamic threshold-based safety alerts through continuous spatial distribution monitoring, which shows only 2.05% average relative errors of density estimation. Speed estimation integrates Kalman-filtered trajectory smoothing, reducing mean relative velocity error to 8.7% while supporting real-time abnormal behavior detection through velocity profile analytics.

For practical engineering applications, an integrated pedestrian flow management system is constructed and deployed. Finite element analysis verified the system’s identification accuracy for pedestrian flow characteristics, achieving minimum errors of 5.56% in acceleration results. The system is cost-effective, with total deployment costs amounting to approximately $200, including $150 for camera purchase and installation, $50 for network sets. This is significantly more affordable compared to alternative methods like LiDAR which costs more than $2000 normally or traditional sensors which spends around $500 in this case.

Despite the promising results, the system has several limitations that need to be addressed. First, the accuracy of the system may be affected by environmental factors such as lighting conditions and visibility. For example, the system’s performance in adverse weather conditions (e.g., rain or fog) and low-light environments is not fully explored, and these factors could potentially impact detection reliability. The relative error for pedestrian speed estimation, which averaged 8.7%, may increase under such conditions. Additionally, the field test revealed that the system’s accuracy in detecting pedestrian flow is slightly lower for upward traffic (94.7% accuracy) compared to downward traffic (98.3% accuracy), which could be attributed to occlusion effects and camera angles.

Future work should focus on improving the system’s robustness in adverse conditions by enhancing its ability to handle occlusions and low-visibility scenarios. Using multiple cameras linked together for secondary recognition and cross-validation would enhance the system’s adaptability and accuracy in complex environments. Additionally, integrating data from other sensors (e.g., radar or thermal cameras) could provide more comprehensive monitoring capabilities and improve overall system reliability. Another promising direction for future research involves developing an interface to integrate the system with Finite Element Analysis (FEA). Subsequent work should establish a seamless computational framework that automatically converts vision-extracted pedestrian parameters into inputs for finite element models, enabling dynamic load application for structural assessment. This integration would support real-time comparison of current operational data against historical benchmarks, enhancing predictive maintenance capabilities through continuous evaluation of structural response and safety performance. Extending this framework to vehicular bridge monitoring presents a promising avenue. Further research is needed to address the challenge for WIM applications, due to visual methods exhibit inherent limitations for accurate vehicle weight estimation. Future work will include more comprehensive benchmarking on standard datasets.

Overall, the proposed methodology demonstrates enhanced pedestrian safety management for bridge infrastructure through predictive risk assessment capabilities, establishing a technical foundation for crowd-induced vibration monitoring aligned with structural safety thresholds. Field validation tests demonstrated the system’s capability to process pedestrian flow metrics and detect abnormal behaviors with rapid response characteristics, providing operational decision support for bridge safety management.

Author Contributions

Conceptualization, C.Z., J.W.; Methodology, Investigation, Visualization, Writing—original draft preparation: C.Z.; Experimental work, Software: C.Z., Y.J.; Supervision: J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to restrictions imposed by the owner of the bridge.

Acknowledgments

The authors acknowledge the following contributors from the College of Civil Engineering and Architecture, Zhejiang University: Rongqiao Xu’s research group for experimental assistance; Jinfeng Wang’s research team for supporting field testing; and Guannan Wang’s group members for algorithm development support.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

YOLO	You Only Look Once
CV	Computer Vision
RMSE	Root Mean Square Error
ROI	Region Of Interest
FEA	Finite Element Analysis
AcD	Actual Distance
KDE	Kernel Density Estimation

References

Zehan, W.; Chenguang, L.; Yaqian, X.; Peng, C. Analysis of Human-Induced Lateral Vibration and Lock-in Phenomenon on Footbridge. J. Shenyang Univ. Technol. 2024, 46, 462–470. [Google Scholar] [CrossRef]
Schweizer, M.; Fina, M.; Wagner, W.; Kasic, S.; Freitag, S. Uncertain Pedestrian Load Modeling for Structural Vibration Assessment in Footbridge Design. Eng. Struct. 2024, 311, 118070. [Google Scholar] [CrossRef]
Wang, L.; Zhou, Y.; Shi, W. Random Crowd-Induced Vibration in Footbridge and Adaptive Control Using Semi-Active TMD Including Crowd-Structure Interaction. Eng. Struct. 2024, 306, 117839. [Google Scholar] [CrossRef]
Ahmadi, E.; Caprani, C.; Živanović, S.; Heidarpour, A. Assessment of Human-Structure Interaction on a Lively Lightweight GFRP Footbridge. Eng. Struct. 2019, 199, 109687. [Google Scholar] [CrossRef]
Venuti, F.; Bruno, L. Crowd-Structure Interaction in Lively Footbridges Under Synchronous Lateral Excitation: A Literature Review. Phys. Life Rev. 2009, 6, 176–206. [Google Scholar] [CrossRef]
Li, H.; Lo, J.T.Y. A Review on the Use of Top-View Surveillance Videos for Pedestrian Detection, Tracking and Behavior Recognition Across Public Spaces. Accid. Anal. Prev. 2025, 215, 107986. [Google Scholar] [CrossRef]
Viani, F.; Rocca, P.; Oliveri, G.; Trinchero, D.; Massa, A. Localization, Tracking, and Imaging of Targets in Wireless Sensor Networks: An Invited Review. Radio Sci. 2011, 46, RS5002. [Google Scholar] [CrossRef]
Galvão, L.G.; Huda, M.N. Pedestrian and Vehicle Behaviour Prediction in Autonomous Vehicle System—A Review. Expert Syst. Appl. 2024, 238, 121983. [Google Scholar] [CrossRef]
Wang, H.; Jin, L.; He, Y.; Huo, Z.; Wang, G.; Sun, X. Detector–Tracker Integration Framework for Autonomous Vehicles Pedestrian Tracking. Remote Sens. 2023, 15, 2088. [Google Scholar] [CrossRef]
Cheng, Y.; Tian, Z.; Ning, D.; Feng, K.; Li, Z.; Chauhan, S.; Vashishtha, G. Computer Vision-Based Non-Contact Structural Vibration Measurement: Methods, Challenges and Opportunities. Measurement 2025, 243, 116426. [Google Scholar] [CrossRef]
Dong, C.-Z.; Bas, S.; Catbas, F.N. Investigation of Vibration Serviceability of a Footbridge Using Computer Vision-Based Methods. Eng. Struct. 2020, 224, 111224. [Google Scholar] [CrossRef]
Martini, A.; Tronci, E.M.; Feng, M.Q.; Leung, R.Y. A Computer Vision-Based Method for Bridge Model Updating Using Displacement Influence Lines. Eng. Struct. 2022, 259, 114129. [Google Scholar] [CrossRef]
Sun, D.; Gu, D.; Lu, X. Three-Dimensional Structural Displacement Measurement Using Monocular Vision and Deep Learning Based Pose Estimation. Mech. Syst. Signal Process. 2023, 190, 110141. [Google Scholar] [CrossRef]
Seyfried, A.; Steffen, B.; Klingsch, W.; Boltes, M. The Fundamental Diagram of Pedestrian Movement Revisited. J. Stat. Mech. 2005, 2005, P10002. [Google Scholar] [CrossRef]
Karim, R.; Weiguo, S.; Rasa, A.R.; Khan, M.A.; Bilintoh, N.D. Comparative Study of Multidirectional Pedestrian Flows: Insights and Dynamics. Phys. A Stat. Mech. Its Appl. 2024, 652, 130053. [Google Scholar] [CrossRef]
Lee, K.; Sener, I.N. Emerging Data for Pedestrian and Bicycle Monitoring: Sources and Applications. Transp. Res. Interdiscip. Perspect. 2020, 4, 100095. [Google Scholar] [CrossRef]
Vanumu, L.D.; Rao, K.R.; Tiwari, G. Fundamental Diagrams of Pedestrian Flow Characteristics: A Review. Eur. Transp. Res. Rev. 2017, 9, 49. [Google Scholar] [CrossRef]
Ahmed, S.; Hossain, S.; Shaik, M.E.; Shakik, A. Evaluation of Speed Characteristics and Gap Acceptance Behavior of Pedestrians of Asian Countries: A Review. Transp. Res. Interdiscip. Perspect. 2024, 27, 101199. [Google Scholar] [CrossRef]
Wu, C.; Wu, P.; Wang, J.; Jiang, R.; Chen, M.; Wang, X. Critical Review of Data-Driven Decision-Making in Bridge Operation and Maintenance. Struct. Infrastruct. Eng. 2022, 18, 47–70. [Google Scholar] [CrossRef]
Brighenti, F.; Caspani, V.F.; Costa, G.; Giordano, P.F.; Limongelli, M.P.; Zonta, D. Bridge Management Systems: A Review on Current Practice in a Digitizing World. Eng. Struct. 2024, 321, 118971. [Google Scholar] [CrossRef]
Luo, W.; Xing, J.; Milan, A.; Zhang, X.; Liu, W.; Zhao, X.; Kim, T.-K. Multiple Object Tracking: A Literature Review. Artif. Intell. 2021, 293, 103448. [Google Scholar] [CrossRef]
Dai, Y.; Hu, Z.; Zhang, S.; Liu, L. A Survey of Detection-Based Video Multi-Object Tracking. Displays 2022, 75, 102317. [Google Scholar] [CrossRef]
Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple Online and Realtime Tracking. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar] [CrossRef]
Wojke, N.; Bewley, A.; Paulus, D. Simple Online and Realtime Tracking with a Deep Association Metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar] [CrossRef]
Sun, S.; Akhtar, N.; Song, H.; Mian, A.S.; Shah, M. Deep Affinity Network for Multiple Object Tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 104–119. [Google Scholar] [CrossRef]
Chen, W.; Zhu, Y.; Tian, Z.; Zhang, F.; Yao, M. Occlusion and Multi-Scale Pedestrian Detection A Review. Array 2023, 19, 100318. [Google Scholar] [CrossRef]
Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. ByteTrack: Multi-Object Tracking by Associating Every Detection Box. arXiv 2022, arXiv:2110.06864. [Google Scholar] [CrossRef]
Sohn, S.S.; Moon, S.; Zhou, H.; Yoon, S.; Pavlovic, V.; Kapadia, M. Deep Crowd-Flow Prediction in Built Environments. arXiv 2019, arXiv:1910.05810. [Google Scholar] [CrossRef]
Zhang, B.; Wang, N.; Zhao, Z.; Abraham, A.; Liu, H. Crowd Counting Based on Attention-Guided Multi-Scale Fusion Networks. Neurocomputing 2021, 451, 12–24. [Google Scholar] [CrossRef]
Arshad, S.; Akinade, O.; Bello, S.; Bilal, M. Computer Vision and IoT Research Landscape for Health and Safety Management on Construction Sites. J. Build. Eng. 2023, 76, 107049. [Google Scholar] [CrossRef]
Chou, J.-S.; Liu, C.-Y.; Guo, P.-J. Deploying UAV-Based Detection of Bridge Structural Deterioration with Pilgrimage Walk Optimization-Lite for Computer Vision. Case Stud. Constr. Mater. 2024, 21, e04048. [Google Scholar] [CrossRef]
Li, Y.; Wei, H.; Han, Z.; Jiang, N.; Wang, W.; Huang, J. Computer Vision-Based Hazard Identification of Construction Site Using Visual Relationship Detection and Ontology. Buildings 2022, 12, 857. [Google Scholar] [CrossRef]
Wu, H.; Zhong, B.; Li, H.; Love, P.; Pan, X.; Zhao, N. Combining Computer Vision with Semantic Reasoning for On-Site Safety Management in Construction. J. Build. Eng. 2021, 42, 103036. [Google Scholar] [CrossRef]
Liao, C.; Guo, H.; Zhu, K.; Shang, J. Enhancing Emergency Pedestrian Safety Through Flow Rate Design: Bayesian-Nash Equilibrium in Multi-Agent System. Comput. Ind. Eng. 2019, 137, 106058. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. arXiv 2015, arXiv:1506.02640. [Google Scholar] [CrossRef]
Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
Hussein, A.; Raed, M.W.; Al-Shaikhi, A.; Mohandes, M.; Liu, B. Crowd Anomaly Estimation and Detection: A Review. Franklin Open 2024, 8, 100169. [Google Scholar] [CrossRef]
Lydon, D.; Lydon, M.; Taylor, S.; Del Rincon, J.M.; Hester, D.; Brownjohn, J. Development and Field Testing of a Vision-Based Displacement System Using a Low Cost Wireless Action Camera. Mech. Syst. Signal Process. 2019, 121, 343–358. [Google Scholar] [CrossRef]
Ouyang, W.; Wang, X. A Discriminative Deep Model for Pedestrian Detection with Occlusion Handling. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3258–3265. [Google Scholar] [CrossRef]
Zhang, Z. Flexible Camera Calibration by Viewing a Plane from Unknown Orientations. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; pp. 666–673. [Google Scholar] [CrossRef]
Shao, S.; Zhao, Z.; Li, B.; Xiao, T.; Yu, G.; Zhang, X.; Sun, J. CrowdHuman: A Benchmark for Detecting Human in a Crowd. arXiv 2018, arXiv:1805.00123. [Google Scholar] [CrossRef]

Figure 1. The pedestrian management system architecture. Vision Perception (capturing and preprocessing data), Data Analysis (calculating pedestrian flow, density, speed), and Decision Support (providing alarm visualizations and timestamped records for further actions). The colored circles in Vision Perception schematically represent different conceptual layers of the neural network processing pipeline.

Figure 2. YOLOv8 detect model structure. The backbone performs feature extraction through convolutional layers (Conv), while the head handles concatenation (Concat) and Upsample to refine the features.

Figure 3. Monocular Distance Measurement Diagram. (a) shows the key components such as the bridge’s boundary pixel width d, and the coordinates (u, v) of the detected pedestrian within the image. (b) shows the relationship between the bridge’s pixel width (d) and the actual bridge width (D). Focal length (f_x) of the camera and the depth (z) are used to calculate the real-world distance based on the captured image.

Figure 4. Pedestrian counting using the vector cross product. In this setup, the pedestrian’s movement direction is represented by a black vector, while the red vector represents the counting line.

Figure 5. The visualization of pedestrian flow count, density estimations, speed calculation and traffic alarms. (a) depicts a bidirectional counting system with the counting line marked in red. It tracks pedestrians moving in both directions. (b) shows a “crowded” density, leading to a “Crowded Traffic Alarm.” The green box represents the designated region of interest (ROI) for density measurement. (c) shows the “Running Tester” is being monitored by the system with the calculated speed and locomotion modes. The Chinese text in (b) refers to the original timestap from the experimental data and is integral to the raw dataset.

Figure 6. Camera and Sensor Layout Diagram of the Pedestrian Bridge. The labels 1#–8# correspond to the bridge column numbers, from left to right.

Figure 7. Training Results of YOLOv8n Model. Display four performance curves: Precision, Recall, mAP50, and mAP50–95 over 300 epochs.

Figure 8. (a) Pedestrian position test with Laser Distance Meter and results with a scatter plot showing data points and the ideal line (represented by the red dashed line). The calculated distances all slightly above the actual distances, demonstrating the linear errors of the distance measurement system. (b) Corrected pedestrian position test results. The corrected data points are closely aligned with the ideal line, suggesting that the distance measurements have been effectively adjusted for accuracy.

Figure 9. Displays the test results with a scatter plot comparing the measured speed versus the calculated speed. The data points generally follow the ideal line (represented by the dashed red line).

Figure 10. Two scenarios within the pedestrian flow management system, highlighting its early warning capabilities. (a) shows a “Busy in ROI” alarm triggered by the density of pedestrians within ROI. (b) shows a “Fast Running” alarm triggered by pedestrian moving rapidly through with a speed above 2.5 m/s. The Chinese text in figures refers to the original timestap from the experimental data and is integral to the raw dataset.

Figure 11. Measured acceleration of the 1/4 section of footbridge from 9:30 a.m. to 9:30 a.m. the Next Day. The peaks in acceleration appears with values reaching up to 0.0904 m/s² around 8:00 p.m.

Figure 12. Pedestrian Flow Rate From 9:30 a.m. to 9:30 a.m. the Next Day. The flow rates for both “Down Flow” (blue bars) and “Up Flow” (red bars) measured in persons per hour. The maximum increment of 485 persons per hour occurred at 19:01, as indicated by the green dashed line.

Figure 13. Pedestrian Density From 7:00 p.m. to 8:30 p.m. during bridge operational hours. The red dashed line marks the peak density of 0.387 persons/m², represents an observed instantaneous occurrence where 4 individuals are simultaneously present within a representative 10 m² section of the target area. while the green line indicates the average density of 0.019 persons/m².

Figure 14. The Probability Density Graph of Pedestrian Speeds. The green shaded area represents the Kernel Density Estimation (KDE) line, indicates that most pedestrians have a speed around 0.77 m/s, with a peak probability density of 1.02 m/s.

Figure 15. (a,b) exemplify cases where cyclists are misidentified as runners. These three cycling instances resulted in three false alarms.

Figure 16. The FEA model of a bridge, along with the measured pedestrian vertical loading method. The pedestrian load is converted from a uniformly distributed load to node loads, which are then proportionally distributed. Based on the mode shapes, the loads are applied in an anti-symmetric manner to the regions corresponding to the measured pedestrian density. Arrows show the direction and points where the external load is applied.

Figure 17. The vertical vibration acceleration time history curve. The graph shows the acceleration values over time. These cross sections include 1/2, 3/8, 1/8, and 1/4. Each line represents the acceleration experienced at these points across the 50 s period.

Table 1. Pedestrian Density Level Classification.

Traffic Level	Description	Pedestrian Density (Persons/m²)
Level 1	Sparse traffic	d < 0.5
Level 2	Busy traffic	0.5 ≤ d ≤ 1.5
Level 3	Crowded traffic	d > 1.5

Table 2. Technical Specifications of Video Surveillance Camera.

Category	Specification
Sensor Type	1/3″ Progressive Scan CMOS
Resolution	1920 × 1080 (Full HD)
Frame Rate	25 fps
Video Encoding	H.265/H.264
Network Protocols	IPv4/IPv6, HTTP, DNS, NTP, RTP
Audio Input	Built-in microphone
Audio Output	Built-in speaker

Table 3. Technical Specifications of Triaxial Accelerometer.

Category	Specification
Measurement:	XYZ triaxial
Range:	±2 g
Accuracy	±1 mg
Operating Temperature	−20 to +65 °C
Frequency Response:	0–200 Hz
Protection Rating	IP68
Protective Cover	Material Steel
Signal Cable	RS485 digital interface

Table 4. 300th training results of different datasets.

Training Datasets	P	R	mAP50	mAP@50–95
Combined dataset	0.911	0.857	0.937	0.643
CUHK	0.888	0.767	0.849	0.613

Table 5. Performance Comparison of Different Models on the CrowdHuman Validation Dataset.

Training Datasets	P	R	mAP50	mAP@50–95	Speed (it/s)
YOLOv8n	0.652	0.438	0.509	0.251	10.06
YOLOv8s	0.656	0.469	0.528	0.264	7.21
Combined-T-Model	0.735	0.504	0.626	0.287	10.81
CUHK-T-model	0.762	0.490	0.606	0.268	7.94

Table 6. Comparative Algorithmic vs. Manual Pedestrian Counts Across Bidirectional Scenarios.

Number	Group	Direction	Actual Count (Persons)	Estimated Count (Persons)	Relative Accuracy
1	15-person queue	Bottom to Top	15	14	93.3%
2	15-person queue	Top to Bottom	15	15	100.0%
3	30-person queue	Bottom to Top	30	26	86.7%
4	30-person queue	Top to Bottom	30	29	96.7%
5	30-person random	Bottom to Top	30	27	90.0%
6	30-person random	Top to Bottom	30	29	96.7%

Table 7. Pedestrian density test with busy and crowded Scenarios.

Number	Area (m²⁾	Actual Density (Persons/m²)	Calculated Density (Persons/m²)	Relative Error	Congestion Level
1	4	0.75	0.7278	3.0%	Busy
2	4	0.75	0.7536	0.5%	Busy
3	4	1	0.9814	1.9%	Busy
4	4	1	0.9724	2.8%	Busy
5	2	1.5	1.503	0.2%	Crowded
6	2	1.5	1.558	3.9%	Crowded

Table 8. Pedestrian speed data from 7:00 p.m. to 8:30 p.m.

Statistical Factor	Value
Sample Size	180,264
Mean	0.72 m/s
Standard Deviation	0.42 m/s
Maximum Value	3.68 m/s
25% Percentile	0.42 m/s
Median	0.73 m/s
75% Percentile	0.99 m/s

Table 9. Pedestrian flow count result comparison.

Direction	System Count (Persons)	Manual Count (Persons)	Accuracy
Pedestrian Flow (Up)	216	228	94.7%
Pedestrian Flow (Down)	172	175	98.3%
Total Pedestrian Flow	388	403	96.3%

Table 10. Finite Element Simulation vs. Field Measurement Comparison of Peak Acceleration.

Section Location	Data Source	Peak Acceleration (m/s²)
1/2 Section	FEM Simulation	0.002
3/8 Section	FEM Simulation	0.032
1/8 Section	FEM Simulation	0.048
1/4 Section	FEM Simulation	0.095
1/4 Section	Field Measurement 1#	0.081
1/4 Section	Field Measurement 2#	0.090

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, C.; Jiang, Y.; Wang, J. A Computer Vision-Based Pedestrian Flow Management System for Footbridges and Its Applications. Infrastructures 2025, 10, 247. https://doi.org/10.3390/infrastructures10090247

AMA Style

Zhao C, Jiang Y, Wang J. A Computer Vision-Based Pedestrian Flow Management System for Footbridges and Its Applications. Infrastructures. 2025; 10(9):247. https://doi.org/10.3390/infrastructures10090247

Chicago/Turabian Style

Zhao, Can, Yiyang Jiang, and Jinfeng Wang. 2025. "A Computer Vision-Based Pedestrian Flow Management System for Footbridges and Its Applications" Infrastructures 10, no. 9: 247. https://doi.org/10.3390/infrastructures10090247

APA Style

Zhao, C., Jiang, Y., & Wang, J. (2025). A Computer Vision-Based Pedestrian Flow Management System for Footbridges and Its Applications. Infrastructures, 10(9), 247. https://doi.org/10.3390/infrastructures10090247

Article Menu

A Computer Vision-Based Pedestrian Flow Management System for Footbridges and Its Applications

Abstract

1. Introduction

2. Literature Review and Related Works

3. Research Methodology

3.1. Pedestrian Management Objectives

3.2. Pedestrian Monitoring System Design

3.2.1. Visual Perception Layer

3.2.2. Data Analytics Layer

3.2.3. Decision Support Layer

3.3. System Field Test Overview

4. Results

4.1. Visual Perception Test

4.1.1. YOLOv8n Training Result

4.1.2. Pedestrian Localization Test

4.2. Pedestrian Analytics Test

4.2.1. Pedestrian Flow Count Test

4.2.2. Pedestrian Density Estimation Test

4.2.3. Pedestrian Speed Estimation Test

4.3. Full System Field Test

4.3.1. Acceleration Sensor Result

4.3.2. Pedestrian Flow Rate Data

4.3.3. Pedestrian Density Data

4.3.4. Pedestrian Speed Data

5. Discussion

5.1. Field Test Discussion

5.2. FEA Analysis Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI