Tracking and Simulating Pedestrian Movements at Intersections Using Unmanned Aerial Vehicles

: For a city to be livable and walkable is the ultimate goal of future cities. However, conﬂicts among pedestrians, vehicles, and cyclists at tra ﬃ c intersections are becoming severe in high-density urban transportation areas, especially in China. Correspondingly, the transit time at intersections is becoming prolonged, and pedestrian safety is becoming endangered. Simulating pedestrian movements at complex tra ﬃ c intersections is necessary to optimize the tra ﬃ c organization. We propose an unmanned aerial vehicle (UAV)-based method for tracking and simulating pedestrian movements at intersections. Speciﬁcally, high-resolution videos acquired by a UAV are used to recognize and position moving targets, including pedestrians, cyclists, and vehicles, using the convolutional neural network. An improved social force-based motion model is proposed, considering the conﬂicts among pedestrians, cyclists, and vehicles. In addition, maximum likelihood estimation is performed to calibrate an improved social force model. UAV videos of intersections in Shenzhen are analyzed to demonstrate the performance of the presented approach. The results demonstrate that the proposed social force-based motion model can e ﬀ ectively simulate the movement of pedestrians and cyclists at road intersections. The presented approach provides an alternative method to track and simulate pedestrian movements, thus beneﬁtting the organization of pedestrian ﬂow and tra ﬃ c signals controlling the intersections. produce ﬁnal FP


Introduction
With rapid economic development, the use of automobiles has greatly increased in developing countries, especially in China, India, and Vietnam, where vehicles are replacing bicycles as the dominant transportation mode [1,2]. Facing this great change, the space allocated to automobiles has been expanded, thus alleviating traffic congestion, which encroaches on the space for cyclists and pedestrians and constrains bicycling and walking. Consequently, potential conflicts of vehicles, cycles, and pedestrians not only exacerbate travel delays but also increase the randomness of pedestrian movements, substantially threatening pedestrian safety. In a recent traffic safety report released by the World Health Organization (WHO), road collisions are the world's leading cause of preventable death; over 1.25 million people die annually on the roads (especially at intersections) of leading pedestrians during an evacuation process, and established a social force evacuation model. Zeng et al. [40,41] analyzed the peculiarities of pedestrian movements at signalized traffic intersections and proposed the control of traffic signals for pedestrian movements on crosswalks. They improved the structure and parameters of the social force model and established a microscopic pedestrian model for traffic intersections. Liu et al. [42] explored various interactions of pedestrians on crosswalks by considering the collision avoidance behaviors of pedestrians moving backward and the follow-up behaviors of leading pedestrians, and established a microscopic model to incorporate the interactions between pedestrians and surrounding pedestrians. These studies provided valuable insights into microscopic pedestrian traffic simulation. However, the impacts of cyclists and vehicles on pedestrians have not been well investigated and integrated into microscopic traffic simulation. Additionally, high-resolution trajectories of pedestrians and other objects, especially cyclists and vehicles, at complex intersections is lacking.
A UAV is an effective tool to monitor geographical contexts with simple deployment and a low cost [43][44][45][46]. This study employs a UAV to automatically identify and track moving objects, including pedestrians, cyclists, and vehicles at complex traffic intersections, and to simulate the pedestrian movements. Especially, we consider interactions between pedestrians and the surrounding environment and simulate pedestrian movements at intersections affected by cyclists and vehicles. The traditional social force model is modified to integrate the boundary effects of zebra crossings and mutual interactions between pedestrians, cyclists, and vehicles. In addition, the model parameters are calibrated based on the maximum likelihood estimation (MLE) method. An experiment in Shenzhen City was conducted to evaluate the performance of the presented approach. The results demonstrate that the presented approach can accurately simulate pedestrian movements at traffic intersections. The improved social force model describes complex interactions within a complex intersection environment and outperforms the traditional social force model.
The main contributions of this study are summarized as follows: (1) This study successfully and accurately extracts high-resolution movements of pedestrians, cyclists, and vehicles at intersections using a UAV and a convolutional neural network. (2) The classic SFM is improved to integrate the interactions among cyclists, vehicles, and pedestrians, and the inherent law of pedestrians is verified and revealed. (3) The MLE method is introduced to calibrate the model parameters and quantify the range and extent of the impacts of surrounding pedestrians, cyclists, right-turning vehicles, and boundaries, which provide a useful reference for subsequent research on further calibration.
The remainder of this article is organized as follows: Section 2 introduces the study area and the presented methodology. Section 3 describes the experiment and analyzes the results. Section 4 concludes the results and outlooks on future research.

Study Area and Methodology
The study was conducted in Shenzhen City, the first special economic zone of China, covering 1996 km 2 . Since the foundation in 1979, Shenzhen has experienced fast urban growth. The population has increased from 0.6 million to 18 million in 2018. Shenzhen has become one of the highest density cities in China. During rush hour, the pedestrian density at some complex traffic intersections may be up to 2-5 persons/m 2 . Therefore, there are highly potential conflicts for pedestrians at these intersections, which highlights the importance of pedestrian monitoring. Here, we present a UAV-based approach to monitor pedestrians, cyclists, and vehicles at complex traffic intersections and simulate pedestrian movements with an improved social force model. The workflow of the presented approach is displayed in Figure 1. First, UAVs are used to capture pedestrians, cyclists, and vehicles. Objects are recognized and localized using state-of-the-art PFPNet. High-resolution trajectories are produced for further microscopic traffic simulation. Considering cyclists and vehicles, an improved social force-based motion model is developed to simulate pedestrian movements at intersections.

Pedestrian, Cyclist, and Vehicle Detection Using a UAV
To obtain high-quality data and reduce noisy information contained in traffic videos, we use a UAV to capture pedestrians, cyclists, and vehicles at road intersections. Compared with a traditional camera installed along the roadside or at intersections [45], the UAV surveillance approach has the following advantages: the hovering location and flying height of the UAV can be conveniently set and changed; the camera scope is substantially greater than that of traditional cameras; and highresolution UAV videos can simultaneously capture richer information about pedestrians, cyclists, and vehicles.
Using high-resolution UAV videos, we extract pedestrian and cyclist trajectories. The PFPNet [30] is used to detect pedestrian and cyclist locations. Compared with current object detection methods, PFPNet constructs a feature pyramid by widening the network instead of increasing the depth, which aims to predict the locations of "hard-to-detect" objects, such as small (e.g., pedestrians are considerably smaller than vehicles), occluded (e.g., pedestrians can be in close proximity to each other), and blurred (e.g., the camera can shake) objects. Therefore, PFPNet is suitable for pedestrian, cyclist, and vehicle detection.
The architecture of PFPNet is illustrated in Figure 2. First, the base network produces a × output feature map with channels. Second, spatial pyramid pooling (SPP) [23] is employed to generate a wide feature pyramid (FP) pool with feature maps of various sizes. An additional feature abstraction strategy is applied to these feature maps in a parallel manner to balance the semantic abstraction levels. The multiscale context aggregation (MSCA) module rescales the feature maps to a uniform size and aggregates their contents to produce different levels of the final FP. Each MSCA module is followed by a prediction subnet, which is used to classify and localize objects, such as pedestrians and cyclists.
The base network is important for object detection and localization. We employ the prevalent VGGNet-16 [47] as the base network. The fully-connected layers are replaced with newly designed convolutional layers with downsampling. The modified VGGNet is pre-trained on the ILSVRC dataset [47]. A set of bottleneck layers [48] are employed in PFPNet for the feature transformation. In the bottleneck layer, a 1 × 1 convolution is used to reduce the channel number to half of the original count. Batch normalization [49] without shift and the rectified linear unit (ReLU) [50] are used for normalization and activation.
The workflow to detect pedestrians, cyclists, and vehicles is displayed in Figure 2. Given a UAV image (a), the modified VGGNet-16 is employed as the base network to generate the input feature map. The high-dimensional FP pool (b) is formed via the SPP module, and the low-dimensional FP

Pedestrian, Cyclist, and Vehicle Detection Using a UAV
To obtain high-quality data and reduce noisy information contained in traffic videos, we use a UAV to capture pedestrians, cyclists, and vehicles at road intersections. Compared with a traditional camera installed along the roadside or at intersections [45], the UAV surveillance approach has the following advantages: the hovering location and flying height of the UAV can be conveniently set and changed; the camera scope is substantially greater than that of traditional cameras; and high-resolution UAV videos can simultaneously capture richer information about pedestrians, cyclists, and vehicles.
Using high-resolution UAV videos, we extract pedestrian and cyclist trajectories. The PFPNet [30] is used to detect pedestrian and cyclist locations. Compared with current object detection methods, PFPNet constructs a feature pyramid by widening the network instead of increasing the depth, which aims to predict the locations of "hard-to-detect" objects, such as small (e.g., pedestrians are considerably smaller than vehicles), occluded (e.g., pedestrians can be in close proximity to each other), and blurred (e.g., the camera can shake) objects. Therefore, PFPNet is suitable for pedestrian, cyclist, and vehicle detection.
The architecture of PFPNet is illustrated in Figure 2. First, the base network produces a W × H output feature map with C channels. Second, spatial pyramid pooling (SPP) [23] is employed to generate a wide feature pyramid (FP) pool with feature maps of various sizes. An additional feature abstraction strategy is applied to these feature maps in a parallel manner to balance the semantic abstraction levels. The multiscale context aggregation (MSCA) module rescales the feature maps to a uniform size and aggregates their contents to produce different levels of the final FP. Each MSCA module is followed by a prediction subnet, which is used to classify and localize objects, such as pedestrians and cyclists. image patches) to train the classifier and produce the diagonal matrix with a discrete Fourier transformation, thus reducing the computational complexity. We select a single target from the PFPNet results, send its bounding box to the KCF, track pedestrian, cyclist, and vehicle movements, and collect its location in the UAV video to generate a two-dimensional point set. When the objects of interest are no longer on the crosswalk, the tracking is finished. These generated trajectories are reported in the final frame for further pedestrian movement simulation.

Pedestrian Movement Modeling
Using acquired trajectories of pedestrians, cyclists, and vehicles, the SFM is improved to simulate pedestrian movements at traffic intersections. Regarding pedestrians as particles satisfying the laws of mechanics, the classic SFM models the movements of a pedestrian as being derived from the combined effects of the self-driving force ( ⃗ ), boundary (B) force ( ⃗ ), and repulsive force from the surrounding pedestrians ( ⃗ ). Given the strong influences of cyclists and vehicles on the movement of pedestrians at intersections, the improved SFM simulates two additional forces: the repulsive force of a cyclist ( ⃗ ) and the disturbing force of a right-turning vehicle on ⃗ .
Hence, the joint force on a pedestrian, ⃗ , is defined as Equation (1).
where is the random fluctuation term of the joint force, which indicates the movement of a pedestrian that accidentally deviates from the normal movement. Figure 3 gives an example of the impacts on pedestrian movements considered. The base network is important for object detection and localization. We employ the prevalent VGGNet-16 [47] as the base network. The fully-connected layers are replaced with newly designed convolutional layers with downsampling. The modified VGGNet is pre-trained on the ILSVRC dataset [47]. A set of bottleneck layers [48] are employed in PFPNet for the feature transformation. In the bottleneck layer, a 1 × 1 convolution is used to reduce the channel number to half of the original count. Batch normalization [49] without shift and the rectified linear unit (ReLU) [50] are used for normalization and activation.
The workflow to detect pedestrians, cyclists, and vehicles is displayed in Figure 2. Given a UAV image (a), the modified VGGNet-16 is employed as the base network to generate the input feature map. The high-dimensional FP pool (b) is formed via the SPP module, and the low-dimensional FP pool (c) is obtained by feature transformation with the bottleneck layer. Using these feature maps, these MSCA modules produce the final FP for multiscale object detection. The FP is fed into the prediction subnets to obtain the detected objects (e). Non-maximum suppression [51] is used to guarantee that each prediction corresponds to a single object.
After detection, we track the movements of pedestrians, cyclists, and vehicles at the crosswalks in videos. With the PFPNet results, the kernelized correlation filter (KCF) [52] is used to track a single object. The objective of a KCF tracker is to teach a classifier to distinguish the objects from their surrounding environment. Unlike other trackers that focus on the objects of interest, the KCF tracker develops circulant matrices to obtain additional environment samples (e.g., locations and scales of image patches) to train the classifier and produce the diagonal matrix with a discrete Fourier transformation, thus reducing the computational complexity. We select a single target from the PFPNet results, send its bounding box to the KCF, track pedestrian, cyclist, and vehicle movements, and collect its location in the UAV video to generate a two-dimensional point set. When the objects of interest are no longer on the crosswalk, the tracking is finished. These generated trajectories are reported in the final frame for further pedestrian movement simulation.

Pedestrian Movement Modeling
Using acquired trajectories of pedestrians, cyclists, and vehicles, the SFM is improved to simulate pedestrian movements at traffic intersections. Regarding pedestrians as particles satisfying the laws of mechanics, the classic SFM models the movements of a pedestrian α as being derived from the combined effects of the self-driving force (  (1). where ξ is the random fluctuation term of the joint force, which indicates the movement of a pedestrian that accidentally deviates from the normal movement. Figure 3 gives an example of the impacts on pedestrian movements considered.
Remote Sens. 2019, 9, x FOR PEER REVIEW 6 of 18 When pedestrians are moving toward their destinations at an expected speed, they are inevitably influenced by their surrounding environments. Therefore, a deviation emerges between the actual velocity and the expected velocity. In this context, following Helbing et al. [38], the selfdriving force ⃗ tends to restore the actual velocity ⃗ to the expected velocity ⃗ . Assuming that the time needed to restore the current speed to the expected speed value is , the self-driving force of a pedestrian can be expressed as Equation (2), where M is the weight of the pedestrian.

Boundary Force
In general, pedestrians always walk within a crosswalk boundary. When an outward object exists at the crosswalk boundary, the boundary will exert a repulsive force to maintain a certain safe distance between the pedestrian and the boundary ( Figure 4a). However, when there are highdensity pedestrians, for example, 2-5 persons/m 2 at some intersections in Shenzhen, pedestrians may walk out of the crosswalk to avoid serious conflict with other pedestrians. Conversely, the boundary force becomes attractive rather than repulsive when a pedestrian steps out of the crosswalk ( Figure  4b) to attract the pedestrian to return to the crosswalk [40]. Therefore, the force exerted by the boundary on pedestrians can be expressed as an exponentially decreasing function of the distance, as follows: where ⃗ is current position of a pedestrian , ⃗ is the closest position at the crosswalk boundary to pedestrian , ⃗ − ⃗ is the Euclidean distance between them, is the boundary force strength, is the boundary force extent, and ⃗ is the unit vector between the pedestrian and the boundary; ⃗ becomes ⃗ with a direction from the boundary B to the pedestrian when the pedestrian is within the boundary. Conversely, ⃗ becomes ⃗ with a direction from pedestrian to boundary B when the pedestrian steps out of the boundary. When pedestrians are moving toward their destinations at an expected speed, they are inevitably influenced by their surrounding environments. Therefore, a deviation emerges between the actual velocity and the expected velocity. In this context, following Helbing et al. [38], the self-driving force → f 0 i tends to restore the actual velocity → v i to the expected velocity → v 0 i . Assuming that the time needed to restore the current speed to the expected speed value is τ i , the self-driving force of a pedestrian can be expressed as Equation (2), where M is the weight of the pedestrian.

Boundary Force
In general, pedestrians always walk within a crosswalk boundary. When an outward object exists at the crosswalk boundary, the boundary will exert a repulsive force to maintain a certain safe distance between the pedestrian and the boundary (Figure 4a). However, when there are high-density pedestrians, for example, 2-5 persons/m 2 at some intersections in Shenzhen, pedestrians may walk out of the crosswalk to avoid serious conflict with other pedestrians. Conversely, the boundary force becomes attractive rather than repulsive when a pedestrian steps out of the crosswalk (Figure 4b) to attract the pedestrian to return to the crosswalk [40]. Therefore, the force exerted by the boundary on pedestrians can be expressed as an exponentially decreasing function of the distance, as follows: where → P α is current position of a pedestrian α,

Repulsive Force Exerted by Other Pedestrians
Pedestrians at traffic intersections tend to repel each other to create a comfortable walking space. The elliptical potential field of human interaction is generally employed to describe the interaction among pedestrians in the traditional SFM, which disregards crowding and bumping. Specifically, following Johansson et al. [38], the repulsive force between pedestrians and can be expressed as follows: where ∇ denotes the gradient operator, ⃗ is the vector from pedestrian to pedestrian , is the potential field. Assuming that elliptical equipotential lines exist in this potential field I, an exponentially decreasing function should exist as Equation (6), depending on the short semi-axis of the ellipse ( ), where is the strength of the repulsive force between two pedestrians and , is the extent of the repulsive force between two pedestrians.
where is the short semi-axis of the elliptical potential field defined as: where ⃗ is the walking velocity of pedestrian , ∆ is the simulation time step. According to the relationship between the potential field and the force, we obtain: Combining ‖ ⃗‖ = ( ⃗) and ∇ ‖ ⃗‖ = ⃗/ ( ⃗) = ⃗ to simplify the operation (7), thus, we obtain:

Repulsive Force Exerted by Other Pedestrians
Pedestrians at traffic intersections tend to repel each other to create a comfortable walking space. The elliptical potential field of human interaction is generally employed to describe the interaction among pedestrians in the traditional SFM, which disregards crowding and bumping. Specifically, following Johansson et al. [38], the repulsive force between pedestrians α and β can be expressed as follows: where ∇ denotes the gradient operator, → d αβ is the vector from pedestrian β to pedestrian α, V αβ (b αβ ) is the potential field.
Assuming that elliptical equipotential lines exist in this potential field I, an exponentially decreasing function should exist as Equation (6), depending on the short semi-axis of the ellipse (b αβ ), where A αβ is the strength of the repulsive force between two pedestrians α and β, B αβ is the extent of the repulsive force between two pedestrians.
where b αβ is the short semi-axis of the elliptical potential field defined as: where → v β is the walking velocity of pedestrian β, ∆t is the simulation time step. According to the relationship between the potential field and the force, we obtain: Remote Sens. 2019, 11, 925 8 of 19 =→ z to simplify the operation (7), thus, we obtain:

Repulsive Force Exerted by Cyclists on Pedestrians
According to China transportation regulations, cyclists should walk in the crosswalk at traffic intersections, following traffic lights. Cyclists have an equal road right to the pedestrian. Consequently, cyclists have an important impact on pedestrian movements at complex traffic intersections. Given the potential conflict between pedestrian movements and cyclists, we improve the classic SFM and assume another potential field II for the repulsive force of a cyclist γ, → f αγ , as follows: where → d αγ is the vector from the cyclist γ to pedestrian α, V αγ (b αγ ) is the potential field following the exponentially decreasing function as follows: Considering the difference in speed between cyclists and pedestrians, the short semi-axis of the elliptical potential field II (b αγ ) is assumed to be: where → d αγ is the vector from cyclist γ to pedestrian α, → v γ is the velocity of the cyclist, and → v α is the walking velocity of the pedestrian.
To verify the superiority of elliptical potential field II over the elliptical potential field I in describing the cyclist impact mode, two scenarios are simulated to compare the two potential fields. Assuming that the cyclist is stationary, → v γ = 0, the short semi-axis b αγ is obtained according to Equation (11) in elliptical potential field II.
According to Equation (6) in the elliptical potential field I, we can obtain: (1) Scenario 1 When the pedestrian α moves in the same or opposite direction as a cyclist γ with a common speed of same direction ( f 2 ), which is consistent with the real-world experience of the pedestrian. In elliptical potential field I, b αγ = → d αγ is a constant value, which indicates equal disturbing forces in both scenarios and no influence of the relative movement direction on the disturbing force exerted by the cyclist.
Remote Sens. 2019, 9, x FOR PEER REVIEW 9 of 18 speed, which is also consistent with real-world experiences. Conversely, the elliptical potential field I is not sensitive to speed, and the repulsive force exerted by a cyclist is not related to the speed of pedestrian and only related to the distance between the cyclist and the pedestrian.

Vehicle Force
According to Chinese traffic law, vehicles in China are allowed to turn right at a traffic signal even if it is red. However, drivers do not always yield to pedestrians. Instead, they tend to take advantage of short pedestrian clearance intervals to pass through intersections; therefore, they exert repulsive forces on pedestrians and force pedestrians to decelerate and avoid the vehicles. Given the different speeds of vehicles and pedestrians, similarly, the force of a turning vehicle on a pedestrian in elliptical potential field II can be expressed as follows: where ⃗ is the vector from the vehicle to pedestrian , ( ) is the potential field following the exponentially decreasing function as below: where is the strength of the repulsive of a vehicle on pedestrian , and is the extent of the vehicle force.
The short semi-axis of the ellipse of ⃗ can be expressed as follows: where ⃗ is the velocity of the vehicle, and ⃗ is the walking velocity of the pedestrian. As shown in Figure 6, right-turning vehicles are assumed to be stationary at different positions. When the pedestrian moves toward the opposite exit at the speed ⃗ , a shorter distance between the pedestrian and the vehicle corresponds to a smaller elliptical short semi-axis of the force (2) Scenario 2 When pedestrian α moves toward cyclists with different speeds of v 1 α and v 2 α (v 1 α < v 2 α ) (Figure 5b), the pedestrian will respond to the larger repulsive force exerted by the cyclist moving at a higher speed, which is also consistent with real-world experiences. Conversely, the elliptical potential field I is not sensitive to speed, and the repulsive force exerted by a cyclist is not related to the speed of pedestrian α and only related to the distance between the cyclist and the pedestrian.

Vehicle Force
According to Chinese traffic law, vehicles in China are allowed to turn right at a traffic signal even if it is red. However, drivers do not always yield to pedestrians. Instead, they tend to take advantage of short pedestrian clearance intervals to pass through intersections; therefore, they exert repulsive forces on pedestrians and force pedestrians to decelerate and avoid the vehicles. Given the different speeds of vehicles and pedestrians, similarly, the force of a turning vehicle ω on a pedestrian α in elliptical potential field II can be expressed as follows: where → d αω is the vector from the vehicle ω to pedestrian α,V αω (b αω ) is the potential field following the exponentially decreasing function as below: where A αω is the strength of the repulsive of a vehicle ω on pedestrian α, and B αω is the extent of the vehicle force. The short semi-axis b αω of the ellipse of → f αω can be expressed as follows: where → v ω is the velocity of the vehicle, and → v α is the walking velocity of the pedestrian. As shown in Figure 6, right-turning vehicles are assumed to be stationary at different positions. When the pedestrian moves toward the opposite exit at the speed → v α , a shorter distance between the pedestrian and the vehicle corresponds to a smaller elliptical short semi-axis b αω of the force potential field and to a larger force experienced by the pedestrian. The repulsive force exerted by the right-turning vehicle is inversely proportional to the distance between the vehicle and the pedestrian.
Remote Sens. 2019, 9, x FOR PEER REVIEW 10 of 18 potential field and to a larger force experienced by the pedestrian. The repulsive force exerted by the right-turning vehicle is inversely proportional to the distance between the vehicle and the pedestrian.

Simulation of Pedestrian Movements at Complex Traffic Intersections
According to the aforementioned influential factors of pedestrian movements, the classic SFM model is improved to generate an acceleration vector of pedestrians to simulate the movements at complex traffic intersections. To evaluate the simulation results, the Verlet algorithm is employed to estimate the pedestrian trajectory. The purpose of the Verlet algorithm is to update the position of a pedestrian ( + ℎ) at time + ℎ using the position ( ) and the acceleration a( ) of the pedestrian at time and the sample interval ℎ. First, we perform the Taylor expansion on ( + ℎ) and ( − ℎ): By adding these two expressions, we can obtain the following positional expression: By differentiating these two expressions, we obtain the speed and acceleration: By substituting b(t) in Equation (22) into Equation (20), we can obtain updated equations for pedestrian speed and position:

Simulation of Pedestrian Movements at Complex Traffic Intersections
According to the aforementioned influential factors of pedestrian movements, the classic SFM model is improved to generate an acceleration vector of pedestrians to simulate the movements at complex traffic intersections. To evaluate the simulation results, the Verlet algorithm is employed to estimate the pedestrian trajectory. The purpose of the Verlet algorithm is to update the position of a pedestrian x(t + ∆h) at time t + ∆h using the position x(t) and the acceleration a(t) of the pedestrian at time t and the sample interval ∆h. First, we perform the Taylor expansion on x(t + ∆h) and x(t − ∆h): By adding these two expressions, we can obtain the following positional expression: By differentiating these two expressions, we obtain the speed and acceleration: By substituting b(t) in Equation (22) into Equation (20), we can obtain updated equations for pedestrian speed and position: where ∆h is a fixed time interval and accelerations a(t) and a(t + ∆h) are calculated by the improved SFM.

Calibration of the Pedestrian-Cyclist Conflict Model
The improved SFM involves a variety of parameters, including the free speed of a pedestrian and the strength and range of forces exerted by the boundary, cyclists, other pedestrians, and vehicles (see Table 1). The MLE method, a widely used parameter calibration method based on statistical principles, is employed, using the extracted trajectories of pedestrians, cyclists, and vehicles.
The position of a pedestrian in the next simulation time step, → P α (t k+1 ), is assumed to be predicted by the model parameter θ. The moving distance from points → P α (t k ) and → P α (t k+1 ) obeys the normal distribution with a mean µ and a standard deviation σ. According to the observed trajectories, the mean µ and a standard deviation σ of the single step distance ∆d α (θ) are estimated. The likelihood function concerning θ is obtained: For simplicity, both sides of Equation (24) are converted by logarithmic functions. Therefore, the value of θ that corresponds to the maximum of L(θ) can be obtained:

Experimental Configuration
To evaluate the performance of the proposed approach, this study used the DJI Inspire 1 Pro to conduct experiments in the high-tech development zone at Nanshan district, Shenzhen. The used DJI Inspire 1 Pro is equipped with a GPS receiver and a built-in inertial measurement unit (IMU), which incorporates both a 6-axis gyroscope and an accelerometer for movement compensation. The camera mounted by the used DJI Inspire 1 Pro is capable of stably recording road traffic at 4K (3840 X 2187) resolution (30fps). The experimental area is located in the center of the Science and Technology Park, which is a key intersectional area in terms of the massive traffic around the commercial and industrial parks. Data were acquired by experienced drone pilots during peak hours (on-and off-duty hours and lunch breaks) when the pedestrian flow is large to ensure sufficient numbers of pedestrians, cyclists, and vehicles. The hovering height of the UAV was set to 50 meters above the ground. Videos of five road intersections were captured and processed. Flying permission was guaranteed by the local transportation administration. Extreme weather events (i.e., rain, winds, etc.) were avoided to ensure the safety.

Pedestrian and Cyclist Detection and Localization
To build the training set, we subsampled the raw UAV videos every 60 frames to generate the image set and manually annotated categories and locations of ground objects, such as pedestrians, cyclists, and vehicles. We trained PFPNet using these annotated images. Then, testing images were fed into the trained detector to produce sets of predicted boxes with class confidence scores, which were used to generate final trajectories.
To evaluate the performance of the proposed detection and tracking algorithm, we counted the number of pedestrians, cyclists, and vehicles in a sample of the obtained UAV videos. To be more specific, we used PFPNet to detect the ground objects (203 in total) and employed the KCF algorithm to keep track of them. We used the bounding box coordinates to mark the type and the location of the tracked objects. Hence, counting was done by simply tallying the number of bounding boxes. We quantitatively evaluated the counting result via the correctness (Cor), completeness (Com), and quality (Qua), which are defined in [45] as: The true positives value (TP) denotes the number of correctly detected ground objects, the false positives value (FP) represents the number of invalid detections, and the false negatives value (FN) denotes the number of missed objects. Among the three evaluation criteria, quality is most important, since it considers both the correctness and completeness of detection algorithms.
We report the count results for all types of ground objects in Table 1. The detection and tracking algorithm works very well, and nearly all types of ground objects are accurately detected and tracked, especially vehicles, which achieve 100% quality. We noted that a few pedestrians and cyclists cannot be correctly recognized as the quality of the results for pedestrian and cyclist are 94.3% and 87.9%, respectively. These results occurred because some bicycles were largely occluded by their cyclists, making them look very similar to pedestrians from the bird's eye view of the high-resolution UAV videos.
After processing the UAV videos, we extracted 2134 trajectories of pedestrians, cyclists, and vehicles. A total of 203 trajectories were of pedestrians and were impacted by cyclists, vehicles, or both. The improved SFM contains a set of parameters, including the free speed of a pedestrian and the strength and range of forces exerted by the boundary, cyclists, other pedestrians, and vehicles. The parameters that are measurable but difficult to derive from the observed dataset were set by referring to related studies. Other parameters that do not have concrete physical meanings but can be indirectly derived from the pedestrian trajectories were calibrated by MLE in the MATLAB program. According to related studies, the free passage speed can be 1.5 m/s and the time needed for a pedestrian to recover from their actual speed to the expected speed is 0.5 s. The p-value of the strength and range of each force is less than 0.05 at the 95% confidence level; all the parameters in the improved SFM are significant.
Both the classic and improved SFM were used to simulate pedestrian positions at complex traffic intersections, assuming a pedestrian mass ranging from 45 to 75 kilograms and a simulation time step ∆t of 0.2 s. The simulation results were compared with the corresponding pedestrian trajectories acquired by the UAV, in terms of absolute positioning accuracy and mean average percent error (MAPE). The obtained results are reported in Table 2. The results demonstrate that the classical SFM achieves a positioning accuracy of 0.33 meters, with a MAPE of 12.43%. By considering additional influences of cyclists and vehicles, the improved SFM provides better performance, with a positioning accuracy of 0.25 meters and a MAPE of 9.04%.  Figure 8 provides an example to evaluate the performance of the improved SFM. Figure 8a shows the recognition results of the pedestrian movements influenced by the boundary and cyclists. Figure 8b illustrates the force that the pedestrian experiences in the improved SFM. The boundary

Performance of the Improved Social Force Model
To assess the performance of the improved SFM, 80% of the pedestrian trajectories were selected to calibrate the SFM. The remaining 20% of the pedestrian trajectories were employed to simulate pedestrian movements at complex traffic intersections to evaluate the SFM's accuracy.
The improved SFM contains a set of parameters, including the free speed of a pedestrian and the strength and range of forces exerted by the boundary, cyclists, other pedestrians, and vehicles. The parameters that are measurable but difficult to derive from the observed dataset were set by referring to related studies. Other parameters that do not have concrete physical meanings but can be indirectly derived from the pedestrian trajectories were calibrated by MLE in the MATLAB program. According to related studies, the free passage speed can be 1.5 m/s and the time needed for a pedestrian to recover from their actual speed to the expected speed is 0.5 s. The p-value of the strength and range of each force is less than 0.05 at the 95% confidence level; all the parameters in the improved SFM are significant.
Both the classic and improved SFM were used to simulate pedestrian positions at complex traffic intersections, assuming a pedestrian mass ranging from 45 to 75 kilograms and a simulation time step ∆t of 0.2 s. The simulation results were compared with the corresponding pedestrian trajectories acquired by the UAV, in terms of absolute positioning accuracy and mean average percent error (MAPE). The obtained results are reported in Table 2. The results demonstrate that the classical SFM achieves a positioning accuracy of 0.33 meters, with a MAPE of 12.43%. By considering additional influences of cyclists and vehicles, the improved SFM provides better performance, with a positioning accuracy of 0.25 meters and a MAPE of 9.04%.  Figure 8 provides an example to evaluate the performance of the improved SFM. Figure 8a shows the recognition results of the pedestrian movements influenced by the boundary and cyclists. Figure 8b illustrates the force that the pedestrian experiences in the improved SFM. The boundary force always exists when the pedestrian is crossing an intersection. The boundary force behaves as an attractive force during the first 17 s when the pedestrian is within the crosswalk, and the force strength is proportional to the distance between the pedestrian and the boundary. As the pedestrian enters the crosswalk, the boundary force becomes a repulsive force with a strength that is inversely proportional to the distance between the pedestrian and the boundary. The initial repulsive force exerted by cyclists is 0. During the time interval between 19 and 32 s, conflicts among the pedestrian and cyclists γ 1 and γ 2 emerge. Because the distance between them is small, the repulsive forces suddenly increase. Once the pedestrian passes a cyclist, the force exerted by the cyclist gradually reduces to zero. Figure 8c shows the estimated position and trajectory of pedestrian α 1 in different models. To visually analyze the model performance, the MAPEs of estimated trajectories in the x and y directions are calculated. The maximum error of the improved SFM is 0.21 m, less than 0.47 m for classic SFM. The difference between the two models is primarily observed in the conflict area between pedestrians and cyclists. The improved model can estimate the pedestrian position and trajectory better than the traditional model. Figure 9 shows another pedestrian's movement affected by the boundary, cyclists, and nearby pedestrians. Figure 9a illustrates the recognized pedestrians and cyclists. Figure 9b shows that the cyclist force has the largest influence on the pedestrian's movement, which is determined by their speeds and the distance between the cyclist and the pedestrian. The force exerted by surrounding pedestrians has the second largest impact on the pedestrian's movement, which primarily occurs during the first 40 s when conflicts among pedestrians emerge; its strength is inversely proportional to the distance between two pedestrians. The boundary force always exists as the pedestrian remains within the crosswalk. Figure 9c displays the extracted and simulated trajectories. The maximum error of the improved SFM is 0.58 m, less than 0.88 m for classic SFM, which demonstrates the better performance of the improved SFM in describing the pedestrian movement as impacted by complex disturbances. In addition, the MAPE, which reflects the average error of the estimated position in each step of the simulation, is evenly distributed in the conflict area throughout the simulation process, which causes an indistinct difference in the model simulation accuracy. and cyclists and emerge. Because the distance between them is small, the repulsive forces suddenly increase. Once the pedestrian passes a cyclist, the force exerted by the cyclist gradually reduces to zero. Figure 8c shows the estimated position and trajectory of pedestrian in different models. To visually analyze the model performance, the MAPEs of estimated trajectories in the x and y directions are calculated. The maximum error of the improved SFM is 0.21 m, less than 0.47 m for classic SFM. The difference between the two models is primarily observed in the conflict area between pedestrians and cyclists. The improved model can estimate the pedestrian position and trajectory better than the traditional model.  Figure 9 shows another pedestrian's movement affected by the boundary, cyclists, and nearby pedestrians. Figure 9a illustrates the recognized pedestrians and cyclists. Figure 9b shows that the cyclist force has the largest influence on the pedestrian's movement, which is determined by their speeds and the distance between the cyclist and the pedestrian. The force exerted by surrounding pedestrians has the second largest impact on the pedestrian's movement, which primarily occurs during the first 40 s when conflicts among pedestrians emerge; its strength is inversely proportional to the distance between two pedestrians. The boundary force always exists as the pedestrian remains within the crosswalk. Figure 9c displays the extracted and simulated trajectories. The maximum error of the improved SFM is 0.58 m, less than 0.88 m for classic SFM, which demonstrates the better performance of the improved SFM in describing the pedestrian movement as impacted by complex disturbances. In addition, the MAPE, which reflects the average error of the estimated position in each step of the simulation, is evenly distributed in the conflict area throughout the simulation process, which causes an indistinct difference in the model simulation accuracy.

Conclusions
Simulation of intersections is important to track and simulate pedestrian movements at these intersections. Correspondingly, a UAV-based method is proposed to track and simulate pedestrian movements at complex traffic intersections. High-resolution UAV videos of the intersections are employed to extract high-resolution movements of pedestrians, cyclists, and vehicles. Given the potential conflicts among pedestrians, cyclists, and vehicles, an improved social force model that

Conclusions
Simulation of intersections is important to track and simulate pedestrian movements at these intersections. Correspondingly, a UAV-based method is proposed to track and simulate pedestrian movements at complex traffic intersections. High-resolution UAV videos of the intersections are employed to extract high-resolution movements of pedestrians, cyclists, and vehicles. Given the potential conflicts among pedestrians, cyclists, and vehicles, an improved social force model that considers the surrounding pedestrians, boundaries, cyclists, and right-turning vehicles is proposed and calibrated for pedestrian movement simulation. Videos acquired of intersections in Shenzhen City are utilized for high-precision pedestrian movement simulation. The results demonstrate that In this study, a UAV is employed as a pedestrian monitoring platform to provide high-precision pedestrian trajectories at complex traffic intersections for the improved social force model, which significantly improves the simulation accuracy. The following aspects require additional attention in future research: (1) the UAV-based method is subject to slight drift during flight, which reduces the accuracy of trajectory data acquisition. Ground control points can assist in UAV image correction and improve absolute positioning accuracy. On the other hand, additional factors associated with pedestrians movements should be included in the presented SFM, such as pedestrian's psychology and pedestrian density. (2) Following the connected UAV approach [53,54], the experiment at one traffic intersection will be extended to monitoring and simulating pedestrian movements at a set of intersections, simultaneously using connected UAVs [55] and cloud services [43].
Author Contributions: Author contributions: J.Z. and S.C. conceived and designed the experiments; S.C. and W.T performed the experiments; S.C. and K.S. analyzed the data; S.C and W.T. wrote the paper.