Application of Crowd Simulations in the Evaluation of Tracking Algorithms

Tracking and action-recognition algorithms are currently widely used in video surveillance, monitoring urban activities and in many other areas. Their development highly relies on benchmarking scenarios, which enable reliable evaluations/improvements of their efficiencies. Presently, benchmarking methods for tracking and action-recognition algorithms rely on manual annotation of video databases, prone to human errors, limited in size and time-consuming. Here, using gained experiences, an alternative benchmarking solution is presented, which employs methods and tools obtained from the computer-game domain to create simulated video data with automatic annotations. Presented approach highly outperforms existing solutions in the size of the data and variety of annotations possible to create. With proposed system, a potential user can generate a sequence of random images involving different times of day, weather conditions, and scenes for use in tracking evaluation. In the design of the proposed tool, the concept of crowd simulation is used and developed. The system is validated by comparisons to existing methods.


Introduction
Person tracking and action-recognition algorithms for video streaming data recently focus a lot of interest, inspired and motivated by variety of possible applications. There are numerous approaches and achievements, often multistage and complicated, leading to different types of information extracted from data. Their further improvement and evolution require elaborating reliable evaluation and benchmarking scenarios. Existing benchmarking methods involve testing the reliability of tracking and action-recognition algorithms by employing public datasets manually annotated by their authors. PETS [1] is the first, well-known dataset created primarily for surveillance applications. The original release consisted of three subsets of benchmark data, the first designed for pedestrian count and density analysis, the second for pedestrian tracking, and the last for analysis of traffic flow and event recognition. With respect to evaluation of tracking methods, the most popular dataset is included in MOTChallenge [2][3][4][5], which includes available datasets with ground truth, validation MATLAB scripts and the possibility of uploading results so as to rank methods with respect to quality and accuracy. address this issue. Kristinsson [23] describes a system based on Unity 3D that along with a properly set animation controller component and adequate animation clips, can employ motion-capture technology to record video clips and high-resolution 3D models, thereby creating a realistic crow simulation system. In one way to approach this problem, Forbus et. al. [24] describe a video game entitled "The Sims" created by Maxis Studios which gives a player the ability to create and control a virtual family over many generations. Agents, or sims, can perform various actions to achieve objects worldwide or with one another. Actions performed by agents, or indirectly controlled Sims, occur when a user creates a temporal-interaction object with a proper animation clip or animation offset in order to impart the feel of a real-world situation. To achieve such an effect, every action needs to be manually defined by the user. The Sims comes with a virtual machine, The Edith, designed specifically for this purpose. Musee et al. [25] propose a different approach in which a crowd simulator simulates the motion of synthetic pedestrians drawn from samples gathered from real-world video sequences using object tracking techniques. The simulator uses the trajectories of pedestrians gathered from video sequences to simulate pedestrians moving within a simulated environment. In [26], the authors report creation of an interactive data-driven crowd simulator that combines the high realism of data-driven methods with interactivity of synthetic techniques.
As previously mentioned, crowd simulators have already numerous applications including not only gaming software but also city and environments planning, organizing commercial spaces, designing and verification of evacuation paths for buildings or terrains etc. All these applications inspire fast advances in crowd simulators. In given work a new application of crowd simulators is presented. The CrowdSim (crowd simulation system) was designed and employed as a validation tool for tracking video recordings. To the best of the authors knowledge, CrowdSim is the first system of this type that allows generation of random test data with high variability and controlled complexity.
A graphical system is designed that creates simulated records together with matching annotations. With this system, a potential user can generate a sequence of random images incorporating differing times of day, weather conditions, and scenes for the purpose of tracking evaluation.
Published algorithms can thus be tested with respect to the influence of crowd density or the impacts of different light conditions related to weather, both of which are difficult to capture when real-world videos are employed. The proposed solution was tested and validated on different algorithms for tracking multiple objects and these results were compared with those obtained from MOTChallenge. The presented work has the following features: • random generation of images that can be used to evaluate different algorithms for tracking multiple objects by means of random starting position of pedestrians, their unpredictable interactions during simulation process and choice of different models • application of the Unity game engine for crowd simulation • automatic annotation of object detections and names of actions according to the MOT format • application of prepared models, generated scenes, and changes in time of day and weather conditions during a simulation

Proposed Solution-CrowdSim
The purpose of proposed crowd simulator (CrowdSim) is testing multi-object tracking algorithms. Moreover, it was intended to be a worthy replacement for datasets of existing video footage and to introduce a new level of testing for tracking algorithms by generating simulations with a variety of factors that would not be possible using real-world camera footage. The CrowdSim is based on the very popular Unity 3D engine that offers excellent optimization and ease of development. Figure 1 (and additionally Figure A1), drawn from a creation of proposed simulator, shows three scenes representing views from three cameras focused on a closed city square. Presented solution is equipped with three different views (static cameras) but the number of possible scenarios is very high-future user can change (1) number of pedestrians in scene, (2) number of FPS (Frame Per Second) and resolution, (3) intensity of different weather conditions (rain, fog and snow). Simulations are compatible with MOTChallenge format and generated along with ground truth (exact results of crowd identification), detections which are the necessary information of detected objects, but not yet marked as individual objects and finally generated images in png format. The presented approach employs not only the possible combinations of video environments available to a user but also allows that user to test algorithms capable of tracking objects via footage from multiple cameras simultaneously. The user can track an object if it leaves one camera frame but appears in the field of view of another. The main challenges in implementation of CrowdSim rely on generation of 3D models along with animations and creation of assumptions concerning agent's movement.
In addition, CrowdSim offers a variety of models, 50 different ones in total (25 male and 25 female characters) created with the Adobe Fuse CC tool. This amount of variety makes the simulation more reliable when it is testing algorithms based on agents' appearances. The Mixamo tool emulates walk animation of human figures, resulting in very natural looking agent movement. To achieve natural pedestrian distribution and walking paths, the authors introduced random initial positions of pedestrians across a scene. In addition, Crowd Simulator with the Unity engine was employed to create a net of points of interest for agents to ensure they move around the simulated city square in a natural and predictable manner. Due to the many interactions between agents during the simulation (e.g., occlusions, path intersections in crowded places), which slow down or stop the agents, their movement is not uniform and makes the tracking process more difficult.
The primary features of Unity, including its usage of an A-star [27] algorithm for global path finding or of reciprocal velocity obstacles to represent collision predictions or collision avoidance, are sufficient to create realistic movement among agents. Other useful functions include constant access to each agent's current and desired velocity and the ability to prioritize an agent's movement to define its deviation from a planned route. What makes this approach special is a mechanism that saves information about an object's position in every frame in a format already popularized by MOTChallenge, a characteristic that makes using this simulator as a test environment for tracking algorithms incredibly easy. Still another useful addition is the potential to simulate various weather conditions and their intensity. Possible workflow of CrowdSim is presented in Figure 2.

Evaluation Scheme
In verification of CrowdSim the MATLAB computational environment has been used as the tool for organizing and ordering experiments as well as for statistics. The material for analysis came from 6 different tracking algorithms that are available along with open-source code. Analysis of the system evaluation was performed in MATLAB (The MathWorks, Natick, MA, USA) on two different workstations, a 2.6-GHz Inter Core i7 with 16 GB RAM and a 3.5-GHz 6-Core Intel Xeon E5 with 64 GB RAM but without GPU acceleration, using standard MATLAB libraries and up to six workers. Execution of tracking algorithms was evaluated separately based on the source code available and was performed on a computer equipped with 3.6-GHz Intel Core i7, NVidia GeForce GTX 1070, and 32-GB RAM. The sample images along with source code are provided online at (https://crowdsim.aei.polsl.pl/). In order to automate the work related to testing multi-object algorithms, a test environment was created, and the main tasks of a testing environment included the following: • adaptation of the input data to suit the input format of each algorithm • conversion of algorithm outputs to the form standardized by MOTChallenge • automation of triggering algorithms in the form of a pipeline • harvest of standardized outputs and performance ratings with use of the MOTChallenge workspace • workspace clean-up and preparation for another run The dataset was split into two categories: one to measure crowd density, variety, and performance and the other to evaluate the impact of weather conditions. In both cases, data were acquired from three camera angles. Performance according to the crowd density is given in Table 1 and Figure 3.  The frequent occlusion effect caused by weather conditions increased the randomness of results in the weather-related test. To make full use of the crowd simulator chosen for this task, data were generated with every weather condition possible according to parameters given in Table 2 and Figure 4. Both rain and snow were applied with 100% possible intensity; fog, however, was set to 75%, as this weather condition has such a huge impact on visibility for algorithms relying primarily on data gathered from video frames.  In this analysis, tracking methods were evaluated based on parameters present in MOTChallenge given in Table 3.
The most complex parameter for comparison is MOTA (multiple-object tracking accuracy), which was introduced in Table 3 to describe algorithm accuracy. The MOTA parameter could be less than zero when the sum of errors exceeds the total number of objects able to be tracked in a scene. Although this parameter describes the overall performance of an algorithm, it by itself is not a sufficient measure to describe an algorithm's performance. Therefore, MOTP (multiple-object tracking precision) was introduced to calculate the overall precision of an algorithm. This parameter typically showed a small variation across different algorithms but was usually between 69.6% up to 71.6%. Table 3. Detailed information on MOTChallenge parameters employed in comparison.

Parameter Details
PT (partially tracked) The number of objects tracked for more than 20% but less than 80% of their presence in a scene MT (mostly tracked) The number of objects tracked for more than 80% of their presence in a scene ML (mostly loosed) The number of objects tracked for less than 20% of their presence in a scene FM (fragments) The number of trajectories with one or more gaps in their tracklets. IDs (ID switches) One object occluding another so that the tracking algorithm continues tracking the wrong object FP (false positive) An instance where an algorithm identified a trajectory but could not relate it to an existing object FN (false negative) Describes the number of objects not tracked by an algorithm GT (ground truth) Number of objects described in the reference file Recall Ratio between objects that were possible to track (GT) and those actually tracked Precision Relationship of correct trajectories to the total number of possible trajectories FAR Number of false positives relative to the total number of frames comprising a tested sequence MOTA Three sources of error: false negatives, false positives, and ID switches.

MOTP
Average difference between true positives and corresponding ground truth targets.

Tracking Algorithms
The reliability of the approach described herein was tested by application of several tracking algorithms. The primary requirements of the proposed method are the availability of results in MOTChallenge and of the algorithms' source code. The tracking algorithms evaluated were as follows: • High-speed tracking by detection based on intersection over union (IOU) by Erik Bochinski et al. [28] and its further extension [29] in which authors based their tracking algorithm overlap. Method was tested on DETRAC dataset [30].

•
Tracking by detection (TBD) by Andreas Geiger [6], in which multiple-object tracking is performed in three stages. All detections are correlated with detections from consecutive frames using bounding box overlap and appearance. A Kalman filter is used in predictions, and detections are than matched between frames using the Hungarian method for bipartite matching. In order to reduce the number of missed detections caused by gaps or occlusions, the author of TBD also employed associated tracklets. TBD was originally tested on author's own dataset.

•
Tracklet Confidence (TCF) by Seung-Hwan Bae and Kuk-Jin Yoon [31], who based the method on tracklet confidence so as to handle track fragments during unreliable detections and occurrence of occlusions and to attain online discriminative appearance learning to avoid errors. For the performance evaluation, the authors used the following datasets: CAVIAR [32], PETS09 [1], and ETH Mobile scene [33].

•
Enhancing linear programming (ELP) with Motion Modeling for Multi-target Tracking [34]. During the detection phase, this algorithm employs the authors' pedestrian detector, creating a group of detections in the form of bounding boxes. After that, all previously detected objects are gathered to create tracklets which include every object and form a network. Method was originally evaluated on Oxford town center [35] and PETS09 [1].

•
High density homogeneous (HDH) [36]. Unlike most object detectors, this one requires no object texture or image learning. The detection algorithm localizes targets based on local maxima search, and tracking is based on a greedy, graph-based method, which matches objects with short tracks and performs backward validation in time windows. Method was verified on ETH [33], PETS09 [1] and TUD [37]. • Discrete continuous energy (DCT) [38]. This approach incorporates both data associations and trajectory estimations in one objective function. The biggest benefit from this approach is that the continuous factor allows many trajectory properties to be modeled that a regular discrete formulation would not be capable of capturing. This method gathers all unlabeled detections and then creates possible trajectory hypotheses to re-estimate those trajectories using discrete-continuous optimization. Method was evaluated on ETH [33], PETS09 [1] and TUD [37].

Crowd Density
The first part of the tests was performed sought to verify the impact of crowd density. The tracking algorithms were evaluated based on changes in crowd density, which began with 10 pedestrians and then increased to 100 pedestrians in increments of 10. Lighting conditions were not affected by additional post-processing methods. Results of the testing are shown in Figure 5 for four different parameters-MOTA, MOTP, FN, and FM. (All remaining parameters can be found in Appendix A Figure A2). The values of almost all parameters (FM, IDs, FAR, FN, FP, ML, MT, PT) exhibited near-linear increases as the number of pedestrians increased, as was expected. In this situation, the most efficient methods have the lowest parameter values. In contrast, the parameters Prcn, Rcll, MOTA, and MOTP take on higher values for relatively more efficient methods. These parameters are influenced by the number of objects ground truth, and the relations are more constant. On the other hand, a good comparison can be observed between parameters MT and ML, for which for mostly tracked the higher value is desired (IOU had the highest overall score), whereas in mostly lost the lower value is expected (IOU had the lowest overall value). Among the tested tracking methods, the IOU algorithm garnered the best performance, and the DCT algorithm the worst. The reason for the latter was that the search for optimal values of the DCT's internal parameters was out of scope. The other algorithms did not require additional optimization. Due to the extended computation time of some algorithms, full analysis for a larger crowd was only possible for three methods reported in Figure 6. Other methods did not scale well with increasing crowd density. They required even several hours to analyze a single video stream, so they were omitted. Performed calculations show that with the increasing crowd numbers in the simulation, the effectiveness of all compared methods decreases, but the order of their effectiveness does not change compared to previous analyzes. As seen in Figure 6, the values of the MOTA parameter fall to 85% (IOU), 40% (HDH), 25% (ELP) and MOTP parameter to 94% (IOU), 82% (ELP) and 72% (HDH).

Weather Conditions
The primary benefit of performing crowd simulations is the potential to validate different weather conditions, the influence of lighting conditions, and other artifacts. In validating the algorithms, three weather conditions were employed: (a) rain with clouds (first column), (b) snow with clouds (second column), and (c) fog (steam) with 75% intensity (third column). Each tracking algorithm was tested a sufficient number of times (one hundred times), thus ensuring repeatability of methods. Figure 7 presents the MOTA and MOTP parameter results, in the form of boxplots, for each of the three weather conditions. (The remaining parameters are given in Appendix A Figure A3). As can be seen, the rain and fog/steam conditions show the primary differences. The results for snowy conditions (second column) show smaller variability and decreased influence on efficiency. The most efficient method is IOU and the least efficient is the DCT for the same reason as with crowd density. Additionally, in Figure 8

Comparison to MOTChallenge
The last step evaluating the algorithms was to compare the results of MOTChallenge [3], which is a benchmark that introduced a true revolution in the multi-object tracking benchmark field. The authors of MOTChallenge introduced video material datasets of a size never before equaled. Its dataset encompasses a wide variety of lightning conditions, crowd densities, and viewpoints. The dataset was augmented by data from previously introduced datasets like PETS. The idea of the benchmark is not to test how good the algorithm performs on a given video sequence but how it performs on a diverse set of sequences. For each sequence available in the MOTChallenge dataset, the benchmark provides a detection file and a ground truth file that were created by the community in order to avoid mistakes that would cause the benchmark to be unreliable. Everyone wishing to test an algorithm against the "challenge" prepared by the authors of the proposed system is free to download the training and benchmark data and submit the results thereafter. These results will then be placed in the ranking, which is reset every year, usually with some additional sequences added to the benchmark. Every participant can also submit his algorithm's code together with a link to any related publications. In recent years, the MOTChallenge authors have also released the modified version of the benchmark, which allows the testing of algorithms capable of object detection only. Researchers standing behind the MOTChallenge are continuously releasing newer version of the benchmark. Since its inception, they have released the following: 2D MOT 2015, 3D MOT 2015, MOT16, MOT17Det, MOT17, CVPR 2019 Tracking Challenge, and CVPR 2019 Detection Challenge. As every new version contains datasets from previous releases and contributes new ones, the dataset is constantly growing.
Data on the MOT ranking from different years were collected and evaluated methods were verified. Specifically, results from MOTChallenge were compared to mean values of different parameters of crowd simulations from the crowd density, rain, snow, and fog/steam tests. The main purpose of this comparison was to compare the different algorithms' results from the simulator with the actual MOT ranking. In order to compare a few approaches to data validation, the MOTA (Table 4) and MOTP (Table 5) parameters were calculated. In general, method IOU provided the best results for the MOTA parameter with method TBD yielding the second-best results. (HDH was not classified in MOTChallenge and, under snow conditions, TCF, ELP, and TBD exhibited slight differences.) In MOTChallenge on the first place of ranking DCT method can be found; however, for crowd simulations, with no modification of the parameter source code, that method fails and so was placed in last position. The main differences and variations in method results can be observed in rain and fog/steam conditions. Those weather conditions gave the different methods the biggest difficulties; only small deviations were apparent in snowy conditions (i.e., the final images were not disturbed by snow in a satisfying way) and a potential user could test the behavior of an algorithm with respect to crowd density by changing number of pedestrians.

Discussion
The experience coming from using automated, crowd simulation-based approach to video tracking evaluation is its flexibility and ease of implementation. Evaluation scenarios can be very easily planned, data are readily presented to video tracking algorithms leading to reliable scorings and ordering of algorithms. CrowdSim tool allows comprehensive evaluation and prioritization of any available state-of-the-art method of video tracking. Here 6 methods have been evaluated which were developed over 2013-2017. That choice was dictated by availability and functionality of the tested tools. The authors of some of the methods available in the latest MOT have not released the source code [39][40][41][42][43][44][45][46][47][48]. Some of the available methods were compiled but unfortunately required additional code modifications, input format adjustments, or generated erroneous results [49][50][51][52].
Important aspect of accessing presented method was comparison to existing approaches, mainly to presently most commonly applied MOTChallenge database. For easy situations obtained ordering of video tracking algorithms highly coincides with MOT ranking. However, scenarios which are presently poorly covered by manually prepared data, rainy, foggy or snowy condition can strongly influence evaluations. IOU method, which achieves top MOT scores, for data generated by CrowdSim is also most efficient. Increasing number of pedestrians or adding rain or snow effect decreases scoring of all compared methods below 90. CrowdSim tool for benchmarking tracking algorithms generates video streams, which are less realistic than real images e.g., obtained from monitoring systems. However, in contrast to real-world, simulated images are much more repeatable and have all parameters under control. Therefore, despite the gap in their reality, they allow drawing more robust conclusions concerning quality of tracking algorithms, as it has been shown in experiments. Clearly, by advancing proposed technology of generating artificial images, in the future versions the authors can come closer to the reality of simulated scenes.
The performed analyses imply that the MOTChallenge ranking cannot be treated as a reference data for all situations. Using collections of video streams prepared for this ranking does not allow for controlling important parameters affecting efficiency of tracking, light conditions number of pedestrians, weather conditions. The data available in the MOTChallenge are too sparse to study differences in values of tracking efficiency as functions of these parameters. The MOTChallenge scorings yielded lower values of the MOTA and MOTP parameters in terms of crowd simulations. For MOTP, the variability in results is much lower than for the MOTA parameter.

Application of the Concept of Crowd Simulations
Proposed system of automatic generation of different scenes ensures that tracking methods designers cannot overfit their parameters by targeting only limited sets of real videos. Examples of a crowd simulator show that it can be applied to verify and validate tracking algorithms. The main benefit of such an approach is in the wider possibility of comparison and generation of different situations for a scene. The CrowdSim enables verification of the impact of crowd density (i.e., the number of pedestrians in a scene) on tracking algorithms. In general, it can be observed that tracking methods suffered from a greater number of problems with increased numbers of pedestrians. On the other hand, the influence of three different weather conditions (as well as associated light) on the quality of tracking was checked and it was observed that the main difference occurred within the rain and fog conditions. Obtained results demonstrated the main trends with respect to parameters, and the idea of crowd simulations in tracking evaluation can be developed and performed under conditions of increasing sophistication. Although given approach was compared with the results from MOTChallenge, the main goal was not to obtain exactly the same parameter values, an impossibility. MOTChallenge contains videos created under different conditions, which were presented in the rankings in averaged form. On the other hand, not all methods were given in each ranking, and there were only a few working open-source codes (even for those methods which topped rankings). In method verification, the authors focused primarily on the MOTA and MOTP parameters, which are the most complex. It was possible to check the methods that yielded repeatable results and the conditions in which algorithms gave different results, and finally the efficiency of the tracking methods included in the analysis was verified.
To reach a compromise between the quality of simulations and the computation time of tracking methods, it was decided to choose 10 FPS and the resolution of 800 × 600. The main intention was to verify the influence of crowd density and weather conditions, so the number of repetitions is much higher than in single videos in MOT. Current video surveillance systems range from 10-25 FPS of quality but still many popular benchmarks have similar parameters (Kitti [53]-10 FPS, PETS09 [1]-7 FPS). Lower FPS value is still dictated by video storage and data transmission. In IPVM report [54] it can be observed that average frame rate in video surveillance increased into 15 FPS in 2019 (40% of 11-15 FPS and 28% of 6-10 FPS) from 10 FPS in 2016 (51% of 6-10 FPS and 32% of [11][12][13][14][15]. Furthermore, the group from University College London proved that the minimum frame rate level to detect events in similar video systems is 8 FPS [55]. The authors of [56] treat the real-time efficiency in the value of 10 FPS. Nevertheless, CrowdSim enables modification and increase (or decrease) of either frame rate or image resolution so that potential user can adjust images to their needs.

Future Works
The proposed solution solves many complex problems but nonetheless requires additional development in the future. For the creation of human movement, built-in tools were used that are available in Unity and external tools. Nevertheless, the simulation must be equipped with some additional mechanism capable of making human movement more natural with limited possibilities of occlusions. In the case of action recognition and more advanced simulation of the movement of individual agents, it is planned to use an additional animation library, as well as record own animations in motion capture systems. It should be considered replacing the navigation mesh with planned movement in which all behavior is planned a priori and is simulated by the software. An additional element that makes the simulations real is the addition of a background in the form of other moving objects that do not participate in the tracking process. There are also plans to prepare first-person images that could move with the crowd. A scene should consist of more realistic elements including traffic jams in the example of moving cars. The crowd simulations themselves should be more complex, consist of more differently looking people, and generate more problems for tracking algorithms. In order to improve the reality of scenarios such as weather conditions future versions of simulator should be equipped with different kind of post-processing methods as well as shaders implemented for that case.

Conclusions
In this paper, a novel approach to validation of video tracking algorithms was proposed. The crowd simulation system for video tracking benchmarking was prepared in a game engine environment. Previous approaches used real-world collections of video streams, catalogued and annotated manually. The proposed methodology was inspired by problems, which follow from weaknesses of manual preparation schemes mentioned in several literature references and commonly encountered in preparation and validation of real-world surveillance systems, limited volume and resolution of data streams, insufficient replicability, problems in planning and implementing desired scenarios.
In the proposed algorithmic solution and elaborated implementation "CrowdSim", resulting images are automatically annotated in the commonly used convention and can be readily applied to evaluation of tracking algorithms. The output of "CrowdSim" reproduces formats and standards, in particular those of MOTChallenge, already in use for benchmarking video surveillance systems. All output indexes for quality of tracking are reliably reproduced. Simulation scenarios can represent variety of situations and can be controlled by parameters such as time of day, crowd level, weather conditions, and background. Parameters can appear with different levels. Studies on sensitivities of algorithms to different variants, different levels of values of disturbing parameters can easily be conducted.
To confirm the usefulness of the proposed approach several tests were conducted concerning comparisons of applications of different tracking algorithms (IOU, TCF, ELP, TBD, HDH, DCT) to the generated data. Significant improvements were demonstrated by comparison of the described approach to existing benchmarking systems based on manual annotations. CrowdSim made possible exhaustive comparisons of the studied tracking algorithms with respect to several important indexes, crowd density, weather conditions (rain, snow, fog/steam), lighting conditions, time of day. Significant differences in responses to changes in the indexes between video tracking systems were observed, ranging one order of magnitude for MOTA parameter. In contrast, manually annotated systems are likely to flatten ranges of quality indexes of compared algorithms due to insufficient variability of parameters.
Several directions of future development of the elaborated system are possible, improving realism of generated images, increasing variability ranges of parameters, combining different indexes in video streams. Developing new versions of crowd simulators will contribute to improvements in efficiency of the constructed surveillance systems. CrowdSim is freely available at the dedicated webpage (https://crowdsim.aei.polsl.pl) along with several simulated benchmark data sets. Funding: This work has been supported by the National Centre for Research and Development (project UOD-DEM-1-183/001 "Intelligent video analysis system for behavior and event recognition in surveillance networks") and by the research project (RAU-6, 2020) and projects for young scientists of the Silesian University of Technology (Gliwice, Poland).

Acknowledgments:
The authors would like to thank all students who were involved in development of crowd simulator, especially to Tomasz Mysliwiec-the author of CrowdSim, Maciej Blady, Daniel Fernes, Łukasz Kandziora, Aleksandra Kotula, Zbigniew Pamuła and Mikołaj Stankiewicz.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: The different examples of weather conditions influencing the results of crowd simulations are visible in Figure A1.

Appendix A.2. Crowd Density
The remaining parameters showing the influence of crowd density on accuracy of evaluated algorithms are presented in Figure A2.