Camera-Based System for Drafting Detection While Cycling

Drafting involves cycling so close behind another person that wind resistance is significantly reduced, which is illegal during most long distance and several short distance triathlon and duathlon events. In this paper, a proof of concept for a drafting detection system based on computer vision is proposed. After detecting and tracking a bicycle through the various scenes, the distance to this object is estimated through computational geometry. The probability of drafting is then determined through statistical analysis of subsequent measurements over an extended period of time. These algorithms are tested using a static recording and a recording that simulates a race situation with ground truth distances obtained from a Light Detection And Ranging (LiDAR) system. The most accurate developed distance estimation method yields an average error of 0.46 m in our test scenario. When sampling the distances at periods of 1 or 2 s, simulations demonstrate that a drafting violation is detected quickly for cyclists riding at 2 m or more below the limit, while generally avoiding false positives during the race-like test set-up and five hour race simulations.


Introduction
As in most endurance sports, the goal of triathlon races is to reach the finish line as quickly as possible. In addition to optimizing training, rest, and nutrition, triathletes can greatly improve their performance by working on a streamlined posture. A huge advantage can also be gained by cycling closely behind another competitor during a race. This other competitor then acts as a wind shield. This technique is called drafting, stayering, or slipstreaming. Recent studies suggest using this technique by riding in a peloton can reduce aerodynamic drag by 90-95% compared to that of an isolated cyclist [1].
The International Triathlon Union (ITU) makes a distinction between draft-legal and draft-illegal races [2]. Most long or middle distance races and many standard or sprint distance races fall under the latter category, where it is forbidden to draft behind another athlete or a motor vehicle, i.e., to enter the bicycle or vehicle drafting zone for an extended period of time. For standard and shorter distances the bicycle draft zone is 10 m long measured from the leading edge of the front wheel. An athlete is allowed to enter another athlete's draft zone but must be progressing through that zone. A maximum of 20 s is allowed to pass through another athlete's drafting zone. The official regulations regarding middle and long distance events are slightly different, enforcing 12 m distance and 25 s maximal duration. Note that some race organizers enforce even other distances or durations. The experiments in this paper, always assume a 10 m drafting zone and 20 s maximal duration, unless stated otherwise.
Today, drafting regulations are monitored by referees driving a motorbike ( Figure 1). However, monitoring these rules in practice is difficult, hence unpunished drafting violations do occur frequently. In a typical triathlon race, there are not enough referees to check all triathletes all the time. In addition, the referees' decisions are still subjective as they can only roughly estimate the distances between the bicycles. Furthermore, athletes can typically hear a motorbike approaching them, so they can adjust their behavior momentarily. Figure 1. Typical drafting rule check in a triathlon race. A motorbike rides next to the athletes, and the referee performs a visual estimate of the distance between them. If the estimated distance is too small (e.g., smaller than 10 m) for an extended period (e.g., longer than 20 s), the drafting athlete receives a penalty.
The lack of reliable detection of illegal drafting can lead to arbitrariness in the best case, and possibly even to corruption, such as favoring some individuals in the worst case. A drafting detection system based on video taken from a camera mounted under the saddle of a bicycle, equipped with computer vision techniques can offer a solution to these problems. Moreover, when an athlete challenges a referee's decision, a video based system and its recorded distances can provide supportive data. In summary, by continuously checking compliance with the drafting rules, a camera based detection system will lead to less drafting and consequently ensure fairer triathlon or duathlon races.
In a previous implementation of a drafting detection system, GPS and Web Services are used, yielding an average accuracy of 1.32 m for the absolute position of each athlete [3]. The error can increase in cases of urban canyons and tree lines, because of the limitations of the GPS device, which is a significant deviation, given the drafting distance limit. Two more accessible solutions for a drafting detection system are: Light Detection And Ranging (LiDAR) and RAdio Detection And Ranging (RADAR). The former is a very accurate but expensive and non-compact technique. The latter is more compact and cheaper. However, for object detection with a RADAR system, a sufficient number of reflection points are required at known positions, making this solution potentially less reliable.
Monocular vision based detection and distance estimation algorithms have been demonstrated to successfully estimate the distance between cars in literature for traffic applications. These techniques typically combine object detection networks with geometric techniques but also employ extra information to accurately estimate the distances. These features include the presence of lane markings [4], or the specific geometry of cars [5]. However, in the application of detecting bicycles in triathlon races, neither of these additional features can be used.
In this paper, we present a video based solution that automates the task of assessing potential drafting violations and extend the work of Van den Bossche et al. [6]. A single camera which looks backwards is mounted under the saddle of the bicycle. We do not consider a stereo camera set-up with two or more cameras on one bicycle because this would mitigate the benefit of compactness, and add to the cost.
To be used in a triathlon race context, the mechanical robustness of the system is very important, because road conditions may often cause problems. It must be able to deal with vibrations in the image due to rough road surfaces, such as cobble stones or speed bumps. Furthermore, installing the system on a race bike should be a simple and straightforward procedure. In the proposed system, the mounting of the camera only involves the camera height and the tilt angle, which are easy to control.
The focal point of this paper is the detection of drafting by the camera, whose operation can be summarized as follows. Bicycles are detected in the individual video images with a Convolutional Neural Network (CNN) architecture. The bicycle closest behind the camera is tracked and the distance to the bicycle with the camera is estimated from the apparent height and position of the bicycle. The distance is monitored over an extended time period, which enables an estimate of the probability that a drafting violation (e.g., longer than 20 s in the draft zone) has occurred.
The main contributions of this paper can be summarized as follows: 1.
We trained, applied, and analyzed the performance of real-time CNN-based object detector, specifically for detecting race, time trial, and triathlon bicycles.

2.
We describe two methods for estimating the distance from the camera to the cyclists behind. We performed sensitivity analysis and investigated the systematic errors that can occur which are caused by making simplifying assumptions. The accuracy of the distance estimators is also verified in a realistic scenario using a Light Detection And Ranging (LiDAR) scanner.

3.
We developed an efficient method which determines the probability of violating the drafting rule, based on successive distance estimations and a model of the measurement error. The behavior of this method is rigorously tested in a realistic scenario and through the use of simulations.

Drafting Detection
In this section, we discuss all processing steps in our proposed approach. Note that (as a preprocessing step), we assume the camera is intrinsically calibrated and the lens distortion was removed, e.g., with the method of Zhang [7].

Bicycle Detection
The first step in the drafting detection system is to detect cyclists in real-time. The aim is to have a robust, real-time detection at least up to 20 m from the camera. To achieve this, a Tiny YOLOv3 (You Only Look Once, version 3) [8] network has been trained for specifically detecting triathlon bikes, using 4 different triathlon race recordings of about one hour long each. Our training set for the object detector is very diverse, also including lighting changes (e.g., in tunnels) and poor weather conditions, such as rain. The detector was trained with approximately 60,000 manually annotated ground truth bounding boxes, which overlay the bicycle from the ground (bottom of the front wheel) to the handlebars. An example frame with manually annotated boxes is shown in Figure 2. This renders the size and position of the bounding boxes useful for distance estimation, as will be shown in Section 2.3. The training versus test set ratio was 80 to 20%. The tiny YOLOv3 network obtained an Average Precision (AP) of 77.19% on our dataset, by comparing the detections with manually annotated ground truth bounding boxes. On our hardware (1080p input video and desktop PC with a GeForce GTX 1060 GPU), the detection speed on the test set is 42.57 fps. We note that other object detectors, such as YOLOv3 [8], Faster RCNN (Region Convolutional Neural Network) [9], and Mask RCNN [10], also have the potential to yield high detection rates but are slower, taking 24.58, 7.85, and 6.64 fps, respectively, on the same hardware set-up.
The Single Shot multibox Detector (SSD) [11] does provide an interesting alternative. It is based on the MobileNet CNN architecture, was specifically designed for constrained devices (e.g., smart phones), and yields a similar processing speed as Tiny YOLOv3 (42.25 fps). Depending on the final application, and whether the processing needs to be executed on the device (i.e., attached to the bicycle) or off-line after the race, a more heavyweight but also more accurate and slower network architecture could be used.
The performance of a re-trained version of this network remains to be investigated in future work. Note that the AP of the detector is further improved by applying object tracking on the detected bicycles, which is discussed in the following subsection.

Bicycle Tracking
Because the regulations also incorporate a maximal drafting duration, the detected bicycles need to be tracked as long as they are visible throughout the video. We are only interested in tracking a bicycle as long as it stays (visible) behind the camera bicycle. In addition, note that, in order to be robustly detected by Tiny YOLOv3 and tracked, the bicycle should be reasonable close. In our experiments, Tiny YOLOv3 was able to accurately detect bicycles which were further than 20 m away, which is still well above the typical drafting limit.
In our method, the bounding box of the bicycle closest to the camera is not only detected but also tracked and its trajectory is recorded. This results in reliable samples to estimate the distance, which will be discussed in the next section. We note that other potential cyclists (riding further from the camera) are discarded in our current method. To be used in a realistic application, the system should be able to detect multiple cyclists behind. However, due to the increased complexity of the track management of such a system, we have not yet investigated this in our current proposed method.
Tracking increases the robustness of the draft detection considerably. The detection miss rate is lowered by using the predicted position of the tracker in case the detector misses a detection for a certain frame. If a bicycle is detected in the previous frame but not in the current frame, its current position can still be estimated from a tracker's prediction step. Thus, the effect of missed detections (False Negatives) is mitigated and bicycles in the scene can be better monitored continuously. More specifically, bicycles are linked when they overlap in successive frames. This strategy is simple but accurate (as will be demonstrated further in this subsection), because of the high frame rate and low relative speeds. When no overlapping bounding box is detected at a given point in time, the tracker uses the last detected bounding box as input and then tries to locate it in the current frame. When a new bounding box is detected, a new track is initialized. In this way, the same bicycle can be uniquely identified by its 'bicycle id'.
In order to make an informed decision with regard to the object tracker, eight object trackers from the literature are benchmarked on our dataset. The success AUC is defined as the Area Under the Curve (AUC) of the success plot, which demonstrates the percentage of the number of frames where the Intersection Over Union (IOU) of the estimated bounding box and the ground truth bounding box is larger than the considered threshold. We evaluated state-of-the-art trackers according to detection speed and success rate as shown in Figure 3: CSRT (Channel and Spatial Reliability Tracker) [12], KCF (Kernelized Correlation Filters, Copyright (c) 2012, Piotr Dollar All rights reserved; Copyright (c) 2014, Tomáš Vojíř.) [13], Boosting [14], MIL (Multiple Instance Learning) [15], TLD (Tracking, Learning, and Detection) [16], Medianflow [17], MOSSE (Minimum Output Sum of Squared Error) [18], and DSST (Discriminative Scale Space Tracking) [19].  . Tracking benchmark for CSRT (Channel and Spatial Reliability Tracker) [12], KCF (Kernelized Correlation Filters) [13], Boosting [14], MIL (Multiple Instance Learning) [15], TLD (Tracking, Learning, and Detection) [16], Medianflow [17], MOSSE (Minimum Output Sum of Squared Error) [18], and DSST (Discriminative Scale Space Tracking) [19]. (a) Success plot: success rate versus Intersection Over Union (IoU) threshold. (b) Area Under Curve (AUC) of (a) versus execution speed.
According to this benchmark, the Discriminative Scale Space Tracking (DSST) object tracker is the most appropriate solution. It clearly stands out with respect to the AUC of the success plot (0.73 versus 0.44 for the second highest scoring method) and has the third highest processing speed (still well above real time). The DSST tracker builds on the MOSSE tracker (described in Bolme et al. [18]) and extends it with a multi-scale pyramid to estimate the scale of an object. Note that rotation invariance is less important in our application, since the orientation of the cyclist only typically changes in a road bend.
The Tiny YOLOv3 object detector is tested in combination with the DSST object tracker with regard to improvement in the detection rate. An initial AP of 77.19% for Tiny YOLOv3 was obtained, but, after applying object tracking, an increase of the recall yields an average precision of 88.27% for the bounding boxes. The success plot of this combination is shown in Figure 4. Note that this combination is notably very reliable when the IoU threshold is allowed to be low. However, in some cases, a higher IoU threshold is preferred, e.g., if the height of the bounding box needs to be very accurate. Figure 3 suggests that relying more on the DSST tracker (as opposed to using it mainly for missed detections) could increase the combined detection/tracking performance further. Other configurations could be investigated in future work.

Distance Estimation
In this section, we discuss two alternatives to estimate the distance between the camera bicycle and the bicycle behind: the Wheel Position-Based method (WPm) and the Handlebar Height-Based method (HHm). Each method has a benefit with respect to the other, which is discussed in detail further in this section. For both methods, we assume a flat road and known geometry of the bicycles, as shown in Figure 5. At the end of the section, we perform a sensitivity analysis w.r.t. potential violations of the assumptions made and describe systematic errors related to these assumptions.
In both methods, we derive x from the position of the bounding box (with bottom at y w and height y h ) of a detected cyclist. The camera tilt angle θ, the focal length f and the camera height h 1 have known values. The camera is mounted at the rear of the saddle and, according to the rules the drafting distance, d is calculated from front wheel to front wheel. Thus, we must add (b 1 − b 2 ) to x to obtain the distance between the two front wheels.
To distinguish between the two models, we will denote the distances estimated by the two methods as x h and x w , respectively. In the ideal case, we should have The distance d between the two athletes is measured between the leading edges of their bicycles' front wheels. In the proposed methods, we estimate the distance x = x w = x h , which differs from d by a fixed constant distance (b 1 − b 2 ). This figure was modified from References [20,21]. (a) Wheel Position-Based method (WPm), estimated from y w ; (b) Handlebar Height-Based method (HHm), estimated from y h .

Wheel Position-Based Method (WPm)
The first method estimates the location of the bottom of the wheel on the ground plane, starting from a known tilt angle θ and camera height h 1 (Figure 5a). Note that the WPm is closely related to the camera-pose-based trigonometric vehicle distance estimation method described in Reference [22]. Let α denote the angle between the bottom of the bounding box and the camera center line; thus, where y w is the vertical distance from the image center to the bottom of the bounding box in the image, and f is the focal length of the camera. The distance x w thus equals The advantage of this method is that is independent of the height of the bicycle behind, which typically can only be estimated. A possible disadvantage of this method is the strong dependence of Equation (4) on the tilt angle of the camera, which will be demonstrated in the sensitivity analysis in Section 2.3.3.

Handlebar Height-Based Method (HHm)
The second distance estimation method is based on the height of the handlebars above the ground, which is obtained from the height of the detected and tracked bounding box. The expression for the distance of to the object w.r.t. the height of the bounding box is complex and depends on more parameters than the WPm (see Appendix A). However, this expression can be simplified if we assume that h 1 ≈ h 2 . From Equation (4), where y h = f tan θ − y w is the height of the bounding box. For realistic recordings a small tilt angle is expected and y h f when the detected object is far enough from the camera. From Figure 5, it can be deduced that The vertical FOV of commercial cameras is rarely larger than 90 • ; thus, | tan α| ≤ 1 when the detected object is entirely visible. Hence, for a small tilt angle θ, y h ≈ f h x . In a typical set-up, h is more than ten times smaller than the drafting distance limit, so y h f for the most crucial situations. Thus, The fact that the tilt angle θ can be ignored is a significant advantage of this method. The initial tilt angle must not be measured when setting up the system. Furthermore, due to vibrations, the camera position might change during the recording. Finally, there are situations where the road itself is not perfectly flat, as shown in Figure 6, which influences the distance estimation in a similar fashion to that of camera rotation. An in-depth sensitivity analysis is performed in the next section.

Sensitivity Analysis
The parameters in the distance calculation formulae Equations (4)-(6) either depend on the scene geometry (h 1 and θ), the intrinsic camera properties ( f ) or the position or height of the detected and tracked object in the image (y w and y h ). The bicycle's speed is not included in the distance or probability calculations, so it does not directly contribute the error. However, it might contribute indirectly, e.g., by causing more jitter on an uneven road surface, or when the relative speed between the two bicycles is high, which might (albeit slightly) influence the performance of our tracker.
In practice, all parameters can be prone to errors, each having different potential causes. Hence, we performed sensitivity analysis of x h and x w w.r.t. these parameters, which is discussed below. The full analysis is performed in Appendix B. An overview of the results for a small tilt angle θ is demonstrated in Table 1. For example, when there is a small error in the measurements of the height h 1 , the relative error on the distance calculated by the WPm satisfies ∆x w/x w ≈ ∆h 1/h 1 . Table 1. Approximate relative errors with respect to the different parameters in the distance estimation formulae (4)-(5) derived from sensitivity analysis. We assume that the tilt angle θ is small and ∂u /∂v ≈ ∆u /∆v for all parameters u and v. The sensitivity analysis demonstrates that measured errors w.r.t. camera height h 1 , the focal length f , the bottom position bounding box y w or the height of the bounding box y h are all (approximately) proportionally propagated to the estimated distances x h and x w , i.e., an error of 5% on one of these parameters yields a 5% error on the estimated distance. This also indicates that the absolute error increases linearly with the distance to the cyclist behind.

Relative errors
The partial derivative for x h , estimated by the HHm, w.r.t. θ is relatively insensitive to tilt angle estimation errors when α (see Figure 5) is small. This justifies the relaxation from Equation (5) to Equation (6). For the WPm, however, even a small estimation error for θ can potentially lead to a significant error in the estimation of x w , notably when α is small.
If the camera cannot be very tightly fixed in its original position, the tilt angle is the most error-prone parameter. In this scenario, this angle can change (undesirably) throughout a recording, e.g., due to vibrations as demonstrated in Figure 7. Thus, when using the WPm, this parameter needs to be updated at runtime. This can be realized by optical flow analysis or by utilizing information from an accelerometer and/or a gyroscope.

Systematic Errors
The proposed HHm makes two assumptions; h 1 ≈ h 2 and θ ≈ 0 • . In this section, the effect of the systematic error introduced by these simplifications is investigated.
In practice, there can be a height difference between h 1 and h 2 of the order of a few centimeters. This issue could be resolved, e.g., by adding a distinctive marker on all bicycles at a predetermined height, which can be detected by the software, and internally adjusting the value of h 2 in the detection step accordingly.
The tilt angle θ typically differs from 0 • when the camera position is either redirected during set-up, or when the angle changes due to vibrations (Figure 7) during the bike ride. A typical tilted camera has a magnitude of θ of up to 10 • .
The difference between the exact solution (see Appendix A, Equation (A3)) and the approximated solution in Equation (6)  The systematic errors vary linearly w.r.t. the distance from the camera. A camera tilt (|θ| > 0 • ) or h 1 < h 2 contributes to an underestimation of the estimated distance, notably for objects further away from the camera. When h 1 > h 2 , the true distance is similarly overestimated.
The WPm is not dependent on h 2 , so it is expected this method performs better when the height of the handlebar of the cyclist behind cannot be accurately estimated. In Section 3, the performance of both methods is evaluated through simulations and in a realistic scenario.

Drafting Probability
In the previous sections, methods for determining the distance between two bicycles at a given point in time were explored, based on object detection and tracking. Since the drafting rule also has a temporal component, it is necessary to robustly combine multiple measurements of the distance over a given time period. Given a set of measurements and a certain measurement error probability density function (pdf), our end result is an estimated probability that the cyclist behind violated the drafting rule.

Theoretical Probability Determination
The probability of drafting for a given period of time is closely related to the known theory of 'success runs' [23]. However, the main difference in this application is that the probability of 'success' depends on the measured distance, which changes between measurements.
Let q 1:n be a binary vector of length n. When the distance between the two bicycles d is smaller than the drafting distance limit d L (e.g., 10 m) at t, q t = 1; otherwise, q t = 0.
We assume that the likelihood function of the real distance d between two bicycles can be modeled as a distance-dependent normal distribution, with standard deviation σ x at x t . We assume here that the measured distance . Let x t be a distance measurement at time instance t, with fixed offset b 1 − b 2 (see Figure 5). The probability that the bicycle behind is actually closer than d L at one time instance can thus be expressed as Similarly, the probability that the bicycle is not closer than d L is P(q t = 0|x t ) = 1 − P(q t = 1|x t ).
To compute the probability of q 1:n given a set of measurements x 1:n , we define the index sets I 1 = {t ∈ [1, n] |q t = 1} and I 0 = {t ∈ [1, n] |q t = 0}. We also assume that all measurements are independent and that the probability that q occurs, given successive measurements x 1:n = (x 1 , x 2 . . . x n ), can be computed as P(q 1:n |x 1:n ) = ∏ t∈I 1 P(q t = 1|x t ) ∏ t∈I 0 P(q t = 0|x t ). (8) q 1:n is defined as a 'valid drafting pattern' (q 1:n ∈ Q v ) if it contains at least one instance of at least k = T L/ f s + 1 successive samples with value 1, where f s is the sampling rate and T L the maximal time an athlete is allowed to stay in the drafting zone (e.g., 20 s). Since all patterns are mutually exclusive, the probability of drafting over a given period of time (n samples) is thus the sum of the probabilities of the occurrence of all valid drafting patterns. Let v 1:n be the event of a drafting rule violation for t ∈ [1, n]. The probability that such an event has occurred, given distance measurements x 1:n is P(v 1:n |x 1:n ) = ∑ q 1:n ∈Q v P(q 1:n |x 1:n ) .
In our application, the drafting violation probability should approach 1 as soon as the cyclist behind spends longer than the drafting time limit T L in the drafting zone. On the other hand, when a cyclist stays outside of the drafting zone, or is only inside for less than T L , the probability should stay close to 0. Three factors determine how well P(v 1:n |x 1:n ) follows these considerations:

1.
Systematic errors: when one or more parameters in the distance estimation formulae Equations (4)-(6) are erroneously set, this leads to a systematic over-or underestimation of the distance. 2.
The distance estimation noise: less noise means more certainty about the estimated distance; thus, the individual probabilities in Equation (7) are closer to either 0 or 1. In Section 3.2, we will investigate the noise level for our test set-up and through simulations. 3.
The sampling rate: for independent measurements and a given noise level and distance, there is always a higher probability that at least one of the measurements is far off from the real distance, which can significantly influence the calculated probability in Equation (9). Conversely, when the sampling rate is low, there is a higher chance of sampling bias. Hence, a trade-off exists, which will be investigated in Section 3.2.

Efficient Probability Calculation
A naive method of calculating the drafting probability would consist of an exhaustive summation of all probabilities for any given pattern (time complexity O(2 n )), which in turn require n multiplications of individual likelihoods, each calculated from Equation (7). However, it is also possible to calculate the drafting probability with time complexity O(n) and constant space complexity by re-using previous results. Assume that all earlier probabilities of drafting P(v 1:n−i |x 1:n ) for t ∈ [1, n − 1] are known. The updated probability of drafting is now the sum of the probability that drafting had already occurred and the probability that it is the first time the drafting limit has been exceeded for longer than k successive samples. The drafting limit can only be exceeded for the first time if the last k samples of q 1:n (i.e., q n−k+1:n ) are 1, the one before that is equal to 0 and no other valid drafting patterns can be found in q 1:n−k−1 . Hence, the probabilities can be calculated by the following recursive expression: P(v 1:n |x 1:n ) = P(v 1:n−1 |x 1:n−1 ) + (1 − P(v 1:n−k−1 |x 1:n−l−1 ))(P(q n−k = 0)|x n−k )P(v n−k+1:n |x n−k+1:n ), (12) = P(v 1:n−1 |x 1:n−1 ) + (1 − P(v 1:n−k−1 |x 1:n−k−1 ))(P(q n−k = 0|x n−k )) n ∏ t=n−k+1 This equation demonstrates that only the last k measured distances (or probabilities of momentary drafting with Equation (7)) and the last k + 1 calculated drafting rule violation probabilities need to be kept in memory and a constant number of calculations is done for every new sample.

Evaluation and Results
In the previous section, we proposed two strategies to estimate the distance between two bicycles: the HHm and the WPm. Both methods are analyzed and benchmarked in order to make a well-founded choice concerning robustness and accuracy. To achieve this, two types of test recordings have been made and are discussed: a static and a dynamic situation test. Both tests were executed in sunny weather conditions. The probability of drafting that is inferred from these measurements is evaluated in the final part of this section.

Static Test
To determine the accuracy of the calculated distance, a static situation for well-known reference distances is considered as shown in Figure 9. The bicycle with the camera is fixed at the 0 m mark. Another bicycle is placed at distances ranging from 1 m to 20 m to the camera, at regular intervals of 1 m each. For this test, an average error of the measured distances over 100 frames is plotted for each ground truth distance. The results of the HHm and the WPm are shown in Figure 10. In this scenario, the WPm obtains better overall results for distances less than or equal to 6 m. For larger distances, the HHm produces a lower overall absolute distance error. Note that these measurements have been realized with a known tilt angle; therefore, the tilt angle deviation is assumed to be close to zero for the wheel-based method. The limitation of this static situation test is that it is considerably different from what happens during a triathlon race, since only one, static athlete is visible at any given time. Moreover, the athlete supports the bicycle by standing on the ground with one or two legs, which provides a slightly different context for the trained object detector and might thus lead to inferior results. Hence, a dynamic situation test is discussed in the following paragraph.

Dynamic Test
To test the drafting detection algorithm in a realistic environment, a LiDAR system is used on a cargo bike to perform accurate distance measurements in a triathlon race-like situation. A flat road with no gradients or bends is considered in this test set-up. The LiDAR of the Velodyne VLP 16 type provides a 3D point cloud of the environment through which the measured distance of the drafting detection system can be compared with the LiDAR distance with an accuracy of 0.1 m. The arrangement is shown in Figure 11. The measurements from our distance estimation methods and the LiDAR ground truth from the sequence are shown in Figure 12.  The LiDAR data is considered ground truth. Figure 13 shows the signed and absolute distance errors calculated for 416 ground truth LiDAR distances for both the handlebar height based method and the WPm. The LiDAR ground truth distances vary from approximately 3 to 19 m. The absolute error graph demonstrates that the HHm is more accurate than the WPm, with average absolute errors of 0.47 and 1.16 m, respectively. The signed error graphs also indicate that both distance estimation methods are prone to systematic, distance dependent errors. For the WPm, this bias is most likely caused by a significant camera tilt change between the measurement of this angle and the beginning of the test. The bias for the HHm indicates a downward tilt and/or h 2 > h 1 .
To further improve the overall distance measurement results in future work, the neural network of the detector could be retrained with a larger training set, notably with cyclists far from the camera bicycle. Other network architectures can also be considered, keeping in mind that the processing speed still needs to be adequate. However, the largest benefit could be obtained by avoiding the systematic errors as much as possible, as explained in Section 2.3.4.

Estimation of Measurement Noise Level
The distance estimation accuracy depends on the accuracy of the parameters in formulae Equations (4) and (A3) and the approximations used to obtain Equation (6). We modeled the distributions of these parameter measurements as normal distributions, for set-ups similar to the one used in our dynamic test. The modeled mean and standard deviation of each parameter can be found in Table 2.
We assume all errors are uncorrelated. Thus, these errors propagate to the estimated distances as follows [24]: where σ x w and σ x h are the estimated standard deviations of the errors on the WPm and the HHm, respectively. These standard deviations are distance dependent and plotted in Figure 14. Table 2. Estimated mean and standard deviations of the normally distributed parameters in our simulations. These estimations were obtained through observation of realistic measurements. Note that the standard deviation for h 2 is significantly bigger than the standard deviation for h 1 because the latter can be measured much easier in a real application. Furthermore, we assume that the standard deviation for the error on θ is much smaller when it is actually measured (e.g., in Equation (4)) than when we assume it is always 0 (e.g., in Equation (6)). Finally, we note that the means of y w and y h are both distance dependent, so we did not include them in this table. Note that the HHm is again clearly more accurate than the WPm in the error simulations. This observation agrees with the sensitivity w.r.t. θ (Section 2.3.3) and our experiments in Section 3.1. With the obtained standard deviations, the drafting probability can be calculated according to the method explained in Section 2.4. We will further analyze the drafting probability with distance estimations obtained from the HHm.

Drafting Violation Probability
We calculated the drafting violation probability for two subsequences of real data from our dynamic test, with video input similar to Figure 11a. We assume the probabilities are reset in between, such that multiple violations can be detected. The results are demonstrated in Figure 15. For the dynamic test, the HHm notably overestimates the distance to the cyclist behind when the distance is large due to systematic errors. However, these errors barely influence the probabilities, since the measurement error for distances close to or below the drafting limit is much smaller. The drafting violation is detected rapidly and correctly in both subsequences.
We further analyzed the behavior of the drafting probability estimator by means of simulated distances with added measurement noise. In every simulation, the cyclist behind is set to start at the edge of the drafting zone, at a distance of 10 m. The cyclist then either moves inside of or away from the drafting zone at a constant relative velocity, such that he arrives at a new distance x d after 10 s. Then, the cyclist stays at this new position indefinitely, and the calculated drafting violation probability is calculated with Equation (13) with σ x = σ x t obtained as explained in Section 3.2.1. Distances x d from 7 m to 11 m are considered, for sampling periods T s of 0.5 s, 1 s, 2 s and 4 s. The simulated measurement noise was derived from the distributions in Table 2. The simulated noisy values for h 1 , h 2 , f and θ (all mostly caused by initial measurements) were kept constant during 1 iteration of the experiment. Only the noise for y h was randomly resampled from the distribution for every measurement. 1000 iterations of each considered combination of x d and T s were performed.
Note that 18,000 s = 5 h is the typical time a participant spends on the cycling part of a full distance triathlon, so times longer than that are not investigated. Examples of these simulations are demonstrated in Figure 16. Each simulation yields a certain time until drafting is detected, whether this detection is desired or not. Distances below the drafting limit should be detected soon after the time limit. An overview with a histogram of the detection times from our simulations can be found in Figure 17.  The results demonstrate that, for distances above the drafting limit, a high sampling rate (low sampling period) is preferred, such that no drafting penalties are wrongfully awarded. Conversely, a lower sampling rate typically awards a drafting penalty faster and is thus the preferred option for distances below the drafting limit.
Theoretically, the resulting drafting violation probabilities should be similar when the underlying date is the same, regardless of the sampling rate. However, we made the assumption that all measurements are independent, which is typically not true when measuring the distance to the same object for an extended period of time. Furthermore, we assume implicitly that an athlete that is drafting at two successive samples, never exits the drafting zone in-between. This assumption might be too strict when the sampling period T s is large and can cause false positives. To avoid false positives, we recommend a sampling period of at most 2 s.
When riding just below the drafting limit (e.g., at 9 m), the drafting probability is clearly sensitive to potential distance estimation errors, and the time before drafting is detected is thus highly variable. The system would have less false negatives when σ x is small close to the drafting limit. To be fairly used in a race context, the rule enforcers need to take this into account. An alternative is to consider only awarding penalties for athletes riding at a distance significantly below the limit.

Current Limitations and Future Work
The proposed approach still has a few limitations in its current form. We conclude this section with a brief discussion of these limitations and potential strategies to overcome them in future work.

1.
In its current form, the proposed system is only able to track one cyclist behind at a time. A more complex track management system should be added such that multiple cyclists can potentially be followed throughout the sequence.

2.
The values for σ x w and σ x h were now only estimated with the use of simulations. To more accurately estimate the parameters, real training data is required, with different cameras, bicycles, camera heights, tilt angles, as well as in different environments, with accurate ground truth, like we obtained in our dynamic test.

3.
We have assumed all distance measurements are independent. However, in practice, the measured distance between two cyclists rarely varies significantly between successive frames. This can be observed by comparing the real data extracted from the dynamic test ( Figure 15) with the simulations (Figure 16). Thus, low pass filtering or the incorporation of a temporal model (e.g., a Kalman filter) with transitional probabilities could further increase the accuracy of the measurement error model. An additional potential benefit of a Kalman filter is that it automatically estimates the measurement error, which can further optimize the estimation of the drafting violation probability.

4.
As mentioned before and verified in this section, the estimated probability depends on the sampling rate. In addition, the exact moments the first sample is selected (just at the beginning of drafting or not) might influence the results, notably at low sampling rates and when the actual drafting time is close to the limit. The rule enforcers should be well aware of these characteristics and standardize the sampling settings and/or allow for enough buffer for questionable cases. This can, e.g., be realized by allowing slightly shorter distances and longer drafting times, or by imposing a drafting probability threshold that differs from 50%.

Conclusions
In this paper, a proof of concept for a drafting detection system in triathlon was proposed. The system is composed of four important building blocks: object detection, object tracking, distance determination, and drafting violation probability estimation. Detecting and then following the closest cyclist through the different scenes ensures a continuous monitoring over time. An average precision of 88.27% for detected bicycles was obtained on the test set after tracking. The Handlebar Height-Based method (HHm) method appears to be the most accurate one for distance determination, as the Wheel Position-Based method (WPm) is too sensitive to potential tilt angle changes or poor tilt angle estimation. In the static situation test, the average absolute error over all ground truth distances was 0.89 and 1.46 m for the HHm and the WPm, respectively. Furthermore, a dynamic test was conducted with LiDAR distances as the ground truth. Over all ground truth distances, this experiment shows that the HHm has an average absolute error of 0.47 m. For the WPm, this error is 1.16 m.
In addition, a drafting violation probability estimation was developed, which checks how likely it is that the drafting rule has been broken over a period of time. When using an appropriate sampling period of at most or 2 s, simulations demonstrate that the calculated probabilities are very useful to detect drafting at 8 m or below. The time before drafting is detected just below the drafting limit is highly variable, however. Nevertheless, with these settings, no unjustified penalties are awarded within a realistic time window. The proposed system shows promise to enter a triathlon and duathlon in order to obtain fairer races when further developed.

•
When either h 1 = h 2 or θ = 0, there is always one zero solution.

•
When h 1 > h 2 , there is exactly one negative solution.

•
When h 2 > h 1 , both solutions have the same sign, but one solution will have a much smaller value than the other, e.g., let h 2 = 0.8 m, h 1 = 0.85 m, and θ = 10 • . From Equation (A5), an object at (real distance) 10 m can theoretically be confused with an object at 132 µm, which in practice cannot even be fully captured on the image plane.
Thus, when only considering objects in front of the camera which are fully visible, the largest value of x h can generally be used as the estimated distance.