On the Noise Complexity in an Optical Motion Capture Facility

Optical motion capture systems are state-of-the-art in motion acquisition; however, like any measurement system they are not error-free: noise is their intrinsic feature. The works so far mostly employ a simple noise model, expressing the uncertainty as a simple variance. In the work, we demonstrate that it might be not sufficient and we prove the existence of several types of noise and demonstrate how to quantify them using Allan variance. Such a knowledge is especially important for using optical motion capture to calibrate other techniques, and for applications requiring very fine quality of recording. For the automated readout of the noise coefficients, we solve the multidimensional regression problem using sophisticated metaheuristics in the exploration-exploitation scheme. We identified in the laboratory the notable contribution to the overall noise from white noise and random walk, and a minor contribution from blue noise and flicker, whereas the violet noise is absent. Besides classic types of noise we identified the presence of the correlated noises and periodic distortion. We analyzed also how the noise types scale with an increasing number of cameras. We had also the opportunity to observe the influence of camera failure on the overall performance.


Introduction
Synthesis and analysis of human motion is an active research area with a plurality of applications in biomechanics and entertainment [1]. Contemporary technologies allow capturing and processing movement (Mocap) with high realism and accuracy; however, they are not error-proof. Various methods were proposed for motion acquisition, yet the optical motion capture (OMC) technique, based on tracking of retro-reflective markers in IR images is considered the gold standard in this field of research. It outperforms other techniques and it has been used for verification of the other technologies: inertial [2,3] or optical [4]. OMC is also considered to be a reference motion acquisition for the applications of other Mocap technologies in such demanding areas as medical [5][6][7] or space research [8].
The uncertainty in optical motion capture systems depends on numerous factors, such as type and amount of used cameras, their physical setup, and mounting, marker size, environmental conditions such as air temperature or humidity, camera noise, and quality of the calibration of the motion camera in the motion capture system. Although almost all these factors can be controlled by re-calibration of the system or ensuring constant environmental conditions, the noise present in the cameras is an inevitable factor that cannot be easily neglected or removed.
In this article we characterize the types and levels of noise in three types of Vicon Motion Capture Camera: MX-T40, Bonita10 and Vantage 5. We use Allan variance (AVAR) [9] which is a handy tool for identification and evaluation of noise types. We propose how to address the non-trivial regression problem of ADEV curve, by matching it with the component functions using metaheuristics: simulated annealing (SA) and ant colony optimization (ACO). Moreover, thanks to the observed malfunction of one of the devices we were able to demonstrate that the proposed approach can be used for quite complex cases of correlated and periodic distortions.
The proposed usage of Allan variance has several advantages over the classical statistical approach or spectral analysis. It works when the simple statistics fails, it is adequate for the noise quantification when the measurements are correlated. It has clear graphical interpretation, which allows for identification and quantifying several different categories of noise at once. Last but not least it does not require a template: it is based on the assumption that the signal is constant and any variation is noise actually.
The need for obtaining cautious characteristics of the OMC systems is severalfold. The key rationale is that OMC systems are employed as a reference for the other Mocap techniques. Requirements of the reference systems for calibration of measurement devices are high, demanding the reference precision to outperform a system under tests significantly. Moreover, the presence of the non-standard types of noise can falsify conventional measures of uncertainty in such a system. Finally, the knowledge of occurrence and amplitude of certain classes of noise can be useful for diagnosing the facility and reducing the noise sources (e.g., long term environmental influence).
The article is organized as follows: in Section 2 we provide theoretical background and we demonstrate simple variance-based noise quantification with the precaution that it might be not satisfactory; we introduce Allan variance as an alternative. In Section 3 we describe the experiment-laboratory setup and procedures-followed by an algorithm for parameters estimation description and comments on unexpected phenomena observed during the experiment. Section 4 discloses the experimental results and their analysis followed by a discussion of results. Section 5 summarizes the article and provides ideas for future work.

Previous Works
The accuracy and precision in different OMCs were subject to analysis in several works [10][11][12][13]. In those works, the most frequently studied OMCs are one of Vicon System (MX, Bonita, and V-series), or the OptiTrack system. Regardless of the system used, the authors of these studies agree that the most important factor that influences the data is camera calibration. Camera calibration originates from photogrammetry [14], it relies on positioning the cameras in a virtual 3D space so that they correspond to the cameras positions in the laboratory. This position and several (minimum two) 2D camera projections of markers are used to reconstruct markers in 3D space [15]. The calibration quality is determined using average re-projection error. This is the mean distance between the 2D image of the markers on camera and 3D reconstructions of those markers projected back to the camera's sensor in pixels.
Early works modeled the noise in frequency-based fashion for signal processing needs. Values of a high frequency were considered to be noise that needed to be removed. They were identified or as residuals from the 'right' motion modeled by slowly varying curves (low-order polynomials, splines, or Fourier series of low order) [16,17], or by conventional Butterworth filtering [18,19], where the cut-off frequency was identified by the lack of autocorrelation in the filtered-out residuals.
Windolf et al. [11] reported that performance of OMC strongly depends on their individual setup and that accuracy and precision should be determined for an individual laboratory installation. They tested both accuracy as a root-mean-square (RMS) error from ground truth and precision as a standard deviation of measured positions in a four camera Vicon 460 system. As a ground truth they employed a custom-built robot mounted L-shaped template. They verified the influence of changing the camera setup, calibration volume, marker size and lens filter application. In the best case they report 63 ± 5 µm accuracy and 15 µm precision.
In another study, Jensenius et al. [13] tested two OMC systems: Optitrack and Qualisys. They used constancy of position as a quality criterion and identified marker position drifting over the time. They measured drifting velocity (in mm/s) and drifting range (in mm) that identifies volume of uncertainty for marker position. They also emphasize role of proper calibration for the performance of OMC, and coverage of the area within the calibration procedure.
In the work of Carse et al. [12], three optical 3D motion analysis systems were compared, one of which was a new low-cost system (Optitrack), and two which were considerably more expensive (Vicon 612 and Vicon MX). They used a rigid cluster of markers and measured inter-marker distance and its standard deviation (SD) as a quality criterion for a walking task in an unknown, but adequately large volume. They reached SD values between 0.11-3.7 mm depending on the OMC system.
Results confirming high quality position measurement, using Vicon MX with 5 Vicon F-40 cameras, were obtained in the work by Yang et al. [20]. They considered whether the OMC could be used for the subtle bone deformation during exercises; the task required accuracy better than 20 µm. As a test template they used markers mounted on the computer numerical controlled (CNC) milling machine with 1 µm spatial resolution. They tested influence of marker size for cameras located very close to the observed, quite small, volume (0.4 × 0.3 × 0.3 m). They confirmed that it is possible to achieve the RMSE accuracy and precision to be 1.2-1.8 µm and 1.5-2.5 µm respectively.
Eichelberger et al. [10] investigated the influence of various recording parameters on the accuracy using Vicon Bonita cameras. These are the number of cameras (6, 8 and 10), measurement height (foot, knee and hip) and movement (static and dynamic). All these affected the system accuracy significantly.
Another notable work was conducted by Merriaux et al. [21]. They performed two experimental error estimations in 8 Vicon T40 camera OMC in moderate volume 2 × 1.5 × 1 m. They used two sophisticated robotic templates for static and dynamic (fast rotating blade) cases. In the static case, the estimated errors are mean absolute error (MAE) 0.15 mm for accuracy and RMSE of 0.015 mm for precision. In the dynamic case, the observed accuracy was larger, yet still satisfying, it achieved values between 0.3 mm to <2 mm. They demonstrated also that it depends on the object velocity and sampling frequency.
Slightly different, yet interesting study on noise [22] involved aquatic OMC based on Vicon T40 cameras, where the scene was a water-filled tank, cameras are located externally in dry locations and the markers made of dedicated reflective tape (SOLAS) are submerged. They demonstrated no significant difference in accuracy and precision due to various mediums in the optical path.
The technological side of an OMC is not the only source of distortions in the system. In the work of Capozzo et al. [23], the mechanics of markers placed on the skin was emphasized as a source of distortions in OMC. Further, it has been practically considered in the work of Alexander and Andriacchi [24], where skin motion-based distortions were suppressed. Clusters of marker positions were observed in a non-disclosed OMC. Bone orientation was estimated and evaluated, but marker positions were also analyzed; however, they were not the main quality criterion. They were able to reduce marker location error from 0.025 to 0.008 cm and the average bone orientation error from 0.370 to 0.083 degrees.
Various requirements towards the system uncertainty were specified in the literature. They depend on the applications and recorded tasks [1]. Some applications may require very high accuracy and precision achievable in a small volume, whereas most of the motion capture labs need larger observed volume at the expense of quality for practical purposes [25]. Moreover, due to various process dynamics and to avoid motion blur, some tasks may require much higher sampling frequency than 100-120 Hz, which is typically used. Detailed handwriting analysis for subtle symptoms of cognitive issues requires small volume but high accuracy and frequent sampling [26], whereas for some sports activities the volumes might be huge but the accuracy of meters is enough [27]. Even the same area of applications may require very different parameters of motion acquisition. Exemplarily, in medical applications, the aforementioned [20] bone deformation acquisition needs high accuracy in small volume; whereas the behavioral study of the surgery staff [28], which took place in 6 × 6 m operating room, resulted in 10% of recordings, which were inconclusive for identification of a subject.

Simple Preliminary Gaussian Model
Locating markers in a scene is a continuous process occurring frame-by-frame at the requested sampling frequency. The measurement of the location of a marker can be presumed to be an actual location signal plus additive Gaussian white noise, consequently, locating of each marker location is an independent statistical process. One dimensional case, as depicted in Figure 1, can be described with normal probability density function: where: loc(M k ) denotes actual location of k th marker in a scene. N(·) denotes normal (Gaussian) distribution, which for real location at x k is estimated as a mean µ k , and standard deviation σ k , that (at best) should be common for all the same markers (of a same size).
The typical uncertainty analysis in measurements employs two factors accuracy and precision [29]-the accuracy that describes how close the estimate µ k is to actual location x k and describes the systematic error, whereas σ k reflects the precision of measurement and describes random part of the error.
Extending the estimation of a marker model to the estimation of a length (L) of a bone, it yields a difference of double marker location measurements, hence its probability density function is described: where: µ e = |x 2 − x 1 | -expected (in common sense) mean value, σ e -expected standard deviation, which might take different forms, depending on the case: , independent variances (covariance σ 12 = 0), B. σ B = σ 2 1 + σ 2 1 -for two different (σ 1 = σ 2 ), independent variances (σ 12 = 0), C. σ C = σ 2 1 + σ 2 2 − 2σ 12 -for two different (σ 1 = σ 2 ), correlated variances (σ 12 = 0). The number of cameras used for position reconstruction is another factor that has a significant influence on the uncertainty of measured position. In the system which takes multiple samples of the measured value (such as position) and results in a mean value of these, the perceived precision can be described as standard error (SE) [30]. In our case, the increasing number of cameras used for reconstruction could be considered to be taking successive samples of the position. With the increasing number of samples, the SE value falls as the variability of measured value reduces. SE calculation is based on standard deviation: where: N is several observations, σ-standard deviation. This theoretic, quasi-hyperbolic relationship is depicted in Figure 2. Such a description is just a kind of approximation of the uncertainty variation in the multicamera triangulation process, since we do not exactly know all the nuances of implementation by OMC vendor; furthermore it does not take into consideration spatial location of cameras. The other issue of the error quantification is the lack of reliable ground truth. The Vicon systems return their results with 1/100 mm resolution, though it is known (see Section 2) that the actual accuracy of OMCs in real installations is lower. Yet still, the uncertainty of reference has to be much better than the tested system. According to the meteorology standards, reference uncertainty should be smaller between 4 and 10 times than the system under test [30]. It is difficult to obtain necessary physical template (like the T-frame) manufactured with precision and accuracy sufficient to reliably calibrate OMCs. For this reason, it is hardly feasible to evaluate the accuracy (bias) of the length estimation with mean values without sophisticated equipment. Fortunately, this aspect is of lesser concern as it describes the systematic error, which is easy to compensate. However, all above considerations would not be enough if the input location measurements are correlated. According to metrology guidelines [29] simple experimental mean or standard deviation are not adequate to describe the uncertainty in the system with correlated noises. In such a situation a dedicated tool, namely Allan variance, is recommended.

Allan Variance
Allan variance (AVAR) a two-sample variance and its square root -Allan deviation (ADEV) are statistical descriptors that were developed for the evaluation of the stability of the time and oscillation in clocks. A notable advantage of this approach that there is no need to provide reference value or ground truth.
Presently, the measure is effectively used for quantifying the noises in the measurement of other quantities [31,32], but it is particularly useful for evaluation of inertial motion capture sensors [33,34]. Allan variance [9] is defined as: where τ is the time intersample spacing, · denotes expected value.
The AVAR analysis consists of identifying the linear parts of certain slopes of the log-log plot of τ steps versus ADEV (square root of AVAR). It is demonstrated in the schematic ADEV plot in Figure 3. It is a highly beneficial advantage of the AVAR noise quantification over the power spectral density (PSD), which has the capability not to clutter different noise processes and to precisely discriminate several types at once. However, there are also disadvantages. AVAR is sensitive to the outliers and requires considering outlier cleaning to obtain reliable results. The second issue is a necessity to record quite a long sequence for the analysis of a longer term processes.
The conventional types of noise can be identified by their PSD distribution with the power law and respective ADEV slopes [35]. The 'color' is given as power relation with respect to frequency (S( f ) ∝ 1/ f α ). Therefore, overall noise characteristics, comprising different basic noise types are: It corresponds to: which for a conventional set of noises yields: Conventional (color) noise types are gathered in Table 1, Table 1. Power-law noise types and their Allan variance representation [35].

Noise Name
where f h is bandwidth limit for the measurement system. A..E respective scaling factors K α . Additionally, two complex distortions, exponentially correlated (Markovian) and sinusoidal, can be identified using Allan variance [36]. The Markovian noise is visible in the Allan deviation plot as a single 'bump' with slopes ± 1 2 . Periodic (sinusoidal) distortion is represented in respective plot as a decaying series of bumps with left-sided slope 1 and right side bump series with constant envelope of a slope −1; however, it is the only case that is more convenient to be observed and to analyze the distortion in the Fourier spectral domain.
Correlated noise PSD is given as: and corresponding Allan variance has a form: where: q c is the noise amplitude, T c is the correlation time.
Sinusoidal noise PSD has a form of two peaks, modeled with Dirac delta: and respective Allan variance form: where: A s is the amplitude, f 0 is the frequency, δ(·) is Dirac delta peak.

Environment
The experimental setup was employed in the Human Motion Laboratory (HML) at the Research and Development Centre of Polish-Japanese Academy of Information Technology in Bytom (http://bytom.pja.edu.pl/laboratorium/laboratorium-hml-analizy-ruchu-czlowieka/). The motion system used in this laboratory consists of a total of 30 Vicon Motion cameras of three different types: • 10 Vicon MX-T40, • 10 Vicon Bonita10, • 10 Vicon Vantage V5.
These cameras can record data independently or can be integrated into one larger system with capture volume 9 m × 5 m × 3 m. To minimize the impact of external interference like infrared interference from sunlight or vibrations, all windows are permanently darkened and cameras are mounted on scaffolding instead of tripods (as is shown in Figure 4) The basic information and main differences between used cameras are shown in Table 2.

Data Capture
For the noise analysis needs, a special, nine-hour recording of the two 14 mm markers from the calibration T-frame template (wand) was made. The duration was chosen to reveal the influence of longer term processes, such as a random walk, and to be on par with the length of a normal daily working hours. The wand (demonstrated in Figure 5) was placed in the center of motion capture volume. The three other markers were removed. Data was recorded simultaneously by all the 30 cameras at 120 Hz in standard Vicon software (Vicon Blade version 3.3.1). The XYZ coordinate system was by default oriented according to the T-frame as it is depicted in Figure 5. Camera calibration was made once with all thirty cameras according to the standard Vicon procedure. The reprojection error for this session, for all the cameras was less than 0.2 pix-mean error for Bonita −0.1946 pix; Vantage −0.1891 pix; T40 −0.1535 pix as reported by the software after the calibration procedure. Additionally, in order to minimize the environmental noise, laboratory technicians were not present in the room during this recording. After the system calibration, all the necessary operations and supervision (start and stop record, system status verification, etc.) were done remotely.

Data Processing
In the post-processing stage in Vicon Blade (Version 3.3.1) software, markers were reconstructed and labeled only, no other filtering or processing was used. This stage was done several times, separately for each camera configuration (including different numbers of cameras of each type). Reconstruction settings were set to the default, for each camera type except the initial set of 2 cameras of each type, where it was required to override the demand of marker visibility by three cameras at least. In this trial, the parameter 'Minimum Cameras to Start Trajectory' had to be set to 2. All those data were used to create the few datasets, containing several realizations of the same sequence: • Data set 1: based on all cameras • Data set 2: based on T40 cameras • Data set 3: based on Bonita cameras • Data set 4: based on Vantage cameras The data sets 2-4 consists of 7 trials, in which a different number of cameras used for 3D markers reconstruction ( Table 3). The location of each camera is shown in Figure 4. To characterize the noise in different camera type, in all datasets the x,y,z trajectories of both markers and their Euclidean (Equation (12)) distance were analyzed. Processing operations in Vicon Blade were limited to 3D reconstruction of marker trajectories and exporting the data to the .c3d file format. Further filtering, analysis, and processing of data were done with Matlab (Version R2016b).

Noise Parameters Estimation
For the computing of AVAR from the experimental data we used an implementation by Czerwinski [32]. It implements various AVAR versions, including overlapping estimator, that we chose to use as it is more stable and boundary error prone than conventional one.
The notable advantage of Allan deviation plots is their simple visual interpretation. Moreover, identification of complex-sinusoidal or correlated-distortions is possible just by visual inspection [33] for the presence of bumps in the plot. Another beneficial feature is the ability to estimate the parameters by simple line or poly-line matching to log-log plot [34]. However, straightforward distinguishing between blue and violet noises is not possible in such a case-to obtain these phase dependant noises it would be necessary to employ a much slower variant-modified AVAR estimation.
The method for the noise parameters readout from the ADEV curve, used in this research was proposed by Vernotte et al. in [37]. The model employs minimization of weighted least squares (WLS). As it was demonstrated in [31], such an LS model even allows to identify blue and violet noises which are represented jointly by τ −2 component.
The weighted LS is represented as following minimization problem that reduces the weighted error between measured AVAR valuesσ 2 y (τ) and a sum of estimated components σ 2 i (τ)-s : Obtaining reasonable results for such a complexand multideimensional non-linear model is a challenging issue. Therefore, we followed roughly a multi-start hybrid algorithm proposed in [38]-multi-start simulated annealing followed by local minimum search, where multiple starts prevent dependence on the initialization. It follows exploration-exploitation scheme, in the first stage simulated annealing (SA), known for avoiding of getting stuck in local minimum, finds solution close to global optimum, which is then refined by local pattern search-for the latter we propose to use ant colony optimization (ACO), specifically the ACO R variant for continuous domains [39]. Our additional modification is in the initialization stage of the ACO solver, which is at start populated with values jittered around the solution returned by the SA stage. Exemplary regression results are visually demonstrated in Figure 6, compared with ordinary (non-weighted) LS obtained with Levenberg-Marquardt algorithm.  As it was mentioned in Section 2.3 it is necessary to remove the outliers before the AVAR estimation. For that purpose, the Hampel filter [40] was employed. It checks the signal whether it is larger than the 3 sigma rule threshold computed robustly on the median absolute deviation (MAD) within a sliding window (in our case 1 second of the past and future values) and replaces outlying values with the local median.

Remarks on the Results
During the data analysis, two issues emerged, it is worth mentioning them in advance as they could contaminate the results or cause confusion during the interpretation. First, that while recording there occurred a slight seismic crump. The second issue was the failure of one of the cameras (IR LED emitter) during the recording.
The crump can be observed in the trajectories of the markers (Figure 7a) as a heavy outlier. It has a significant effect on the Allan variance results (see Figure 7b). It is worth mentioning that the occurrence of seismic crump was not detected immediately. After the recording, none of the cameras' sensors signaled a bump, which could indicate a system decalibration, as in the case of, for example, a direct impact on the scaffolding or a gunshot in the room, which we already tested in the lab. The crump itself was relatively small-similar disturbance would be probably caused by a person running next to markers. Moreover, the basic variance descriptors, which we checked for 5 min periods before and after crump do not indicate any change in the system performance. If the cameras were mounted on tripods, then such an event would probably have a much larger impact. However, that fact draws attention to the need for the careful screening of the measurements for the outliers and proper filtering if necessary-such as the aforementioned Hampel filter. The second issue was identified because of the noise levels for data set 4 (Vantage cameras). They were aberrant, for some camera combination the noise levels were increasing when taking more cameras into the reconstruction process. It appeared that one of the cameras was out of order and it would have broken soon after our recordings. Therefore, we excluded it from our analyses and we used up to nine cameras in the reconstruction for the Vantage data set. Figure 8 illustrates how such a failing device, increases the noise in the results for x dimension-the most contributing to the length (L). It is visible in the figure that we observe larger ADEV values for 10 cameras than for 9, another noteworthy fact is that markers positions are affected to different extents, and the distance is therefore affected to an intermediate extent. The reason for such an observation remains unclear as the internal details of triangulation in Vicon software are kind of a black box. We suspect the inner quality control procedure, that could, for example, select just some subset of cameras.
There were also indirect consequences of the failing camera. We could observe slight cross-talk of distortions to the other cameras mounted at similar heights (Vantage and T40), resulting in the presence of short term correlated noise (verified in Section 4.6). For better or worse, it resulted finally in much more complex ADEV curves we had to analyze, proving that the proposed method is capable of adapting to all the noise types known from the Allan variance literature. The exemplary ADEV curve, obtained for the data after replacing the failing camera is included in Appendix B for comparison. It demonstrates ADEV of the whole system in the facility free of the distortions caused the failing IR LEDs.
This unexpected failure in the system is a premise for the conclusion, that statistical analysis of noise in the system could be a useful diagnostic procedure for early failure diagnostics as the LED flickering was imperceptible to the human observer, but was observable in the data. However, a profound analysis of errors and their sources is probably feasible at the manufacturer's laboratory only. System vendor is probably the sole, who has knowledge of the appearing malfunctions, access to replaced devices such as ours, and who can induce misbehavior of the equipment on demand. Nevertheless, such a procedure, especially if coded into the system software, could become a part of system diagnostics and maintenance.

Overview
Using the single long sequence recorded with a regime as described in Sections 3.1-3.3, we have obtained a pool of sequences obtained with different camera sets. Initially (Section 4.2), we have analyzed the results using conventional Gaussian noise model as described in in Section 2.2. It appeared, as expected, to be insufficient to describe the noise present in the system. Therefore, the in-depth analysis was performed, and is described in successive steps.
The main analysis of results involved overlapping Allan variance estimator. First (Section 4.3) overall ADEV characteristics were obtained. These results are grouped by camera type and presented as families of Allan deviation plots in Figure 12 with a varying number of cameras used in the process.
Next, Allan variance noise component parameters were estimated with the procedure described in Section 3.4. The presence and number of correlated and periodic components were examined visually, the verification, whether they are not coming from the periodic distortion, was done by examination of the PSD estimator. The noise parameter estimation results are demonstrated in Figures 14 and 15, in logarithmic and linear scales to adapt to large range of values. Numerical values are attached in Appendix A in Tables A1-A3. Finally, the less conventional noises were considered in Section 4.6. These are multiple occurrences of correlated noise and periodic distortion.

Simple Gaussian Model
The brief results -markers location and their distance are gathered in Table 4. It contains estimated parameters for locations and lengths, we provide also theoretically calculated values for length. These are: locations mean values and their standard deviations, covariance, and correlation (ρ 12 ) as well, furthermore it contains mean value(µ L ) and standard deviation (σ L ) for length as it was reported by the Vicon software. Calculated statistical descriptors are the length (µ e ) and standard deviation in four variants-σ A..C as listed in Section 2.2, with two A variants assuming either markers as a potential source of variance value. Figure 9 demonstrates exemplary kernel estimates of location PDFs for one of the camera sets.

Measured [mm]
Theoretic 166 Generally, the measurement results conform to the theoretic considerations for correlated random variables -obviously the length measurement results confirmed (not included in the paper) in Chi-squared statistical tests their origin in Gaussian distribution with σ C . In the considered measurements it is visible in the dispersion of measurements, which is considered to be noise. For the low-cost Bonita cameras, the location variance is relatively large, moreover, it is non-correlated with each other (low correlation and covariance), therefore it can be considered to be noise. On the other hand, overall dispersion for the high-end T40 cameras is small but highly correlated. The high correlation coefficients mean that the measurement of precision with simple statistical descriptors needs to be extended in a more sophisticated way.
Concerning the precision, each of the camera sets reports slightly different mean value (see Figure 10), though the discrepancies between the camera sets are on the level rather satisfactory for the most applications: tenth part of a millimeter. Regarding the number of cameras in the generic noise mode, the results are demonstrated in Figure 11, which demonstrates how the variances and covariances scales with increasing number of cameras used in the measurement. One can denote that the variance results adhere (with minor fluctuations) to the theoretical relationship given with Equation (3), it is clearly visible similarity between the theoretic characteristics in Figure 2 and the real observed decrease in variations for increasing number of cameras shown in Figure 11a.
Covariance, on the other hand, is rather constant regardless of the number of cameras (except very low numbers of cameras), as it is depicted in Figure 11b. It suggests the presence of a process of unknown origin that is rather common to the markers and affects their registration rather than physical devices, e.g., it could be either signal processing or a common mechanic micro trembling of cameras.

Overall ADEV Characteristics
Overall ADEV characteristics for all camera types-T40, Bonita and Vantage-are shown in Figure 12. They reveal that the σ y values gradually decrease with the increasing number of cameras; however, there are some 'drops' along the camera number axis. That suggests that certain camera setup is notably better suited for the recorded object. These setups are: 3 cameras for T40, 5 cameras for Bonita, and 4 cameras for Vantage.
Another interesting observation in the overall plots can be denoted when analyzing ADEV plots for measured values along the temporal axis. The σ y values for the distance L are larger than for the contributing locations x 1 and x 2 within the range of short-time noises τ ≈ 10 −2 . . . 10 1 , although for the longer time ranges these values are on par or even ADEV values are smaller for the distance than for contributing locations. Apparently, flicker and random walk noises do not add in the system. We can also observe all the slopes (−1, −1/2, 0, 1/2) from the Table 1 in the characteristics plots. The presence of random walk could be a bit confusing at first, if we omit the results of Jensenius [13], one could expect such distortion not to appear. The triangulation process is done frame-by-frame in the system, therefore the position of a marker should be steady, with flicker, white and higher frequency color noises present. However, there might be implicit denoising present in the system done by low pass filtering, such filters act as integrators, therefore they could introduce some low-frequency noises from the higher noise components.

Comparing the Camera Types
The logical corollary of results demonstrated in the previous paragraph is a direct comparison of camera types. In Figure 13 we demonstrate the ADEV plots for each of the camera types at its maximum performance configuration (all cameras). It shows length (L) and marker positions in x as the most important dimension. Comparing the camera types by visual plot inspection confirms the expected outcome, low-cost Bonita cameras have much higher ADEV values (are more noise affected) than two other camera types. However, ADEV values of up-to-date Vantage cameras indicate that they are more noise affected than relatively old T40s.

Estimation of Basic Noise Coefficients
Several remarks on the noise colors present in the system that are based on the noise estimated coefficients (Figures 14 and 15): • Random walk and white noise diminishing with the increasing number of cameras, roughly follow quasi-hyperbolic characteristics described in Equation (3) We could also observe intense peak fluctuations in plots of h coefficient characteristics. They could originate from two potential sources-numerical errors in the optimization process, and/or very specific camera geometric configuration-that could improve or degrade the results. However, the general rule (Equation (3)) of decreasing uncertainty with an increasing number of measurements could be observed to some extent for all the noise coefficients (h), but violet one, where it rather seems to be random fluctuations.

Unconventional Distortions
Regarding the less conventional, correlated noises, a series of interesting observations relates to their presence. In Figure 16a one can observe that they are mostly independent of the number of cameras; T c and q c parameters remain relatively constant. Moreover, one could note from Figure 16b that for correlated noises in the system the longer the time constant the lower amplitude. We could also identify three correlated noise 'classes' of a different correlation time constants. Their sources remain unknown; however, at least we could speculate about their origins based on T c . Removing the failed camera made the first two of them (see Appendix B) disappear. Hence our guesses are: • 10 −2 -10 0 s-due to failed camera, though we considered signal processing-based first, • 10 0 -10 1 s-due to failed camera, at first we suspected mechanical-based microtrembling of camera support, • 10 2 -10 3 s-environmental-based such as changes in room temperature.  Finally, the occurrences of periodic noise of f 0 ≈ 15 Hz frequency had to be checked as it was an unexpected outcome. The verification was done using Welch estimator of PSD (see Figure 17), and it is present in each of the reconstructions using T40 and Vantage cameras. However, it is usually negligibly small phenomenon, barely observable in most of ADEV plots. Its origin cannot be connected with a recording of any specific camera, but since all the cameras were recording concurrently, it is probably due to some environmental source. Cameras appeared sensitive to a different extent, surprisingly vertical dimensions obtained from the T40 cameras were the most sensitive to this distortion. In case of the Vantage or Bonita cameras it is negligibly small. It requires very large zoom to observe appearance of respective bumps in either PSD or ADEV plots.

Summary
In the article, we demonstrated how to evaluate with Allan variance a compound structure of noise present in the optical Mocap system. The proposed tool was invented for situations when the reliable ground truth is inaccessible. We demonstrated that it provides outcomes convenient for visual inspection to identify qualitatively noise types actually present in the system or to compare two systems. We have also demonstrated how to employ sophisticated solvers to read the noise parameters from the characteristic ADEV curve, even when it gets quite complicated form.
For our facility, we proved that the main contribution to the imprecision comes from the random walk and white noise with coefficients being h ≈ 10 −5 , whereas flicker noise and blue noise contributions are several orders of magnitude smaller of a h ≈ 10 −10 . The influence of violet noise is negligibly small h ≈ 10 −300 ...10 −20 . We have identified also the presence of quite a long term (tens of minutes) correlated noise, probably due to environmental influence and periodic distortion with time constant of a minute order(T c ≈ 10 2 ...10 3 ) and notable amplitude amplitude (q c = 10 −3 ...10 −13 ). Additionally, we identified a periodic distortion of 15 Hz frequency, which is visible to the cameras reversely to their overall quality, with an amplitude of A s ≈ 10 −3 .
The registration of the noise connected with camera failure was an additional and unforeseen outcome; however this might be seminal for establishing statistical based diagnostics procedure for the motion capture laboratories.
It should be noted that the conditions during the experiment significantly differed from the conditions during the traditional MoCap session. The noise resulting from the behavior of actors (running, jumping, screaming, etc.), as well as people responsible for the session (technical staff, art director, etc.), are much greater than those recorded in ideal conditions. The data during the post-processing stage are also filtered, using Butterworth or Woltring filter. Of course, some attempts can be made to reduce the impact of certain types of noses.
Some recommendations can be made for noise suppression in the OMC system. Part of the observed distortion is very simple to suppress with low pass filtering: white noise and above noises are very convenient cases. Also, periodic noise can be easily addressed with band-stop filters, at least as long we are aware of its presence and its frequency is not too low. However, longer-term distortions, such as flicker or random walk are quite inconvenient to remove with simple filters, but the multiresolution approach seems to be promising to identify and suppress noises of longer duration than single samples. Finally, regarding the environmental-based distortion, modifying overall lab conditions could help to reduce such noises, such modifications as employing continuous air-conditioning, low heat light sources, door insulation, and the like are worth further testing.
Future works might include extending the basic approach by analysis of AVAR for bone orientation, though it would require preparing quaternionic AVAR. Other interesting aspects are the variability of AVAR depending on the location in the scene, different marker sizes, or how much the results are affected by the presence and activities of staff in the lab. Prospective dynamic tests are possible to some extent only, since AVAR is based on assumption of static signal, because computing of AVAR is not possible for moving markers. However, if we assume T-frame or any other template is stiff enough to be considered to be a rigid body, then we can consider computing AVAR for the inter-marker distance regardless it is steady or in motion. The experiments could be also repeated in different Mocap facilities or laboratories of a different kind, but including Mocap as a feature such as interactive rehabilitation platform Motek CAREN (The Computer Assisted Rehabilitation ENvironment)-https://www.motekmedical.com/product/caren/.
Author Contributions: P.S. conceived the idea, provided the theory, and prepared the analysis tools; M.P. conducted the experiments and performed the computations; both authors analyzed and discussed the concepts and results; both authors wrote the paper.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix B. Supplementary Measurement Allan Deviation
For the verification, because of the camera failure, we repeated the testing steady sequence capture after replacing the failing camera. In Figure A1 we demonstrate 'z' dimension of M1 marker for maximal numbers (10) of each camera type and all the cameras together. We could observe that short term correlated noises disappeared, whereas the bumps representing periodic and long-term correlated noises remain.