How to Assess Repeatability and Reproducibility of a Mechanical Test? An Example for Sports Engineers

: Several sources of variation can affect the performance of a mechanical test. Hence, the measurement system performance should be assessed. The gage repeatability and reproducibility study is a method used to assess and quantify the variation of a mechanical test. Since it seems that this method has not yet found its way into the field of sports engineering, this paper promotes its application by demonstrating a practical example based on a current problem in sports shoe development. In detail, a novel mechanical simulation to determine the forefoot bending stiffness of athletic footwear during plantar flexion movement was developed and its quality assessed. The ANOVA Gage R&R study was performed based on 64 randomized trials of eight footwear samples assessed by two appraisers. The mechanical test was evaluated as acceptable for the desired application and the resolution was quantified to be 0.04 Nm/°.


Introduction
Sports equipment development is an important aspect to enhance athletic performance [1]. Common tools for sports engineers in the development and validation process of sports equipment include commercial and/or customized mechanical tests. The results from mechanical tests are regarded as being superior by means of less variation compared to subjective and biomechanical test results [2]. Nevertheless, mechanical tests are also subjected to several sources of variation (e.g., calibration) that can affect the performance of a measurement system [3]. Hence, there is a need to assess and quantify the variation of a mechanical test-especially of novel customized tests. A way to address this task is to apply the gage repeatability and reproducibility (GRR) study [3]. The GRR study is a method that provides a statistical approximation of the variation and percentage of process variation for a test measurement system. Although this method is commonly used in the automotive industry, it seems that it has not yet found its way into the field of sports engineering. In an effort to promote and demonstrate the application of the GRR study by sports engineers, we elaborated and conducted a practical example based on a current problem in sports shoe development-the lack of an existing standardized methodology of measuring forefoot bending stiffness, especially during plantar flexion. Plantar flexion of the foot is considered a crucial movement for several sports [4]. However, even though it is known that individual athletes have differing needs in terms of forefoot bending stiffness [4], information on forefoot bending stiffness of many existing shoe models is currently not available [5]. Hence, a novel mechanical simulation to determine the forefoot bending stiffness of athletic footwear during plantar flexion was developed and its quality assessed by means of a GRR study.

Mechanical Simulation
A mechanical simulation is a mechanical or technical test device which captures the reality of human-technology interactions and translates it into a mechanical model [6]. The mechanical simulation used to determine the forefoot bending stiffness of athletic footwear during plantar flexion movement was composed of a hydraulic testing machine (HC10, ZwickRoell GmbH and Co. KG, Ulm, Germany), a bending unit, a two-part fixing unit ( Figure 1) and a machine control that implements a biomechanically evaluated deflection-time spectrum. The actuator of the testing machine was connected to the bending unit to enable plantar flexion of the footwear samples. The bending apparatus consisted of an aluminum frame, two linear guides and a sliding unit with a rotating shaft with one degree of freedom. The lower part of the fixing unit consisted of two parts. The first part was composed of several stacked wooden boards that provide a height-adjustable support for the footwear samples. The second part was a small shaft fixed to a revolution joint to enable the measurement of the plantar flexion angles of the bended footwear samples. The upper part of the fixing unit consisted of an aluminum frame, a spindle and a last of size UK 8, which was trimmed at the metatarsal line while keeping the proximal part [7]. A load cell with a relative measurement uncertainty of ±0.5%, a linear variable differential transformer (LVDT) with a linearity of ±0.25% and a conductive plastic potentiometer (MP10, MEGATRON Elektronik GmbH and Co. KG, Putzbrunn, Germany) with a linearity of ±2% were used to measure the variables needed to determine the forefoot bending stiffness. Forefoot bending stiffness was defined as the ratio between applied torque and bending angle [7].

Testing Procedure
Eight footwear samples were tested twice and randomly by two testers. In the field of measurement systems analysis, the testers are called appraisers and hence this expression will be used from here on. The samples were assumed to represent a broad range from low to high bending stiffness. The selection comprised of one trail running, two conventional running, two minimalistic, two soccer and one bike and hike shoe ( Figure 2). Biomechanical gait patterns of walking at a speed of 1.8 ms −1 were simulated by bending (plantar flexion) the footwear samples from 0° to 9.7°. After reaching the maximum bending angle, the load was released and the sample bent back into the rest position were the sample remained for a predefined time to complete the test cycle. In total, 21 cycles were performed per test. Raw data of force, time and stroke were collected for each sample at a sampling frequency of 1 kHz. Raw data of the 21st cycle were processed using MATLAB (R2018a, The MathWorks, Inc., Natick, MA, USA). In detail, strokes were converted into bending angles using trigonometric functions and forces were converted into torques to obtain torque-angle profiles. Finally, forefoot bending stiffness was calculated as the slope of the torque-angle profile [7].

ANOVA Gage R&R
To assess the quality of the recently developed mechanical simulation, an ANOVA Gage R&R study [3] was performed. This included the data collection, and the graphical and the numerical evaluation of 64 randomized tests performed on eight samples/parts (n) by two appraisers (k) in two repetitions/trials (r). The gage repeatability and reproducibility (GRR) was determined using the analysis of variance (ANOVA) method. The GRR is an estimate of the combined variation of repeatability (i.e., within-system variation) and reproducibility (i.e., between-system variation) [3]. By means of MATLAB, the forefoot bending stiffness values were converted into an ANOVA table. The ANOVA table is composed of five columns representing the source (i.e., cause of variation), the degree of freedom (DF) associated with the source, the sum of squares (SS) (i.e., the deviation around the mean of the source), the mean square (MS) (i.e., the SS divided by DF) and the F-ratio (F) (i.e., statistical significance of the source value). This information was then used to calculate the measurement systems characteristics containing the repeatability/equipment variation (EV), the reproducibility/appraiser variation (AV), the interaction between parts and appraisers (INT), the GRR, the part variation (PV), the total variation (TV) and the number of distinct data categories (ndc). The equations are as follows:

Results
The ANOVA Gage R&R analysis of the mechanical simulation resulted in a mean of the averages ( ) of 0.187 Nm/° for the bending stiffness ( Table 1). The mean of the average ranges ( ) was 0.018 Nm/°. Appraiser B had lower variations between the two trials compared to appraiser A (Figure 3). The ANOVA table resulted in only one significant variation (Table 2). Namely, the source parts was significantly different (p = 0.000). The GRR was 13.9% (Table 3). Based on the GRR criteria [3], the mechanical simulation could be classified as acceptable for the application.   Table 3. Gage repeatability and reproducibility (GRR) ANOVA method report. The % total variation was calculated as the ratio between the standard deviation (σ) of the source (e.g., EV) and the standard deviation of total variation (TV) multiplied by 100. The % contribution to total variance was calculated as the ratio between the variance (σ 2 ) of the source and the variance of TV multiplied by 100.

Discussion
A novel mechanical simulation to determine the forefoot bending stiffness of athletic footwear during plantar flexion was introduced. The ANOVA Gage R&R study, performed above the recommended threshold of n·k·r ≥ 30, resulted in a GRR of 13.9%. This value is below 30% and hence can be categorized into the second category (i.e., 10% to 30%) of the GRR criteria [3]. According to that category, the mechanical simulation can be considered as acceptable for the desired application. Given a range of part averages (Rp) of 0.395 Nm/° and ten distinct categories, the minimal discernable difference between two samples by means of non-overlapping 97% confidence intervals is 0.04 Nm/°. Differences in stiffness of footwear samples below that level will not be capable of being differentiated through this mechanical simulation. If there was demand for a higher resolution of the mechanical test, the whole process needs to be revised. Therefore, a deeper look into the percentage contribution of appraisers (AV) and equipment (EV) on the total variability is recommended.
The percentage contribution of the repeatability (EV; 1.6%) on the total variation was four times higher than the percentage contribution of reproducibility (AV; 0.4%). In order to reduce equipment variation in mechanical test results, the measurement system analysis (MSA) guidelines [3] propose a number of action items that will need further considerations. By means of a self-assessment, we could identify clamping and wear as two causes whose improvements might result in a better GRR (Table 4). We also identified the resolution of the load cell as a possible cause since the 1 N increments of the 10 kN load cell might be too big with respect to the current peak forces (29 N to 294 N). Table 4. Proposed action items of the measurement system analysis (MSA) guidelines to reduce equipment variation in mechanical tests and corresponding self-assessment.

MSA-If Repeatability is Large Compared
to Reproducibility, the Reasons May Be:

Self-Assessment
The instrument needs maintenance.
Not applicable. Bending unit was maintained beforehand. Servo-hydraulic machine is calibrated. Stroke-time curves of all trails were identical.
The gage may need to be redesigned to be more rigid.
Not applicable. Rigidity of the test rig (20 kN machine frame) including load cell (10 kN) was sufficient compared to peak loads. The clamping or location for gaging needs to be improved.

Applicable. Clamping of samples needs to be revised.
There is excessive within-part variation.
Applicable. Possibly wear of the samples in repeated trials.

Conclusions
In this paper, a well-established Six Sigma quality management method has been transferred to the field of sports engineering. Even if mechanical test results are regarded as being superior by means of less variation compared to subjective or biomechanical test results [2], a MSA shall be performed before any publication of research data obtained by a mechanical simulation.
An important part of a MSA is the estimation of the reproducibility and repeatability of the measurement process. However, an ANOVA Gage R&R addresses solely the precision of the testing procedure. It does not reveal information on the accuracy.

Conflicts of Interest:
The authors declare no conflict of interest.