Table 5 shows the results of the anthropometric measurements obtained from the different methods. These results correspond to the average and standard deviation (SD) values of a group of twelve participants. This table shows that the values obtained by the homogeneous and non-homogeneous MVS methods are very alike to each other, and to the results obtained by the MAM method. In contrast, the results of the Kinovea method exhibit larger deviations than the previous ones. It is important to mention that the identification of the RFT, RFLE, and RFAX landmarks was difficult in two subjects with a body mass index (BMI) greater than 25 kg/m
2. This difficulty is consistent with [
25,
26], where it is mentioned that the identification of anatomical landmarks may be more difficult in people with high body fat.
To estimate the performance of each of the MVS with respect to the MAM method, a deviation value was calculated according to the following equation:
where
MeanMVS represents the length value obtained using one of the MVS approaches, and
MeanMAM represents the corresponding length value obtained using the conventional MAM approach. The resultant deviation values are shown in
Table 6, where it can be observed that the non-homogeneous and homogeneous MVS methods have smaller deviations than the Kinovea method. The longest anthropometric lengths,
BS1 and
BS2, calculated by the linear vision methods, had the smallest mean deviation. The largest mean deviation obtained by all the MVS corresponds to the shortest anthropometric length
BS3. This large deviation may be due to the fact that the anthropometric length
BS3 requires the RLCA landmark, which was one of the most difficult to be located, and therefore it may have moved out-of-place during walking. In addition, this large deviation is also a result of the small value of
BS3 in the denominator of Equation (10).
3.1. Results of the 3D MVS Methods
Figure 6 represents the results of the 3D reconstruction using the homogeneous MVS. At the beginning, the volunteers were placed in the test area,
Figure 6a. Next, an RGB image was taken using a black light,
Figure 6b, and converted to grayscale,
Figure 6c. The histogram of the image was obtained by an image binarization process,
Figure 6d. Next, the coordinates of the markers’ centroids in pixels were calculated,
Figure 6e. Finally, the results of the 3D coordinate markers were computed, as shown in
Figure 6f.
The precision of the camera calibration process was fundamental to have a good accuracy in the homogeneous and non-homogeneous MVS. A good calibration pattern in terms of perpendicularity and correspondence among the planes, and a correct setting of the optical characteristics of the cameras, were also fundamental to have a good accuracy. Although mathematically, 6 and 5.5 2D-3D correspondences were required on the calibration pattern to calculate the calibration matrix P, the results revealed that the use of nine or more 2D-3D correspondences led to smaller errors in the 3D reconstruction. Therefore, it is indistinct to use either method as long as at least nine 2D-3D correspondences in the calibration pattern are used. It is important to mention that the calibration process must be conducted only once for all the participants. Once the calibration is made, many participants can be measured, and the process can be automated.
On the other hand, different image sizes were evaluated during the camera calibration process. For this purpose, three test points on the calibration pattern were 3D reconstructed using different image sizes. To determine the 3D reconstruction error, the following equation was used:
where
are the 3D coordinates of a test point on the calibration pattern and
are the 3D coordinates estimated by the MVS, and
N corresponds to the number of test points. Since three test points (the farthest in each plane) were used for every trial, the resulting error corresponds to the average error. It should be noted that for comparative purposes, this absolute error in millimeters is normalized with respect to the distance from one of the test points to the origin of the calibration pattern. The results are shown in
Figure 7, where it can be observed that the image size of 2816 × 2112 pixels led to the best precision performance; however, the processing of these images requires a high computational cost. Therefore, it was decided to use an image size of 1280 × 1024 pixels, which ranks second in terms of precision.
In the case of the camera selection, the results also revealed that it is important to identify the cameras with the least possible distortion in their perimeter fringe to reduce errors in the 3D reconstruction at the extremes of the field of view. In addition, it is important to avoid some sports cameras with wide-angle lenses, since the distortion is quite considerable.
3.3. Performance Comparison
To assess and compare the performance of each MVS, a performance index was proposed based on an Analytical Hierarchy Process (AHP) [
27] and Quality Function Development (QFD) [
27,
28] tools. AHP is a structured technique used to make complex decisions, which helps to find the solution that best suits the needs and understanding of the problem, whereas QFD is a method for calculating what features should be added when designing a product or service. Both tools provide an objective view of what users are looking for in a product and the requirements it should have, as well as a prioritization of the most important features to consider. To determine the most important criteria to consider, as well as their corresponding weightings, the AHP and QFD tools were applied to a group of 30 potential customers of the proposed MVS approach for measuring BSs. Based on the results, the most relevant criteria were selected, assigning a value of 1 to the most important criterion. Thus, the criteria were: (1) precision; (2) speed of the MVS; and (3) equipment requirements, with weights
w1 = 1.00,
w2 = 0.78, and
w3 = 0.72, respectively. Thus, the proposed performance index,
pindex, is:
where
cprecision is the precision criterion related to the deviation between the MVS results and MAM results,
ctime is the criterion related to the total time required by each MVS to get the anthropometric measurements of the twelve individual samples, and
cequipment corresponds to the equipment requirements for each method. The
cprecision value was obtained from the mean deviation values shown in
Table 6, and through a normalization process, giving a value of 1 to the method that exhibited the best precision.
Table 7 presents the resultant values of
cprecision for each MVS.
The average total time required by each method to measure a subject and the criterion
ctime, are shown in
Table 8. Note that for the MVS methods, the time of the preparation and calibration processes corresponds to the average time per individual, i.e., the total time to prepare and calibrate the system is divided by the number of participants. The value of
ctime was estimated similarly to the previous criterion, where a value of 1 was assigned to the MVS with the shortest total time. The preparation time corresponds to mounting the cameras and other devices, adjusting the cameras’ optical parameters, and placing the markers on the participants. The preparation time in the MAM method includes the verification and cleaning of the anthropometric instruments, and the measurement time corresponds to the time to measure the body segments of each participant three times.
The results show that the non-homogeneous and homogeneous MVS are faster than the conventional and Kinovea methods. The homogeneous method led to a time reduction of 48.9%, the non-homogeneous method led to a time reduction of 48.2%, and the Kinovea program led to a time reduction of 33%, in comparison with the MAM method. The non-homogeneous and homogeneous MVSs had almost the same time performance, with the homogeneous MVS having the best ctime performance. In the case of the MVS methods, the equipment preparation and camera calibration are only needed once, regardless of the number of participants to be measured. Therefore, if only the DIP and 2D/3D reconstruction processes and placement of the markers are considered, the non-homogeneous, homogeneous, and Kinovea MVSs require an average time of 4.16 min, 4.15 min, and 5.88 min per participant, respectively. This represents an improvement in the efficiency of the data collection process.
Table 9 shows the equipment required by each MVS, and the obtained values of the equipment criterion,
cequipment. The values were computed using a normalization process, assigning a value of 1 to the MVS that requires the minimum equipment. It is important to mention that the material and equipment used in this study were inexpensive and not sophisticated, in comparison with the devices and equipment reported in the literature. For instance, Mantis Vision 3iosk
® is priced at USD 250k [
21]. The cost of the materials and the equipment used in this investigation does not exceed USD 500, which represents a major competitive advantage. The DIP and the 3D reconstruction algorithms were implemented in MatLab
® but they could also be implemented in an open source program.
Finally, the performance index results are shown in
Table 10, where it is observed that the non-homogeneous and homogeneous MVS methods had the best performance index, with the homogeneous method having the best performance value. The Kinovea MVS approach had the worst performance value, despite the fact that it only uses one camera to provide 2D information, which is a disadvantage compared to other MVSs that provide 3D results.