Video-Based Deep Learning Approach for 3D Human Movement Analysis in Institutional Hallways: A Smart Hallway

: New artiﬁcial intelligence- (AI) based marker-less motion capture models provide a basis for quantitative movement analysis within healthcare and eldercare institutions, increasing clinician access to quantitative movement data and improving decision making. This research modelled, simulated, designed, and implemented a novel marker-less AI motion-analysis approach for institutional hallways, a Smart Hallway. Computer simulations were used to develop a system conﬁguration with four ceiling-mounted cameras. After implementing camera synchronization and calibration methods, OpenPose was used to generate body keypoints for each frame. OpenPose BODY25 generated 2D keypoints, and 3D keypoints were calculated and postprocessed to extract outcome measures. The system was validated by comparing ground-truth body-segment length measurements to calculated body-segment lengths and ground-truth foot events to foot events detected using the system. Body-segment length measurements were within 1.56 (SD = 2.77) cm and foot-event detection was within four frames (67 ms), with an absolute error of three frames (50 ms) from ground-truth foot event labels. This Smart Hallway delivers stride parameters, limb angles, and limb measurements to aid in clinical decision making, providing relevant information without user intervention for data extraction, thereby increasing access to high-quality gait analysis for healthcare and eldercare institutions.


Introduction
Motion analysis provides information and insights into the quality of movement for rehabilitation, performance analysis of professional athletes, and animation for video games or computer-generated imagery in movies. Of particular interest is improving the quality of information provided to healthcare professionals. Stride analysis and gait information are used in clinical decision making to optimally care for patients. Human motion analyses can aid in understanding rehabilitation progress [1], fall risk [1,2], progression of neurodegenerative diseases [3], and classifying gait patterns [4][5][6]. However, equipment, access, space, and human-resource requirements limit quantitative movement assessment within healthcare and eldercare environments. A Smart Hallway implementation could automatically record movement as a person walks through a hallway within an institution so that a therapist or physician can review walking parameters before their appointment. In an eldercare residence, movement data could be collected multiple times a day, thereby providing data to track changes in movement quality. Big data models could also be implemented to provide indicators for fall risk or dementia progression. The Smart Hallway design enables non-invasive data collection that would not interfere with existing hospital The goal of this research is to design, develop, and evaluate a marker-less motionanalysis system that provides movement-outcome measures for institutional hallway settings. The system must be non-invasive in its inherent design and provide a modular approach that is optimized and can be deployed in any institutional hallway setting. The system should also improve on past marker-less systems by providing a sufficient capture rate, a robust synchronization and calibration approach, and a method to automatically detect foot events and return common outcome measures. By implementing in an institutional hallway, the system would be accessible and enable movement analysis to be integrated into daily schedules. The ultimate goal for a "Smart Hallway" is to accurately and non-invasively assess and report a person's human-movement status in an institutional setting, with minimal or no human intervention.

Materials and Methods
Motion-capture systems require a variety of components to work optimally in tandem. This research includes computer simulations to determine optimal camera layout, temporospatial synchronization validation, calibration validation, and evaluation with various walking scenarios. Appendices A-C provide details of optimal camera layout, temporospatial synchronization validation, calibration validation, and validation methods used to design the Smart Hallway.
Based on preliminary research [16], the open-source OpenPose BODY25 model was used for all body keypoint inferences. The OpenPose model was trained on a combination of the COCO and MPII pose datasets. OpenPose BODY25 produced accurate keypoint results from preliminary testing on clinically relevant movements [16]. These 2-dimensional (2D) points combine to create a skeleton model of the person of interest ( Figure 1).

System Design Requirements
The Smart Hallway's goal is to provide a non-invasive approach to extract gait outcome measures without human intervention. To effectively incorporate the system into hospital processes, the system must not interfere with individuals moving through the hallway [17]. Thus, typical hospital hallway dimensions were considered (length × 2.4 m × 2.8 m) when determining the placement of system components. The system components (cameras, cables, high-performance computing unit) should be mountable on the ceiling or high enough from the ground to not interfere with carts or people passing through. To extract data for 3D reconstruction using triangulation, at least two cameras are needed [18][19][20]. Increasing the number of cameras improves 3D reconstruction accuracy; this is a function of the camera-view overlap and number of detections of the point of interest, which can be passed to the optimization method. Based on simulation results, four cameras were used [8,21]. Other factors relating to 3D reconstruction accuracy include the camera resolution and synchronization. For the most accurate keypoint placement from OpenPose BODY25, the target must be at least 300 pixels tall in the camera frame [16], [22]. Camera resolution is also dependent on the target-capture volume and lens specifications.
Camera synchronization is paramount for accurate reconstruction; having all cameras capture images at the same time reduces 3D reconstruction error. For this level of synchronization, a hardware approach with a stable sync signal is desirable. The system framerate is dependent on the type of motion being analyzed. For normal walking, a framerate of 60 fps is sufficient to reconstruct movement and extract useful outcome measures [23][24][25].
An accurate calibration routine for the camera array is required to extract accurate 3D information from the marker-less keypoints. Each camera's projection matrix relative to the world origin contains variables for lens distortion, intrinsic parameters, and extrinsic parameters. These parameters are normally determined using a patterned calibration object and techniques such as Zhang's method with random sample consensus (RANSAC)

System Design Requirements
The Smart Hallway's goal is to provide a non-invasive approach to extract gait outcome measures without human intervention. To effectively incorporate the system into hospital processes, the system must not interfere with individuals moving through the hallway [17]. Thus, typical hospital hallway dimensions were considered (length × 2.4 m × 2.8 m) when determining the placement of system components. The system components (cameras, cables, high-performance computing unit) should be mountable on the ceiling or high enough from the ground to not interfere with carts or people passing through. To extract data for 3D reconstruction using triangulation, at least two cameras are needed [18][19][20]. Increasing the number of cameras improves 3D reconstruction accuracy; this is a function of the cameraview overlap and number of detections of the point of interest, which can be passed to the optimization method. Based on simulation results, four cameras were used [8,21]. Other factors relating to 3D reconstruction accuracy include the camera resolution and synchronization. For the most accurate keypoint placement from OpenPose BODY25, the target must be at least 300 pixels tall in the camera frame [16,22]. Camera resolution is also dependent on the target-capture volume and lens specifications.
Camera synchronization is paramount for accurate reconstruction; having all cameras capture images at the same time reduces 3D reconstruction error. For this level of synchronization, a hardware approach with a stable sync signal is desirable. The system framerate is dependent on the type of motion being analyzed. For normal walking, a framerate of 60 fps is sufficient to reconstruct movement and extract useful outcome measures [23][24][25].
An accurate calibration routine for the camera array is required to extract accurate 3D information from the marker-less keypoints. Each camera's projection matrix relative to the world origin contains variables for lens distortion, intrinsic parameters, and extrinsic parameters. These parameters are normally determined using a patterned calibration object and techniques such as Zhang's method with random sample consensus (RANSAC) and bundle adjustment for camera extrinsic parameters [26][27][28]. These parameters should be Computation 2021, 9, 130 5 of 33 calibrated such that the reprojection error is less than 1.0 pixel; however, this is dependent on camera resolution.

System Design
3D simulations were performed to determine the volumetric coverage achievable with four and eight cameras, respectively, within a hallway scenario. The simulated hallway was modelled as 5 m × 2.4 m × 2.8 m based on measurements from a typical hospital hallway. Selected components were modelled using Blender's (Blender Foundation, Blender 2.91) camera object, and several iterations were performed while varying camera pose and placement relative to the world origin. The various configurations were compared based on parameters relating to capture volume that each setup produced (i.e., total capture volume, ground-area coverage, and view overlap). An array of four FLIR BlackFly ® S USB3 (BFS-U3-16S2C-CS) machine-vision cameras with Fujinon 3 MP Varifocal Lenses (YV4.3X2.8SA-2) were selected based on simulations. Figure 2 shows the virtual Smart Hallway camera layout, providing a 5 m × 2.4 m × 2.8 m (29 m 3 ) capture volume with four cameras in an arc layout. Appendix A provides a detailed explanation of the simulation methods used. and bundle adjustment for camera extrinsic parameters [26][27][28]. These parameters should be calibrated such that the reprojection error is less than 1.0 pixel; however, this is dependent on camera resolution.

System Design
3D simulations were performed to determine the volumetric coverage achievable with four and eight cameras, respectively, within a hallway scenario. The simulated hallway was modelled as 5 m × 2.4 m × 2.8 m based on measurements from a typical hospital hallway. Selected components were modelled using Blender's (Blender Foundation, Blender 2.91) camera object, and several iterations were performed while varying camera pose and placement relative to the world origin. The various configurations were compared based on parameters relating to capture volume that each setup produced (i.e., total capture volume, ground-area coverage, and view overlap). An array of four FLIR Black-Fly ® S USB3 (BFS-U3-16S2C-CS) machine-vision cameras with Fujinon 3 MP Varifocal Lenses (YV4.3X2.8SA-2) were selected based on simulations. Figure 2 shows the virtual Smart Hallway camera layout, providing a 5 m × 2.4 m × 2.8 m (29 m 3 ) capture volume with four cameras in an arc layout. Appendix A provides a detailed explanation of the simulation methods used. System components were selected based on geometric and data-transfer constraints. Geometric constraints were based on the institutional hallway simulations and the maximum cable lengths for each communication standard (USB, GiGE, etc.). Data-transfer constraints were based on the desired multi-camera system performance in terms of resolution (minimum 960 × 720), pixel format (minimum 8bit colour depth), and frame capture rate (minimum 60 fps). The selected components that best addressed the Smart Hallway requirements are detailed in Table 2. System components were selected based on geometric and data-transfer constraints. Geometric constraints were based on the institutional hallway simulations and the maximum cable lengths for each communication standard (USB, GiGE, etc.). Data-transfer constraints were based on the desired multi-camera system performance in terms of resolution (minimum 960 × 720), pixel format (minimum 8 bit colour depth), and frame capture rate (minimum 60 fps). The selected components that best addressed the Smart Hallway requirements are detailed in Table 2.
A hardware synchronization cable was designed and created to ensure reliable image capture for the multi-camera system. A primary camera sends a sync signal at the beginning of exposure to the other cameras in the array, and the secondary cameras begin exposure once the sync signal has been received. This synchronization approach was validated by capturing 10,000 images and comparing the timestamps produced by each FLIR camera. The cameras remained synchronized within 5 µs of the primary camera when capturing at 60 fps (16,667 µs/image). This solution provides a repeatable synchronization method without the need for synchronization post data capture. Detailed cable design and validation methods are given in Appendix B for reproducibility. Spatial synchronization for the multi-camera system was accomplished with a ChArUCo calibration pattern. Calibration was performed by capturing several views of the ChArUCo board and implementing OpenCV and Ceres libraries for robust camera parameter calculation. Final reprojection error from the distortion, intrinsic, and extrinsic parameters was less than 1.0 pixels. The multi-camera system calibration was tested by comparing system output to measured dimensions along the length of the capture volume (5 markers on the floor spaced at 1 m intervals). X-axis error was 1.7 (SD = 1.2) cm, Y-axis error was 2.4 (SD = 1.5) cm, and Z-axis error was 1.9 (SD = 1.4) cm. This solution allows for one-time system calibration that does not need to be performed prior to each data-collection session. The hardware pipeline is highlighted in Figure 3. Details of the calibration approach are given in Appendix C. A hardware synchronization cable was designed and created to ensure reliable image capture for the multi-camera system. A primary camera sends a sync signal at the beginning of exposure to the other cameras in the array, and the secondary cameras begin exposure once the sync signal has been received. This synchronization approach was validated by capturing 10,000 images and comparing the timestamps produced by each FLIR camera. The cameras remained synchronized within 5 μs of the primary camera when capturing at 60 fps (16,667 μs/image). This solution provides a repeatable synchronization method without the need for synchronization post data capture. Detailed cable design and validation methods are given in Appendix B for reproducibility.
Spatial synchronization for the multi-camera system was accomplished with a ChArUCo calibration pattern. Calibration was performed by capturing several views of the ChArUCo board and implementing OpenCV and Ceres libraries for robust camera parameter calculation. Final reprojection error from the distortion, intrinsic, and extrinsic parameters was less than 1.0 pixels. The multi-camera system calibration was tested by comparing system output to measured dimensions along the length of the capture volume (5 markers on the floor spaced at 1m intervals). X-axis error was 1.7 (SD = 1.2) cm, Y-axis error was 2.4 (SD = 1.5) cm, and Z-axis error was 1.9 (SD = 1.4) cm. This solution allows for one-time system calibration that does not need to be performed prior to each datacollection session. The hardware pipeline is highlighted in Figure 3. Details of the calibration approach are given in Appendix C.

Signal Processing
Videos of participants were recorded and stored on the NVIDIA Jetson AGX's solidstate drive. The videos were then passed to the OpenPose BODY25 model to perform inference and create a set of 2D keypoints, locating participant joint centres for every video frame. For every video, the 2D keypoint data contains confidence scores that describe the likelihood of correct marker location. Data from each video were preprocessed by removing points below 10% confidence and using a cubic spline to interpolate gaps in the dataset that are five frames (0.083 s) or less. The dataset was then filtered using a zero-phase low-pass 12 Hz Butterworth filter. Figure 4 shows an example of the 2D keypoint data after preprocessing. ference and create a set of 2D keypoints, locating participant joint centres for every video frame. For every video, the 2D keypoint data contains confidence scores that describe the likelihood of correct marker location. Data from each video were preprocessed by removing points below 10% confidence and using a cubic spline to interpolate gaps in the dataset that are five frames (0.083 s) or less. The dataset was then filtered using a zero-phase lowpass 12 Hz Butterworth filter. Figure 4 shows an example of the 2D keypoint data after preprocessing. 2D keypoint data from each camera were passed to the triangulation pipeline, along with the intrinsic and extrinsic parameters. Point correspondences from the 2D keypoints were used in a non-linear optimization RANSAC triangulation method to determine an optimal set of 3D keypoints describing the body at each timestep in the video. For each trial's 3D data, regions where 3D keypoint data reprojection error exceed two standard deviations from the mean were removed to reduce outlier effects.
Software was written in Python 3.7 to calculate body-segment lengths, stride parameters, and hip, knee, and ankle angles. 3D data were filtered using a zero-phase low-pass 5 Hz Butterworth filter, based on findings from other research involving OpenPose keypoint inferences and marker-based approaches [25,29].
Body-segment lengths were calculated using the Euclidean distance between limb endpoints. Body-segment lengths were measured at each timestep in the video. Measurements outside two standard deviations from the mean were identified as outliers and removed. For evaluation, body-segment length deltas were calculated as limb length subtracted from the measured limb length. 2D keypoint data from each camera were passed to the triangulation pipeline, along with the intrinsic and extrinsic parameters. Point correspondences from the 2D keypoints were used in a non-linear optimization RANSAC triangulation method to determine an optimal set of 3D keypoints describing the body at each timestep in the video. For each trial's 3D data, regions where 3D keypoint data reprojection error exceed two standard deviations from the mean were removed to reduce outlier effects.
Software was written in Python 3.7 to calculate body-segment lengths, stride parameters, and hip, knee, and ankle angles. 3D data were filtered using a zero-phase low-pass 5 Hz Butterworth filter, based on findings from other research involving OpenPose keypoint inferences and marker-based approaches [25,29].
Body-segment lengths were calculated using the Euclidean distance between limb endpoints. Body-segment lengths were measured at each timestep in the video. Measurements outside two standard deviations from the mean were identified as outliers and removed. For evaluation, body-segment length deltas were calculated as limb length subtracted from the measured limb length.
Ground-truth stride parameters were calculated from ground-truth foot events obtained by manually labelling foot offs and foot strikes in each video. Detected foot events were obtained by using the Zeni et al. algorithm [30]. The set of detected foot events was improved by implementing an algorithm to recover gait initiation and gait termination (initiation termination recovery, it recovery). Figure 5 shows an example of the detected foot events prior to the initiation and termination recovery algorithm.
Ground-truth stride parameters were calculated from ground-truth foot events obtained by manually labelling foot offs and foot strikes in each video. Detected foot events were obtained by using the Zeni et al. algorithm [30]. The set of detected foot events was improved by implementing an algorithm to recover gait initiation and gait termination (initiation termination recovery, it recovery). Figure 5 shows an example of the detected foot events prior to the initiation and termination recovery algorithm. Figure 5. Foot-event detection. Solid green lines are ground-truth foot events, and dashed red lines are detected foot events. The solid green line circled in red shows an event that was missed but subsequently recovered using the IT recovery algorithm.
Regions such as the one highlighted by the red ellipse in Figure 5 were recovered by searching the window between the stop region (red area) and the next detected foot event. The algorithm determined whether an initiation occurred by analyzing the linear fit of the curve in a calculated search region. The search region was assessed by detecting a potential foot event, using SciPy's signal.find_peaks function, and fitting a line between the potential foot event and the next detected foot event [31]. Linear fits above an R-squared value of 0.85 were selected as gait initiations and added to the list of detected foot events. Foot events that were missed or not recovered by Zeni's algorithm were backfilled using the algorithm proposed by Capela, Lemaire, and Baddour [32,33]. Algorithm 1and Algorithm 2 describe the methods implemented by this research to detect foot events. Regions such as the one highlighted by the red ellipse in Figure 5 were recovered by searching the window between the stop region (red area) and the next detected foot event. The algorithm determined whether an initiation occurred by analyzing the linear fit of the curve in a calculated search region. The search region was assessed by detecting a potential foot event, using SciPy's signal.find_peaks function, and fitting a line between the potential foot event and the next detected foot event [31]. Linear fits above an R-squared value of 0.85 were selected as gait initiations and added to the list of detected foot events. Foot events that were missed or not recovered by Zeni's algorithm were backfilled using the algorithm proposed by Capela, Lemaire, and Baddour [32,33]. Algorithm 1 and Algorithm 2 describe the methods implemented by this research to detect foot events. Algorithms 1 Method for detecting stops in a trial given an array of chest keypoint position per frame. Detect Stops.

0
Get first derivative of keypoints in chest keypoint array, A C , smooth the array with a low-pass filter smooth E with a low-pass filter 5 fine detect initial peaks in E as foot strikes F S 6 pass E, F S , and list of stop windows W L to initiation/termination recovery method 7 for W in W L 8 coarse detect peaks from E index 0 to start index of W 9 select the last detected peak as a potential gait termination strike P S 10 create window between the last detected strike in F S r and P S 11 if last detected strike is equal to potential gait termination 12 Return is termination False 13 else 14 check concavity of the displacement data inside the newly constructed window W 15 construct a line L from the start to the end of W 16 determine the linear fit between the data inside W and L 17 if linear fit is less than threshold 18 Return is termination False 19 Return is termination True → P S inserted in F S Processing time for foot-event detection and stride-parameter measurement was calculated using Python 3 s built in nanosecond clock. Foot-event detection took, on average, 18 ms for a 1500 frame trial. Stride-parameter measurement took an average of 35 ms for calculating 30 parameters across a 1500 frame trial.
Stride parameters included stride length, stride time, stride speed, step length, step width, step time, cadence, stance time, swing time, stance swing ratio, and double support time. Results for comparison included mean (µ) and standard deviation (σ) across all trials of the same walking condition for each participant.
Hip, knee, and ankle 3D angles for the left and right legs were calculated for each stride, defined by ground truth and detected foot events. Figure 6 shows how the vectors used in the angle calculations were defined. Hip angle was the angle between the torso vector and thigh vector, knee angle was the inner angle between the thigh and shank vector, and ankle angle was the inner angle between the shank and foot vector.

Validation
The Smart Hallway system was evaluated by testing two male participants (age: 28; height: 180 cm; weight: 64 kg and age: 25; height: 178 cm; weight: 90 kg). Each participant provided informed consent (University of Ottawa Ethics Board, H-01-21-5819). Data collection was completed in one testing session.
Prior to testing, each participant's body-segment lengths were measured using an anthropometric tape. The segments matched OpenPose's BODY25 model ( Figure 1) and were measured by palpating the joint centres of interest for each measurement. Each participant completed five separate trials of five walking conditions (Table 3). Participants were recorded with the four-camera array at 60 fps. A total of 50 videos were recorded, containing approximately 500 foot events.

Validation
The Smart Hallway system was evaluated by testing two male participants (age: 28; height: 180 cm; weight: 64 kg and age: 25; height: 178 cm; weight: 90 kg). Each participant provided informed consent (University of Ottawa Ethics Board, H-01-21-5819). Data collection was completed in one testing session.
Prior to testing, each participant's body-segment lengths were measured using an anthropometric tape. The segments matched OpenPose's BODY25 model ( Figure 1) and were measured by palpating the joint centres of interest for each measurement. Each participant completed five separate trials of five walking conditions (Table 3). Participants were recorded with the four-camera array at 60 fps. A total of 50 videos were recorded, containing approximately 500 foot events. Table 3. Testing protocol for the Smart Hallway validation.

Walking straight
Start one meter outside the capture volume and walk straight. Once through the capture volume, turn around and walk back to the initial position. The turn occurs outside of the camera field of view.

Walking and turning
Start at the edge of the capture volume and walk towards a marker positioned 50 cm from the end of the capture volume. Turn around the marker and walk back to the initial position. The turn occurs within the camera field of view.

Walking in a curved path
Start at the edge of the capture volume and walk in a curved path around the capture volume. The test ends once the participant reaches their initial position. The participant performs each test in the same direction.
Walking with a cane Follow the "walking straight" protocol while using a cane as a walking aid. Cane held in the same hand for all trials. Participants were instructed on how to properly use a cane Walking with a walker Follow the "walking straight" protocol while using a wheeled walker as a walking aid. Participants were instructed on how to properly use a walker.

Results
From Tables 4 and 5, mean differences between calculated and ground-truth values were small for the majority of body-segment lengths. In general, the calculated bodysegment lengths were less than the ground-truth values. The average difference across all test conditions was 1.56 (SD = 2.77) cm. Since the delta results for participant one and participant two were similar, only the results for participant one were included in this manuscript. Results for participant two are located in Appendix D. Table 4. Body-segment lengths for participant one: walking straight, walking turn, and walking curve test conditions (mean and standard deviation in brackets). Smart Hallway (SH) values were calculated using the 3D reconstructed data, and Delta is the difference between the Smart Hallway and ground-truth segment lengths.

Walking Condition
Walking   Table 6 shows the mean absolute error of the foot-event detection algorithms compared to ground-truth values obtained from manual labelling. The error in ground-truth labelling was three frames, and the average error in detected events across all trials was four frames. Tables 7-11, calculated stride parameters were comparable to ground-truth results. Table 6. Detected foot events frame offset from ground-truth values (mean and standard deviation in brackets) across both participants. The percentage of events detected using Zeni [30] and IT recovery algorithms in combination is shown alongside the percentage of events detected using all the proposed foot-event detection methods.

Zeni and IT Recovery (%) Zeni, IT Recovery, and Capela (%)
Walking   Figure 8 shows the ensemble averaged leg angles during gait, from foot strike to foot off. The shape of the ankle, knee, and hip angle curves were similar to able-bodied joint angles from the literature [34]. Row B is with the person facing away from the camera array. Row C provides normative reference data from a typical 3D motion-analysis system [34]. Grey dotted lines are the one-standard-deviation upper and lower bounds.

Discussion
Based on the simulation, design, and evaluations from this research, marker-less human movement analysis is a viable option for outcome measurement of people moving within institutional hallways. The system configuration can lead to automated video capture and fully automated processing that enables outcome measurement with minimal or no human intervention. Unlike past research that has only tested the efficacy of markerless human motion analysis, this work provides a fully implemented prototype that could be deployed in existing institutional environments and brings closer the adoption of marker-less motion analysis for use in practice.
Body-segment length measurements were within 1.56 (SD = 2.77) cm of ground-truth values. This is comparable to leg-length measurements used for clinical decision making, where the difference between Vicon limb lengths and X-Ray bone measurements was 0.98 (SD = 0.55) cm [35]. Stride parameters and joint angles were analyzed to determine the Smart Hallway's capability for human motion analysis. Keypoint-based stride parameters were similar to ground-truth results. Across all test conditions, stride-parameter distances were 3.16 (SD = 3.26) cm from the ground truth. In all cases, the standard deviation of the delta was within the standard deviation of the calculated stride parameter. Stride-parameter times were 0.047 (SD = 0.037) s from the ground truth. The timing differences were very small and equivalent to the ground-truth foot-event timings. Stride-parameter velocities were 0.74 (SD = 0.75) cm/s from the ground truth. In particular, stride times for walking straight were 1.18 (SD = 0.05) s; this corresponds with findings from the literature, showing that the Smart Hallway's standard deviation is within a similar range to existing marker-based gait datasets 1.02 (SD = 0.06) s [36].
Other studies analyzing physician ability for visual gait assessment concluded that raters had an average of 50% accuracy when compared to 3D marker data [37,38]. Even Figure 8. Ensemble averaged leg angles measured for participant one. Row A is with the person facing the camera array. Row B is with the person facing away from the camera array. Row C provides normative reference data from a typical 3D motion-analysis system [34]. Grey dotted lines are the one-standard-deviation upper and lower bounds.

Discussion
Based on the simulation, design, and evaluations from this research, marker-less human movement analysis is a viable option for outcome measurement of people moving within institutional hallways. The system configuration can lead to automated video capture and fully automated processing that enables outcome measurement with minimal or no human intervention. Unlike past research that has only tested the efficacy of markerless human motion analysis, this work provides a fully implemented prototype that could be deployed in existing institutional environments and brings closer the adoption of marker-less motion analysis for use in practice.
Body-segment length measurements were within 1.56 (SD = 2.77) cm of ground-truth values. This is comparable to leg-length measurements used for clinical decision making, where the difference between Vicon limb lengths and X-Ray bone measurements was 0.98 (SD = 0.55) cm [35]. Stride parameters and joint angles were analyzed to determine the Smart Hallway's capability for human motion analysis. Keypoint-based stride parameters were similar to ground-truth results. Across all test conditions, stride-parameter distances were 3.16 (SD = 3.26) cm from the ground truth. In all cases, the standard deviation of the delta was within the standard deviation of the calculated stride parameter. Strideparameter times were 0.047 (SD = 0.037) s from the ground truth. The timing differences were very small and equivalent to the ground-truth foot-event timings. Stride-parameter velocities were 0.74 (SD = 0.75) cm/s from the ground truth. In particular, stride times for walking straight were 1.18 (SD = 0.05) s; this corresponds with findings from the literature, showing that the Smart Hallway's standard deviation is within a similar range to existing marker-based gait datasets 1.02 (SD = 0.06) s [36].
Other studies analyzing physician ability for visual gait assessment concluded that raters had an average of 50% accuracy when compared to 3D marker data [37,38]. Even with such low accuracy, good clinical decision making is still possible, though some abnormal gait features are not detected during visual assessment [37][38][39]. Thus, the outcome measures calculated from the Smart Hallway can provide useful information for clinical decision making when compared to current visual assessment methods.
Stride parameters were affected by the walking aids; however, results were still similar to measures from the walking-straight condition. The ensemble average curves obtained from the leg-angle calculations showed similar shapes compared to leg angles from the literature [34].
Stride parameters that were only reliant on a single type of foot event (e.g., only left foot strikes) were highly accurate. Increased error and standard deviation in strideparameter measurements were seen in parameters that relied on multiple types of foot events, including step length, step width, and stance-swing ratio. This is likely due to compounding errors by combining either contralateral foot events, foot strikes and foot offs, or foot events and 3D keypoint locations on the floor. Foot-event detection was within four frames (67 ms) of the ground-truth foot events. Stride parameters obtained using detected foot events were similar to stride parameters calculated using ground-truth foot events. More work is needed to improve foot-event detection accuracy when calculating stride parameters that rely on multiple data types, such as stance-swing ratio or contralateral step parameters.
Joint-angle standard deviations were greater when occlusions occurred or large variance existed in the Y depth coordinate between the keypoints of interest. Greater error in the global Y-axis is expected since this axis is related to scene depth and is sensitive to triangulation method accuracy. These occlusions and points of greater variance generally occurred at foot strike and foot off, where the leg is either at the maximum distance in front of the body or at the maximum distance behind the body, causing a greater variation in the Y-axis.
Smart Hallway accuracy was lower when keypoints were occluded by walking aids or the pose of the participant in the scene. Improvements can be made by increasing the number of cameras and implementing kinematic constraints on the BODY25 model to ensure that only realistic movements are produced in the 3D reconstruction. The current OpenPose BODY25 AI model does not account for physical and kinematic constraints such as consistent joint-to-joint segment lengths and range of motion of certain joints. Some recent approaches to keypoint-pose inference models, such as MotioNet, have included encodings for bone lengths and 3D joint rotation [40]. These models use a scaled estimation of the depth coordinate, which is learned through AI training processes that are also seen in Google ML Kit Pose Detection [41]. However, OpenPose BODY25 has better keypoint quality compared to these models.
The Smart Hallway produced viable results across all outcome measures, with low variance. For this prototype system, only two participants were recruited for testing; however, approximately 500 strides were analyzed in total. Improvements to the OpenPose BODY25 model or new, more advanced models should aid in more accurately detecting the feet and handling body-part occlusion. Currently, OpenPose processing is a system bottleneck, with results from a 10 s trial returned after approximately 120 s (NVIDIA Jetson AGX). Other processing stages, such as the triangulation step, could be further optimized to reduce data registration time. The current implementation is limited to handling one person in frame at a time; however, with upgrades to outcome-measurement software, groups of people could be processed since OpenPose BODY25 provides keypoints for all people in frame.
The Smart Hallway was successfully deployed in a manner that was non-invasive to the hallway environment. This implies that a system could be set up and gather data on multiple individuals that walk throughout the capture volume on a daily basis without obstructing existing institutional processes. The Smart Hallway system successfully captured data from individuals in a non-invasive manner that did not require markers or a data-collection device being affixed to the participants.

Conclusions
A Smart Hallway setup for marker-less 3D human motion analysis in institutional hallways was viable when using an array of four temporally and spatially synchronized cameras and OpenPose BODY25. Temporal synchronization was achieved for the multicamera array, and spatial synchronization was achieved through a rigorous calibration procedure using RANSAC and bundle-adjustment techniques. 3D joint keypoints were successfully calculated from 50 videos (approximately 500 strides) that included straight walking, walking with turns, walking a curved path, using a walker, and using a cane. Body-segment lengths, foot events, and stride parameters from each condition were similar to manually identified and calculated ground-truth values. Ensemble averaged leg angles corresponded well with kinematic data from the literature [34]. The prototype system validated in this research allows for fully automated human motion analysis without the need for post-processing techniques for calibration, synchronization, or foot-event and stride-parameter analysis. This research helps to move human motion analysis from the lab to the point of patient contact by providing a full system design that is implementable in institutional settings but does not require extensive human resources for operation.
Future research could apply kinematic constraints to the reconstructed 3D results, such as consistent joint-to-joint segment lengths and constrained-joint range of motion in order to reconstruct only possible physical body positions. Furthermore, new training data or transfer learning, could be applied to make better inferences when movement aids are being used. Research into applying the Smart Hallway design to other areas of interest, such as gait-classification applications, could be performed to assess fall risk or neurodegenerative disease progression.

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.

Data Availability Statement: Not applicable.
Acknowledgments: Graduate-student support was provided by the CREATE-READi program. The authors would like to thank The Ottawa Hospital Rehabilitation Centre and the University of Ottawa for providing resources for development and testing. They would also like to thank the volunteers who participated in this study.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Simulation and Modelling
Appendix A. 1

. Methods
To simulate camera layout in order to achieve an optimal capture volume, with cameras located on the ceiling to avoid disrupting hallway access and people hitting the cameras, a method using Blender (Blender Foundation, Blender 2.91) to model the cameras was implemented [7]. Camera field of view (FoV) during simulation was 70 • × 52.5 • , based on lens specifications [42]. In this simulation, an arc layout and corner layout were evaluated. Setup 1 had four cameras, with one camera placed at each corner of the simulated hallway capture volume. Setup 2 had four cameras placed in an arc at one end of the simulated hallway. For each setup, three iterations were performed by adjusting camera poses. Both setup 1 and setup 2 aimed to capture a 5 m × 2.4 m × 2.5 m volume in the simulated hallway. Setup 3 had eight cameras placed equidistantly around a 10 m × 2.4 m × 2.5 capture volume to test how the system could scale to a greater number of cameras. For each setup, camera poses were varied to maximize FoV overlap and the percentage of useable volume. Useable volume was defined as the desired walking-distance length (5 m or 10 m), the width of the institutional hallway (2.4 m), and a conservative estimate of typical participant height (2.5 m). Figure A1 displays an example of the fourcamera corner, four-camera arc, and eight-camera layouts. eras located on the ceiling to avoid disrupting hallway access and people hitting the cameras, a method using Blender (Blender Foundation, Blender 2.91) to model the cameras was implemented [7]. Camera field of view (FoV) during simulation was 70° × 52.5°, based on lens specifications [42]. In this simulation, an arc layout and corner layout were evaluated. Setup 1 had four cameras, with one camera placed at each corner of the simulated hallway capture volume. Setup 2 had four cameras placed in an arc at one end of the simulated hallway. For each setup, three iterations were performed by adjusting camera poses. Both setup 1 and setup 2 aimed to capture a 5 m × 2.4 m × 2.5m volume in the simulated hallway. Setup 3 had eight cameras placed equidistantly around a 10 m × 2.4 m × 2.5 capture volume to test how the system could scale to a greater number of cameras. For each setup, camera poses were varied to maximize FoV overlap and the percentage of useable volume. Useable volume was defined as the desired walking-distance length (5 m or 10 m), the width of the institutional hallway (2.4 m), and a conservative estimate of typical participant height (2.5 m). Figure A1 displays an example of the four-camera corner, four-camera arc, and eight-camera layouts. Properties of the useable capture volume were calculated by determining the intersection of each camera's respective FoV within the simulated hallway. A volume mesh describing the space in which all cameras have a view of the scene was formed from these intersections. Using Blender's Boolean intersection method, several 3D meshes were produced and compared. The capture-volume meshes generated from each layout were compared to determine a camera layout that provided desirable features in the context of an institutional hallway setting. Individual camera capture volumes are defined in A1 as with as the camera capture volume and indicating the camera number in a multicamera system. Vectors ⃗ , ⃗ , ⃗ are defined in Figure A2. The intersection of the camera volumes is defined in A2 as where defines the capture volume of camera 1, defines the capture volumes Properties of the useable capture volume were calculated by determining the intersection of each camera's respective FoV within the simulated hallway. A volume mesh describing the space in which all cameras have a view of the scene was formed from these intersections. Using Blender's Boolean intersection method, several 3D meshes were produced and compared. The capture-volume meshes generated from each layout were compared to determine a camera layout that provided desirable features in the context of an institutional hallway setting. Individual camera capture volumes are defined in A1 as with CV as the camera capture volume and i indicating the camera number in a multi- Figure A2. Properties of the useable capture volume were calculated by determining the intersection of each camera's respective FoV within the simulated hallway. A volume mesh describing the space in which all cameras have a view of the scene was formed from these intersections. Using Blender's Boolean intersection method, several 3D meshes were produced and compared. The capture-volume meshes generated from each layout were compared to determine a camera layout that provided desirable features in the context of an institutional hallway setting. Individual camera capture volumes are defined in A1 as with as the camera capture volume and indicating the camera number in a multicamera system. Vectors ⃗ , ⃗ , ⃗ are defined in Figure A2. The intersection of the camera volumes is defined in A2 as where defines the capture volume of camera 1, defines the capture volumes of the other cameras in the multi-camera system, and ⃗ defines the set of vectors that define each camera capture volume. The intersection of the camera volumes is defined in (A2) as where CV 1 defines the capture volume of camera 1, CV i+1 defines the capture volumes of the other cameras in the multi-camera system, and → v defines the set of vectors that define each camera capture volume.

Appendix A.2. Validation
The total capture volume for each setup and the total ground-coverage areas are presented in Table A1. The capture-volume meshes ( Figure A3) are shown for each layout with varying camera pitch angles. The four-camera arc layout provided more coverage of the simulated institutional hallway compared to the four-camera corner layout. For the four-camera corner layout, some stereo-camera pairs were too far apart to reliably achieve an accurate calibration. Additionally, 3D reconstruction accuracy for a given stereo-camera pair is dependent on the incidence angle between the two cameras [43]. The four-camera corner layout requires some pairs of stereo-cameras to exceed the desirable incidence angle, which could negatively affect the final 3D reconstruction accuracy. The eight-camera perimeter layout provided the best coverage overall (i.e., total capture volume, groundcoverage area). Increasing the number of cameras also improves the likelihood of more accurate 3D reconstruction. An optimization procedure could be performed to determine camera poses and placement based on capture volume, ground coverage, and the number of cameras to further improve these results. Table A1. Capture volume and ground-coverage area from the simulated camera layouts. R is the range of camera poses in the X, Y, and Z axis for each camera layout. Meshes obtained using the intersection procedure provided visual confirmation of the expected capture volume in the simulated institutional hallway. Geometric features of the capture-volume mesh could be used to obtain a desirable layout of the cameras and aid in camera positioning if certain features are desired. This may include maximizing the total volume, maximizing the camera capture-volume overlap, or minimizing the angle of incidence between cameras. Figure A3 shows the meshes obtained from the four-camera corner, four-camera arc, and eight-camera perimeter layouts.

Appendix B. Camera Synchronization
Appendix B.1. Methods FLIR BlackFly S USB3 cameras have a general-purpose input and output (GPIO) port that allows access to the camera auxiliary power input, auxiliary power ground, non-isolated input, and opto-isolated input. Camera software can be used to specify how GPIO will be used [44]. The hardware synchronization cable provided external power to the cameras while transferring the trigger signal produced by the primary camera. Cameras were powered externally to improve overall reliability and reduce the load on the NVIDIA Jetson AGX Xavier. The primary camera was formatted through software commands so that opto-isolated output produced a square wave as a function of the internal exposure time and selected frame rate. The secondary cameras were formatted similarly to the primary camera, except that the opto-isolated input was enabled to receive the primary camera's trigger signal. To produce the desired trigger signal at the primary camera's opto-

Appendix B. Camera Synchronization
Appendix B.1. Methods FLIR BlackFly S USB3 cameras have a general-purpose input and output (GPIO) port that allows access to the camera auxiliary power input, auxiliary power ground, nonisolated input, and opto-isolated input. Camera software can be used to specify how GPIO will be used [44]. The hardware synchronization cable provided external power to the cameras while transferring the trigger signal produced by the primary camera. Cameras were powered externally to improve overall reliability and reduce the load on the NVIDIA Jetson AGX Xavier. The primary camera was formatted through software commands so that opto-isolated output produced a square wave as a function of the internal exposure time and selected frame rate. The secondary cameras were formatted similarly to the primary camera, except that the opto-isolated input was enabled to receive the primary camera's trigger signal. To produce the desired trigger signal at the primary camera's opto-isolated output, an external connection to a power source was required since the camera's 3.3V input was occupied by the external camera power supply. Figure A4 displays connections to the primary and secondary camera GPIO ports. isolated output, an external connection to a power source was required since the camera's 3.3V input was occupied by the external camera power supply. Figure A4 displays connections to the primary and secondary camera GPIO ports. Figure A4. Hardware synchronization cable GPIO pin connections for primary and secondary cameras.
The cable was built longer than necessary to accommodate the distance between cameras and variety of camera-array layouts tested. The distance between camera nodes was 7 m, and the connection to each camera was 0.75 m to allow for variability in placement. Due to the size of the cable and manufacturing capabilities, the external power connections were not consolidated into one connection and were not run alongside the triggersignal wiring. Thus, camera power supplies were spliced into the cable near each camera GPIO connector. This is detailed in Figure A5, where the overall cable layout is shown, along with all connections. The cable was built longer than necessary to accommodate the distance between cameras and variety of camera-array layouts tested. The distance between camera nodes was 7 m, and the connection to each camera was 0.75 m to allow for variability in placement. Due to the size of the cable and manufacturing capabilities, the external power connections were not consolidated into one connection and were not run alongside the trigger-signal wiring. Thus, camera power supplies were spliced into the cable near each camera GPIO connector. This is detailed in Figure A5, where the overall cable layout is shown, along with all connections. Computation 2021, 21, x FOR PEER REVIEW 24 of 34 Figure A5. Multi-camera synchronization cable expanded to an eight-camera setup.

Appendix B.2. Validation
Preliminary validation of the synchronization cable was performed by using a multimeter to measure current while running the cameras. Based on the inner circuitry of the opto-isolated GPIO, a 1.5 mA activation current was required at the LED to enable triggering [44] (opto-isolated output allows for a maximum current draw of 25 mA). The values in were measured using three different pull-up resistors at the primary camera. To ensure cameras were being triggered properly, an oscilloscope was used to measure voltage changes at the output of the primary and input of the secondary cameras. Ideally, the signal produced by the primary camera should be identical to the signal received by each of the secondary cameras, without lag, to ensure that cameras are triggered Figure A5. Multi-camera synchronization cable expanded to an eight-camera setup.

Appendix B.2. Validation
Preliminary validation of the synchronization cable was performed by using a multimeter to measure current while running the cameras. Based on the inner circuitry of the opto-isolated GPIO, a 1.5 mA activation current was required at the LED to enable triggering [44] (opto-isolated output allows for a maximum current draw of 25 mA). The values in were measured using three different pull-up resistors at the primary camera. To ensure cameras were being triggered properly, an oscilloscope was used to measure voltage changes at the output of the primary and input of the secondary cameras. Ideally, the signal produced by the primary camera should be identical to the signal received by each of the secondary cameras, without lag, to ensure that cameras are triggered at the same instant. Figure A6 shows the trigger wave, including the trigger and exposure portion of the signal. at the same instant. Figure A6 shows the trigger wave, including the trigger and exposure portion of the signal. Figure A6. Trigger signal produced by the primary (yellow) and received by the secondary (blue) cameras.
Once the trigger signal was validated, the lag between primary and secondary cameras was measured by capturing images (60 fps) of a millisecond clock (monitor refresh rate, 60 fps). A test using four cameras was performed, and after 167 s of image capture (10,000 frames), the final frame from each camera was compared. Figure A7 displays the setup and an example of the camera synchronization. Figure A7. Synchronization test layout and validation images for a four-camera setup capturing at 60 fps. All cameras show the same frame of the millisecond clock after 10,000 images. Once the trigger signal was validated, the lag between primary and secondary cameras was measured by capturing images (60 fps) of a millisecond clock (monitor refresh rate, 60 fps). A test using four cameras was performed, and after 167 s of image capture (10,000 frames), the final frame from each camera was compared. Figure A7 displays the setup and an example of the camera synchronization. at the same instant. Figure A6 shows the trigger wave, including the trigger and exposure portion of the signal. Figure A6. Trigger signal produced by the primary (yellow) and received by the secondary (blue) cameras.
Once the trigger signal was validated, the lag between primary and secondary cameras was measured by capturing images (60 fps) of a millisecond clock (monitor refresh rate, 60 fps). A test using four cameras was performed, and after 167 s of image capture (10,000 frames), the final frame from each camera was compared. Figure A7 displays the setup and an example of the camera synchronization. Figure A7. Synchronization test layout and validation images for a four-camera setup capturing at 60 fps. All cameras show the same frame of the millisecond clock after 10,000 images. Figure A7. Synchronization test layout and validation images for a four-camera setup capturing at 60 fps. All cameras show the same frame of the millisecond clock after 10,000 images.
To further validate camera synchronization, image time stamps were converted to a standardized CPU time on the Jetson NVIDIA AGX Xavier. Three tests using four cameras capturing 10,000 synchronized images were performed to determine the robustness of the trigger-synchronization cable. Table 4 displays the average difference between the measured time stamp and the target 60 fps (16,667 µs/frame). Table A3 shows the stability of the camera synchronization over 10,000 images and how closely in time the individual images are captured (on average). Camera positions were P: 20023229, S1: 20010192, S2: 20010189, and S3: 20010190. Calibration of individual camera-intrinsic parameters and the multi-camera system's extrinsic parameters was accomplished using a pattern calibration approach. The calibration pattern was an 8 × 7 ChArUCo board with a 4 × 4 (16 bit) dictionary of ArUCo random generator markers [45,46]. Chessboard squares were 110 mm, and ArUCo markers were 80 mm on each side.
For the intrinsic and extrinsic calibration process, a minimum of 200 images with a successfully detected calibration board were captured at a resolution of 1440 × 1080 pixels. The desired capture volume was outlined with markers to guide calibration and ensure that the entire volume was covered. During calibration, cameras were set to capture at 10 fps to reduce the total number of images passed to the ChArUCo board detection and calibration pipeline.
For intrinsic camera calibration, the set of images contained a variety of calibrationboard poses that spanned a range of distances in the camera FoV. A RANSAC approach was used to determine each camera matrix and set of distortion coefficients [47]. After a set of images were captured for each camera, intrinsic calibration was performed using ChArUCo detection to obtain points on the calibration board and OpenCV's extended library for access to the ChArUCo calibration functions. Images with poor reprojection error or too few detected ChArUCo markers were ignored during calibration. The calibration results were only accepted when a sufficiently low reprojection error was obtained from a given set of images (less than 1 pixel).
For extrinsic calibration, calibration-board images were captured from a variety of distances and angles relative to the cameras. For the extrinsic calibration process to successfully determine a valid rotation and translation between the camera pairs, the calibration board must be clearly visible in all images. Ideally, a quarter of the calibration board's ChArUCo markers should be detected in an image to accurately determine calibrationboard points. If too few markers are detected, the calibration-board points may have a poor reprojection error. For all frames with a partially detected ChArUCo calibration board, an image-point recovery algorithm was implemented to greatly increase the number of usable frames.
With a set of at least 200 successfully detected calibration boards, the calculated intrinsic parameter values were used to obtain an initial rotation matrix and translation vector connecting each secondary camera with the primary reference-frame camera. Initial extrinsic calibration results were only accepted when a reprojection error of less than 1 pixel was achieved. The set of extrinsic parameters was further improved using bundle adjustment, a modified version of OpenPose's pipeline built using the Google Ceres library [48]. The modified bundle-adjustment approach utilizes some advantages of the ChArUCo calibration board to recover calibration points from images where only partial detections were obtained. Calibration was performed iteratively until the extrinsic-parameter pixel-reprojection error was less than 0.5 pixels.
For output-parameter calculations, the global coordinate system was transformed to the lab floor. An image of the calibration board on the ground was captured, and rotation and translation needed to transform the coordinate system to the board plane was calculated. This was performed multiple times with the board in a desired location to ensure that the new coordinate system's X-axis lined up with the virtual capture volume width and that Y-axis aligned with the length.

Appendix C.2. Validation
Multi-camera system calibration was validated qualitatively and quantitatively. Qualitative assessment was performed by analyzing images from each camera to determine the effectiveness of the distortion model and by analyzing stereo-pairs of images to assess epipolar geometry characteristics. Removal of image distortions was verified by checking images for pin-cushioning or barrel distortion ( Figure A8). With a set of at least 200 successfully detected calibration boards, the calculated intrinsic parameter values were used to obtain an initial rotation matrix and translation vector connecting each secondary camera with the primary reference-frame camera. Initial extrinsic calibration results were only accepted when a reprojection error of less than 1 pixel was achieved. The set of extrinsic parameters was further improved using bundle adjustment, a modified version of OpenPose's pipeline built using the Google Ceres library [48]. The modified bundle-adjustment approach utilizes some advantages of the ChArUCo calibration board to recover calibration points from images where only partial detections were obtained. Calibration was performed iteratively until the extrinsic-parameter pixel-reprojection error was less than 0.5 pixels.
For output-parameter calculations, the global coordinate system was transformed to the lab floor. An image of the calibration board on the ground was captured, and rotation and translation needed to transform the coordinate system to the board plane was calculated. This was performed multiple times with the board in a desired location to ensure that the new coordinate system's X-axis lined up with the virtual capture volume width and that Y-axis aligned with the length.

Appendix C.2. Validation
Multi-camera system calibration was validated qualitatively and quantitatively. Qualitative assessment was performed by analyzing images from each camera to determine the effectiveness of the distortion model and by analyzing stereo-pairs of images to assess epipolar geometry characteristics. Removal of image distortions was verified by checking images for pin-cushioning or barrel distortion ( Figure A8).  Figure A8 shows the removal of curvature caused by camera-lens distortion. Features such as the square calibration-board edges and ceiling-tile supports become straight in the undistorted view, as opposed to the distorted view.
Images from stereo-pairs of cameras were rectified and stitched together to determine whether epipolar constraints were violated. The epipolar constraint ensures that the projection of a given point from one image must lie on the epipolar line defined by the projected point and the imaging plane epipole of the other image. Points in the left camera view were tracked, along the corresponding epiline in the right camera, to ensure that the same point was found, with Figure A9 showing an example of a corner point in the left image being tracked along its epiline to the corresponding point in the right image. Figure A8. Barrel-distortion removal by camera-distortion model. Figure A8 shows the removal of curvature caused by camera-lens distortion. Features such as the square calibration-board edges and ceiling-tile supports become straight in the undistorted view, as opposed to the distorted view.
Images from stereo-pairs of cameras were rectified and stitched together to determine whether epipolar constraints were violated. The epipolar constraint ensures that the projection of a given point from one image must lie on the epipolar line defined by the projected point and the imaging plane epipole of the other image. Points in the left camera view were tracked, along the corresponding epiline in the right camera, to ensure that the same point was found, with Figure A9 showing an example of a corner point in the left image being tracked along its epiline to the corresponding point in the right image. Quantitative validation was performed using the pixel-reprojection error obtained at each calibration stage. Final intrinsic-parameter-calibration results are documented in Table A4. The average extrinsic reprojection error was 0.21 pixels. A pixel-reprojection error of less than 1 pixel is generally desirable; however, this can vary depending on the image-sensor resolution. The average error after intrinsic-parameter calibration was less than 1 pixel for all cameras, and the overall average reprojection error calculated during the extrinsic-parameter bundle adjustment was 0.21 pixels. The depth accuracy of the calibrated multi-camera system was tested by measuring the capture volume length, width, and height. These values were obtained by selecting corresponding points in each camera view by placing markers on the ground or ceiling. The point correspondences were then passed to the triangulation pipeline to determine measures of the capture-volume dimensions.
The triangulation pipeline was written in Python 3.7 and used code from the AniposeLib GitHub repository as a structure for the triangulation procedure [49,50]. The triangulation procedure used for the Smart Hallway applied a RANSAC approach to remove outliers in the detected points. The selected points were then triangulated using a bundle-adjustment approach, where the final 3D keypoint was iteratively adjusted to reduce the overall reprojection error in each camera view.
After verifying the final calibration, the measured capture-volume length was 5.04 ± 0.015 m, the width was 2.42 ± 0.012 m, and the height was 3.02 ± 0.014 m. Quantitative validation was performed using the pixel-reprojection error obtained at each calibration stage. Final intrinsic-parameter-calibration results are documented in Table A4. The average extrinsic reprojection error was 0.21 pixels. A pixel-reprojection error of less than 1 pixel is generally desirable; however, this can vary depending on the image-sensor resolution. The average error after intrinsic-parameter calibration was less than 1 pixel for all cameras, and the overall average reprojection error calculated during the extrinsic-parameter bundle adjustment was 0.21 pixels. The depth accuracy of the calibrated multi-camera system was tested by measuring the capture volume length, width, and height. These values were obtained by selecting corresponding points in each camera view by placing markers on the ground or ceiling. The point correspondences were then passed to the triangulation pipeline to determine measures of the capture-volume dimensions.
The triangulation pipeline was written in Python 3.7 and used code from the Ani-poseLib GitHub repository as a structure for the triangulation procedure [49,50]. The triangulation procedure used for the Smart Hallway applied a RANSAC approach to remove outliers in the detected points. The selected points were then triangulated using a bundle-adjustment approach, where the final 3D keypoint was iteratively adjusted to reduce the overall reprojection error in each camera view.
After verifying the final calibration, the measured capture-volume length was 5.04 ± 0.015 m, the width was 2.42 ± 0.012 m, and the height was 3.02 ± 0.014 m. Table A5. Body-segment lengths for participant two: walking-straight, walking-turn, and walker-curve conditions (mean and standard deviation in brackets). Smart Hallway (SH) is calculated using the 3D reconstructed data, and Delta is the difference between the Smart Hallway and ground-truth segment length.  Figure A10. Ensemble averaged leg angles measured by the Smart Hallway towards the camera array (row (A)) and away from the camera array (row (B)). Comparator data [34] (row (C)) show similar shape and range of motion. Grey dotted lines are the one-standard-deviation upper. Figure A10. Ensemble averaged leg angles measured by the Smart Hallway towards the camera array (row (A)) and away from the camera array (row (B)). Comparator data [34] (row (C)) show similar shape and range of motion. Grey dotted lines are the one-standard-deviation upper.