We aimed to validate the effectiveness of bullet-time video for surgical education by simulating suturing under bullet-time video and a normal instructional video. The type of surgery varies according to the site and purpose, but suturing is the universal finishing stage of open surgery.
4.3.1. Subgroups and Rationale
In order to maximize the accuracy of the results, we used a controlled variable grouping approach, whereby the subjects were divided into two groups according to the ease of suturing [
29] and the level of their skill.
We selected four pairs of participants with the most similar proficiency in the use of surgical instruments as controls from a total of 20 participants. In the experiment, they were shown different types of videos (general video and bullet-time video), and after viewing them, they were asked to reproduce as much as possible the operation process in the videos to compare the advantages of bullet-time videos over traditional single-view videos.
On the basis of the above description, we classified the subjects according to their surgical instruments using skill and experience as follows:
Almost all the medical-related participants can master the operation of simple sutures. Focusing on exploring the effect of bullet-time videos for participants who already have some basic skills to improve, we prepared two other suture operations [
30]:
Simple Suture;
Mattress Suture;
Figure-of-8 Suture.
Of these, the latter two require more steps to perform than the simple suture and require a higher degree of skill. And the arrangement of participants is shown in
Table 4 and
Table 5.
4.3.3. Quantitative Results
We counted the time taken by all subjects to complete the suturing operation, which was used as one of the indicators for a rough assessment of proficiency, as shown in
Table 7.
Medical participants underwent two separate suture operations, and the length of time that the two groups took to undergo the different suture operations was compared here. It was found that the viewing of the bullet time had a non-significant reduction in the length of the operation, as shown in
Table 8.
In minimally invasive procedures, the precision and effectiveness of surgical instrument trajectories constitute critical metrics for evaluating operator proficiency [
31]. This study employs a trajectory matching evaluation framework based on procedural characteristics of diverse suturing techniques. Specifically, temporal keyframes corresponding to critical operative phases were systematically identified. Quantitative comparison was implemented by calculating three-dimensional spatial displacement between instrument tip positions in expert-derived Ground Truth trajectories and trainee-generated trajectories [
32]. The Root Mean Square Error (RMSE) algorithm was applied to measure positional deviations across corresponding keyframe intervals. This computational approach objectively quantifies operator–expert discrepancies in instrument control accuracy and motion economy, thereby establishing a standardized metric for surgical skill assessment.
First, we need to determine how to obtain the corresponding points of these two trajectories. Both the truth interval set as a reference and the corresponding interval are the coordinates of the tip point of the key surgical instrument in the keyframe manually intercepted under a specific critical operation, or aligned to the same time point by some interpolation method. If the timestamps of the trajectories do not match, the time alignment or interpolation process needs to be performed first.
Next, for each pair of corresponding points, we need to compute their Euclidean distances (di) in 3D space:
Subsequently, we take the sum of the mean squares of each corresponding point:
where N represents the total number of corresponding key points in the trajectory.
The actual value of RMSE is obtained by removing the square root:
As demonstrated in
Figure 10a–c, the three-dimensional trajectory comparison among Ground Truth (expert-derived), novice participants, and medical trainees during simple suturing reveals distinct operational patterns. Medical trainees with prior surgical experience exhibited significantly higher trajectory congruence with the Ground Truth (mean RMSE = 1.8 ± 0.3 mm) compared to novice participants (mean RMSE = 3.4 ± 0.5 mm,
p < 0.01 via paired
t-test). This inverse correlation between RMSE values and surgical experience aligns with the skill quantification framework, confirming that trajectory fidelity effectively reflects subjects’ baseline operative competence. Furthermore, the systematic deviation patterns observed in novice trajectories (e.g., excessive instrument rotation at needle entry points) provide actionable metrics for structured surgical training interventions.
As illustrated in
Figure 10, the left figure depicts the 3D trajectory of the needle holder tip during expert-level simple suturing procedures. This trajectory was reconstructed from 80 keyframes predicted by our proposed processing pipeline, with chromatic progression (blue → purple) encoding temporal dynamics of instrument movement. The central and right panels visualize trainee trajectories under contrasting instructional conditions. The corresponding RMSE calculation utilized these keyframes as positional ground truth.
A chromatic error mapping schema was implemented to enhance interpretability:
Red: Critical deviations (RMSE > 5 mm);
Yellow: Moderate errors (3 mm ≤ RMSE ≤5 mm);
Green: Expert-like precision (RMSE < 3 mm).
It can be known that the trajectory output of the bullet-time video group is better than that of the control group, which only watched the normal video, in which the experimental group had a total of 3 Critical deviations, while the control group had 17 occurrences. It is worth noting that, in order to eliminate individual differences as much as possible, we selected these two experimenters who performed almost identically after watching the first normal video (25 occurrences of Critical deviations in the same step).
When we extended this rating method to the full dataset, we obtained the data shown in
Table 9 below:
As shown in
Figure 11, both metrics consistently favored the bullet-time modality, with Expert-like scores reaching statistical significance (
p < 0.07, t(3) = 2.002) and Critical deviations showing strong directional trends (t(3) = 1.248). The large effect size for Expert-like (d = 1.00) and medium effect for Critical deviations (d = 0.62) indicate that bullet-time video provides measurable performance advantages.
With the exception of subject B, which shows an increase in the rate of fatal errors after watching the bullet-time video instead, all groups showed an improvement in educational outcomes compared to the regular video control group.
We also calculated the mean scores (transformed from RMSE calculations) for each of the eight groups of subjects participating in the trial, the time consumed, and the suture wound at completion, as shown in
Figure 12.
In addition, we found that there was consistency in the direction of the values of the expert score (score) as well as the trajectory fit as shown in
Figure 13. This could also confirm that our fit is able to explain the educational outcomes. And it indicates that the bullet-time educating method is useful for normal participants to master instruments’ easy control and medical participants about high-level procedures.
Because of the numerical nature of RMSE, we used the new numerical value RMSE_ref for the inverse expression of the trends of RMSE. This makes the results more intuitive.
A strong positive correlation is observed between expert ratings and automated RMSE_ref scores (r = 0.780, p = 0.022, 95% CI [0.18, 0.95]), indicating significant alignment in their evaluation trends. In 5 of 8 cases (62.5%), the expert ratings outperformed calculated RMSE_ref. The Wilcoxon signed-rank test showed consistent directional advantage for the system (T+ = 16.5, Z = −1.26, p = 0.207), with a moderate effect size (ES r = −0.44). These indicate that the trends of the two methodological assessments are highly consistent, and with the exception of some specific extreme samples, the rating score by system (RMSE_ref) can be used as a basis for judging completion.