Multi-View Surgical Camera Calibration with None-Feature-Rich Video Frames: Toward 3D Surgery Playback

: Mounting multi-view cameras within a surgical light is a practical choice since some cameras are expected to observe surgery with few occlusions. Such multi-view videos must be reassembled for easy reference. A typical way is to reconstruct the surgery in 3D. However, the geometrical relationship among cameras is changed because each camera independently moves every time the lighting is reconﬁgured (i.e., every time surgeons touch the surgical light). Moreover, feature matching between surgical images is potentially challenging because of missing rich features. To address the challenge, we propose a feature-matching strategy that enables robust calibration of the multi-view camera system by collecting a set of a small number of matches over time while the cameras stay stationary. Our approach would enable conversion from multi-view videos to a 3D video. However, surgical videos are long and, thus, the cost of the conversion rapidly grows. Therefore, we implement a video player where only selected frames are converted to minimize time and data until playbacks. We demonstrate that sufﬁcient calibration quality with real surgical videos can lead to a promising 3D mesh and a recently emerged 3D multi-layer representation. We reviewed comments from surgeons to discuss the differences between those 3D representations on an autostereoscopic display with respect to medical usage.


Introduction 1.Motivation
Recording surgeries has a potential demand for educational and archiving purposes.Deeper analyses with video examples encourage better understanding and avoid future mistakes.Affordable, small form-factor cameras make recordings more casual and multiview.Aiming for solid capturing of patients under surgery, mounting multi-view cameras within a surgical light (aka.shadowless lamp, hereafter referred to as a multi-camera shadowless lamp: McSL) is a reasonable choice.A shadowless lamp has multiple light sources verged at a point.At least one of the light sources lights an area; hence, fewer shadows appear, even with interferences by surgeons' heads, hands, and tools.The analogy applies to viewpoints.At least one of the cameras prevents uncollected data capture.
Once the surgery is recorded with a McSL, separated videos must be reassembled into an easy-to-refer-to format.Shimizu et al. proposed a video analysis approach that can select the most non-occluded view over frames [1].Reconstructing the surgery in 3D would be another option, given that a 3D scene allows viewers to navigate the scene, avoiding occlusions.However, one McSL video tends to be long (in several tens of minutes to hours), making frame-to-3D model conversion time consuming.Moreover, a major challenge lies in frame alignment because of two reasons.First, the geometrical relationship among the cameras is changed because each camera independently moves every time when the lighting is reconfigured (i.e., every time surgeons touch the surgical light).Second, surgery videos may contain highly reflective tissues and tools, uniform blue covers, and significantly occluding substances.Therefore, finding feature correspondences over cameras becomes difficult and 3D point localization becomes fragile.Manual calibration is always accessible [1], but the video post-processing becomes further cumbersome.
We address the above issues with a video player that allows a user to access a McSL video hierarchically.First, the user highlights important frames in a McSL video and selects shorter videos around the frames.Then, our system performs feature matching and frame alignment algorithms that utilize the shorter video frames to find feature correspondences robustly.Therefore, only the highlighted frames are converted on-demand into a 3D mesh or multi-layer scene representation, which renders the frame in 3D and thus in an autostereoscopic display.

Background and Related Work
Here, we provide an overview of surgical video recording and 3D view synthesis approaches and the challenges in their application to non-feature-rich frames of surgeries.
Surgical Video Recording.Recording surgery is needed for various purposes, such as preserving case studies, educating trainees, and passing on skills [1][2][3][4].Contrary to laparoscopic surgery [5,6], recording in open surgeries, in which physicians directly view the affected areas, is challenging because of possible interference with the surgery, spatial limitations, and occlusions by surgeons.
Displaying non-aligned multi-view images makes the analysis difficult as it requires mental image warping.Mathematically aligning such images requires calibrating the cameras.However, objects in surgical videos are often featureless, which makes calibration challenging.Calibration before video recording is useless as cameras in McSL move when surgeons manipulate it.To address these issues, we implement a feature-matching scheme that utilizes multiple frames over time to collect sufficient feature points for calibration.
3D View Synthesis.To seek system requirements, we reviewed comments from two surgeons who have experience using a McSL recording system at Keio University School of Medicine.One is a Prof.Dr. of the Department of Plastic and Reconstructive Surgery and the other is a Dr. of the Department of Anatomy.The surgeons requested to see and compare several frames in 3D each at the beginning and end of the surgery, especially for plastic surgery.Therefore, we implement a video player that allows medical doctors to seek and select frames so that the system can instantly reconstruct the view in 3D at the selected frames only.From this background, we did not use view synthesis approaches that require per-frame extensive training [12,13].
Another approach is to convert all frames into a 3D video, such as neural videos [14-17] and multi-layer mesh and texture atlas videos [18,19].However, neural videos are typically in low resolution and are computationally demanding.A multi-layer mesh provides faster rendering at higher resolutions, although conversion is nonetheless computationally expensive [18].We instead provide on-demand 3D views only at selected frames.Disocclusion Rendering.Real-time disocclusion rendering is often referred to as diminished reality [20].Multi-view approaches bring observed background pixels from cameras to the current view to disocclude foreground objects [21][22][23].These approaches either rely on plane geometry proxy [21] or range sensors [22,23] for millisecond-order rendering, which prevents us from applying them to our application.Another approach relies on image inpainting, in which the recovered area is completely synthetic (i.e., pixels are hallucinated by a collection of pixels within the field of view) [24][25][26].Therefore, inpainting approaches are not suitable for our application.
Rather than relying on the above approaches, we utilize a recently developed offline multi-view approach [27] and soft 3D representation [28,29] (i.e., multi-layer scene representation) that is known to be more robust than explicit 3D representations.Although 3D rendering allows users to peep at occluded backgrounds, we further extend the soft 3D representation for disocclusion rendering.

Contributions
In summary, we contribute to medical imaging and display in the following ways:

•
We propose a video player that allows the user to selectively convert a portion of a long-shot McSL video into free viewpoint images to minimize time and data until playbacks.

•
We propose a multi-frame multi-view feature matching strategy to estimate intrinsic and extrinsic camera parameters from a set of McSL video frames.This process is always required after the independent movement of internal cameras (e.g., when surgeons touch the surgical light during the operation).We also analyze the number of frames to achieve stable calibration results using real surgical videos.

•
With the robustly estimated camera parameters, we demonstrate a 3D mesh and a recently emerged 3D multi-layer reconstruction.The latter enables disocclusion rendering to remove foregrounds in the generated 3D scene representation for better surgical field visibility.

•
We reviewed comments from surgeons to discuss the differences between those 3D representations with respect to medical usage.

Materials and Methods
The proposed workflow consists of the following four steps (Figure 1).The view synthesis is available only after a successful calibration, which is challenging with non-feature-rich multi-view images in surgical videos.We, therefore, implement a calibration strategy that utilizes multiple frames over time.
by a collection of pixels within the field of view) [? ??].Therefore, inpainting approaches are not suitable for our application.Rather than relying on the above approaches, we utilize a recently developed offline multi-view approach [? ] and soft 3D representation [? ?] (i.e., multi-layer scene representation) that is known to be more robust than explicit 3D representations.Although 3D rendering allows users to peep at occluded backgrounds, we further extend the soft 3D representation for disocclusion rendering.

Contributions
In summary, we contribute to medical imaging and display in the following ways:

•
We propose a video player that allows the user to selectively convert a portion of a long-shot McSL video into free viewpoint images to minimize time and data until playbacks.The view synthesis is available only after a successful calibration, which is challenging with non-feature-rich multi-view images in surgical videos.We, therefore, implement a calibration strategy that utilizes multiple frames over time. 1.
On-site McSL video recording: We record a surgery case with a McSL (Figure 1a).At this point, no additional effort is required (e.g., no calibration pattern recording is necessary).

2.
Playback with our McSL video player: The user (e.g., medical trainees) plays the video and selects a frame to be reconstructed in 3D (Figure 1b, Section 2.3).

3.
McSL calibration: Upon frame selection, we run our calibration algorithm to calculate the cameras' intrinsic and extrinsic parameters (Figure 1c, Section 2.1).

4.
View synthesis: The 3D frame data are generated and saved (Section 2.2), and the selected frame is highlighted in the player.Therefore, the user can switch between the 2D and 3D viewers.We provide ways to avoid occlusions in surgical fields.Given that the frame is in 3D representation, we can display the frame on an autostereoscopic display (Figure 1d).
Our McSL consists of the following components.

Calibration with a McSL Video
The relative poses of multi-cameras in McSL change every time a surgeon manipulates the McSL position.This mechanism prevents photographing a calibration target and calibrating the cameras before the surgery.Instead, we propose to use surgical video frames for calibration.However, surgical video frames are not feature-rich (see an example in Figure 1a).Therefore, to collect more features, we utilize feature points and their matching results that appear on every available time frame.
The key idea is to treat the scene geometry that changes over time as a whole structure observed at a time since such structure changes are irrelevant from the inter-camera poses.Assuming that M consecutive frames are recorded under a fixed McSL location, we randomly select N (≤M) multi-camera frames.At a time frame, we extract feature points [30] at every camera image and run matching [31] between McSL cameras.We obtain N such sets of inter-camera feature correspondences and use all correspondences as if all features were detected at a time frame (Figure 2).Finally, we perform structure from motion (SfM), followed by bundle adjustment, for which we rely on the COLMAP framework [32,33].

Three-Dimensional Frame Generation
Users can choose to generate either a 3D mesh or a recently emerged multi-plane image (MPI) for an in-depth investigation of the surgery.We discuss the advantages and disadvantages of those different types of 3D frames through a comparison on an autostereoscopic display (see Section 3).
Mesh Generation.We create a 3D mesh from a selected multi-view frame.Given the intrinsic and extrinsic parameters from our calibration procedure, a 3D mesh is constructed from a 3D point cloud from COLMAP using a multi-view stereo library, OpenMVS [27].

MPI Generation and Disocclusion
Rendering.Given the same multi-camera data for the mesh generation, we can generate an MPI using local light field fusion (LLFF) [29].LLFF is a 3D convolutional neural network that can infer multi-layer scene representation (aka.MPI) [34] or soft 3D representation [28] from calibrated multi-view images.MPI consists of multiple RGB+α layers spaced along the camera forward direction in inverse depth for efficiency.LLFF blends multi-view MPI depending on per-camera weights for quality: where w C i represents a blending weight (scalar) of i −th rendered MPI (among MPIs of individual McSL cameras).K means the total number of cameras (i.e., K = 5 in our case of McSL.See Figure 1a).We calculate Euclid distances between the rendered view and the five McSL cameras so that we blend closer cameras with more weights and vice versa.c i is a pixel color (RGB values) of the i th rendered MPI and α i is the corresponding opacity.t is a small value to avoid zero division.
We apply the LLFF inference to a selected McSL frame to render the frame in 3D.To avoid occlusion, users can control the viewpoint until the rendered view disoccludes the obstacle.Furthermore, given a camera occlusion metric, we can down-weight cameras under occlusion: where w O i is a per-camera value that suggests disocclusion significance in the camera view (e.g., the camera score [1]).

McSL Video Player
Our video player allows the user to switch between the 2D video and 3D scene viewers (Figure 1b,d).The 2D video player performs similarly to a conventional video player of a selected camera view or arranged multi-camera views in a grid.Upon a request from the user (a button click), MPI or 3D mesh at the time frame is generated in the background.The user can choose which 3D representation to generate.Once the 3D representation generation is completed, a highlight appears over the sequence bar.Clicking the highlight toggles the 3D scene viewer.The 3D scene is displayed in an autostereoscopic display for a glasses-free experience.Users can adjust the position, orientation, and scale of the 3D representation with mouse dragging and scrolling.
MPI takes several seconds for generation (with our setup using NVIDIA GeForce RTX 3080 GPU, MPI generation for five viewpoints takes approximately 20 s) and increases the data amount, per camera, by a factor of L (=32 by default) MPI layers, each of which adds an additional α channel.The 3D mesh likewise requires an amount of time and data to generate.As such, we do not generate 3D representation at every frame, minimizing the time and storage.

Results
We evaluate our calibration system in two experiments using real surgical videos.We first present a quantitative evaluation of the McSL camera calibration (Section 2.1).Then, we demonstrate disocclusion view synthesis using MPI (Section 2.2).
Note that other 3D surgical video reconstruction works than this study are for laparoscopic surgery and these approaches assume a single-view moving video input with fewer significant occlusions in the input [5,6].In our McSL setups, one or a few cameras are significantly or entirely occluded.Namely, [5,6] and we explore distinct problems and, thus, direct comparison is difficult.

McSL Calibration Accuracy
We prepared McSL videos with five cameras of three different types of surgeries: polysyndactyly surgery, cleft lip surgery, and accessory auricle surgery (Figure 3).To investigate the impact of frame counts used in calibration, we randomly selected N = 1, 10, 20, . .., 100 frames 10 times in each video and performed the calibration.We calculated the mean and standard deviation values of reprojection errors for each number of frames and the success ratios.For the success ratios, we calculated the number of fails of COLMAP bundle adjustment over all trials.

Results
We evaluate our calibration system in two experiments using real surgical videos.We first present a quantitative evaluation of the McSL camera calibration (Section 2.1).Then, we demonstrate disocclusion view synthesis using MPI (Section 2.2).
Other 3D surgical video reconstruction works except this study are for laparoscopic surgery, and these approaches assume a single-view moving video input with less significant occlusions in the input [? ?].In our McSL setups, one or a few cameras are significantly or entirely occluded (See typical examples in Figure 7a, b).Namely, [? ?] and we explore distinct problems, and thus, direct comparison is difficult.
In 3D reconstruction, in general, COLMAP [? ?] is the de facto standard and state-ofthe-art method.Although the approach is extended with a graph-based neural network approach (i.e., SuperPoint [? ] and SuperGlue [? ]) instead of SIFT features and heuristic matching, the result is summarized as 1-frame (the leftmost result) in Figure 4 in our experiment.

McSL Calibration Accuracy
We prepared McSL videos with five cameras of three different types of surgeries: polysyndactyly surgery, cleft lip surgery, and accessory auricle surgery (Figure 3).To investigate the impact of frame counts used in calibration, we randomly selected N = 1, 10, 20, ..., 100 frames 10 times in each video and performed the calibration.We calculated the mean and standard deviation values of reprojection errors for each number of frames and the success ratios.For the success ratios, we calculated the number of fails of COLMAP bundle adjustment over all trials.
Figure 4 summarizes the results.A 1-frame input is not enough for the calibration because of the lack of features in the frames.Increasing the number of inputs increases the success ratio.However, no improvement is observed over the 20-frame input.The use of more than 70 frames shows a slight decrease in success ratio, perhaps due to the increased interference of erratic feature matching.The reprojection errors stay very similar regardless of the number of images.
Figure 5 presents a qualitative result of our calibration method with the 20-frame input.The red cones represent the camera view frustum, and their heights indicate the focal lengths.The direction and position of a cone represent the extrinsic parameters.The points below the red cones are a 3D point cloud reconstruction.These results show our method provides qualitatively reasonable performance.

3D Mesh Rendering
We show the 3D mesh generation results.As in Section 3.3, we used 20 randomly   A 1-frame input is apparently not enough for calibrating the system.Almost identical mean errors and standard deviations are observed.However, the success ratio reaches 1.0 in a 20-frame input.Notably, with more than 70 frames, the success ratio is slightly lower, which suggests that including a significantly larger number of images would introduce erratic and less beneficial images for calibration.A 1-frame input is apparently not enough for calibrating the system.Almost identical mean errors and standard deviations are observed.However, the success ratio reaches 1.0 in the 20-frame input.Notably, with more than 70 frames, the success ratio is slightly lower, which suggests that including a significantly larger number of images would introduce erratic and less beneficial images for calibration.
In 3D reconstruction, in general, COLMAP [32,33] is the de facto standard and stateof-the-art method.Although the approach is extended with a graph-based neural network approach (i.e., SuperPoint [30] and SuperGlue [31]) instead of SIFT features [35] and heuristic matching, the result is summarized as 1-frame (the leftmost result) in Figure 4 in our experiment.

Three-Dimensional Mesh Rendering
We show 3D mesh generation results.From the results in Section 3.1, we used 20 randomly selected frames for calibration in all results.Figure 6 (2nd row) shows the 3D meshes at the beginning and ending in the same surgical video and frames to compare the differences.

Occlusion-Free View Synthesis Using MPI
We show view synthesis results.As in Section 3.2, we used 20 randomly selected frames for the calibration.
Figure 6 (3rd row) shows the free-viewpoint images at the beginning and ending of a polysyndactyly surgery to compare the differences.We emphasize that such a comparison in 3D rendering is achieved because of our successful calibration.
Figure 7 shows the free-viewpoint images generated with two different disocclusion rendering strategies, as discussed in Section 2.2.Navigating the viewpoint to where no occlusion occurs simply clarifies the view (Figure 7a).With our disocclusion rendering (Equation ( 2)), the occlusion is virtually diminished at any viewpoint.We prioritized the second MPI by setting weights manually depending on the interruption degrees of the surgeon's head to demonstrate our disocclusion rendering conceptually (Figure 7b).

Experts' Comments
Using the camera's intrinsic and extrinsic parameters calculated by the calibration method in this paper, we performed 3D scene recovery in two ways: 3D mesh and MPI.Our main contributions are the pipeline and calibration system (the first and second item in Section 1.3) that enable 3D rendering (the third item in Section 1.3) in the medical imaging context.As such, we consider that asking medical doctors to provide their comments is the best way to investigate how well and in what way the generated 3D rendering performs in medical usage.We asked two medical doctors from Keio University School of Medicine to observe and provide comments on each rendering result displayed on the spatial reality display (SRD) from Sony Corporation [36], an autostereoscopic display that reproduces 3D spatial images using real-time gaze recognition and dedicated image generation algorithms.One of the medical doctors is a Prof.Dr. of the Department of Plastic and Reconstructive Surgery and the other is a Dr. of the Department of Anatomy.
Figure 8 shows the 3D mesh models and MPIs on SRD shown to the doctors.The following summarizes the positives (P) and negatives (N) from the medical doctors. in McSL videos are obvious and easy to detect visually, automation of the process is a reasonable future extension.In the previous approach [? ], experts need to align images from McSL by hand, although the resultant alignment is image-to-image registration in 2D.Once a frame chunk is specified, our approach can automate the process, and the resultant alignment is 3D registration and 3D view synthesis.
Blurry artifacts.We observe blurry artifacts in MPI rendering results, especially when there are significant occlusions.The LLFF network, which we rely on for MPI generation, implicitly finds color matches between images.Therefore, ambiguous mismatches introduced by occlusion pixels can lead to blurred pixels.Denser camera arrangements can provide appropriate plenoptic sampling [? ].However, further quality improvements in MPI inference using non-feature-rich images exceeds our focus and remains for future work.

Scale ambiguity.
As pointed out by the medical doctors, scale ambiguity needs to be manually resolved in our system.Camera baselines or scene objects with a known scale can potentially resolve this issue.Whereas the former varies depending on McSL video frames, the latter can be estimated from known faces, hands, and tools in the field of view.

MPI.
• (P1) Feeling of being there, however, worse than 3D mesh; Although MPI enables new disocclusion rendering, as demonstrated in Section 3.3, there are clear limitations.MPI has more disadvantages and one of the main reasons would be wider baselines between the cameras in McSL.One medical doctor commented that "it is difficult to list advantages from a clinical point of view" due to the limitations.Nonetheless, we should note that MPI can be extended to videos rather more straightforwardly than the 3D mesh [18].As a common issue, the medical doctors must have adjusted the scale manually for individual scenes, given that the scene scale is unknown and thus different in each scene.

Discussion
Our approach successfully estimates the intrinsic and extrinsic parameters of multiview cameras in McSL using non-feature-rich multi-view image sequences.The generated multi-layer scene enables disoccluding visual disturbances, such as a surgeon's head, for better surgical field visibility.However, we found several limitations to be addressed that indicate future research directions.
Shorter video extraction.The current calibration procedure assumes that the user extracts a shorter video, in which McSL does not move, from a long video.Although such changes in McSL videos are obvious and easy to detect visually, automation of the process is a reasonable future extension.In the previous approach [1], experts need to align images from McSL by hand, although the resultant alignment is image-to-image registration in 2D.Once a frame chunk is specified, our approach can automate the process, and the resultant alignment is 3D registration and 3D view synthesis.
Blurry artifacts.We observe blurry artifacts in MPI rendering results, especially when there are significant occlusions.The LLFF network, which we rely on for MPI generation, implicitly finds color matches between images.Therefore, ambiguous matches introduced by occlusion pixels can lead to blurred pixels.Denser camera arrangements can provide appropriate plenoptic sampling [29].However, further quality improvements in MPI inference using non-feature-rich images exceeds our focus and remains for future work.
Scale ambiguity.As pointed out by the medical doctors, scale ambiguity needs to be manually corrected in our system.Camera baselines or scene objects with a known scale can potentially resolve this issue.Whereas the former varies depending on McSL video frames, the latter can be estimated from known faces, hands, and tools in the field of view.

Conclusions
This paper presents a system to calibrate McSL cameras using non-feature-rich multiview frames and recover a 3D scene representation.Given the long-shot surgical videos from McSL, the system provides a way to seek video frames where the user wishes to recover the 3D structure.Given that McSL camera configurations can change every time surgeons move McSL, calibration must be performed with a selected shorter clip, in which McSL is static.Within the shorter clip, our calibration algorithm randomly selects multiple frames to robustly estimate the intrinsic and extrinsic parameters.Owing to the robust calibration, the system can generate a 3D mesh or a multi-layer scene representation to provide 3D frames.The evaluation results using real surgical videos revealed the advantages and the reasonable number of frames for the calibration, and the future challenges in automation and rendering quality improvements.

Figure 1 .
Figure 1.Pipeline of our system.Our system consists of four steps, including (a) on-site McSL video recording, (b) playback with our McSL video player, (c) McSL calibration, and (d) view synthesis on an autostereoscopic display.The view synthesis is available only after a successful calibration, which is challenging with non-feature-rich multi-view images in surgical videos.We, therefore, implement a calibration strategy that utilizes multiple frames over time.

Figure 1 .
Figure 1.Pipeline of our system.Our system consists of four steps, including (a) on-site McSL video recording, (b) playback with our McSL video player, (c) McSL calibration, and (d) view synthesis on an autostereoscopic display.The view synthesis is available only after a successful calibration, which is challenging with non-feature-rich multi-view images in surgical videos.We, therefore, implement a calibration strategy that utilizes multiple frames over time.

Figure 2 .
Figure 2. Our multi-frame multi-view feature matching strategy.We randomly select N (≤M) frames from M consecutive frames with a fixed McSL location.We then extract feature points from the camera images in each frame and run matching between the McSL cameras.The resulting correspondences between all N sets of inter-camera features are used to determine the geometric relationship between the cameras.

Figure 3 .
Figure 3. Examples of none-feature-rich frames.Each shows one of five view images in McSL in three different surgeries: Polysyndactyly, cleft lip, and accessory auricle surgery (from left to right).

Figure 3 .
Figure 3. Examples of none-feature-rich frames.Each shows one of five view images in McSL in three different surgeries: polysyndactyly, cleft lip, and accessory auricle surgery (from left to right).

Figure 4
Figure 4 summarizes the results.A one-frame input is not enough for the calibration because of the lack of features in the frames.Increasing the number of inputs increases the success ratio.However, no improvement is observed over the 20-frame input.The use of more than 70 frames shows a slight decrease in success ratio, perhaps due to the increased interference of erratic feature matching.The reprojection errors stay very similar regardless of the number of images.2023, 1, 0 7 of 11

Figure 4 .
Figure 4. Reprojection errors and success ratios over various numbers of images for McSL calibration.A 1-frame input is apparently not enough for calibrating the system.Almost identical mean errors and standard deviations are observed.However, the success ratio reaches 1.0 in a 20-frame input.Notably, with more than 70 frames, the success ratio is slightly lower, which suggests that including a significantly larger number of images would introduce erratic and less beneficial images for calibration.

Figure 4 .
Figure 4. Reprojection errors and success ratios over various numbers of images for McSL calibration.A 1-frame input is apparently not enough for calibrating the system.Almost identical mean errors and standard deviations are observed.However, the success ratio reaches 1.0 in the 20-frame input.Notably, with more than 70 frames, the success ratio is slightly lower, which suggests that including a significantly larger number of images would introduce erratic and less beneficial images for calibration.

Figure 5
Figure 5 presents a qualitative result of our calibration method with a 20-frame input.The red cones represent the camera view frustum and their heights indicate the focal lengths.The camera views are posed with extrinsic parameters.The colored points represent a 3D point cloud reconstruction.These results show our method provides qualitatively reasonable performance.

Figure 5 .
Figure 5. Qualitative results of our calibration method with a 20-frame input.The red cones represent the camera view frustum and their heights indicate the focal lengths.The cone directions and positions represent the extrinsic parameters.The colored points are a 3D point cloud.The results are qualitatively convincing, compared to the camera poses attached to the shadowless lamp (Figure 1a).

Figure 6 .
Figure 6.Comparing the (a) beginning and (b) ending of a polysyndactyly surgery in 3D mesh and MPI at the selected McSL frames.Notice the difference in the suture of the little toe.

Figure 6 .
Figure 6.Comparing (a) beginning and (b) ending of a polysyndactyly surgery in 3D mesh and MPI at the selected McSL frames.Notice the difference in the suture of the little toe.

( a )Figure 6 .Figure 7 .
Figure 6.Comparing the (a) beginning and (b) ending of a polysyndactyly surgery in 3D mesh and MPI at the selected McSL frames.Notice the difference in the suture of the little toe.

Figure 7 .
Figure 7. Occlusion-free rendering using two techniques.These frames contain a viewpoint where the operation field is occluded by the surgeon's head (first row).Our free-viewpoint rendering techniques (a) by changing the viewpoint in 3D or (b) by disocclusion rendering (Equation (2)) can reveal the occluded area.

Figure 8 .
Figure 8. 3D mesh and MPI on Sony SRD.We showed polysyndactyly, cleft lip, and accessory auricle surgery images to medical doctors to review their comments.

Figure 8 .
Figure 8. 3D mesh and MPI on Sony SRD.We showed polysyndactyly, cleft lip, and accessory auricle surgery images to medical doctors to review their comments.
Very close impression to what we see in actual surgeries regarding appearance and shape; • (N1) The blind spots on the sides are sharply vertical, unlike the actual shape; • (N2) The scale should be adjusted.