Three-Dimensional Stitching of Binocular Endoscopic Images Based on Feature Points

: There are shortcomings of binocular endoscope three-dimensional (3D) reconstruction in the conventional algorithm, such as low accuracy, small ﬁeld of view, and loss of scale information. To address these problems, aiming at the speciﬁc scenes of stomach organs, a method of 3D endoscopic image stitching based on feature points is proposed. The left and right images are acquired by moving the endoscope and converting them into point clouds by binocular matching. They are then preprocessed to compensate for the errors caused by the scene characteristics such as uneven illumination and weak texture. The camera pose changes are estimated by detecting and matching the feature points of adjacent left images. Finally, based on the calculated transformation matrix, point cloud registration is carried out by the iterative closest point (ICP) algorithm, and the 3D dense reconstruction of the whole gastric organ is realized. The results show that the root mean square error is 2.07 mm, and the endoscopic ﬁeld of view is expanded by 2.20 times, increasing the observation range. Compared with the conventional methods, it does not only preserve the organ scale information but also makes the scene much denser, which is convenient for doctors to measure the target areas, such as lesions, in 3D. These improvements will help improve the accuracy and efﬁciency of diagnosis.


Introduction
Endoscopy has advantages such as high resolution and less trauma, in a wide range of applications in clinical diagnosis and treatment. Compared with traditional endoscopes, binocular endoscopes have three-dimensional (3D) imaging functions, which can provide surgical depth information. This helps doctors operate the endoscope accurately, efficiently and safely, and facilitates the 3D reconstruction of human organs [1,2]. The 3D reconstruction of organs, especially real-time dense 3D reconstruction, makes the picture more consistent with the real scene. It helps doctors judge the important anatomical structure and its spatial location accurately, improve the speed and safety of surgery significantly, and reduce the pain of patients, which is of great significance in clinical surgery [3,4].
Currently, the 3D reconstruction of binocular endoscopes mainly relies on a binocular matching algorithm [5]. The basic principle is to imitate the binocular stereo vision of the human eyes and then obtain the parallax value of the object point in the left and right views by the way of feature matching. Finally, build the model combined with binocular camera parameters. For example, Zhou et al. [6] extracted the blood vessels of the fundus images based on a binocular vision for stereo matching and realized the 3D reconstruction of the retinal blood vessel image. However, they are all based on single-view 3D reconstruction, and the limitations of the field of view will affect the accuracy and safety of doctors' diagnoses and surgical operations.
The 3D reconstruction of the complete scene mainly relies on 3D stitching technology. Common methods include multi-view geometry and point cloud registration [7,8]. At present, many research teams have developed 3D reconstruction systems of endoscopic images based on multi-view geometry. For example, Mahmoud et al. [9] used the simultaneous localization and mapping (SLAM) method to map the abdominal cavity monocular image dynamically. Widya et al. [10] used structure from motion (SFM) to reconstruct gastric organs based on a gastroscopy video. This type of method is generally sparse reconstruction, with low accuracy and loss of scale information, which is not conducive to the observation and measurement of the reconstruction results in the later stage. Moreover, the method requires different viewing angles to be observed through a moving camera, which is difficult to operate in a narrow space such as human organs. In terms of point cloud registration, random sample consensus (RANSAC) is a matching algorithm based on the extraction of the 3D features of the point cloud [11]. Under normal conditions, the surface of organs such as the gastrointestinal tract is relatively smooth, and their texture features are insufficient to be used for feature matching by RANSAC. While iterative closest point algorithm (ICP), the most widely used point cloud registration method, obtains the optimal transformation matrix by constantly repeating the search. Although it is easy to understand and has a desirable result, it relies on the initial matrix heavily, which not only falls into a locally optimal solution but also requires huge computing resources [12].
In this study, we propose the 3D stitching method based on feature points. Through the detection and matching of the feature points of the endoscopic image and the registration of multiple point clouds, a complete dense 3D reconstruction of the stomach model is realized, which expands the observation range of the doctors and assists in the operation. At the same time, it can lay the foundation for later surgical navigation.

Methods
The specific process of the proposed 3D stitching method of endoscopic images based on feature points is demonstrated in Figure 1. First, the binocular endoscope is operated to obtain the left and right image sequences of the stomach model at different sites. Next, the semi-global matching algorithm (SGBM) is implemented to generate the disparity maps, from which the point cloud under each site can be obtained. Then they are preprocessed for outlier culling and down-sampling. At the same time, feature extraction and matching are performed on the adjacent left image sequences. The offset of the key-point in the X-axis and Y-axis can be obtained and from which the corresponding initial matrix can be calculated. Finally, the point clouds are registered and spliced through the improved ICP algorithm to achieve the 3D reconstruction of the entire stomach organ.

Binocular Endoscope Calibration
The internal and external parameters of the binocular camera were calculated by using Zhang Zhengyou's checkerboard calibration method [13]. The black and white checkerboard used as the calibration board was shot in different poses within the working distance of the binocular endoscope, and the OpenCV library was used to calibrate the captured binocular images.

Binocular Endoscope Calibration
The internal and external parameters of the binocular camera were calculated by using Zhang Zhengyou's checkerboard calibration method [13]. The black and white checkerboard used as the calibration board was shot in different poses within the working distance of the binocular endoscope, and the OpenCV library was used to calibrate the captured binocular images.
After the binocular calibration, 3D information can be calculated from two-dimensional images based on binocular stereo vision and the triangulation method. The principle of binocular stereo is shown in Figure 2. O L and O R are the optical centers of the left and right cameras located on the same horizontal line, respectively, whose distance is the binocular baseline length. The world coordinate system O-XYZ is established with the optical center of the left camera as the origin, and the pixel coordinate system o-uv is established with the upper left corner of the left and right image planes as the origin. The coordinates of the object point P in the world coordinate system are X, Y, Z. The vertical coordinates of the image points on the left and right image planes are equal, and the difference between the horizontal coordinates u L and u R is the disparity d. According to the triangle similarity relationship, the distance Z from the object point P to the camera can be calculated according to the triangulation formula, Z = (f *b)/d, the X and Y coordinates of the object point P can be calculated according to the camera parameters, the formula is as follows: After the binocular calibration, 3D information can be calculated from two-dimensional images based on binocular stereo vision and the triangulation method. The principle of binocular stereo is shown in Figure 2. OL and OR are the optical centers of the left and right cameras located on the same horizontal line, respectively, whose distance is the binocular baseline length. The world coordinate system O-XYZ is established with the optical center of the left camera as the origin, and the pixel coordinate system o-uv is established with the upper left corner of the left and right image planes as the origin. The coordinates of the object point P in the world coordinate system are X, Y, Z. The vertical coordinates of the image points on the left and right image planes are equal, and the difference between the horizontal coordinates uL and uR is the disparity d. According to the triangle similarity relationship, the distance Z from the object point P to the camera can be calculated according to the triangulation formula, Z = (f*b)/d, the X and Y coordinates of the object point P can be calculated according to the camera parameters, the formula is as follows: (1) The image plane of the binocular camera system is in the same plane and its focal length is the same under the condition of pole line correction. Thus, the reprojection matrix Q of the binocular system represent the internal and external parameter information of the binocular camera and the definition is as follows: where is the disparity obtained by binocular matching; and are the two-dimensional coordinates in the pixel coordinate system; X, Y, Z and W are the corresponding unnormalized 3D coordinates and the normalized coefficient, respectively.

Binocular Image Acquisition and Matching
After the calibration, the binocular endoscope was operated to shooting the stomach model to obtain a total of 16 pairs of left and right images (1280 × 800) of the upper and lower halves of the stomach model. The captured image could cover the main area of the stomach model. Through the binocular matching algorithm, the disparity map was obtained to generate the corresponding point cloud. Among the many matching algorithms, The image plane of the binocular camera system is in the same plane and its focal length is the same under the condition of pole line correction. Thus, the reprojection matrix Q of the binocular system represent the internal and external parameter information of the binocular camera and the definition is as follows: where d is the disparity obtained by binocular matching; x and y are the two-dimensional coordinates in the pixel coordinate system; X, Y, Z and W are the corresponding unnormalized 3D coordinates and the normalized coefficient, respectively.

Binocular Image Acquisition and Matching
After the calibration, the binocular endoscope was operated to shooting the stomach model to obtain a total of 16 pairs of left and right images (1280 × 800) of the upper and lower halves of the stomach model. The captured image could cover the main area of the stomach model. Through the binocular matching algorithm, the disparity map was obtained to generate the corresponding point cloud. Among the many matching algorithms, SGBM has improved the cost calculation of the original semi-global matching (SGM) and added the post-processing part, which is faster than the global matching method (GC) and is more accurate than the local matching algorithm (SAD). Thus, the SGBM algorithm was selected for binocular matching [14]. Here is the procedure of SGBM. First, the left image is filtered along the horizontal direction. Then Birchfield-Tomasi algorithm is used to calculate the cost of both the left and right images. Using the multipath constraint aggregation method to calculate the effect of minimizing the global energy function, the corresponding minimum generation value in the left and right graphs can be obtained as the matching point. At last, disparity refinement is carried out, including confidence detection, subpixel interpolation and leftright consistency detection.

Point Cloud Preprocessing
Due to the texture-less, weak details and other unsatisfying areas in the endoscopic image, the disparity maps obtained by the SGBM matching algorithm still have mismatching errors, resulting in a lot of outliers in the generated point cloud [15]. These noises not only affect the visualization but also cause interference in the point cloud registration process. Therefore, it is necessary to eliminate outliers in the point cloud.
We used radius filtering to eliminate outliers. This method assumed that each valid point in the original point cloud contains at least a certain number of points in the specified radius neighborhood. The points in the original point cloud that meet the assumptions were regarded as normal points and kept, otherwise, they were regarded as noise points and removed. Since the noise deviation caused by the mismatch was very large, the method had a good removal effect on outliers.
A large amount of noise with a fixed value of the disparity was generated due to non-uniform illumination and other reasons, which was converted into a point cloud and concentrated in a specific area. Thus, the final point cloud used for stitching could be obtained by splitting the point cloud to intercept point cloud areas within the correct depth range.
The resolution of the endoscope image used in the experiment was 1280 × 800, so the number of points of point cloud generated from the disparity map reached millions. To improve the speed of registration while maintaining the shape characteristics of the point cloud, the point cloud was implemented with a voxel grid down-sampling process. By setting the voxel parameter, the current grid was represented by a bit of center of gravity in the set edge-length grid to reduce the stitching complexity.

Feature Detection and Matching
The feature point method is a common method for camera motion estimation. Feature points are composed of two parts: key points and descriptors. Key points refer to the position of the feature point in the image, and some also have information such as orientation and size. The descriptor is usually a vector that describes the information of the pixels around the key point. Commonly used feature detection algorithms include SURF, SIFT, ORB, etc. [16]. ORB is not scale-transformable and can only be applied to scenes that are directly photographed. The advantage of the SIFT algorithm is that the features are stable, and it remains invariant to rotation, scale transformation, and brightness. It also has a certain degree of stability to viewing angle transformation and noise. The disadvantage of SIFT is that the real-time performance is not satisfying and that the ability to extract feature points of smooth edges is weak. SURF is an improvement of SIFT, which uses a more efficient way to complete feature extraction and description to improves the calculation speed and robustness [17].
The point cloud of each site was generated based on the disparity map corresponding to the left image. Therefore, for adjacent two-frame peeping images, feature detection and matching were performed through the left images to estimate the movement of the camera. Considering accuracy, robustness and calculation speed, as well as the specific scene of the endoscope, we used the SURF feature method and the specific process is as follows: • Construct the Hessian matrix to generate all the points of interest for feature extraction. Feature point matching. The matching degree is determined by calculating the Euclidean distance between two feature points. The shorter the Euclidean distance, the better the matching degree between the two feature points.

Multiple Point Cloud Registration
In this study, the multiple point clouds registration method was used to register the point cloud sequence for 3D stitching. The former point cloud of two adjacent sites was taken as the source point cloud, and the latter point cloud was taken as the target point cloud. ICP was used for registration. The basic idea of the ICP algorithm is to approximate the two closest points from two clouds as the same point. For two given point clouds P and Q, the initial positional relationship is [R 0 |t 0 ]. Select any point of the point cloud P in the initial pose relationship pi and the nearest point qi in the point cloud Q as a matching point pair to establish an error function, and obtain the best value of R and t by making the error function reach the minimum. The above process is one round of iteration. Combine the obtained R and t into a new position relationship [R 1 |t 1 ], update the position relationship of the point cloud, continue to repeat the above process, and finally achieve one of the two conditions of convergence of the error function or reaching the upper limit of the number of iterations. For the first pair of points, the error can be expressed as: Therefore, using the least-squares method, the error function can be expressed as: The initial transformation matrix will affect the speed and accuracy of ICP registration greatly. Generally, the epipolar geometry method is used to recover the camera motion between two frames through the correspondence between two-dimensional image sites. In this process, t will be normalized, which directly leads to degree uncertainty, making the final reconstruction unable to carry out the 3D measurement. In this study, the initial transformation matrix was generated by the result of feature matching. According to the position changes of the matching points in the left view of the two adjacent fields of view, the average values of the offset in the X-axis and Y-axis were calculated as the initial value of the 3D transformation matrix in the translation matrix part. Since there were two adjacent sites, the transformation size of the point clouds in the X-axis and Y-axis was much larger than the change in the Z-axis and the rotation matrix.
For each pair of point clouds, two thresholds of size were set to make two registrations of coarse and fine to improve the calculation efficiency under the premise of ensuring that the point cloud registration had a better result. The rotation part in the transformation matrix was non-linear, resulting in the ICP solution being essentially a non-linear least squares problem. Considering that the movement in a short time was very small and the rotation angle was about 0, the ICP solution was approximately converted to a linear least-squares problem.
After obtaining the pose graph, that was, the point cloud nodes and the edges of the transformation matrix containing the point cloud registration, the graph was optimized through the Bundle Adjustment algorithm [18] to reduce the accumulation of pose estimation errors during the registration process. Finally, the 3D stitching of the endoscopic images was completed, and the 3D reconstruction of the entire stomach model was realized.

Results of System Construction and Calibration
In this paper, we established a 3D stitching experiment system for binocular endoscopes, and the structure of the system is shown in Figure 3a. The prototype system mainly includes the binocular gastroscope body, the LED light source, the signal acquisition circuit, and the image processing workstation. The scope probe (Figure 3c) contains the binocular camera system, the water and air channel, the surgical instrument channel, and the lighting system. The camera parameters are as follows: the focal length is 1059.6 pixels, the baseline distance is 5.9 mm, the CMOS size is 1.75 µm × 1.75 µm, and the camera principal point coordinates c x and c y are 633.6 pixels and 367.1 pixels, respectively. A computer with an Intel Core i5-8265U with 1.6 GHz and 1.8 GHz processors and 8GB RAM is used in this research. The algorithm is realized by PCL1.9.1 [19] in Microsoft Visual Studio 2017 in Windows 10. fields of view, the average values of the offset in the X-axis and Y-axis were calculated as the initial value of the 3D transformation matrix in the translation matrix part. Since there were two adjacent sites, the transformation size of the point clouds in the X-axis and Yaxis was much larger than the change in the Z-axis and the rotation matrix. For each pair of point clouds, two thresholds of size were set to make two registrations of coarse and fine to improve the calculation efficiency under the premise of ensuring that the point cloud registration had a better result. The rotation part in the transformation matrix was non-linear, resulting in the ICP solution being essentially a non-linear least squares problem. Considering that the movement in a short time was very small and the rotation angle was about 0, the ICP solution was approximately converted to a linear least-squares problem.
After obtaining the pose graph, that was, the point cloud nodes and the edges of the transformation matrix containing the point cloud registration, the graph was optimized through the Bundle Adjustment algorithm [18] to reduce the accumulation of pose estimation errors during the registration process. Finally, the 3D stitching of the endoscopic images was completed, and the 3D reconstruction of the entire stomach model was realized.

Results of System Construction and Calibration
In this paper, we established a 3D stitching experiment system for binocular endoscopes, and the structure of the system is shown in Figure 3a. The prototype system mainly includes the binocular gastroscope body, the LED light source, the signal acquisition circuit, and the image processing workstation. The scope probe (Figure 3c) contains the binocular camera system, the water and air channel, the surgical instrument channel, and the lighting system. The camera parameters are as follows: the focal length is 1059.6 pixels, the baseline distance is 5.9 mm, the CMOS size is 1.75 μm × 1.75 μm, and the camera principal point coordinates cx and cy are 633.6 pixels and 367.1 pixels, respectively. A computer with an Intel Core i5-8265U with 1.6 GHz and 1.8 GHz processors and 8GB RAM is used in this research. The algorithm is realized by PCL1.9.1 [19] in Microsoft Visual Studio 2017 in Windows 10.  was within the allowable error range. Based on the above results, the scope of endoscopic observation was 20-100 mm.

Results of Point Cloud Preprocessing
In the process of removing outliers with radius filtering, according to the current image size, the outliers less than 16 points in the sphere with a radius less than 15 were defined as outliers to be removed. As shown in Figure 4, the result of point cloud preprocessing after outlier removal is displayed, in which Figure 4a is the original point cloud, Figure 4b is the point cloud processed by radius outliers, and Figure 4c is the point cloud segmentation based on Figure 4b to remove the void part with depth value within a certain range. It can be seen in Figure 4 that after the outlier removal preprocessing, a large number of holes and mismatched points are removed, and the processed point cloud can reflect the 3D structure of the endoscopic image.
When calibrating the binocular camera, a black-and-white checkerboard with a 12 mm square edge distributed by 9 × 6 was used as the calibration board. Within the working distance of the binocular endoscope, 15 sets of checkboard images of different positions were taken. Finally, the resulting reprojection matrix = . According to the reprojection matrix, we calculated that the calibration error was 0.35 pixels, which was within the allowable error range. Based on the above results, the scope of endoscopic observation was 20-100 mm.

Results of Point Cloud Preprocessing
In the process of removing outliers with radius filtering, according to the current image size, the outliers less than 16 points in the sphere with a radius less than 15 were defined as outliers to be removed. As shown in Figure 4, the result of point cloud preprocessing after outlier removal is displayed, in which Figure 4a is the original point cloud, Figure 4b is the point cloud processed by radius outliers, and Figure 4c is the point cloud segmentation based on Figure 4b to remove the void part with depth value within a certain range. It can be seen in Figure 4 that after the outlier removal preprocessing, a large number of holes and mismatched points are removed, and the processed point cloud can reflect the 3D structure of the endoscopic image. In voxel down-sampling, the voxel parameter was set to 5, and the average point cloud size was reduced from 1,000,000 to 100,000. Table 1 shows the speed comparison before and after point cloud preprocessing. Time of reading refers to the reading time of the two point clouds in adjacent positions, and time of registration refers to the registration time of these two point clouds. It evidences that after pre-processing operations such as hole removal and down-sampling, especially after down-sampling, the time for point cloud registration is reduced significantly, and the real-time stitching is improved. Figure 5 demonstrates the registration results of adjacent point clouds before and after point cloud preprocessing. Figure 5a-c correspond to the original point cloud, point cloud with outlier removal, point cloud with down-sampling registration results, respectively. There are obvious mismatches in the red area, and the registration accuracy is improved after eliminating the mismatches and holes. In voxel down-sampling, the voxel parameter was set to 5, and the average point cloud size was reduced from 1,000,000 to 100,000. Table 1 shows the speed comparison before and after point cloud preprocessing. Time of reading refers to the reading time of the two point clouds in adjacent positions, and time of registration refers to the registration time of these two point clouds. It evidences that after pre-processing operations such as hole removal and down-sampling, especially after down-sampling, the time for point cloud registration is reduced significantly, and the real-time stitching is improved. Figure 5 demonstrates the registration results of adjacent point clouds before and after point cloud preprocessing. Figure 5a-c correspond to the original point cloud, point cloud with outlier removal, point cloud with down-sampling registration results, respectively. There are obvious mismatches in the red area, and the registration accuracy is improved after eliminating the mismatches and holes.   Figure 6 indicates the result of feature matching in the left view of two adjacent sites, where the red dots are the detected feature points, and the blue line represents the matched feature point pair. In Figure 6, the blue lines are mostly parallel lines of equal length, indicating that the movement in the X-axis and Y-axis are mainly manifested in two adjacent frames of images. The average time of feature detection and matching is 0.689 s.

Results of 3D Stitching
After the point cloud sequence is registered, the 3D reconstruction result of the binocular endoscopic image of the complete stomach model is shown in Figure 7. Figure 7a is a plan view of the stomach model, and Figure 7b is the result of the 3D stitching of the point cloud of the stomach model. In Figure 7b, the results of the 3D stitching can already display the characteristics of the main regions of the stomach model, and various textures on the model such as stomach wall folds can be displayed clearly, which is conducive to the diagnosis of the doctors. Figure 7c is the result of the field of view change. The red box is the field of view of a single image, which is 29.05°. After 16 images are spliced, the field of view is 2.20 times the original, and it achieves 63.91°, which is greatly increased. The observation range is convenient for doctors to operate.  Figure 6 indicates the result of feature matching in the left view of two adjacent sites, where the red dots are the detected feature points, and the blue line represents the matched feature point pair. In Figure 6, the blue lines are mostly parallel lines of equal length, indicating that the movement in the X-axis and Y-axis are mainly manifested in two adjacent frames of images. The average time of feature detection and matching is 0.689 s.   Figure 6 indicates the result of feature matching in the left view of two adjacent sites, where the red dots are the detected feature points, and the blue line represents the matched feature point pair. In Figure 6, the blue lines are mostly parallel lines of equal length, indicating that the movement in the X-axis and Y-axis are mainly manifested in two adjacent frames of images. The average time of feature detection and matching is 0.689 s.

Results of 3D Stitching
After the point cloud sequence is registered, the 3D reconstruction result of the binocular endoscopic image of the complete stomach model is shown in Figure 7. Figure 7a is a plan view of the stomach model, and Figure 7b is the result of the 3D stitching of the point cloud of the stomach model. In Figure 7b, the results of the 3D stitching can already display the characteristics of the main regions of the stomach model, and various textures on the model such as stomach wall folds can be displayed clearly, which is conducive to the diagnosis of the doctors. Figure 7c is the result of the field of view change. The red box is the field of view of a single image, which is 29.05°. After 16 images are spliced, the field of view is 2.20 times the original, and it achieves 63.91°, which is greatly increased. The observation range is convenient for doctors to operate.

Results of 3D Stitching
After the point cloud sequence is registered, the 3D reconstruction result of the binocular endoscopic image of the complete stomach model is shown in Figure 7. Figure  7a is a plan view of the stomach model, and Figure 7b is the result of the 3D stitching of the point cloud of the stomach model. In Figure 7b, the results of the 3D stitching can already display the characteristics of the main regions of the stomach model, and various textures on the model such as stomach wall folds can be displayed clearly, which is conducive to the diagnosis of the doctors. Figure 7c is the result of the field of view change. The red box is the field of view of a single image, which is 29.05 • . After 16 images are spliced, the field of view is 2.20 times the original, and it achieves 63.91 • , which is greatly increased. The observation range is convenient for doctors to operate.
To evaluate the accuracy of point cloud stitching, the result was registered with the scanned point cloud of the stomach model. The white part in Figure 7d is the point cloud scanned by the scanner, which can be regarded as the true value. It evidences that the two are well-matched. To know the accuracy, the root mean square error (RMS) and the relative error (RE) are computed via Equations (4) and (5). Where do i and ds i are the depth of the one point in the reconstructed point cloud and scanned point cloud, respectively. d m is the mean of the model surface. The accuracy provided by the binocular system for the model is RMS = 2.07 mm, and the relative error is 3.18%. To evaluate the accuracy of point cloud stitching, the result was registered with the scanned point cloud of the stomach model. The white part in Figure 7d is the point cloud scanned by the scanner, which can be regarded as the true value. It evidences that the two are well-matched. To know the accuracy, the root mean square error (RMS) and the relative error (RE) are computed via Equations (4) and (5). Where and are the depth of the one point in the reconstructed point cloud and scanned point cloud, respectively.
is the mean of the model surface. The accuracy provided by the binocular system for the model is RMS = 2.07 mm, and the relative error is 3.18%. To evaluate the research method further, the SFM method was used to perform 3D reconstruction of the same endoscopic image sequence, and the comparison of the obtained results is shown in Figure 8. The number of point clouds in the SFM reconstruction result is 27,568, while our method is 344,201, which shows its advantage in observing the To evaluate the research method further, the SFM method was used to perform 3D reconstruction of the same endoscopic image sequence, and the comparison of the obtained results is shown in Figure 8. The number of point clouds in the SFM reconstruction result is 27,568, while our method is 344,201, which shows its advantage in observing the details of organs. Enlarging a certain area of the organ model, this method can still reflect its 3D characteristics, and retain the scale information, allowing 3D measurement. The SFM method not only loses scale information but also makes it difficult to identify organ features due to its sparseness. details of organs. Enlarging a certain area of the organ model, this method can still reflect its 3D characteristics, and retain the scale information, allowing 3D measurement. The SFM method not only loses scale information but also makes it difficult to identify organ features due to its sparseness. The reconstruction effect in real scenes may be affected by many factors, for example, the reflection of the gastric mucosa. To verify the applicability of the method, we tried 3D stitching on the fresh stomach of a pig. A total of 10 pairs of left and right images were generated by the moving binocular endoscope. After SGBM matching and point cloud preprocessing, we stitched a complete point cloud of pig stomach by ICP registration, which contained 413,076 points. The point cloud of the pig stomach is shown in Figure 9. The results show that this method also works well in real scenes.

Discussion
A 3D stitching method of the binocular endoscope is proposed in this study, which shows many advantages when compared with other conventional algorithms.
The binocular endoscope mainly realizes 3D reconstruction through a binocular matching algorithm such as SGM and SGBM, which are only based on a single field of The reconstruction effect in real scenes may be affected by many factors, for example, the reflection of the gastric mucosa. To verify the applicability of the method, we tried 3D stitching on the fresh stomach of a pig. A total of 10 pairs of left and right images were generated by the moving binocular endoscope. After SGBM matching and point cloud preprocessing, we stitched a complete point cloud of pig stomach by ICP registration, which contained 413,076 points. The point cloud of the pig stomach is shown in Figure 9. The results show that this method also works well in real scenes. details of organs. Enlarging a certain area of the organ model, this method can still reflect its 3D characteristics, and retain the scale information, allowing 3D measurement. The SFM method not only loses scale information but also makes it difficult to identify organ features due to its sparseness. The reconstruction effect in real scenes may be affected by many factors, for example, the reflection of the gastric mucosa. To verify the applicability of the method, we tried 3D stitching on the fresh stomach of a pig. A total of 10 pairs of left and right images were generated by the moving binocular endoscope. After SGBM matching and point cloud preprocessing, we stitched a complete point cloud of pig stomach by ICP registration, which contained 413,076 points. The point cloud of the pig stomach is shown in Figure 9. The results show that this method also works well in real scenes.

Discussion
A 3D stitching method of the binocular endoscope is proposed in this study, which shows many advantages when compared with other conventional algorithms.
The binocular endoscope mainly realizes 3D reconstruction through a binocular matching algorithm such as SGM and SGBM, which are only based on a single field of

Discussion
A 3D stitching method of the binocular endoscope is proposed in this study, which shows many advantages when compared with other conventional algorithms.
The binocular endoscope mainly realizes 3D reconstruction through a binocular matching algorithm such as SGM and SGBM, which are only based on a single field of view [20]. In this research, ICP is used to splice 3D point clouds from different positions to form complete organ models. Thus, the endoscopic field of view is expanded. For example, our method can realize the 3D construction of the whole model in Figure 7c, while conventional algorithms can only realize it in a red box. At the same time, a lot of preprocessing is operated to reduce the computation, including outlier removal, point cloud down-sampling and transformation matrix calculation by feature points matching.
Compared with the conventional 3D reconstruction algorithm SFM [21], the point clouds generated by our method are much denser and retain scale information. Point cloud with scale information shows advantages in 3D measurement [22]. Therefore, our method has great potential for clinical application, for example, measurement of polyps, which is of great significance for diagnosis.
There are also some studies on the 3D reconstruction of organs based on structured light [5]. However, structured light devices are complex and difficult to be used in a clinic setting.
Currently, this method has only been applied to models and pig stomachs. Due to the complexity of the human body structure, the clinical application of this method still needs further verification.

Conclusions
In this study, a novel approach for 3D stitching of binocular endoscopic images was carried out. The point cloud of different sites is obtained through binocular matching of SGBM, and the transformation matrix is calculated through feature detection and matching of adjacent sites, which is a multi-point cloud ICP registration that reduces the amount of calculation and improves accuracy. The 3D spliced endoscopic image not only expands the field of view but also retains scale information, which is conducive to reflecting the real scene of the diseased part, facilitating the operation of the doctors. The 3D splicing method was also applied to the reconstruction of a pig stomach to verify the applicability of the method. Further investigation is needed to fully understand the capabilities and limitations of the new approach.