A Very Fast Image Stitching Algorithm for PET Bottle Caps

In the beverage, food and drug industry, more and more machine vision systems are being used for the defect detection of Polyethylene Terephthalate (PET) bottle caps. In this paper, in order to address the result of cylindrical distortions that influence the subsequent defect detection in the imaging process, a very fast image stitching algorithm is proposed to generate a panorama planar image of the surface of PET bottle caps. Firstly, the three-dimensional model of the bottle cap is established. Secondly, the relative poses among the four cameras and the bottle cap in the three-dimensional space are calculated to obtain the mapping relationship between three-dimensional points on the side surface of the bottle cap and image pixels taken by the camera. Finally, the side images of the bottle cap are unfolded and stitched to generate a planar image. The experimental results demonstrate that the proposed algorithm unfolds the side images of the bottle cap correctly and very fast. The average unfolding and stitching time for 1.6-megapixel color caps image can reach almost 123.6 ms.


Introduction
Polyethylene Terephthalate (PET) bottle caps are widely used in the medical, beverage and food industries. During the process of bottle cap production, surface defects such as scratches or deformations are unavoidable. In order to ensure product quality, surface defect detection is very essential. Traditional defect detection methods are mainly based on manual work. Its disadvantages include low efficiency, high working intensity and low accuracy. With the development of computers and image processing algorithms, defect detection methods using machine vision technology instead of human eyes have improved efficiency and accuracy [1,2].
Nevertheless, it is difficult to obtain a whole image of the bottle cap through one camera simultaneously. Therefore, obtaining a complete cylindrical bottle cap image plays an important role in the bottle cap quality inspection process. Several works have applied multiple cameras placed around a bottle cap to capture images of the bottle cap, and defect detection is performed directly on the captured images. However, due to the non-planar surface of the bottle cap, the cylindrical label is distorted and compressed during the projection imaging process. In addition, there is too much of the overlapping area between the captured image in order to obtain a complete and clear side of the bottle cap. The former can affect the results of the inspection, and the latter can increase the computational cost of defect detection. The collected real images of the bottle cap need to be spliced into a two-dimensional plane 360 • panoramic view, which can be completed by using image stitching technology [3,4]. Image stitching technology is the registration and fusion of several adjacent images or photos with overlapping areas to form a 360 • or wide-view panoramic image.
Many scholars have conducted a lot of work on image stitching [5][6][7]. Image stitching algorithms are basically divided into area-based methods and feature-based methods. Generally, area-based methods establish the transformation relationship between the image to be registered and the reference image by determining similarity measures. The

•
This paper proposes a FISA model based on projective geometry for PET bottle caps. The method can quickly unfold the full side image of the bottle cap, which can lay the foundation for subsequent defect detection. • This paper provides several settings with different image quality and different computational times. In actual applications, the settings can be flexibly chosen to meet the actual needs. cap is utilized. According to this information, a four-camera coordinate system and a cylinder bottle cap model can be established. In addition, the mapping relationship between the three-dimensional points on the side surface of the bottle cap and the image pixel points are determined. Next, the best view of cameras for the bottle cap need to be solved. The cylindrical back-projection and the interpolation operations are carried out in the regions of the best view of cameras. Finally, the flattened side images are stitched together.
The experimental results show that the stitching algorithm can unfold the full side images rapidly and correctly, and the algorithm execution speed meets the real demands. In particular, the contribution of this paper is as follows: • This paper proposes a FISA model based on projective geometry for PET bottle caps. The method can quickly unfold the full side image of the bottle cap, which can lay the foundation for subsequent defect detection.

•
This paper provides several settings with different image quality and different computational times. In actual applications, the settings can be flexibly chosen to meet the actual needs.

Algorithm Framework
The structure of the hardware system is shown in Figure 1. The system is mainly composed of four sets of industrial cameras, lenses, LED light sources and one PC. The camera model is Hikvision MV-CA016-10UC (Hangzhou Hikvision Digital Technology Co.,Ltd, Hangzhou, China.), which is a 1.6-megapixel color camera. The focal length of the lens is 6 mm. Four cameras are mounted surrounding the bottle cap at 90° intervals horizontally, and the side images of the bottle cap are collected and transmitted to the PC.
The flowchart of the proposed algorithm is shown in Figure 2. Through calibration, the intrinsic and extrinsic parameters of the camera can be obtained. Then, a four-camera coordinate system and the three-dimensional (3D) bottle cap model are built. After that, the mapping relationship between the cap's 3D points and corresponding image pixels is established. Finally, the cylindrical bottle cap images are projected onto a rectangular plane.

Algorithm Framework
The structure of the hardware system is shown in Figure 1. The system is mainly composed of four sets of industrial cameras, lenses, LED light sources and one PC. The camera model is Hikvision MV-CA016-10UC (Hangzhou Hikvision Digital Technology Co., Ltd., Hangzhou, China), which is a 1.6-megapixel color camera. The focal length of the lens is 6 mm. Four cameras are mounted surrounding the bottle cap at 90 • intervals horizontally, and the side images of the bottle cap are collected and transmitted to the PC.
The flowchart of the proposed algorithm is shown in Figure 2. Through calibration, the intrinsic and extrinsic parameters of the camera can be obtained. Then, a four-camera coordinate system and the three-dimensional (3D) bottle cap model are built. After that, the mapping relationship between the cap's 3D points and corresponding image pixels is established. Finally, the cylindrical bottle cap images are projected onto a rectangular plane. cap is utilized. According to this information, a four-camera coordinate system and a cylinder bottle cap model can be established. In addition, the mapping relationship between the three-dimensional points on the side surface of the bottle cap and the image pixel points are determined. Next, the best view of cameras for the bottle cap need to be solved. The cylindrical back-projection and the interpolation operations are carried out in the regions of the best view of cameras. Finally, the flattened side images are stitched together.
The experimental results show that the stitching algorithm can unfold the full side images rapidly and correctly, and the algorithm execution speed meets the real demands. In particular, the contribution of this paper is as follows: • This paper proposes a FISA model based on projective geometry for PET bottle caps. The method can quickly unfold the full side image of the bottle cap, which can lay the foundation for subsequent defect detection.

•
This paper provides several settings with different image quality and different computational times. In actual applications, the settings can be flexibly chosen to meet the actual needs.

Algorithm Framework
The structure of the hardware system is shown in Figure 1. The system is mainly composed of four sets of industrial cameras, lenses, LED light sources and one PC. The camera model is Hikvision MV-CA016-10UC (Hangzhou Hikvision Digital Technology Co.,Ltd, Hangzhou, China.), which is a 1.6-megapixel color camera. The focal length of the lens is 6 mm. Four cameras are mounted surrounding the bottle cap at 90° intervals horizontally, and the side images of the bottle cap are collected and transmitted to the PC.
The flowchart of the proposed algorithm is shown in Figure 2. Through calibration, the intrinsic and extrinsic parameters of the camera can be obtained. Then, a four-camera coordinate system and the three-dimensional (3D) bottle cap model are built. After that, the mapping relationship between the cap's 3D points and corresponding image pixels is established. Finally, the cylindrical bottle cap images are projected onto a rectangular plane.  The imaging principle of the camera is the basis of the method in this paper. The geometric model of the camera imaging system [19] is shown in Figure 3. There are four coordinate systems: the world coordinate system X w Y w Z w , the camera coordinate system X c Y c Z c , the image coordinate system x img o img y img and the pixel coordinate system u pix o pix v pix . The imaging principle of the camera is the basis of the method in this paper. The geometric model of the camera imaging system [19] is shown in Figure 3. There are four coordinate systems: the world coordinate system w w w X Y Z , the camera coordinate system  When the lights reflected from an object's surface are converged through the lens to a point (focal point), the object's image is formed on the imaging plane. For the convenience of observation, the imaging plane is assumed to be located between the pinhole and the object. The imaging direction of the subject is consistent with the actual direction.
where α and β are the scale factors of the length and pixel value along horizontal and vertical axes, respectively. R represents the 3 × 3 rotation matrix, and t represents the 3×1 translation vector. K is the camera intrinsic parameter matrix, and M is the camera extrinsic parameter matrix.

Solving the Four-Camera Coordinate System
Firstly, Zhang's [20] calibration method is used to calibrate the four cameras respectively, and thus the intrinsic parameters of the four cameras are obtained. The intrinsic parameters are composed of the focal length f, the distortion coefficient, main point coordinates 00 ( , ) uv , etc., which establish the mapping relationship from the pixel coordinate system to the camera coordinate system. Then, the extrinsic parameters of the four cameras need to be solved. The extrinsic parameter matrix is composed of the rotation matrix R and the translation matrix t. When the lights reflected from an object's surface are converged through the lens to a point (focal point), the object's image is formed on the imaging plane. For the convenience of observation, the imaging plane is assumed to be located between the pinhole and the object. The imaging direction of the subject is consistent with the actual direction. o c is the origin of the camera coordinate system X c Y c Z c . The plane X c O c Z c is parallel to the imaging plane x img o img y img . Z c is the optical axis. The distance between the optical center and the imaging plane is the focal length f . Both the image coordinate system x img o img y img and the pixel coordinate system u pix o pix v pix are on the imaging plane. o img is the origin of the image coordinate system, whose value is (u 0 , v 0 ). The relationship between these coordinate systems is defined as: where α and β are the scale factors of the length and pixel value along horizontal and vertical axes, respectively. R represents the 3 × 3 rotation matrix, and t represents the 3×1 translation vector. K is the camera intrinsic parameter matrix, and M is the camera extrinsic parameter matrix.

Solving the Four-Camera Coordinate System
Firstly, Zhang's [20] calibration method is used to calibrate the four cameras respectively, and thus the intrinsic parameters of the four cameras are obtained. The intrinsic parameters are composed of the focal length f, the distortion coefficient, main point coordinates (u 0 , v 0 ), etc., which establish the mapping relationship from the pixel coordinate system to the camera coordinate system. Then, the extrinsic parameters of the four cameras need to be solved. The extrinsic parameter matrix is composed of the rotation matrix R and the translation matrix t.
As is shown in Figure 1, the four cameras are mounted surrounding the bottle cap at an interval of approximately 90 • . In fact, because it is complicated to set the four cameras apart at precise degree intervals, instead we obtain the precise position and pose relationship of each camera by calibrating the extrinsic parameters. In Figure 4, cameras 1 and 2 are used to shoot the same calibration plate.
As is shown in Figure 1, the four cameras are mounted surrounding the bottle cap at an interval of approximately 90°. In fact, because it is complicated to set the four cameras apart at precise degree intervals, instead we obtain the precise position and pose relationship of each camera by calibrating the extrinsic parameters. In Figure 4, cameras 1 and 2 are used to shoot the same calibration plate.
where 1 K is the intrinsic parameters of camera 1, 1 M is the extrinsic parameters and 1 R and 1 t refer to the rotation matrix and the translation matrix, respectively, which are both the extrinsic parameters of camera 1.
In the same way, the 3D point is projected to a pixel point where 2 K is the intrinsic parameters of the camera 2,  It is assumed that there is a 3D point on the calibration plate, and the 3D point is expressed as P(x w , y w , z w ) in the world coordinate system. The 3D point is projected to a pixel point p 1 (u 1 , v 1 ) in the image of the calibration plate captured by camera 1. The relationship of the 3D point and the pixel point can be expressed by: where K 1 is the intrinsic parameters of camera 1, M 1 is the extrinsic parameters and R 1 and t 1 refer to the rotation matrix and the translation matrix, respectively, which are both the extrinsic parameters of camera 1.
In the same way, the 3D point is projected to a pixel point p 2 (u 2 , v 2 ) in the image of the calibration plate captured by camera 2. The relationship of the 3D point and the pixel point can be expressed by: where K 2 is the intrinsic parameters of the camera 2, M 2 is the extrinsic parameters and R 2 and t 2 refer to the rotation matrix and the translation matrix, respectively, which are both the extrinsic parameters of camera 2. Therefore, K 1 and K 2 are known, so the points (x c1 , y c1 , z c1 ) and (x c2 , y c2 , z c2 ), for which the 3D point P(x w , y w , z w ) is projected on camera coordinate systems of camera 1 and camera 2, can be obtained, respectively. The points (x c1 , y c1 , z c1 ) and (x c2 , y c2 , z c2 ) can be written as: Thus, Equation (6) can be obtained by Equations (4) and (5), which defines the pose relationship between camera 1 and 2.
This is also the rotation and translation relationship between camera 1 and 2. Similarly, the rotation and translation relationship between camera 2 and 3, and even between camera 3 and 4, can be obtained.
Finally, the camera coordinate systems of four cameras are transformed into one coordinate system, where camera 1 serves as the origin (i.e., the four-camera coordinate system).

Building the Cap Model and Solving the Ideal Cap Pose
Firstly, the 3D point cloud of the bottle cap can be expressed by the following Equations (7) and (8).
Here, R and H are the radius and height of the bottle cap, respectively. N pr means the number of pixels in each row after the cap side image is unfolded, which equals the perimeter of the cap. p n represents the arc length of the cap surface along the horizontal direction. θ is the degree of the central angle of the circle corresponding to the length of the arc p n . The next step is to solve the ideal pose of the cap. As shown in Figure 5, there are four camera coordinate systems: x 1 y 1 z 1 , x 2 y 2 z 2 , x 3 y 3 z 3 and x 4 y 4 z 4 . In the four-camera coordinate system, the coordinate origins of the four coordinate systems of cameras are o c1 (0, 0, 0), o c2 (x o2 , y o2 , z o2 ), o c3 (x o3 , y o3 , z o3 ) and o c4 (x o4 , y o4 , z o4 ), respectively.  z directions are cos , cos and cos  , respectively, which can be obtained by: The 3D point o cyl (x m , y m , z m ) is obtained from Equation (9), which means the center of the four-camera coordinate system.
The direction cosines of the space vector → o c1 o cyl (x m , y m , z m ) in the x 1 , y 1 and z 1 directions are cos δ, cos η and cos γ, respectively, which can be obtained by: The corresponding direction angles are δ, η and γ, which represent the X-axis direction of the ideal bottle cap model's pose. Next, a plane is constructed where the point o cyl (x m , y m , z m ) passes through and where the origins of the four camera coordinate systems are closest. In addition, the normal line of the plane is the Z-axis direction of the ideal cap model's pose. Next, the cross product of the X-axis and Z-axis is the Y-axis direction of the ideal cap model's pose. Thus, the ideal pose of the cap is obtained. This coordinate system is defined as the bottle cap coordinate system. Finally, five coordinate systems are established, including cameras 1, 2, 3, 4 and the bottle cap coordinate system.

Extracting the Bottle Cap Edge
In order to obtain the actual pose information of the bottle cap in a 3D space, it is necessary to solve the relationship between the ideal and actual poses of the bottle cap. In this paper, the edge information of the cap image is extracted first. Then, the solved ideal pose of the cap is used to fit the cap edge in the image to determine the actual position of the cap and this part is in the next subsection.
The details of the edge extraction are as follows. Firstly, distortion correction is applied to the image. Then, the pixel coordinates of the edge of the bottle cap are obtained by using edge detection algorithms such as Canny or Marr-Hildreth [21]. In order to improve the efficiency of edge extraction, the Canny edge detection algorithm combined with a fuzzy rule is used. This allows one to define a fuzzy membership function [22], which describes the features of good edges. The advantage of this approach is its flexibility to deal with extra edges. This approach can flexibly restrict the range of edge extraction (the blue rectangle in Figure 6) by the fuzzy membership function: where [w min , w max ] represents the range of edge extraction, [w min − a, w min ) and (w max , w max + a] represent the flexible (i.e., fuzzy) range and a is set to 10. Moreover, a sliding window (the red rectangle in Figure 6) is applied to extract a straight edge perpendicular to the red rectangle.   Figure 6. Edge extraction process.
Moreover, a sliding window (the red rectangle in Figure 6) is applied to extract a straight edge perpendicular to the red rectangle.

Fitting the Actual Cap Pose
Next, the extracted edge information and the solved ideal pose of the cap are used to fit the actual pose of the cap, as shown in Figure 7. The pixel coordinates of the edge points i A are transformed to the camera coordinate system of camera 1 by: Finally, the coordinate of camera 1 and the edge feature points

Fitting the Actual Cap Pose
Next, the extracted edge information and the solved ideal pose of the cap are used to fit the actual pose of the cap, as shown in Figure 7. The pixel coordinates of the edge points A i are transformed to the camera coordinate system of camera 1 by:  The error E can be obtained by subtracting the actual bottle cap radius R from the distances (i.e., The least squares method is used to minimize the error value E ; therefore, the principal axis cyl oz   of the actual cap is obtained, as shown in Figure 8. Then, the perpendicular line from the coordinate of camera 1 to the principal axis  The error E can be obtained by subtracting the actual bottle cap radius R from the distances (i.e.,d 1i , d 2i , d 3i and d 4i ): The least squares method is used to minimize the error value E; therefore, the principal axis o cyl z of the actual cap is obtained, as shown in Figure 8. Then, the perpendicular line from the coordinate of camera 1 to the principal axis o cyl z is drawn to obtain the X-axis of the actual cap position. The cross product of the X-axis and the principal axis o cyl z is the Y-axis of the actual cap position. So far, the pose of the actual bottle cap has been solved.
the distances (i.e., The least squares method is used to minimize the error value E ; therefore, the principal axis cyl oz   of the actual cap is obtained, as shown in Figure 8. Then, the perpendicular line from the coordinate of camera 1 to the principal axis cyl oz   is drawn to obtain the X-axis of the actual cap position. The cross product of the X-axis and the principal axis cyl oz   is the Y-axis of the actual cap position. So far, the pose of the actual bottle cap has been solved.

Determining the Best View of Cameras for the Bottle Cap
In this section, it is important to solve the best view of cameras for the bottle cap to determine which regions of the cap are seen best from which camera. An observation angle ∠OA i O c1 is able to determine the best view of cameras, as shown in Figure 9. It can be seen that, when the observation angle is larger, the camera's view for the cap is smaller, and the observation regions of cameras for the cap are smaller. This method allows the observation regions of each camera to be stitched together without overlaps and intervals. In this section, it is important to solve the best view of cameras for the bottle cap to determine which regions of the cap are seen best from which camera. An observation angle 1 ic OAO  is able to determine the best view of cameras, as shown in Figure 9. It can be seen that, when the observation angle is larger, the camera's view for the cap is smaller, and the observation regions of cameras for the cap are smaller. This method allows the observation regions of each camera to be stitched together without overlaps and intervals.  Equation (15) is used to solve the best observation angle for each camera. The specific details of the process are as follows. In the bottle cap coordinate system, the 3D coordinates of the origin o c1 of camera 1 are subtracted by the 3D points A i (x i , y i , z i ) on the cap surface (to reduce the calculation, let z i = 0) to obtain the vector  (14). The corresponding observation angle of the observation value is ∠OA i O c1 , as shown in Figure 9.
The observation values of the four cameras β 1 , β 2 , β 3 and β 4 are calculated, respectively. The four values are compared, and when β j is the largest, its corresponding observation angle is the best observation range of camera j.

Image Unfolding and Stitching
The calculated 3D points of the best view of bottle cap areas (Figure 10b) are cylindrical back-projected on the rectangular plane ( Figure 10c) and are stitched to generate a full unfolding image of the cap side (Figure 10d), as shown in Figure 11. Image fusion techniques can be used to overcome the shortcomings of an unnatural appearance after image stitching. They include the weighted fusion technique, pyramid fusion technique, gradient domain fusion technique, etc. [23]. In this paper, the simple fading-in and fading-out fusion algorithm is chosen to fuse the images. The calculated 3D points of the best view of bottle cap areas (Figure 10b) are cylindrical back-projected on the rectangular plane ( Figure 10c) and are stitched to generate a full unfolding image of the cap side (Figure 10d), as shown in Figure 11. Image fusion techniques can be used to overcome the shortcomings of an unnatural appearance after image stitching. They include the weighted fusion technique, pyramid fusion technique, gradient domain fusion technique, etc. [23]. In this paper, the simple fading-in and fadingout fusion algorithm is chosen to fuse the images.     To summarize, the steps of the new stitching strategy for cylindrical bottle cap surfaces can be given as follows: The procedure of stitching images of bottle cap sides.
Step 1: The intrinsic parameters of the four cameras are calibrated by Zhang's calibration method, respectively. Then, the extrinsic parameters of the four cameras are calibrated by the approach designed in this paper. Finally, a four-camera coordinate system is established.
Step 2: The center of the four-camera coordinate system is found, and a new coordinate that represents the ideal position of the cap in the four-camera coordinate system is established with the center as the origin.
Step 3: A 3D point cloud model of the cap with this new coordinate origin as its center is established.
Step 4: A set of images of the cap side is captured by the four-camera system.
Step 5: Edge feature extraction is performed for bottle cap side images after image preprocessing.
Step 6: The actual position of the cap is determined by exploiting the ideal position and the edge feature information of the cap. To summarize, the steps of the new stitching strategy for cylindrical bottle cap surfaces can be given as follows: The procedure of stitching images of bottle cap sides.
Step 1: The intrinsic parameters of the four cameras are calibrated by Zhang's calibration method, respectively. Then, the extrinsic parameters of the four cameras are calibrated by the approach designed in this paper. Finally, a four-camera coordinate system is established.
Step 2: The center of the four-camera coordinate system is found, and a new coordinate that represents the ideal position of the cap in the four-camera coordinate system is established with the center as the origin.
Step 3: A 3D point cloud model of the cap with this new coordinate origin as its center is established.
Step 4: A set of images of the cap side is captured by the four-camera system.
Step 5: Edge feature extraction is performed for bottle cap side images after image preprocessing.
Step 6: The actual position of the cap is determined by exploiting the ideal position and the edge feature information of the cap.
Step 7: The best view of cameras for the bottle cap is solved to determine which regions of the cap are seen best from which camera. The best observation regions of cameras for the cap can be obtained.
Step 8: According to the best observation regions of cameras, the images belonging to the regions (i.e., region of interest) are cylindrical back-projected and are stitched to generate a full unwrapping image of the cap side.

Experiments
In order to evaluate the performance of our proposed method, we implemented the algorithms proposed in this paper. The test machine used in our experiments was equipped with an Intel(R) Core (TM) i5-9300H CPU at 2.40 GHz (with four cores and eight threads), an NVIDIA GeForce GTX 1660ti GPU and 6 GB of physical memory. The operating system for our test machine was Windows 10. The experimental system is shown in Figure 11. All four cameras were firstly calibrated to obtain the intrinsic and extrinsic parameters to use to rectify images and to build the four cameras' spatial coordinates. The images were acquired from different angles of the cap side.
In order to prove the universality of the proposed algorithm, several different kinds of caps were used in the experiments, and three of them are shown in Figure 12. threads), an NVIDIA GeForce GTX 1660ti GPU and 6 GB of physical memory. The operating system for our test machine was Windows 10. The experimental system is shown in Figure 11. All four cameras were firstly calibrated to obtain the intrinsic and extrinsic parameters to use to rectify images and to build the four cameras' spatial coordinates. The images were acquired from different angles of the cap side.
In order to prove the universality of the proposed algorithm, several different kinds of caps were used in the experiments, and three of them are shown in Figure 12.

Results Analysis
The ideal spatial pose of the cap should be in the center of the four-camera coordinate system; however, the actual spatial pose of the cap may have deviated. Therefore, we utilized the edge feature information of the cap images to determine the actual spatial pose of the cap. A good edge extraction result helped locate the actual bottle cap pose more accurately.

Results Analysis
The ideal spatial pose of the cap should be in the center of the four-camera coordinate system; however, the actual spatial pose of the cap may have deviated. Therefore, we utilized the edge feature information of the cap images to determine the actual spatial pose of the cap. A good edge extraction result helped locate the actual bottle cap pose more accurately.
In addition to the Canny edge detection algorithm used for edge feature extracting, a sliding window and a fuzzy rule were used to extract the straight edge and to restrict the range of edge extraction. The effect of this can be seen in the bottom of Figure 13. The top of Figure 13 shows the effect of only the Canny edge detection algorithm being used. There are some outliers, and the extracted edges are not straight on it. In addition to the Canny edge detection algorithm used for edge feature extracting, a sliding window and a fuzzy rule were used to extract the straight edge and to restrict the range of edge extraction. The effect of this can be seen in the bottom of Figure 13. The top of Figure 13 shows the effect of only the Canny edge detection algorithm being used. There are some outliers, and the extracted edges are not straight on it. Since the four-camera coordinate system and the cylindrical coordinate system of the actual bottle cap were established, the mapping relationship between the spatial 3D points

The Unfolding Images of the Caps
Since the four-camera coordinate system and the cylindrical coordinate system of the actual bottle cap were established, the mapping relationship between the spatial 3D points of the cap surface and the pixel points of the cap's images could be obtained. Next, the images of the bottle caps belonging to the best observation regions of cameras were used with the cylindrical back-projection to generate flattened images of the caps, as shown in the left of Figure 14. In the process, since we did not perform the cylindrical back-projection on the image of the side of the full caps [11,19], as shown in the right of Figure 14, this could reduce the computation cost significantly. Finally, the flattened images of the caps were stitched to generate a full unfolding image of the bottle cap, and the effects of the three samples are shown in Figure 15.

The Unfolding Images of the Caps
Since the four-camera coordinate system and the cylindrical coordinate system of the actual bottle cap were established, the mapping relationship between the spatial 3D points of the cap surface and the pixel points of the cap's images could be obtained. Next, the images of the bottle caps belonging to the best observation regions of cameras were used with the cylindrical back-projection to generate flattened images of the caps, as shown in the left of Figure 14. In the process, since we did not perform the cylindrical back-projection on the image of the side of the full caps [11,19], as shown in the right of Figure 14, this could reduce the computation cost significantly. Finally, the flattened images of the caps were stitched to generate a full unfolding image of the bottle cap, and the effects of the three samples are shown in Figure 15.  In addition, in order to explore the relationship between image quality and computation cost, we conducted a set of experiments that projected the images of bottle caps onto rectangular planes with several settings, including performing projection transformation with a pixel area, two times the pixel area and three times the pixel area (equivalent to performing downsampling in the projection transformation). The settings were denoted by a 1 × scale, 2 × scale and 3 × scale, respectively. It can be seen clearly in Figure 15.
We utilized a blue marker to draw a continuous curve on the bottle cap side in sample 1 to test the effect of the stitching. As shown in Figure 15a, the curves properly coincide with each other in the stitched result image. It also can be seen that the bottom part of the cap is a little larger than the radius of the main part of the cap (the middle part), so there is a little error in the splicing of the bottom part of the cap. Moreover, as illustrated in Figure 15b, the vertical texture of the joint in sample 3's stitching result is slightly inclined. This is because sample 3's cap is a frustum cone-like cap rather than a normal cylinder, resulting in minor joint defects. However, these had almost no impact on the subsequent defect detection of the bottle caps. In addition, in order to explore the relationship between image quality and computation cost, we conducted a set of experiments that projected the images of bottle caps onto rectangular planes with several settings, including performing projection transformation with a pixel area, two times the pixel area and three times the pixel area (equivalent to performing downsampling in the projection transformation). The settings were denoted by a 1 × scale, 2 × scale and 3 × scale, respectively. It can be seen clearly in Figure 15.
We utilized a blue marker to draw a continuous curve on the bottle cap side in sample 1 to test the effect of the stitching. As shown in Figure 15a, the curves properly coincide with each other in the stitched result image. It also can be seen that the bottom part of the cap is a little larger than the radius of the main part of the cap (the middle part), so there is a little error in the splicing of the bottom part of the cap. Moreover, as illustrated in Figure 15b, the vertical texture of the joint in sample 3's stitching result is slightly inclined. This is because sample 3's cap is a frustum cone-like cap rather than a normal cylinder, resulting in minor joint defects. However, these had almost no impact on the subsequent defect detection of the bottle caps.

Application
Cap defect detection results: Existing image segmentation methods are mainly divided into the following categories: threshold-based, edge-based [24,25] and methods based on specific theories. Since the captured image usually contains spot-like Gaussian noises and may have uneven surfaces and inhomogeneous illuminations, the contrast between the defects and the background information is usually not that high. If the threshold segmentation is performed directly in the spatial domain, it results in the incomplete

Application
Cap defect detection results: Existing image segmentation methods are mainly divided into the following categories: threshold-based, edge-based [24,25] and methods based on specific theories. Since the captured image usually contains spot-like Gaussian noises and may have uneven surfaces and inhomogeneous illuminations, the contrast between the defects and the background information is usually not that high. If the threshold segmentation is performed directly in the spatial domain, it results in the incomplete extraction of defect information or even error extraction. Therefore, Gaussian filtering was firstly used in this paper to suppress the image background noises. Then, the Sobelbased algorithm was adopted [26]. The advantages of the Sobel operator include good anti-noise and small calculations. After Sobel edge detection, the contrast between the bottle cap defect and the background of the neighboring domain increased. Finally, the precise detection and location of bottle cap defects could be completed with morphological processing and feature extraction operations.
Sobel edge detection, morphological processing and feature extraction methods were used to detect defects such as scratches and oil stains in the stitched image, and the effects of the three samples are shown in Figure 16. and small calculations. After Sobel edge detection, the contrast between the bottle cap defect and the background of the neighboring domain increased. Finally, the precise detection and location of bottle cap defects could be completed with morphological processing and feature extraction operations.
Sobel edge detection, morphological processing and feature extraction methods were used to detect defects such as scratches and oil stains in the stitched image, and the effects of the three samples are shown in Figure 16.

Performance Analysis
In order to evaluate the stitching speed of bottle cap images at different image qualities, experiments were conducted 100 times on the three samples in this paper, respectively. The average unfolding and stitching time for different samples on different scales is shown in Table 1. It can be seen from Table 1 that the times for the proposed algorithm were 172.2, 159.5 and 123.6 ms on the 1 × scale. As a comparison, the time spent on the 2 × scale and 3 × scale conditions was greatly reduced, and the corresponding image quality was also reduced. The time required to complete the unfolding and stitching was reduced by almost 53% on the 3 × scale compared to the 1 × scale. In actual applications, we can flexibly choose the settings to meet the actual needs.

Performance Analysis
In order to evaluate the stitching speed of bottle cap images at different image qualities, experiments were conducted 100 times on the three samples in this paper, respectively. The average unfolding and stitching time for different samples on different scales is shown in Table 1. It can be seen from Table 1 that the times for the proposed algorithm were 172.2, 159.5 and 123.6 ms on the 1× scale. As a comparison, the time spent on the 2× scale and 3× scale conditions was greatly reduced, and the corresponding image quality was also reduced. The time required to complete the unfolding and stitching was reduced by almost 53% on the 3× scale compared to the 1× scale. In actual applications, we can flexibly choose the settings to meet the actual needs. In other words, by using our algorithm, the time required to complete the unfolding and stitching was reduced by almost 40%. Overall, we used the known geometric information, including the camera pose relationship obtained by camera calibration and the cylindrical bottle cap model. The mapping relationship between the 3D points on the bottle cap surface and the camera imaging to 2D planar pixel points was established without time-consuming feature point searching and matching, which are usually used in traditional stitching methods based on features.
In fact, the most time-consuming part of this process was likelly fitting the actual bottle cap pose, if the cap edge was not extracted accurately. The other parts were matrix operations, which were similar to the operations of other stitching methods after obtaining the transformation matrix. In addition, the images of the bottle caps belonging to the best observation regions of cameras, rather than images of the sides of the full caps, were used with the cylindrical back-projection to generate flattened images of the caps, and this could also reduce the computation cost significantly.
Finally, defect detection was performed on the stitched side images of the bottle cap. Defect detections were performed 100 times for each sample. The average detection time for the three samples was 7.74 ms, 7.28 ms and 6.97 ms, respectively, as shown in Table 3.

Conclusions
This paper proposes a stitching method for the images of bottle caps, in which the surfaces of a bottle cap are reconstructed to generate an unwrapped plane image of the bottle cap's sides. Firstly, in the image stitching method, the four-camera coordinate system is established through calibration, and the cylindrical bottle cap model is solved. Then, the position and pose relationship between the four cameras and the bottle cap is established in a 3D space to obtain the mapping relationship between the 3D points of the bottle cap and the pixels of the image taken by the camera. Next, the best view of the cameras for the bottle caps needs to be solved. The unfolding and interpolation are only carried out in the regions of the best view of the cameras. Finally, the pixels of the bottle cap image are rearranged to form a complete side image of the bottle cap, resulting in a good imaging effect and fast executing speed.
In order to evaluate the performance of the proposed approach in terms of the unfolding speed of the bottle caps, several experiments were conducted on three samples of the bottle caps. The experimental results show that, for the bottle cap images captured by a 1.6-megapixel color camera, the fastest average unfolding and stitching time was about 61.6 ms on the 3× scale. In addition, several settings with different image quality and computational time are provided. In actual applications, the settings can be flexibly chosen to meet the actual needs. In addition, tubes with different radii will be our future work.