Needle Segmentation in Volumetric Optical Coherence Tomography Images for Ophthalmic Microsurgery

: Needle segmentation is a fundamental step for needle reconstruction and image-guided surgery. Although there has been success stories in needle segmentation for non-microsurgeries, the methods cannot be directly extended to ophthalmic surgery due to the challenges bounded to required spatial resolution. As the ophthalmic surgery is performed by ﬁner and smaller surgical instruments in micro-structural anatomies, speciﬁcally in retinal domains, difﬁculties are raised for delicate operation and sensitive perception. To address these challenges, in this paper we investigate needle segmentation in ophthalmic operation on 60 Optical Coherence Tomography (OCT) cubes captured during needle injection surgeries on ex-vivo pig eyes. Furthermore, we developed two different approaches, a conventional method based on morphological features (MF) and a speciﬁcally designed full convolution neural networks (FCN) method, moreover, we evaluate them on the benchmark for needle segmentation in the volumetric OCT images. The experimental results show that FCN method has a better segmentation performance based on four evaluation metrics while MF method has a short inference time, which provides valuable reference for future works.


Introduction
Recent research shows that eye pathologies contribute to more than 280 million visual impairments [1]. Therefore, there is an increasing demand for the ophthalmic surgery due to such a large number of claims. Vitreoretinal surgery is a conventional ophthalmic operation consisting of complex manual tasks, as shown in Figure 1. The incisions, created by keratome and trocar at the sclera in a circle and 3.5 mm away from the limbus [2], are made to provide the entrance for three tools: light source, surgical tool, and irrigation cannula [3,4]. The irrigation cannula is used for liquid injection to maintain appropriate intraocular pressure (IOP). The light source is used to illuminate the intended area on the retina, allowing the planar view of the area obtained and analyzed by surgeons through the microscope. To address these challenges, the surgical progress proposes a great challenge of delicate operation and sensitive perception to surgeons. The surgical instrument segmentation is the first step to estimate the needle pose and position, which is extremely important to enhance surgeons' concentration under low illumination intraocular condition. Among a variety of surgical tools in ophthalmic operations, the beveled needle is a widely used surgical instrument for delivering the drug into micro-structural anatomies of the eye such as retinal blood vessels and sub-retinal areas. Many studies have been carried out with significant progress in the needle segmentation through microscopic images [5][6][7][8]. These works achieved satisfactory results using either color-based or geometry-based features. Nevertheless, due to the limitation of two-dimensional (2D) microscopic images, these detection results in en-face plane view cannot provide enough information to locate the needle pose and position in three-dimensional (3D) space. Many widely used 3D medical imaging technologies, such as computed tomography (CT) scans, fluoroscopy, magnetic resonance (MR) and ultrasound are already applied in brain, thoracic and cardiac surgeries, not only for diagnostic procedures but also as real-time surgical guides [9][10][11][12][13]. However, these imaging modalities cannot achieve a sufficient resolution for ophthalmic interventions. For instance, in MRI-guided interventions with millimeter resolution in breast and prostate biopsies, 18 gauge needles with a diameter of 1.27 mm are used. Yet for ophthalmic surgery, 30 gauge needles with a diameter of 0.31 mm require resolution submillimeter [14].
Optical Coherence Tomography (OCT) is originally used in ophthalmic diagnosis for its micron level resolution [15]. Recently, OCT application has been extented to provide real-time information of intra-operative interactions between the surgical instrument and intraocular tissue [16]. The microscope-mounted intra-operative OCT (iOCT) developed by Carl Zeiss Meditec (RESCAN 700) was firstly described in clinical use in 2014 [17]. The iOCT integrated to on this microscope is capable of sharing the same optical path with the microscopic view and provide real-time cross section information, which is an ideal imaging modality for ophthalmic surgeries. The iOCT device can also obtain a volumetric image cubes with multi B-scans acquirements. The scan area can be adjusted via region-of-interest (ROI) shown in the microscopic images in CALLISTO eye assistance computer system.
When using iOCT to estimate the needle pose and position, the first step is to obtain the needle point cloud and to segment the needle voxels in the volumetric OCT images. Apart from the difficulties caused by image speckle, low contrast, signal loss due to shadowing, and refraction artifacts, the needle in OCT images has much more details of the tip part (needle may have several fragments in the B-scan image) and has illusory needle reflection [18]. All these challenges make needle segmentation in OCT images different from other imaging modalities such as 3D ultrasound. To the best of our knowledge, there is no systematic empirical research addressing the needle segmentation in volumetric OCT images. Previous studies from [19,20] focused on the visualization of volumetric OCT. Rieke et al. [7,21] studied the surgical instrument segmentation in cross-sectional B-scan image. In their work, the position of the cross-section is localized by microscope image, and the pattern of needle in B-scan image is simplified. Therefore, their methods cannot be directly applied to segmenting the needle in volumetric OCT images.
This study focuses on developing the approaches in two popular mechanisms for needle segmentation in volumetric OCT images. We propose two methods, corresponding to mechanisms of manually feature exaction and automatically feature exaction, to tackle difficulties for the needle segmentation in OCT images: (a) with the needle shadow principle [22], a conventional method based on morphological features (MF) comes to the mind; (b) another approach is based on the recently developed fully convolution neural networks (FCN) [23], which has been applied for MRI medical image analysis. The MF method is usually straightforward which needs features with parameters manually input upon the analysis of all situations. The FCN is a complicated network which can learn all features automatically by properly training the model based on dataset. We extend the FCN method to identify the needle in OCT images. The main contributions of this paper are: (a) A specific FCN method is developed and compared with the conventional method for the needle segmentation with different pose and rotation; (b) A benchmark including 60 OCT cubes with 7680 images is set up using ex-vivo pig eyes for evaluation of both methods. The rest of the paper is organized as follows: Section 2 gives the basic configuration of OCT and typical patterns of needle in OCT B-scan images, afterwards, two methods are presented for needle segmentation in detail. We carry out the experiments and describe the results in Section 3. Section 4 gives the discussion and Section 5 concludes the presented work.

Method
The schematic diagram of needle segmentation in OCT cube is shown in Figure 2. In order to preserve as much information as possible, the highest resolution of OCT scan on RESCAN 700 is selected which is 128 B-scans each with 512 A-sacns in 3 × 3 mm. Each A-scan has 1024 pixels for the 2 mm depth information (see Figure 2a). Figure 2b shows the needle in an ex-vivo pig eye experiment, which is an en-face plane view obtained by the ophthalmic microscope. The scan area is decided by the rectangle and can be adjusted by a foot panel connected to the RESCAN 700. Afterwards, the volumetric OCT images are generated (see Figure 2c). Figure 2d is the needle segmentation based on the process of OCT image cube. By taking consideration of different needle rotations and positions in OCT cube, the needle in B-scan can be seen as the following patterns in Figure 3. It shows that most of the needle fragments have the clear shadow except for the needle tip part. The refraction fragment usually has a larger amount of pixels in comparing to normal imaging parts. The qualified method should be able to segment as many needle pixels as possible in various conditions.

Morphological Features Based Method
Due to the fact that most of the needle parts in each B-scan are located outside the tissue creating shadows, a morphological feature based method is proposed. The B-scan gray image is transformed into a binary image by thresholding the OCT images. The threshold value is adaptively defined based on statistical measurements of each B-scan. This simple and effective method has been used and evaluated in the automatic segmentation of structures for OCT images [24]. Moreover, a median filter is applied for eliminating the noise. Furthermore, the topmost surface is segmented and considered as the tissue surface. By scanning from left to right, any vertical jump or drop in this surface layer is detected and considered as the beginning and the end of an instrument reflection, respectively. To avoid the misdetection of anatomical features of the eye tissue as a reflection caused by the needle, only reflections with invisible intra-tissue structures are confirmed to be needle reflections. A bounding box is used to cover the region of detected needle part, see Figure 2b-e, and the width of bounding box reflects the width of needle cross-section which can be used to analyze the diameter of the needle. In consideration of the situation for several needle tip parts, the bounding boxes in one B-scan image, whose distance between any two of them is less than a threshold d b , can be merged into one bounding box. The needle refraction fragment will be then removed from the detected results since it is not the real needle position. Removal operation will be performed when either of the conditions is satisfied: (1) the top edge of the needle fragment bounding box is closed to the upper edge of the image; (2) the number of needle pixels is more than a threshold. Here, in the MF method, features are obtained by observation and summarization of the needle patterns in the B-scan image. Some of the parameters are manually decided which may influence the accuracy of segmentation. In the next section, we will introduce another method that can learn the features by itself with the training dataset.

Network Description
In this section, we present a specifically designed FCN method inspired by the work of Long et al. [23]. Figure 4 shows the schematic representation of our network. We perform convolutions aiming at both extracting features from the data and segmenting the needle out of the image. The left part of the network consists of a compression path, while the right part decompresses the signal until original sizes are reached. In order to reduce the size of the network and consider existence of the needle in B-scan image sequence continuously, we take three adjacent B-scan images in OCT cube as input and resize the image with a factor of χ. The left side of the network is divided into different stages operating at different resolutions. Similar to the approach demonstrated in [23], each stage comprises one to two layers. The convolution layer performed in each stage uses volumetric kernels having a size of 7 × 7 × 3 voxels with stride 1. The pooling uses the max-pooling operation in FCN with 7 × 7 × 3 voxels with stride 2, thus the size of the resulting feature maps is halved. PReLu non linearities are applied throughout the network. Downsampling allows us to reduce the size of the input information, and furthermore increase the receptive field of the features being computed in subsequent network layers. Each pair of the convolutional and pooling layers computes two times higher features more than the one in the previous layer.
After three convolutional layers with image size of 4 × 8 × 3, the network increases the low-resolution input by de-convolution combining the pooling results from previous layers [23]. The two feature maps computed from the last convolutional layer, having 1 × 1 × 1 kernel size and producing the same size as input images which will be converted to probabilistic segmentations of the foreground and background regions by using soft-max operation. In order to obtain the original resolution of the needle foreground, the segmentation result from FCN is used to fuse with the original image after binarization and 4-connected components labeling. The labeled area with a certain number of pixels voting from the output foreground of FCN will be considered as the needle fragments. All needle fragments will be covered by one bounding box to indicate the ROI.

Training
The needle fragment always occupies only a small region in B-scan image compared to the background, which leads the network having a strong bias towards the background. In order to avoid the learning progress trapping into the local minima, the dice coefficient based objective function is used to define our subject function. This will increase the weight of the foreground during the training phase. The dice coefficient of the two binary images can be represented as [25]: where the predicted binary segmentation image p i ∈ P, the ground truth binary volume g i ∈ G and n indicates the amount of pixels. The gradient can be calculated to obtain the gradient with respect to the j-th pixel of the prediction.
After the soft-max operator, we obtain a map of probability for the needle and the background. The pixel with a probability of more than 0.5 will be treated as the needle part.

Experimental Setup and Evaluation Metrics
The experiments were carried out on ex-vivo pig eyes. A micro-manipulator was used to grip the needle. The CALLISTO eye assistance computer system was established to show the microscopic image and display preview of OCT scans (see Figure 5). A foot pedal connected to the RESCAN700 was settled to relocate the scan area. We captured the needle with different poses and positions in OCT scan area on ex-vivo pig eyes. The OCT cube with resolution of 512 × 128 × 1024 voxels and a corresponding imaging spatial area of 3 × 3 × 2 millimeters, 60 OCT cubes with 7680 B-scan images were manually segmented and marked the needle pixels as the ground truth data set.

CALLISTO eye assistance computer system
Ex vivo pig eye RESCAN 700 Figure 5. The experimental setup of ophthalmic microsurgery on ex-vivo pig eyes. The micro-manipulator is designed to grasp the syringe and place the needle close to the eye tissue. The CALLISTO eye assistance computer system is set up to display the en-face microscopic image and cross sectional images of OCT cube.
Both of the aforementioned methods were implemented on a Intel(R) Xeon(R) CPU E5-2620 v3 2.40 GHz with a GeForce GTX 980 Ti and memory of 64 GB running Ubuntu 16.04 operating system. The MF based method was implemented with OpenCV 2.4 in C++. The FCN based method used Caffe [26] framework to design network with python and the rest image processing parts were implemented also with OpenCV 2.4 in C++. We used four metrics to evaluate the performance of two methods. Let n p be the number of needle pixels in prediction, and n p be the number of needle pixels in prediction correctly. The details of the four metrics are described as follow: (1) pixel error number can be false positive (FP) needle pixels and false negative (FN) needle pixels in predicted result; (2) pixel accuracy rate equals n p /n p ; (3) The average bounding box absence rate indicates the degree of missing needle segment in B-scan image; (4) The width error of bounding box influences the further needle dimension analysis.

Results
Among 60 OCT cubes, 40 OCT cubes with 5120 images are applied to training the proposed network, while the rest 20 OCT cubes are used to verify the CNN network meanwhile giving a comparison with the MF based method. The needle in each cube has different pose and position, but all B-scan image sequences follow a pattern of no needle appearance to needle appearance since the needle appearance is always continuous. In order to make the segmentation result of each cube comparable, we mapped all indexes of B-scan images in one cube into three piecewise intervals with the increasing index, (1) no needle above the tissue, (2) needle appearance, and (3) needle above the tissue but out of OCT image range thus reflection exists. All indexes of B-scan images were mapped such that 0-25% is the first interval, 25-75% is the second interval and 75-100% is the third interval. Therefore, in case in an OCT cube the index for the first needle appearance image (taken from the ground truth) is at 50 and the index for needle end image is at 100, the currently evaluated image has an index of 60, and its metrics information will be at 25% + (60 − 50)/(100 − 50) = 45%. The metrics for this image are then sorted into buckets of 2% and the values are averaged over other images' metrics value in the given bucket.
The evaluation of two methods under four metrics is shown in Figure 6. Figure 6a shows the comparison of FN and FP for the average needle pixel number using two methods. The average pixel number of FP for MF and FCN is almost equal to 0 which means that few background pixels are classified as needle points. For the performance of average pixel number for FN, both methods have problems with recognition at the beginning of the needle. This artifact is probably caused by unclear shadow of the small needle tip segmentation. However, the FCN performs better than the MF which indicates fewer needle pixels are incorrectly classified to the background. The overall average FN pixel number for two methods are 187.2 and 34.1, respectively. Especially, the FCN can almost segment all of the needle pixels for the needle body part. Figure 6b shows the accuracy rate of needle pixel for two methods, which further indicates that the beginning of needle tip part has a low accuracy rate with 0.19 and 0.33 for MF and FCN, respectively. Both methods give an acceptable accuracy rate during the needle body, while FCN has a better accuracy rate. The comparison of detection accuracy rate for bounding box is shown in Figure 6c. The rate of missing bounding boxes for MF method is quite dramatic in the beginning with around 80%, but rapidly drops to a steady 10% until around the middle. Starting from the middle, almost all bounding boxes are calculated correctly. The FCN method has a similar pattern with better results. It has 66.6% of missing bounding box rate at most for the beginning of needle tip and almost none of missing bounding box detection for the needle body part. Figure 6d gives the average width error of bounding box detection in the same level that the maximum of average width error is under 1 pixel.
We also analyze the performance results of two methods with each processing stage per B-scan image (see Tables 1 and 2). The FCN method has a 121.83 ms average inference time while the MF method has a shorter inference time of 25.6 ms, indicating that MF method performs better for time sensitive purpose usage on the current platform. The variance of FCN method is lower than MF method as the number of operations is always the same in FCN method, revealing that the FCN method is more robust to the dataset and more general to the application.

Discussion
This study modified and improved two methods based on the previous work to tackle the challenges for needle segmentation in OCT images: the MF method and the FCN method. The MF based method mainly relies on the needle shadow feature which is hand-crafted by researchers. The FCN method learns the features by itself which is more efficient and less time-consuming as it avoids the cumbersome hand designing phase. The evaluation of two methods is performed with 60 OCT cubes using 7680 B-scan images captured on ex-vivo pig eyes. The experimental results show that FCN method has a better segmentation performance in terms of four evaluation metrics than the MF method. Specifically, the overall average FN needle pixel number is 187.2, and 34.1; the average needle pixel accuracy rate is 90.0%, and 94.7%; the average bounding box accuracy rate is 92.5%, and 97.6% and average pixels error for bounding box is 0.09, and 0.07; for the MF method and FCN method, respectively. Although this study is not targeting on obtaining the highest performance by tunning the FCN network, Our results elucidate that deep learning method indeed generates a powerful model on our small dataset. A future direction is indicated to try different deep learning methods and record more training data. Regarding the runtime, the average inference time of the MF method is four times shorter than FCN method, that is 25.6 ms, and 121.83 ms, respectively. The MF method is a good choice when real-time information is required. The inference time of FCN method can also be improved using parallel programme on more advanced GPU platform. It is possible to get a balance between efficiency and complexity by fusing these two methods together. It is also worth to note that the variance of total inference time for FCN method is lower than MF method for the number of operations in FCN remaining the same in different input images.
Although both methods achieve an average needle pixel accuracy rate above 90.0%, they have problems on segmenting the needle tips, especially for the very beginning part. A potential reason is that the tiny needle segments in B-scan, the image noise as well as the shadows caused by light diffraction principle can be easily mixed with each other. There is a clear improvement from FCN method on the performance for the needle body part, showing almost perfect segmentation results. This also indicates that our future work should focus on improving the needle segmentation performance on the tiny needle tips.

Conclusions
We studied the first step of obtaining the needle point cloud for needle pose and position estimation: the needle segmentation in OCT images. We proposed two methods: a conventional needle segmentation method based on morphological features and a fully convolutional neural networks method. These are two typical machine learning methods in image segmentation, manually feature designing based method (MF method) and automatically feature learning based method (FCN method). We analyzed our evaluation results based on the segmentation performance and runtime requirement. Two important insights from our experiments are that the deep learning method generates the discriminative model very well and conventional machine learning method can achieve real-time performance very easily. We believe that our work and insights will be highly helpful and can be referenced as a fundamental work for future needle segmentation research in OCT images. It is also worth to note that the FCN method can segment not only the surgical tools and but also the retinal tissue, which provides additional information to guide the positioning of the surgical tools, such as an alarm warning on the distance between the tool and underlying retinal tissue.
In future, we would like to collect more data and apply the state-of-the-art deep learning model on them. Moreover, we will build up a specific tiny needle tip recognition model in order to overcome the shortness in current performance. A future design can integrate two methods to speed up the processing with better performance. The conventional method could detect the indexes of B-scan images with needle preliminarily in a short time, and then the deep learning is used to segment these B-scan images accurately.