Automatic Needle Route Proposal in Preoperative Neck CT for Injection Laryngoplasty

: Transcutaneous injection laryngoplasty (TIL) is a commonly used method to treat vocal fold paresis, where the affected vocal folds are augmented through injection. Determining the injection site and route is a major step during the preprocedural planning of TIL. In this communication, we propose and investigate an automatic method for needle route computation in preoperative neck CT. Recently, deep reinforcement learning (RL) agents showed noteworthy results for localizing the vocal folds. In this work, we focus on ﬁnding the optimal needle trajectory from the neck skin to the vocal folds localized by such RL agents. Identifying critical structures and constraints in the medical routine, we propose a minimal cost-based search to ﬁnd the optimal path. Furthermore, we evaluate the proposed method with neck CT volumes from 136 patients, where it is shown that our computed needle paths have high accuracy.


Introduction
Vocal fold paresis is a prevalent condition that causes a glottal gap leading to problems ranging from voice discomfort to breathing difficulty and other respiratory issues [1].Transcutaneous injection laryngoplasty (TIL) is an office-based vocal fold augmentation method for treating vocal fold paresis [2].In this method, augmentation material is usually injected through the cricothyroid membrane into the affected vocal fold to reduce the glottal gap [3].One of the major challenges of this procedure is to determine the optimal site and route of the injecting needle [4].This is because laryngeal anatomy varies across patients and the needle tip is not visible during the procedure under endoscopic guidance.Generally, neck CT scans are shown to be helpful for preoperative planning in injection laryngoplasty by providing a superior perception of the laryngeal anatomy [5].However, identifying the optimal needle route in the 2D view of CT slices is time-consuming and requires a substantial amount of prior experience.Thus, an automatic method for identifying the optimal needle route in preoperative CT can be useful for preprocedural planning.In addition, training residents in performing TIL is difficult due to the nature of this procedure being typically carried out in an office setting.Hence, having neck CT volumes with automated needle trajectory annotations can prove to be beneficial in this regard as well.
Existing works related to the optimal needle route for TIL are strictly limited to manual annotation approaches.Nasir et al. [3] presented a comparative study on needle routes based on neck CT volumes of different patients, where the needle routes were manually annotated.Hamdan et al. [4] and Lee et al. [6] suggested 3D printing of the larynx to improve the perception of the injection route before the TIL procedure.A comprehensive medical guide for determining the injection route was presented by Chhetri et al. [2].In our earlier work [7], we trained a reinforcement learning (RL) agent to automatically localize the right and left vocal folds in neck CT.To the best of our knowledge, no previous work has attempted to computationally identify the needle route in neck CT scans.
In this communication, we propose an automatic method to compute the optimal needle trajectory for TIL.Utilizing the vocal fold locations detected by the RL agent in our previous work [7], we determine the optimal needle route by scanning through all possible linear trajectories from the neck skin to the vocal folds.By carefully investigating the medical routine of TIL, we identify the critical regions and constraints that should be considered while defining the search space.To segment such regions consistently across different patients, we propose a reformulation of the geodesic distance-based segmentation [8].Finally, we define an appropriate cost function to find the optimal route from the trajectory search space.Furthermore, we experimentally evaluate the proposed method on 136 subjects, where the resultant needle routes are evaluated by an expert laryngologist.Our contribution can be summarized into the following points:

•
We propose the first automatic method to compute the optimal needle route in neck CT for TIL based on critical structure segmentation and minimal cost route search.

•
We perform a comprehensive evaluation on neck CT scans of 136 patients, which is carried out by an expert laryngologist.
We organize the rest of the paper as follows.In Section 2, we describe the proposed method in detail.We present and discuss the experimental evaluation of our method in Section 3. Finally, we present our concluding remarks in Section 4.

Methodology
We compute the optimal needle route based on the vocal fold locations detected by the RL agent [7].The input neck CT can be viewed as a 3D volume, where we denote the detected locations of the right and left folds by q r and q l , respectively.Our goal is to find the optimal needle paths p r q r and p l q l to the vocal folds, where p r and p l represents the optimal needle insertion points in the neck skin for the right and left folds.
Our needle route computation method consists of four major stages.First, we obtain the vocal fold locations in the neck CT using the previously proposed RL agent [7].Second, we identify and segment the critical regions for injection.Third, we define the search space containing all possible routes.Finally, we define a cost function to find the optimal needle route.Figure 1 illustrates the proposed method.In the following, we describe each stage of our method in detail.

Vocal Fold Localization
We obtain the vocal fold locations using the RL agent in our previous work [7].In standard deep learning approaches, either the coordinates of the target location [9] or a heatmap around it [10] is usually regressed.Coordinates are difficult to regress because the outputs do not have direct spatial correspondence.Spatial input-output correspondence is utilized in heatmap regression.However, this often suffers from sample bias [11] because the heatmap voxels are usually negligible compared to the huge background voxels.
RL adopts a pixel-to-pixel navigation framework coupled with deep feature learning, where an agent tries to gradually navigate to the target location through a sequential decision process.Recently, RL-based localization has recently shown significant outcomes in several studies [11][12][13] compared with standard deep learning methods.The application of RL for our vocal fold localization is also extensively studied in our previous work [7].
In the RL-based approach, the agent is initialized at a random position inside the CT volume and takes a number of steps to reach the vocal fold.At any position, it observes an ROI (region-of-interest) of 45 × 45 × 45 voxels around and decides an optimal move (i.e., left, right, up, down, superior, inferior to move one voxel in any axis).Relating to the standard terminologies of RL, the observed ROI is defined as the state and the six moves are the actions.We denote state by s and action by a.The optimal action is determined by the policy network π (i.e., a convolutional neural network) that takes the ROI as input and produces optimal action probabilities π(s, a) as output.Eventually after conducting a number of such moves, the agent converges to the target vocal fold location.
During policy training, the agent takes episodes of steps in different CT volumes.For each step, an action is rewarded +1 if it takes the agent closer to the target or −1, otherwise.Thus, a set of transitions is gathered where each transition (s, a, r) includes state s, action a, and the corresponding reward r.The goal of training is to maximize the expected cumulative reward.
The policy network used in the current work is exactly the same as the network used in our previous study [7].We have reused the same network without changing any parameters and tested with the current test data.The network has three convolutional blocks followed by two fully connected layers.Each convolutional block consists of two convolutional layers (kernel size: 3 × 3) with a trailing max pooling layer (pool size: 2 × 2; stride: 2).The final fully connected layer has six units to output the optimal probabilities of six actions.
Note that the vocal fold localization is not the focus of the current work.Moreover, we have only reused the same architecture from our previous study [7].Therefore, we do not include the elaborated training process here.Interested readers may refer to [7] for more details.

Critical Region Segmentation
In the usual TIL method, the needle is inserted through the cricothyroid membrane.The needle route takes a sub-mucosal way to reach the vocal folds while strictly avoiding the airway.If the needle pierces through the paraglottic membrane and enters the airway, it can cause bleeding and coughing during the procedure.Therefore, we first segment the airway in the preoperative neck CT. Figure 2 illustrates the segmentation process.

Initial segmentation:
The airway in neck CT can easily be obtained by intensity-based thresholding.Note that intensity in CT is encoded in the Hounsfield unit (HU) and air has an intensity of −1000 HU [14].The nearest material in the HU spectrum (i.e., fat) lies in −70 to −30 HU.Therefore, considering the HU tolerance for scanning air and fat-like material, we can safely segment the air by marking the voxels where the intensity is less than −200 HU.However, such segmentation would also include the air outside the body or neck.Airway segmentation: To separate the desired airway from the outer air region, we perform a geodesic distance-based segmentation [8].In such approach, the geodesic distance of the voxels with respect to a foreground seed is thresholded to determine the foreground voxels.Such segmentation is useful when the target region has even and continuous intensity.The intensity of the air in neck CT is also usually evenly distributed.However, there are two major challenges to incorporating this method for segmenting the airway, which are as follows:

Geodesic distance map
i Automatic localization of the foreground seed is difficult in our task because the airway location is inconsistent across patients due to anatomic and scan environment variation.ii Optimal geodesic distance threshold for airway segmentation can vary because of the patient-specific anatomic variation.
To overcome the seed localization challenge, we suggest segmenting the outer air first and then subtracting it from the initial segmentation result to obtain the airway.By doing so, we can ensure an accurate and automatic seed localization because outer air consistently resides in the anterior region of the CT volume, especially taking a large portion of the inferior-anterior area.To be specific, we identify the outer air seed q s at (10, 10, 10) voxels from the right anterior-inferior corner of the CT volume.
To overcome the optimal threshold variation problem, we propose a reformulation of the standard geodesic distance for segmentation.The proposed reformulation allows for segmentation with a fixed distance threshold across different subjects.
Reformulation of geodesic distance: Standard geodesic distance D(q) combines the spatial distance with the intensity difference along the path P from seed q s to the point q [8].At each point of the path, the combined difference is computed as a weighted L2-norm of the spatial and intensity gradients in the path direction at that point.Thus, the distance of a path refers to the summation of the combined differences at all the points in the path.The final geodesic distance D(q) is defined as the minimum distance of all the possible paths from seed q s to q. Formally, this can be expressed as follows: where Π qq s includes all possible paths from seed q s to q and ∇ P indicates the gradient along a path P. Therefore, ∇ P p and ∇ P I(p) are the spatial and intensity gradients, respectively.Usually, a voxel q is labeled as foreground if D(q) <= τ.Because of the inter-patient variation, fixing this threshold τ is difficult.
In our formulation, we weight the spatial gradient by the intensity gradient to find the combined difference instead of taking the L2-norm.Therefore, the reformulated distance is expressed as follows: In the new formulation, distance D(q) = 0 for the outer air region because the flat and continuous air region gives zero intensity gradients (i.e., |∇ P I(p)| = 0).Distance for the regions inside the neck is greater than zero because of the intensity differences.Because the target airway is disjointed from the air outside, the distance for this along with other regions inside the neck is also greater than zero.Thus, outer air can be segmented by simply finding the voxels {q| D(q) = 0}.The air inside (i.e., airway region) is then obtained by subtracting the outer air mask from the initial air mask.

Search Space
To define the search space for the needle route, we first identify the center of the vocal fold line (the segment that connects two vocal folds) as: p c = (p r + p l )/2.Then, we find the midline that passes through the center and is axially perpendicular to the vocal fold line (Figure 3).Thus, the equation of the mid-line is as follows: We also find the midpoint p m where the midline meets the skin.Computationally, we find this as the anterior-most point of the midline before the outer air region, which can be expressed as follows: where where p y is the component of position vector p along anterior-to-posterior axis.
Medically, the needle is inserted 5-10 mm lateral to the midpoint.Therefore, we also include all possible lines to the vocal folds from the skin points up to 10 mm (≈20 degrees) lateral to the midpoint p m .Thus, the search space Ω r for the right vocal fold spans 20 degrees to the right and the one for the left fold (Ω l ) spans 20 degrees to the left from the midpoint.Here, the center p c of the vocal fold line is the center of rotation.In Figure 3, e r and e l are the endpoints of the span of Ω r and Ω l .Considering the relative position of the cricothyroid membrane (i.e., the usual needle insertion region), each search space also spans 60 degrees inferior from the axial slice where the vocal folds are situated.
Finally, we refine the search space by excluding the lines that pass through the segmented airway.Thus, the refined search space holds all possible line segments from the skin to the target vocal fold, which do not enter the airway.

Optimal Needle Route Search
From this refined set of routes, we seek the route that faces minimal obstacles or rigidness so that the needle can be pushed through easily.Therefore, we define a cost function to represent this rigidness and find the lines from Ω r and Ω l that minimize the cost.It is important to note that we conduct separate searches for the right and left vocal folds using their respective search spaces.As a result, we obtain two distinct optimal needle routes, each used to reach a specific vocal fold.
In CT, intensity is directly proportional to the rigidness of a material.Therefore, a simple definition of the cost can be the sum of intensities over the route, which is as follows: where Γ r and Γ l indicate the optimal needle routes for right and left folds.However, summation may normalize thin rigidness (e.g., tiny calcifications) when the intensity sum is minimal despite containing a few high-intensity voxels.Hence, we use the maximum intensity over a route as the cost.Thus, the proposed optimal needle routes can be defined as follows: This helps avoid calcifications even if they are small.In addition, the needle insertion point is also ensured to be in the cricothyroid membrane region.Any route that instead passes through the surrounding thyroid and cricoid cartilages would give a larger cost due to the higher intensity of the cartilages.If the intensity sum is used, this high intensity is not reflected because cartilage are usually thin.

Data and Evaluation Method
To evaluate our method, we collected a neck CT dataset from Seoul National University Bundang Hospital (SNUBH), Seongnam, South Korea.A total of 136 neck CT volumes were acquired from 136 different patients during the period of October 2017-August 2021.All the patients were suspected of vocal fold paresis, whereas 29 patients among them were diagnosed with the paresis condition.Among them, 23 patients had unilateral paresis, and 6 patients had bilateral paresis.Table 1 summarizes our data.CT scan was performed using 64-row multidetector scanners (Brilliance; Philips Healthcare, Best, The Netherlands).A single-phase bolus injection of 100 cc of iodinated contrast media (Iohexol 300 mg/mL, GE Healthcare) was administered with 2 cc/s of flow rate.Image acquisition began 80 s after the injection, and the scan range was from the frontal sinus to the carina.The following imaging parameters were used: 120 kVp, 250 mAs, 0.835 pitch, 0.5 rotation time, 128 × 0.625 collimation, 1-2 mm slice thickness, and 0.5-1 mm increments.The resultant CT volumes had about 160 axial slices on average, where each slice had 512 × 512 voxels.The average voxel spacings were 0.44 mm, 0.44 mm, and 0.96 mm for the coronal, sagittal, and axial directions, respectively.
The final needle route results on the 136 test cases were presented to the expert at the same clinical site for evaluation.The expert evaluator (labeled as observer 1) in our study has more than ten years of experience as a TIL surgeon.We also obtained evaluation from a second observer (observer 2) who has one and a half years of experience with preoperative neck CT for vocal fold augmentation.Thus, we could determine if the final results are comparable to the inter-observer agreement.For evaluation, results were presented in DICOM using burn-in annotation with maximum intensity, where the needle path was indicated along with the vocal fold and needle insertion locations.The evaluators performed a binary evaluation, where each result was labeled as accurate or not accurate.Because the acceptable needle route is not a singular unique path, measuring the distance of the proposed path from the ideal one is not appropriate for the evaluation.Rather, the evaluators conducted a qualitative evaluation by carefully examining the results to determine the acceptability of the proposed route.A result was labeled accurate only if it was a safe and practically plausible route.To be specific, a result was considered to be accurate if the suggested route starts from the cricothyroid membrane and reaches the vocal fold without crossing or touching any critical regions.On the other hand, it was considered to be inaccurate if the route starts from any regions other than the cricothyroid membrane (e.g., thyroid/cricoid, etc.) or passes through the rigid cartilage.The route was also rejected if it came within 2 mm proximity of the paraglottic membrane.

Vocal Fold Localization Performance
Though the vocal fold localization is not the main contribution of this work, good localization accuracy is important because the needle route computation directly depends on it.In our previous work [7], extensive evaluations were performed which showed an error as low as 2.33 ± 1.17 mm.In the current study, we again focus on the acceptability or accuracy of localization (i.e., to verify whether the vocal fold is localized in the desired region with a safe margin from the paraglottic membrane).We report the localization accuracy in Table 2.We also present the accuracy for the paresis cases separately in this table.Overall, the vocal folds were localized with high accuracy of 93.02% and 94.12% according to observers 1 and 2, respectively.The localization accuracy was comparable to the inter-observer agreement of 96.69%.Similar to the findings of the previous study [7], the RL agent showed a greater performance over the standard end-to-end deep localization models (Table 3).[9] 79.41% 80.88% Deep Heatmap Regression [10] 85.29% 84.56% The above results are based on observer 1's evaluation.

Critical Region Segmentation Performance
To evaluate the performance of the proposed segmentation, we used the dice similarity coefficient (DSC) between the resultant segmentation mask and the ground truth region.Denoting the target foreground voxels in the segmentation mask as F and the voxels in the corresponding ground truth region as G, the DSC was calculated as 2|F ∩ G|/(|F| + |G|).
The ground truth region was manually obtained by using an interactive segmentation tool called ITK-SNAP [15].Both the outer air and airway segmentation are important because they play a major role in determining the needle route search space.We present the segmentation performance in Table 4.For the outer air segmentation, the standard geodesic distance-based method gave a DSC of 97.74%, whereas the proposed method gave a DSC of 99.54%.Similarly, for the airway segmentation, the standard distance resulted in a DSC of 96.50%, whereas the proposed reformulation resulted in a DSC of 97.98%.The improvement made by the proposed geodesic distance was significant showing a p-value < 0.02.In Figure 4, we present two examples of major improvement.Because the optimal distance threshold varies for different patients, the standard geodesic distance revealed under-segmentation and/or over-segmentation cases.On the other hand, the proposed reformulation allows us to determine the segmentation label based on zero-distance conditions consistently across all patients.Thus, it could generally provide a noteworthy improvement over the standard method.

Standard geodesic distance Proposed geodesic distance
Outer air Airway Outer air Airway

Needle Route Accuracy
Finally, we present the expert evaluation results for the computed needle routes in Table 5.We also present the volumetric renders of some example needle routes in different CT volumes in Figure 5.As mentioned earlier, a result is considered accurate only if it indicates a safe insertion path practically.A route with even a slight risk was labeled as inaccurate.The intensity sum-based cost function gave 84.19% and 84.93% accuracy with respect to observers 1 and 2, respectively.The proposed maximum intensity cost resulted in a further improvement (p-value < 0.05) showing 88.97% and 89.71%, respectively.Excluding the cases where the vocal fold localization was inaccurate, the needle route accuracy was 95.64% and 95.31%, respectively.The needle projection accuracy was considerably high, given that the inter-observer agreement was 95.58%.Figure 5 presents volumetric renders of the automated needle routes for different cases.The maximum intensity cost minimization could successfully avoid the thyroid and cricoid cartilages, placing the needle insertion points in the desired cricothyroid membrane.Moreover, the insertion points were also in the desired range laterally because our search space strictly followed the medically implied range after computing the midpoint.From this desired area, the proposed routes could reach the vocal folds without facing any obstacles.The entire process from vocal fold localization up to the optimal needle route search took about 9 s (as tested on a 3.60 GHz CPU and RTX 3080 GPU).Therefore, the proposed method can potentially be useful for guiding and accelerating preoperative planning.
The proposed method was robust under noisy CT volumes.The needle routes mainly relied upon airway segmentation and the minimal intensity path calculation.The airway segmentation was not affected by noise because the air has a clearly different intensity with a large margin of approximately 900 HU compared with other regions (fat, water, soft tissue, etc.).Moreover, the intensity of the thyroid cartilage is also high with a large difference (about 400 HU) from the other regions in the larynx.Therefore, the paths passing through cartilages were always avoided because the paths heading to the desired cricothyroid membrane constantly had minimal intensity even under the noise.In Figure 6, we present such a noisy case with a relatively low signal-to-noise ratio (SNR) of 7.27 dB whereas the average SNR was 9.42 dB.As mentioned earlier, no automatic needle trajectory planning work for TIL currently exists.Therefore, we could not directly compare our method to the existing works.A wide range of computational needle trajectory studies focused on liver tumor thermal ablation [16][17][18][19].Apart from the critical region, insertion angle to the liver capsule, interaction with healthy liver tissue, distance to critical structures (DCS), length of the path, etc. are considered constraints during the optimization in those works.However, these constraints are specific to the liver tumor ablation task and are unrelated to our study.Nonetheless, we implemented the path length and DCS constraints for a comparison with our intensitybased path planning.Table 6 compares the performance of different constraints for our trajectory planning.[19] 32.03% 33.84% Path length + DCS [17] 41.41% 44.62% The above results are based on observer 1's evaluation.
From Table 6, we can observe that the path length and DCS did not give fruitful performance because of their inappropriateness in our task.Since the distances of any points in the desired insertion area (cricothyroid membrane) to the target vocal fold are similar, the path length is not a critical constraint in our task.On the other hand, the shortest path is often pierced through the thyroid cartilage because this region is usually the closest to the vocal fold.Since our goal is to avoid cartilage-and calcification-like rigidness and propose a softer path from the cricothyroid membrane, the DCS constraint was also not helpful.The proposed intensity-based planning could always avoid such rigidness because calcification and cartilage have high intensity.Therefore, the insertion point was ensured to be in the desired membrane.

Limitations
Among the unsuccessful routes, about half of the cases were caused by inaccurate vocal fold locations.The remaining unsuccessful cases were labeled inaccurate because the resultant routes, despite not entering the airway, were almost touching the paraglottic membrane.Though such cases were few, the proposed search space and cost functions do not encode such a scenario.Therefore, we plan to incorporate a soft constraint into our cost function in our future work, which can also relate to the distance from the critical region.

Conclusions
To aid physicians in preoperative planning for TIL, we proposed the first computational method for automatically finding the optimal needle paths in neck CT.Each phase of the method was designed by closely relating to the medical routine and proposing appropriate image analysis techniques.The results of each phase were carefully evaluated, considering the practical plausibility of the proposed result.Overall, the needle paths computed by the proposed method had high accuracy and are shown to be potentially useful for preprocedural planning.

Figure 1 .
Figure 1.Automatic needle route planning for TIL.Based on the vocal fold locations and the segmented critical regions, the search space is defined by covering all possible trajectories from the anterior neck skin to the vocal folds.The optimal needle routes (one for each vocal fold) are then determined based on a minimal cost search.The first three figures (from left) are presented in axial view, whereas the last figure presents a volumetric render of the result.The blue and orange lines indicate the needle routes for the right and left folds, respectively.

Figure 2 .
Figure 2. Critical region segmentation.Initial air mask is obtained by HU thresholding.Outer air is then segmented based on the geodesic distance with respect to the outer air seed.The airway mask is obtained by subtracting the outer air from the initial air mask.

Figure 3 .
Figure 3.The search space for needle route.The initial search space is obtained by spanning 10 mm to the right (e r ) and left (e l ) from the midpoint p m on the skin w.r.t. and the vocal fold line.The final search space is obtained excluding the trajectories that pass through the airway.

Figure 4 .
Figure 4. Two examples (in top and bottom) of improved segmentation based on the proposed geodesic distance.Yellow region indicates the intersection of the resultant mask and the ground truth (i.e., true positive region).Green region refers to the over-segmented region (i.e., false positive region).Red region indicates the under-segmented region (i.e., false negative region).

Figure 5 .
Figure 5. Optimal needle routes by the proposed automatic method.The orange and blue lines in the render represent the route to the right and left vocal folds, respectively.The needle insertion lines are extended by 40 voxels beyond the skin for visualization purposes.

Figure 6 .
Figure 6.Needle routes in a noisy case.The critical regions (i.e., air and cartilage) are still clear under the noise because of their large intensity difference.(Left) axial view of the vocal folds.(Right) volumetric render of the resultant needle routes.

Table 3 .
Vocal fold localization performance of different methods.

Table 4 .
DSC of the segmentation result with respect to ground truth.

Table 6 .
Performance of different trajectory planning methods.