Imaging the Vocal Folds: A Feasibility Study on Strain Imaging and Elastography of Porcine Vocal Folds

: Vocal folds are an essential part of human voice production. The biomechanical properties are a good indicator for pathological changes. In particular, as an oscillation system, changes in the biomechanical properties have an impact on the vibration behavior. Subsequently, those changes could lead to voice-related disturbances. However, no existing examination combines biomechanical properties and spatial imaging. Therefore, we propose an image registration-based approach, using ultrasound in order to gain this information synchronously. We used a quasi-static load to compress the tissue and measured the displacement by image registration. The strain distribution was directly calculated from the displacement ﬁeld, whereas the elastic properties were estimated by a ﬁnite element model. In order to show the feasibility and reliability of the algorithm, we tested it on gelatin phantoms. Further, by examining ex vivo porcine vocal folds, we were able to show the practicability of the approach. We displayed the strain distribution in the tissue and the elastic properties of the vocal folds. The results were superimposed on the corresponding ultrasound images. The ﬁndings are promising and show the feasibility of the suggested approach. Possible applications are in improved diagnosis of voice disorders, by measuring the biomechanical properties of the vocal folds with ultrasound. The transducer will be placed on the vocal folds of the anesthetized patient, and the elastic properties will be measured. Further, the understanding of the vocal folds’ biomechanics and the voice forming process could beneﬁt from it.


Introduction
Our voice is the primary way of interpersonal communication. Not only for professional speakers or singers, many everyday routines rely on a functioning voice and sound production. The major part of the voice production takes place in the upper airway, beginning at the larynx. The vocal folds (VFs), located in the larynx, produce the primary sound. The air flow passes by and stimulates their oscillation [1]. The vocal tract finally forms the audible human voice.
The voice forming process as a whole is still being researched and not fully understood yet. The biomechanical properties of the VFs play a crucial role in the whole phonation process [2]. In this oscillating system, the eigenfrequencies are not only set by the spatial dimensions, but also by the mechanical properties. Changes in the elasticity are a good marker for benign and malignant changes. In addition, the diagnosis of voice disorders profits from new methods, which are able to examine the

Materials and Methods
The discussion of the materials and methods is divided into sections. At first, the anatomical properties of the human and porcine VFs are described. Based on this, the preparation of the porcine larynx and the gelatin phantoms is depicted. Further, the experimental setup for the mechanical testing of the gelatin phantoms is discussed. Finally, we will go into the image acquisition, image registration, and the following computation of the elastic properties.

Anatomical Structure of the Vocal Folds
The VFs, located in the larynx, are primarily responsible for the phonation process. Their geometrical measurements are 3-10 mm in the cranial-caudal, 10-20 mm in anterior-posterior, and 8-12 mm in medial-lateral direction [1]. They oscillate self-sustained with a high frequency and small amplitude. A study on ten individuals showed that the average fundamental frequency for males is around 174 Hz and for females around 278 Hz, and the lateral amplitudes are around 0.93 mm and 0.80 mm [19].
Histologically, three different main layers (Figure 1b) of the human VF can be separated: epithelium, lamina propria, and musculus (M.) vocalis, whereas the lamina is formed by sublayers (superficial, intermediate, and deep lamina propria) [20]. The epithelium is 0.05 mm thick, and the lamina propria is 1 mm thick [21]. The epithelium maintains the shape of the VF and protects the underlying tissue. The lamina propria mainly influences the elastic properties of the VFs. The M. vocalis acts actively as a muscle and passively as a stiff rubber band [22]. Primarily, the outer layers consist of extracellular matrix, whereas the inner layers consist of cells [1]. Different models were developed to describe the biomechanical properties and the oscillation of the VFs [23]. As a result of their complex structure, the elastic behavior of the VFs is nonlinear [1]. In the body-cover-model, the VFs are simulated by a two-layer continuum model, in which the body layer represents the M. vocalis and the deep lamina. Further, the cover is represented by the outer layers. The laryngeal muscles control the tension of the VFs: the body-tension is set by the contraction of the M. vocalis and M. cricothyroid, whereas the cover-layer has no contractile properties on its own [4]. Increasing the body tension leads to a decrease of the oscillation amplitude of it and concentrates the vibration on the cover-layer. Depending on the body-cover stiffness ratios, the phonation onset frequency is effectively controlled by either the body stiffness for small ratios or the cover stiffness for large ratios [4]. The contraction of M. vocalis may lead to an increase or decrease of the phonation frequency, whereas a contraction of M. cricothyroid always leads to an increase [1].
In this study, we used porcine VFs, which are quite similar to human VFs [24]. However, the porcine VFs lacks one layer of the lamina propria, and it is more challenging to distinguish the single layers. The thicknesses of the individual layers are comparable to the human VFs [21].

Preparation of Gelatin Phantoms and Porcine Larynx
To validate and test the algorithm, we used gelatin phantoms with known elastic properties. We used mechanical testing to measure the Young's modulus E and Poisson's ratio ν. Further, the results were used as initial values for the optimization of the finite element model (FEM).
Preparation of the gelatin phantoms: Gelatin is a widely-used material to substitute tissue for US testing [25]. The body of the phantom was made of 0.2 g/mL gelatin (Peter Pharma GmbH, Gelantine Rind) diluted with 0.01 g/mL of flour. The flour increased the acoustic reflection and the stiffness of the phantom [26]. The cuboid phantom had a base of 30 × 60 mm and a height of 30 mm. In the center of the phantom, we placed a cylindrical inclusion (diameter of 8 mm) with 0.5 g/mL of gelatin and 0.1 g/mL of flour. In that way, we had a phantom that had a stiffer core compared to the surrounding body. The elastic properties were considered homogeneous in these two regions. For both mixtures, pure phantoms with the same dimensions and elastic properties were produced. These pure phantoms were used in the following mechanical testing. The phantoms were stored at 4 • C and examined at room temperature.
Preparation of the larynx: We used porcine VFs (Figure 1a) for the ex vivo examination. The larynx was extracted from a pig at the local slaughterhouse (Erlangen). The surrounding tissue and the vestibular folds were removed. The remaining larynx was vacuumed and packed in plastic bags, shock frozen at −140 • C and stored at −80 • C. It was sent to Hall in Tirol by postal service in a thermally-insulated transport box. The larynx arrived frozen and was stored at −24 • C. One day before the examination, the larynx was thawed over night at a temperature of 5 • C. The larynx was examined in the holder shown in Figure 1a. The holder was specially designed and 3D-printed with PLA (polylactic acid). The larynx was fixed in it by two adjusting screws, pressing in the dorsal direction on the thyroid cartilage with flexible contact plates. On the posterior side of the larynx, a big contact plate ensured a tight grip. In the holder, we made sure the external forces on the larynx were minimized.

Mechanical Measurements of Gelatin Phantoms
To estimate the mechanical properties of the phantom, we used a compression test (Figure 2a). The compression test complied better with our examination method than the usual tensile testing. The phantom was compressed by a 3D-printed US transducer replica (not shown in the figure) fixed on a linear unit. The replica was used, because the cable of the US transducer would tamper with the force measurement. The linear unit was driven by a stepper motor. The steps of the motor gave information about the displacement of the transducer. Simultaneously, a load cell (HT Sensor TAL221 500 g, Data acquisition: SparkFun OpenScale) recorded the force on the surface of the phantom. The whole measurement process was controlled by MATLAB (MATLAB 2018b, The MathWorks, Inc., Natick, MA, USA). The engineering strain is given by: whereas ∆l is the measured change in length and l 0 the original length. The uniaxial normal stress σ was calculated by dividing the normal force F by the area A: σ = F /A. Young's modulus is the ratio of stress to strain: in the axial direction [27] (pp. [8][9]. To estimate Young's modulus E over a set of measurements, we used a linear regression on the stress-strain-curve ( Figure 2b). Deming regression was used, because it is useful for erroneous measurement results [28]. Young's modulus is the slope of the regression in the stress-strain-curve. In our case, the uniaxial Young's modulus of the body was E = 50 (± 1) kPa and of the inclusion E = 93 (± 2) kPa. The measured values were in the expected range of 40-110 kPa for gelatin [29,30]. Furthermore, Poisson's ratio ν of the body material was measured with the same setup. Additionally, an industrial camera (Lumenera LM135M) was placed in front of the phantom to estimate the height and the width. The images were processed by a Sobel edge detection, which identified the borders of the phantom [31] (p. 36ff.). The vertical ε yy and horizontal strain ε xx were calculated by Equation (1). Subsequently, Poisson's ratio: was estimated by a linear fit between the strains [32] (p. 88). The estimated Poisson's ratio for the body material was ν Body = 0.49, which fits the literature [33,34]. In the FEM of the phantom with the inclusion, Poisson's ratio of the body material was used, because it had a bigger impact on the overall Poisson's ratio.

Evaluation of the Image Data
The aim of the data evaluation process was to estimate the strain in the tissue and the elastic modulus of it. Several steps were necessary: (1) data acquisition (B-mode video sequence), (2) image filter, (3) image registration, (4) filtering of the displacement field (Savitzky-Golay filter), and (5) estimation of the strain distribution, respectively the elastic modulus ( Figure 3).

Image Acquisition and Preprocessing
The VFs and the phantoms were examined with the Alpinion E-Cube 15 EX, using the IO8-17 Transducer with a frequency range of 8-17 MHz. The examination of the phantoms was done in the setup shown in Figure 2a. The phantom with the inclusion was compressed 1 mm by the transducer. Examining the VFs, the transducer was directly placed on the surface in the position shown in Figure 1a. The examiner took care that the gap between the transducer and VF was completely filled with US gel. The tissue was compressed manually, and the examiner loaded the tissue in the caudal direction. Both compression processes were recorded as a B-mode video sequence.
The video sequence was saved in DICOM (Digital Imaging and Communications in Medicine) format and imported to MATLAB. The user manually chose a region of interest (ROI). We applied the following filters to prepare the frames for the image registration process: At first, the contrast was enhanced by a histogram equalization. Furthermore, the histograms of the images were matched to the first frame to avoid brightness differences. The metric of the image registration allowed only affine brightness differences [35]. To denoise the images, without loosing sharp edges, we used a non-linear anisotropic diffusion filter [36]. Then, the images were given to elastix for the registration process.
2.4.2. Image Registration elastix [35,37] is an open-source image toolbox for intensity-based image registration. The main task of image registration is to map one image to another and gain information about the spatial relationship between them. The main applications are aligning different image modalities, comparing images of different recording times, or preparing for segmentation and classification [35].
In our case, the challenge was to find the temporal relationship between frame I N (x) and the following frame I N+1 (x); whereas, x = [x, y] T is the position occupied by a material point [38]. The relation between I N (x) and I N+1 (x) is expressed by the transformation T(x). The optimal transformation perfectly aligns the two frames. To find this transformation, a cost function: was defined and minimized. To find the minimum, an iterative process was used [39]. In many cases, T is not well defined, because it is not necessarily possible to map every pixel in the first frame to a corresponding pixel in the following frame. For more details on the registration process itself, we refer to Klein et al. [35].
We used a non-rigid b-spline registration with a multiresolution approach [40]. We applied a normalized correlation coefficient (NCC) metric. The NCC metric had no spatial boundaries; therefore, regularization was necessary to avoid non-physical displacements (e.g., folding). We regularized the registration in two ways: First, we adjusted the maximum step size of the optimizer to the expected displacements. In that way, we avoided too large displacements. Secondly, we added a bending energy penalty P (T) to the cost function S(T, I N , I N+1 ) in Equation (4), resulting in a regularized cost function: C(T, I N , I N+1 ) = γ 1 S(T, I N , I N+1 ) + γ 2 P (T) The bending energy penalty smoothed the displacement field according to the bending of a thin metal plate [41,42]. The values of γ 1 and γ 2 weighted the influence of the two metrics on the registration process. The estimation of the values was an empiric and iterative process. In our registration, we used relative weighting, in which both metrics were weighted the same.

Displacement Field and Strain
The results of the registration were given back to MATLAB where the transformation T(x) was converted to the displacement field. The displacement field gave information about the spatial relationship of every material point between two frames. It consisted of the displacement vectors for all material points.
The displacement vector u(x, t) gave the relation between every voxel in the first frame to the same voxel in the following frame, which was estimated by the image registration.
Furthermore, strain was calculated. It was the change of length per unit length, which could be also expressed as the gradient of the displacement field [38].
The estimated displacement field was contaminated with noise. If the strain was directly calculated from the noisy displacement field, the noise was amplified. We used a Savitzky-Golay filter to reduce the noise, without losing local information [43]. The filter was applied in two ways: temporal and spatial. The temporal filter smoothed the displacement over a range of frames at the same location. The underlying assumption was that the displacement had no sudden changes over a minor time range. Secondly, the spatial filter smoothed the displacement field in one frame. Again, the assumption was that in adjacent regions, no sudden changes occurred, whereas continuous changes were mapped by the filter.
The strain in the tissue was calculated according to Lai et al. [44]. The displacement gradient for a two-dimensional displacement field is: Further, the deformation gradient F was calculated: I denotes the identity matrix. Subsequently, the 3D Eulerian strain tensor was determined by:

Elastic Properties
It is not possible to determine the elastic modulus E(x) directly from the displacement field or the strain image, so additional information is necessary to get quantitative results. The three-dimensional deformation of the tissue was measured in a two-dimensional plane. Therefore, we also reduced the model to a two-dimensional problem by making assumptions. We used the plane-stress approximation, which assumes that all out of plane stresses are zero [45]. The reconstruction of E(x) from a displacement field is an inverse problem. In our algorithm, we discretized the input to linear triangular elements. The linearity was only valid for small strains. Burks et al. [46] showed on ex vivo porcine VFs that the transition point from linearity to non-linearity was approximately at a strain value of 0.122. Consequently, we only evaluated measurements with lower strain values.
The quality of the estimation depends on the magnitude of the displacement field, due to the smaller signal-to-noise ratio [47]. To obtain greater displacements in the registration process, every hundredth frame was used. We solved this by using OpenQSEI, which is an open source FEM solver of elastic problems based on MATLAB [48].
The computation of E(x) is an inverse problem in the form of: in which U(E(x)) is the forward solution, e the assumed noise, and u(x, t) the measured displacement field. The assumed noise e was considered Gaussian-distributed and summed up all noise that occurred in the measurement process so far. The solution of Equation (10) can be found by solving: The observation noise covariance matrix is expressed by L T e L e = C −1 e ; furthermore, E L and E U define the lower and upper limit of E. The solution can be plagued by modeling errors, noise, and outlier data [47,49,50]. Therefore and because the inverse problem can be ill-posed, the computation needs a regularization term p E (E). The choice of p E (E) has a major impact on the result. We decided to use a Tikhonov uninformative regularization, because we expected no sharp features and a non-homogeneous background [47]. The Tikhonov regularization parameter α was chosen according to the noise level. If α was large, the regularization tampered with the solution. Consequently, if α was too small, the noise would destroy the solution. The right choice of α needs to account for those influences [51] (p. 123).
The choice of the boundary conditions is another crucial part of the FEM, and it has to be oriented on the physical settings (Figure 1a). In our case, the displacements at the tissue-transducer interface were close to zero (Figure 4). In the FEM, all displacements at this boundary were considered to be zero. The second boundary condition, which was used in the model, was a virtual force F v compressing the tissue (Figure 4). The force was seen as virtual, because in the reference system of the transducer, and consequently of the image, the tissue was compressed towards the transducer (in the positive y-direction). The real force, applied by the transducer, was compressing the tissue in the negative y-direction. Using those two boundary conditions, OpenQSEI solved the inverse problem (Equation (11)) with an iterative constrained Gauss-Newton optimization algorithm [48].

Measurements in Gelatin Phantoms
The gelatin phantoms were compressed by the linear unit, shown in Figure 2a. The resulting images were evaluated by the presented algorithm. Figure 5a shows the displacement field in the phantom. The strain image (Figure 5b) made the differences in the elasticity visible. It is easy to see that the region of the inclusion shared the same strain. The algorithm showed this applicability to B-mode images and its capability to estimate strain distribution.
The estimated elastic modulus E was projected on the US image (Figure 5c). The parameters for QSEI were adopted for the special requirements. In that case, we used the virtual force F v = 1.5 N, which was the force compressing the phantom. The start value for the optimization of the elastic modulus was E = 100 kPa. The Tikhonov regularization parameter was α = 10 −3 . Figure 5c shows that the measured elastic modulus was in the same range as the mechanically-measured one. The inclusion is clearly visible.

Ex Vivo Measurements on Porcine Vocal Folds
The images of the porcine VFs were analyzed with the same algorithm. The estimated displacement field (Figure 6a) and the strain field (Figure 6b) were mapped on the US images. The FEM solution (Figure 6c) was found with F v = 1 N and the start value for E = 10 kPa. The Tikhonov regularization parameter was chosen α = 10 −5 .

Discussion
The goal of this work was to show a new approach to measure the elastic properties of the VFs. At first, we showed the applicability of our algorithm on gelatin phantoms and compared the results with mechanical measurements. Later, we used the same algorithm on porcine VFs, and it showed feasible results. The discussion of the results is separated into three sections: (1) the discussion of the measurement method and the algorithm, (2) the discussion of the measurement on VFs, and (3) potential applications and outlook.

Examination Method and Algorithm
The examination method and the algorithm itself consisted of several steps and interfaces between different software tools ( Figure 3). Every step was optimizable and potentially erroneous, so we will discuss the individual steps. First of all, the image acquisition on the VFs was user dependent, because the VFs were compressed manually. The manual compression, therefore the strain field, was difficult to reproduce [16]. The shear-wave elastography could reduce the user dependency, bringing up other technical challenges [52]. For example, the choice of the different imaging and stimulation frequency was challenging.
Due to the measurement on ex vivo VFs, the elastic modulus was measured on relaxed muscles. For this reason, the presented results were only valid for muscles free from tension. Our approach makes anesthesia of the patient mandatory, which leads to relaxation of the muscles. Therefore, the measurement on muscles under tension is not possible. The transcutaneous approach was able to display the VFs under movement [53,54].
The measurement method and the algorithm were validated on rectangular gelatin phantoms with a central inclusion. In this case, the proposed method showed good results. The geometrical and elastic properties of these phantoms are different from the ones of the VFs. Therefore, the validation could be refined in two ways. On the one hand, the validation could be done on artificial VFs, or on the other hand on a series of measurements of porcine VFs. Our future goal was to validate it in both suggested ways.
The image registration of the video sequence was very time consuming. Every frame took around 10-15 min to register. An optimization of the calculation time was sought. The calculation could be reduced by reducing the iteration number of the optimizer, which possibly reduced the registration precision. Further investigations on the minimum number of iterations could be beneficial. Additionally, the registration could be done on a GPU [37].
The filter parameters, specifically the filter window and order of the polynomial, for the Savitzky-Golay filter of the displacement field are empirical values. The exact choice needs to be validated and optimized. Furthermore, the parameters (e.g., boundary conditions and Tikhonov regularization parameter) of the FEM solver were chosen manually. The boundary conditions of the FEM were an approximation of the real ones, and the measurement of the forces in the tissue was not done. To use this approach in non-lab settings, further research on the FEM estimation would be necessary. Especially, an exact integration of the displacement at the borders would be desirable. As a result of the approximated boundary conditions, the corners of the measured elastic modulus of the phantom (Figure 5c) appeared to be very high.
As a result, the algorithm will be enhanced. Especially, the automatic choice of the mentioned parameters and a shorter calculation time would be required for clinical applications.

Measurements on the VFs
The described measurement on the VFs was a new method and to our knowledge the first of its kind. That makes a direct comparison difficult. We compared our approach with alternative methods: with other medical imaging techniques, with different placements of the US transducer, and with mechanical measurements on porcine VFs.
First, we will compare our results to other medical imaging techniques, like OCT and magnetic resonance imaging (MRI) [55]. The optical methods (e.g., videostroboscopy, videokymography, and videoendoscopy) measure the movements and strains on the VF's surface [56]; whereas in our study, quasi-static loads on the VFs were evaluated. Due to the different nature of these methods, a reasonable comparison of the two approaches is not possible. Consequently, we will only discuss medical imaging techniques, which are able to image tissue in depth, notably US, OCT, and MRI.
The OCT examination of the VFs is a promising approach, due to the high-resolution of the images [57][58][59][60][61][62]. Imaging the different layers of the VFs, the OCT surpassed the US imaging [59,62]. However, the penetration depth in the mentioned studies was a maximal of 2.6 mm, whereas the penetration depth of the US was much higher, in our case 20 mm [59]. As a consequence, deeper layers can be displayed. Therefore, the M. vocalis was clearly visible on our images (Figure 6c). The resolution loss through US can be compensated by using higher frequencies. Up to 50 MHz, with a penetration depth of 5 mm, have been used to examine the VF tissue [63].
The MRI images of the phonation process measured the anatomical structures of the VFs. Measurements of the elastic properties with MRI have been shown in other organ systems [64].
Furthermore, the vocal tract [65,66] and artificial VFs [67] have been examined with MRI. The tissue needs to be excited by an internal or external source [64]. This would be one of the challenges of using magnetic resonance elastography to examine the VFs.
Ultimately, it was not clear that there was a superior method. All methods had their advantages and disadvantages. The US had a deeper penetration depth, whereas the OCT had a good resolution, and MRI was able to show the VFs during phonation. Further research in the elastic imaging of the VFs is necessary, in order to choose the best method [9]. Our USE approach had the advantages of being minimally invasive, not exposing the patient to radiation, using a standard US machine, and being well studied.
In addition, we will compare different transducer placements. In our study, we placed the transducer directly on the VFs, the so-called direct approach. However, other researchers placed the transducer on the skin of the patient, which is called the transcutaneous approach [10,11,[13][14][15]54,[68][69][70]. In the direct approach, it was easy to identify the VFs, whereas in transcutaneous approaches, the visualization rate was only 36.7% for the true VFs [71]. Those difficulties were based on two main factors: the VFs's anatomical position posterior of the thyroid cartilage and the physical limits of the US method itself. The penetration depth of the US decreased with the frequency, while the resolution increased [13]. Consequently, it was difficult to gain high-resolution images of the VFs in a transcutaneous approach. In the direct approach, we overcame both limitations: this allowed us to exactly place the transducer on the VFs, and we could use higher frequencies for good spatiotemporal resolution. Adversely, the spatial conditions in the human larynx are quite small, so a specially-designed transducer for in vivo measurements would be necessary. Additionally, in clinical use, the patient would have to be anesthetized, which makes the examination more invasive than the transcutaneous approaches.
Furthermore, we compared our results to mechanical measurements of the VF's elastic properties. Various measurement protocols (e.g., traction testing, shear rheometry, linear skin rheometry, indentation testing) have been used to estimate the elastic modulus of porcine VFs [1]. Using the traction testing approach on six porcine VFs, an elastic modulus of 16.3 (± 2.9) kPa was found [72]. Besides, Burks et al. [46] measured an elastic modulus of 17.86 (± 7.91) kPa in the low strain region using a traction testing approach based on digital image correlation (DIC). Both methods are also static methods, which measure the properties ex vivo, so the results were comparable with ours. In Figure 6c, we see that the estimated elastic modulus in the present study was in the same range.
On the other hand, a similar DIC-approach found a elastic modulus of 40.5 (± 15.7) kPa [73]. Miri et al. [73] explained the inconsistency with the different measurement methods, which led to an underestimation of the elastic modulus in mechanical measurements. The indentation testing method reported lower values between 1.6 and 5.7 kPa [1]. Those values were below the estimated values in our work. This could be a consequence of the highly different measurement methods.
Therefore, no conclusion can be drawn about the frequency dependency of the elastic modulus. Therefore, further research with vibration VFs would be necessary.

Potential Applications and Outlook
The measurement of the elastic properties of the VFs could be beneficial for diagnostic methods of voice disorders, but further research is necessary [3,4]. Moreover, the results of our study can be used in refining models of the VFs. The knowledge of the spatial distribution of the displacement, the strain, and the elastic modulus can be computerized and used for models of the VFs. We showed that it was possible to gain strain images and the spatial distribution of the elastic modulus of the true vocal folds ex vivo by an image registration approach. We will improve our algorithm, automate the parameter estimation for the image registration and the FEM, and test the whole measurement on a series of porcine larynges.