Random Fourier Features-Based Deep Learning Improvement with Class Activation Interpretability for Nerve Structure Segmentation

Peripheral nerve blocking (PNB) is a standard procedure to support regional anesthesia. Still, correct localization of the nerve’s structure is needed to avoid adverse effects; thereby, ultrasound images are used as an aid approach. In addition, image-based automatic nerve segmentation from deep learning methods has been proposed to mitigate attenuation and speckle noise ultrasonography issues. Notwithstanding, complex architectures highlight the region of interest lacking suitable data interpretability concerning the learned features from raw instances. Here, a kernel-based deep learning enhancement is introduced for nerve structure segmentation. In a nutshell, a random Fourier features-based approach was utilized to complement three well-known semantic segmentation architectures, e.g., fully convolutional network, U-net, and ResUnet. Moreover, two ultrasound image datasets for PNB were tested. Obtained results show that our kernel-based approach provides a better generalization capability from image segmentation-based assessments on different nerve structures. Further, for data interpretability, a semantic segmentation extension of the GradCam++ for class-activation mapping was used to reveal relevant learned features separating between nerve and background. Thus, our proposal favors both straightforward (shallow) and complex architectures (deeper neural networks).


Introduction
Recently, regional procedures have been arisen as an attractive alternative for general anesthesia in the context of medical surgeries to enhance post-operative mobility and reduce mortality and morbidity [1]. In this sense, peripheral nerve blocking (PNB) is a widely used method that involves the administration of an anesthetic substance in the area surrounding a nerve structure to block the transmission of nociceptive information [2]. Nevertheless, the success of PNB depends on a nerve structure's precise localization, avoiding adverse effects such as neurological damage or intoxication due to the flow of the anesthetic into the bloodstream [3]. Concerning this, ultrasonography has been used to support PNB. This technique aims to improve targeting accuracy enabling realtime visualization of the nerve at low cost, while also being non-invasive and using no radiation [4].
Conventional 2D ultrasound images carry different challenges such as attenuation, artifacts, and speckle noise-based disturbances, which make the nerve location by visual In turn, approaches based on explicit kernel approximation seek to estimate the nonlinear mapping directly. The fundamental work in [34] introduces the random Fourier features (RFF) estimator founded on Bochner's theorem for stationary kernels [35]. The RFF approach tackles the issues of significant computational and storage cost of kernel matrices [36]. In addition, some RFF variants (most of them approximating a Gaussianbased mapping) have been proposed to improve the computational cost and learning performance. For instance, the fast-food algorithm employs structured matrices to reduce the RFF's computational time from O(Q P ) to O(Q log P ), where Q represents the number of random features to represent and P is the input data dimensionality [37]. After that, the orthogonal random features (ORF) approach computes the RFF's Gaussian matrix with an orthogonal-based procedure [38]. Likewise, the structured orthogonal random features (SORF) method extends the ORF technique using a normalized Walsh-Hadamard matrix [38]. The butterfly-based quadrature (BQ) variant handles butterfly matrices to improve the system performance [39]. Notwithstanding, RFF-based techniques depend upon a trade-off between the system accuracy and computational burden [40].
This work proposes a kernel-based strategy to support nerve structure segmentation from ultrasound images using deep learning. Concerning this, an RFF-based approach is employed to approximate a Gaussian kernel implicit mapping within three wellknown architectures for image-based semantic segmentation. In particular, the FCN [12], Unet [14,41], and ResUnet [16] are studied. Our RFF-based improvement aims to provide a better generalization capability for ultrasound image-based nerve segmentation using straightforward and complex architectures. For concrete testing, we coupled an RFF layer within the bottleneck end for Unet and ResUnet architectures; meanwhile, the last pooling layer was used to locate our kernel enhancement in FCN. Moreover, two ultrasound image datasets were tested. The former belongs to the Universidad Tecnológica de Pereira and the Santa Mónica Hospital, Dosquebradas, Colombia; holding ultrasound images of sciatic, ulnar, median, and femoral nerves. The latter is a Kaggle Competition dataset [42], gathering ultrasound images of the brachial plexus (BP). For data interpretability, a semantic segmentation extension of the gradient class activation mapping (GradCAM++) strategy was applied [43], which aims to visually test the deep learning model's ability to learn relevant features separating between nerve and background. Specifically, a Grad-CAM++ extension of the seminal work in [44] for semantic segmentation was proposed to capture the entire object completeness. Then, an explanation map-based quantitative assessment was carried out for relevance analysis. Obtained results prove that our RFF-based improvement facilitates the discrimination between nerve structure and background with preserved data interpretability concerning the highlighted image regions.
The remainder of this paper is organized as follows. Section 2 depicts the materials and methods. Sections 3 and 4 present the experimental setup and the results obtained. Finally, Section 5 shows the concluding remarks.

Deep-Learning-Based Semantic Segmentation Fundamentals
be an input-output set holding N labeled images, where I n is the n-th image with R rows and C columns. The mask M n encodes the one-hot membership of each pixel on I n to the target class. For simplicity, gray-scale images and a binary segmentation problem are considered, e.g., background vs. object of interest.
A deep learning architecture for semantic segmentation often includes a stack of convolutional layers fed by the input images to predict each pixel label by exploiting the local spatial correlations. Thereby, let {W l ∈ R P l ×P l ×D l } L l=1 be a set of convolutional layers, where P l and D l denote the l-th layer size and the the number of filters, respectively (L holds the number of convolutional layers). Given an input image I, a prediction mask M ∈ [0, 1] R×C can be computed as: where F l = ϕ l (F l−1 ) = ν l (W l ⊗ F l−1 + b l ) ∈ R R l ×C l ×D l is a tensor holding D l feature maps at the l-th layer, ϕ l : R R l−1 ×C l−1 ×D l−1 → R R l ×C l ×D l is a representation learning function, b l ∈ R D l is a bias vector, and ν l (·) is a non-linear activation function, i.e., rectified linear unit defined as ReLU(x) = max (0, x). Notation • stands for function composition and ⊗ for image-based convolution. Note that F 0 = I and F L =M, where ν L (·) can be fixed as a sigmoid or softmax function for bi-class and multiclass segmentation, respectively. In turn, the prediction accuracy relies on the parameter set θ = {W l , b l } L l=1 , yielding: where L : {0, 1} R×C × [0, 1] R×C → R is a given loss function and E{·} stands for the expected value operator. The optimization problem in Equation (2) can be solved through mini-batch based gradient descend using back-propagation and automatic differentiation [45]. Concerning this, the representation learning stage depicted on the composition function in Equation (1) can be built from several deep learning architectures. Consequently, the three most relevant approaches devoted to image-based semantic segmentation are briefly described: - Fully convolutional network (FCN) [12]: It is known as the fundamental semantic segmentation architecture that avoids computational redundancy and replaces fully connected layers with convolutional ones. FCN is based on the well-known "very deep convolutional network for large-scale image recognition model" (also known as the VGG-16 algorithm) [46]. -U-net [14]: This approach aims to extract low-level features while preserving highlevel semantic information. Moreover, the U-net algorithm pretends to relieve training problems related to a limited number of samples [47]. Remarkably, the U-net's architecture includes an encoder and decoder stage, and is a U-shaped network. -Residual network and U-net (ResUnet) [16]: This approach enhances the U-net algorithm including residual blocks. Thereby, residual learning is employed to boost the model layers as residual functions referenced to the inputs, instead of learning unreferenced mappings; that is, the enhanced feature maps can be rewritten as F l = ϕ l (F l−1 ) + F l−1 [48]. Then, the ResUnet combines low and high-level features, favors the network optimization, and includes a deeper representation learning stage than U-net and FCN approaches.

Random Fourier Features Approximating Kernel Mappings
To improve the generalization capability of deep learning approaches for semantic segmentation, we propose to include a kernel mapping-based layer within the network architecture. For such a purpose, let x, x ∈ R P be a pair of samples from a real random variable on P dimensions. The well-known kernel trick indirectly computes the inner product between implicitly generated features from any pair x, x using a kernel function κ : R P × R P → R, so that κ(x,x ) = φ(x), φ(x ) , where φ : R P → H defines an implicit mapping to an "infinite-dimensional" Hilbert space H.
Due to the untractable mapping, kernel approaches require high computational and storage costs for large training sets. The random Fourier features (RFF) method lightens the computational burden by taking advantage of Bochner's theorem for shift-invariant kernels, e.g., κ(x − x ) = κ(x,x ) [34]. Namely, a function κ(x − x ) is positive definite if and only if its Fourier transform is related to a non-negative measure, as follows: where ζ ω (x)=e jω x , so that ζ ω (x)ζ ω (x ) * is an unbiased estimate of κ(x − x ) when ω ∈ R P is drawn from p(ω) and * stands for the complex conjugate. Moreover, the probability p(ω) and the kernel function κ(·) are real, then the integral in Equation (3) converges by replacing the exponential with a cosine. So, a real-valued mapping that satisfies the condition E{φ( when ω ∼ p(ω) and b ∼ U (0, 2π). Since the expected value ofφ(x)φ(x ) * converges to κ(x − x ), the estimator's variance can be reduced by concatenating Q randomly chosen mappings (normalized by √ Q); then the following approximation arises [49]: whereφ : R P → R Q . Overall, the Gaussian kernel is preferred because of its universal approximating property and mathematical tractability [50]. Then, for κ(x − x ) = exp(− x − x 2 2 /2σ 2 ), being σ 2 ∈ R + a given bandwidth, its Fourier transform yields to p(ω)=N (0, σ 2Ȋ ), where 0 andȊ are an all-zero vector and the identity matrix of proper size.
Afterward, to provide a better generalization capability founded on kernel mappings, an RFF-based layer from Equation (4) can be used to enhance the semantic segmentation architectures exposed in Section 2.1, by addingφ(·) to the function composition approach in Equation (1). In particular, we propose to add the RFF layer after the last pooling in FCN (see Figure 1). Similary, the RFF mapping is added after the bottleneck end for U-net and ResUnet architectures (see Figure 2).  Figure 2. RFF-U-net architecture for semantic segmentation. U-shaped network to preserve both low and high-level features using an encoder-decoder approach. The RFF layer is located at the bottleneck (red block). The number of filters, kernel size, stride, and nonlinear activation function are depicted.

Relevance Analysis Based on Class Activation Mapping for Semantic Segmentation
Deep learning provides the most effective approach to today's intelligent systems; however, their prediction success is limited by the inability to explain human users' decisions (interpretability); therefore, highlighting the most relevant features to discriminate between nerve and background could help visualize what is beneath the hood when using a neural network [51]. Here, a semantic segmentation-based extension of the gradient-class activation mapping (Grad-CAM++) algorithm [43] is used to provide an efficient data interpretability strategy, revealing fine-grained image details to capture the entire nerve completeness.
Let S l (λ) ∈ R R×C be a class-specific upsampled saliency map, regarding the output label λ ∈ {0, 1}, as follows: F d l ∈ R R l ×C l stands for the d-th feature map at layer l computed for a given input image I, µ : R R l ×C l → R R×C is an upsampling function, and γ d l (λ) ∈ R + is a saliency weight: where A d l (λ) ∈ R R l ×C l , and: gathers a class-conditional score, being is an all-ones column vector of proper size, and stands for the Hadamard product [44]. Following the algorithm proposed by authors in [43], matrix A d l (λ) in Equation (6) can be computed as: where A d ij ∈ A d l (λ). It is worth mentioning that the weighted combination in Equation (6) favors our CAM-based approach to deal with different object orientations and views; meanwhile, the ReLU-based thresholding in Equations (5) and (6) constrained the relevance analysis to gather only positive gradients into S l (λ), indicating visual features that increase the output neuron's activation rather than suppressing behaviors [52].

RFF-Based Semantic Segmentation Pipeline and Main Contributions
In summary, we propose a twofold deep learning pipeline from nerve structure segmentation. (i) An RFF-based layer (as discussed in Section 2.2) is coupled with three well-known shallow and complex architectures (as discussed in Section 2.1). To the best of our knowledge, this is the first attempt to combine kernel mappings within shallow and deep models to support nerve structure detection from 2D ultrasound images. (ii) A CAMbased extension for semantic segmentation (see Section 2.3) from RFF-based mappings is proposed to highlight the most relevant features (image regions) that favor discriminating between nerve and background. Figure 3 depicts our RFF-based semantic segmentation pipeline. For concrete testing, we apply our RFF-based proposal within the FCN [12], U-net [14,41], and ResUnet [16] approaches. Our main aim is to improve the representation learning benefits of deep models using robust kernel mappings. Then, the RFF layer is used after the last pooling block in FCN (see Figure 1). Our kernel-based enhancement is located at the bottleneck block in U-net and ResUnet models (see Figure 2). Of note, the ResUnet architecture reformulates the U-net model through residual blocks (see Figure 4). In addition, we search the place with lower features number by RFF computational cost, O(Q P ). Furthermore, we conducted various experiments that concluded that these are the better place for the RFF layer. These experiments were located the RFF layer in different places.

Experimental Setup
Our RFF-based deep learning enhancement approach was tested as a tool to support nerve structure segmentation from 2D ultrasound images. In particular, such a semantic segmentation was studied from three well-known deep learning architectures: U-net, ResUnet, and FCN (as exposed in Section 2). Thereby, we aimed to demonstrate the discriminative capability and interpretability benefits of our kernel-based improvement. In the following we present the studied datasets. Next, we describe the quantitive assessment, method comparison, and implementation details.   [42]. It holds labeled ultrasound images of the neck concerning the brachial plexus (BP). In particular, 47 different subjects were studied, recording 119 to 580 images per subject (5635 as a whole) at 420 × 580 pixel resolution. For concrete testing, we performed a pruning procedure to remove images with inconsistent annotations as suggested by authors in [18][19][20], yielding to 2323 samples.

Method Comparison, Performance Measures, and Implementation Details
To compare the performance of our RFF-based framework for nerve structure segmentation that includes RFF-FCN, RFF-U-net, and RFF-ResUnet, where RFF stands for approximating kernel mapping (see Figure 3), we considered the following relevant stateof-the-art approaches: (i) FCN [12], (ii) U-net [14,41], and (iii) ResUnet [16]. Moreover, for the NSD dataset, the following approaches are studied: (iv) an automatic nerve structure segmentation methodology founded on U-net algorithm [18], (v) an approach that couples linear Gabor binary patterns and deep learning [19], and (vi) an algorithm that comprises median filtering and dense atrous convolution with residual multi-kernel pooling to enhance a ResUnet strategy [20].
A hold-out cross-validation scheme is applied for all provided datasets, setting 70% of the samples for training, 10% for validation, and 20% for testing. Furthermore, as quantitative assessment concerning the semantic segmentation performance, the sensitivity (Sen), the specificity (Spe), the Dice coefficient, the intersection over union (IOU), the area under the ROC curve (AUC), and the geometric mean (GM) are reported on the testing set, which can be written as follows: where TP, FN, and FP represent the true positives, false negatives, and false positives predictions after comparing the actual and estimated label masks M n andM n for a given input image I n . The AUC can be computed by varying the decision boundary concerning the Sen and Spe measures [53]. Next, to measure the data interpretability quality (relevance analysis), two explanation map-based measures are introduced founded on the work proposed by authors in [43]. Thereby, letĨ(λ) =S(λ) I be the explanation map of image I with respect to the normalized class activation mappingS(λ) = S(λ)/ max(S(λ)) at a given layer of interest. Moreover, letỹ(λ) = E{G ij : ∀i, j|M ij = λ} be the expected class-conditional score concern-ingS(λ), whereG = W L ⊗F L−1 + B L ,G ij ∈G, fixingF 0 =Ĩ(λ), that is, the explanation mapĨ(λ) feeds the deep learning predictor in Equation (1) till the penultimate layer that holds a linear activation to preserve a class-conditional score activation as in Equation (7). Then, the following relevance analysis measures arises: where ϑ(·) is an indicator function that returns 1 when the argument is true. For the Increase Confidence measure, the ideal value equals 100[%] and aims to quantify how an explanation map highlights the most relevant regions for decision-making, e.g., accurate nerve segmentation. Namely, it counts, as a percentage value, the number of images enhanced by the explanatory map that provides only the image-relevant patterns instead of the whole image to increase the prediction score. Then, for pair-wise comparison, the Win approach computes the percentage of times in which the explained map's confidence of a model M r is better than in model M r . For FCN, U-net, ResUnet, RFF-FCN, RFF-U-net, and RFF-ResUnet an Adam optimizer is fixed, using a 10 −3 learning rate value in the Nerve-UTP dataset and 10 −4 for NSD. A Dice-based loss is employed in Equation (2), as follows: where = 1 avoids numerical instability. A batch size of 32 samples is fixed and the Q hyper-parameter in Equation (4)   Concerning the Nerve-UTP dataset, the sciatic and femoral nerves provide the most challenging scenarios. Indeed, straightforward models such as FCN, U-net, and ResUnet cannot capture boundary regions of the nerve. However, after our RFF-based improvement, the nerve localization is more accurate. Consecutively, the BP in NSD exhibits a more difficult task compared to Nerve-UTP. As seen, the input image gathers noisy samples, which can be related to attenuation, artifacts, and speckle noise [5]. Again, FCN and U-net algorithms show poor performances, e.g., false-positive regions are highlighted as nerve; however, their RFF-based alternatives mitigate such false-positive predictions. It is worth mentioning that both ResUnet and RFF-ResUnet provide false-positive segmentation, which can be explained by the overfitting issue of deeper architectures [33]. Red contour: target segmentation. Blue contour: predicted segmentation. The sciatic, ulnar, femoral, and median nerves are shown for the Nerve-UTP dataset. The brachial plexus (BP) of the NSD dataset is also presented. FCN [12], U-net [14,41], and ResUnet [16] algorithms were tested. Moreover, their RFF-based improvements (our proposal) are displayed, fixing theQ factor value as 128, 64, and 8 for RFF-FCN, RFF-Unet, and RFF-ResUnet on Nerve-UTP, and 128, 8, and 8 on NSD. Table 1 presents the comparison results between straightforward state-of-the-art methods for semantic segmentation and our RFF-based enhancement concerning the Sen, Spe, Dice, IOU, and GM quantitative assessments (see Equations (9) to (13)). In addition, a non-parametric Friedman test was computed for statistical significance. The null hypothesis was that all algorithms perform equally [54,55]. For concrete testing, we fixed the significance threshold as p-value < 0.05. In this sense, a Chi-square of 2856.32 was obtained for the Dice measure (p-value = 1.75×10 −218 ). Of note, all remaining measures also reject the null hypothesis.

Semantic Segmentation Results
At a glance, our RFF-based enhancement favors the segmentation prediction in most cases; especially, FCN and U-net architectures are favored by our kernel mapping to find a representative feature space to discriminate between nerve and background. Indeed, RFF-FCN and RFF-Unet obtain the first ranking places for most of the studied measures. However, again, as shown in Figure 5, ResUnet and RFF-Unet suffer from overfitting, i.e., see the low Sen values of the ResUnet method for challenging nerves. Still, RFF-ResUnet aims to prevent such behavior by enhancing the segmentation assessment for the BP identification task. Remarkably, the RFF-FCN and RFF-Unet achieve the best ranking (first and second place) for Dice and IOU, which are often used to test semantic segmentation tasks. Then, our kernel approach allows preserving a trade-off between network complexity and representation learning capability [34,36]. Next, we applied a pair-wise post hoc analysis regarding the Dice results reported in Table 1, to compute a p-value after the statistical comparison between models M r and M r [55]. As seen in Table 2, FCN vs. RFF-FCN, U-net vs. RFF-Unet depict p-value < 0.05, that is, our kernel-based approach allows obtaining better segmentation results holding a pair-wise statistical significance for shallow architectures. ResUnet and RFF-Unet present a similar performance (p-value > 0.05). In this sense, our kernel layer concedes similar segmentations to residual blocks coupled with a U-net scheme. Similarly, RFF-FCN vs. RFF-U-net displays a p-value = 0.076, which shows that akin detections are retrieved after adding an RFF layer to FCN and U-net. Table 2. Pair-wise post hoc analysis using the Friedman test concerning the Dice-based results in Table 1. The r, r element reports the p-value after the statistical comparison between models M r and M r . If p-value < 0.05, then statistical difference between approaches is accepted.  Table 3 exhibits the method comparison results for the Dice measure on 2D brachial plexus nerve segmentation (NSD as a well-known Kaggle Competition dataset). At first sight, our RFF-Unet brings the best nerve segmentation results, improving at ∼ 3[%] its straightforward strategy U-net. The kernel improvements of FCN and ResUnet also afford Dice boosting. Though state-of-the-art methods, such as those of [19,20], introduce some preprocessing or feature extraction techniques before the deep-learning-based prediction, U-net-based methods (as the one proposed by authors in [18]) seem to be more appropriate for NSD, which poses a challenge concerning the noisy images, requiring a suitable representation learning. Table 3. Method comparison results for NSD dataset (brachial plexus nerve segmentation). The average Dice is presented regarding the testing set. Bold: highest Dice-based performance. RFF-FCN, RFF-Unet, and RFF-ResUnet stand for our kernel-based deep learning enhancement.

Relevance Analysis Results
Figures 6 and 7 show some visual inspection results for the Nerve-UTP and NSD datasets. In particular, the normalized class activation mappingS l (λ) is plotted as a heatmap on the 2D input image I (some illustrative examples of the testing set are selected), where l is the convolutional layer just before our RFF-based enhancement (λ = 0 stands for background class and λ = 1 for nerve). As seen, our kernel mapping helps focalize the class activation maps on image regions related to the nerve structure.
Notably, our RFF-based improvement prunes the representation learning stage by avoiding nerve pixels when studying the background class; namely, our approach benefits the network attention to the nerve structure from ultrasound images. Yet, for the NSD, this is not appreciable but is supported by the semantic segmentation and CAM-based performance measures.
As exposed in the semantic segmentation results, the sciatic and femoral nerves present the most challenging scenarios, and external pixels close to the nerve structure are highlighted as relevant according to the class activation heatmaps; however, ultrasound images provide non-stationary conditions (shift-variant patterns), being necessary to hold neighborhood pixels around the class of interest to promote a proper segmentation. Further, we analyze the concentration of the class activation maps by averaging the target region of the testing samples (see Figures 8 and 9). The median operator is used to avoid outliers. Then, centering and scaling are applied for visualization, and the average heatmap is shown concerning background and nerve labels. Notably, RFF enhancement applied to U-net architecture undergoes a close neighborhood-based representation to discriminate between background and nerve. Indeed, U-net' class activation maps exhibit neither salient information for the nerve nor the background class. Regarding ResUnet and RFF-ResUnet, condensed class activation maps are obtained for the nerve class. Still, as presented in Table 1, overfitting issue arises for such architectures with lower-ranking performances compared to FCN and U-net variants. On the other hand, the U-net architecture highlights the nerve region and its surroundings almost equally, but our proposed RFF-U-net makes the nerve stand out, showing a better representation. In addition, for ResUnet and RFF-ResUnet, our proposal reduces the background attention of the network. Next, Table 4 reports the quantitative results for relevance analysis using the explanation map-based measures in Equations (14) and (15). Note that the increased confidence aims to compute the score gain within a given model M after feeding the network with the explanation mapĨ(λ); meanwhile, the Win assessment compares the score gain of M r and M r . Overall, nerve confidence boosts are achieved under easy semantic segmentation scenarios, e.g., ulnar and median. Conversely, for sciatic and femoral, the increased confidence is low. Nonetheless, our kernel approach hikes the network score from explanation maps for shallow architectures such as FCN and U-net. Lastly, the Win-based measure results prove that our RFF-based variants (RFF-FCN, RFF-Unet, and RFF-ResUnet) improve the model score from their explanation maps, pushing relevant and discriminant input image patterns.

Conclusions
We proposed a kernel-based enhancement to support 2D nerve structure segmentation from ultrasound images using deep learning. Our proposal incorporates a random Fourier features (RFF)-based layer for kernel mapping, e.g., a Gaussian function [34], estimation within three well-known architectures for semantic segmentation: fully convolutional network (FCN) [12], U-net [14,41], and residual U-net (ResUnet) [16].
Our strategy seeks to improve the deep learning potential from the generalization capability of kernel methods, preserving mini-batch gradient descend optimization. Concretely, we apply an RFF-layer after the bottleneck end for U-net and ResUnet. Likewise, the RFF is joined after the last pooling in FCN. Furthermore, a class activation mapping algorithm, termed GradCam++ [43], is extended for semantic segmentation to visualize heatmaps that reveal the model's ability to extract relevant features from ultrasound images. Next, explanation maps are used as quantitative assessment concerning the increased confidence (deep learning score before the last decision-making layer) for semantic segmentation tasks. To the best of our knowledge, this is the first attempt to enhance 2D ultrasound image segmentation for nerve structure identification using both shallow and deep networks while preserving class activation interpretability.
Experiments were carried out on two ultrasound 2D image datasets: (i) Nerve-UTP that belongs to the Universidad Tecnológica de Pereira and the Santa Mónica Hospital, Dosquebradas, Colombia; holding ultrasound images of sciatic, ulnar, median, and femoral nerves. (ii) NSD as a Kaggle Competition dataset [42], gathering ultrasonography records of the brachial plexus.
Obtained results prove that our RFF-based improvement facilitates the discrimination between nerve structure and background in terms of conventional performance measures: sensitivity, specificity, Dice, intersection over union, area under ROC curve, and geometric mean. Indeed, our approach can improve the discrimination effectiveness of straightforward (shallow) architectures, i.e., FCN and U-net, leveraging nonlinear kernel-based mapping within a deep learning paradigm; it preserves the performance of deeper approaches such as ResUnet, which holds a residual learning philosophy with more training parameters than FCN and U-net. In turn, the RFF-based mapping also favors the explanatory capacity of the segmentation algorithm, finding relevant maps that highlight salient image regions related to the nerve structure. All experiments were conducted on Python (TensorFlow and Keras), and both datasets and code are publicly available.
As future work, the authors plan to couple attention mechanisms for semantic segmentation [56] within the introduced kernel-based representation. Furthermore, an RFF-layer extension for direct image-based convolution operation could benefit the algorithm training and hyperparameter tunning [31]. Further, variational autoencoders can be incorporated within our scheme to avoid overfitting and to benefit the data interpretability [57]. Lastly, the extension of our deep learning pipeline to provide 3D nerve segmentation is an exciting research line [58]. Funding: Under grants provided by the Minciencias project (code 111084467950) Desarrollo de una herramienta de seguimiento de aguja y localización de nervios en ecografía para la práctica de anestesia regional: Aplicación al tratamiento de dolor agudo traumático y prevención del dolor neuropático crónico. C. A. Jimenez-Castaño is funded by Convocatoria del Fondo de Ciencia, Tecnología e Innovación del Sistema General de Regalías para la conformación de una lista de proyectos elegibles para ser viabilizados, priorizados y aprobados por el OCAD en el marco del Programa de Becas de Excelencia (Corte 2)-Minciencias. A.M. Alvarez-Meza thanks to the project Prototipo de visión por computador para la identificación de problemas fitosanitarios en cultivos de plátano en el departamento de Caldas-Hermes code 51175, funded by Universidad Nacional de Colombia.

Institutional Review Board Statement:
We use publicly available datasets as exposed in [1,42]; however, we did not collect any data from human participants ourselves.

Informed Consent Statement:
This study uses anonymized public datasets as presented in [1,42].

Conflicts of Interest:
The authors declare that this research was conducted without any commercial or financial relationships that could be construed as a potential conflict of interest.