Pre-Cancerous Stomach Lesion Detections with Multispectral-Augmented Endoscopic Prototype

Featured Application: The presented work has been developed to be used in operating rooms, during upper gastrointestinal exploration of the stomach. The system can distinguish between healthy mucosa, chronic gastritis and intestinal metaplasia. Abstract: In this paper, we are interested in the in vivo detection of pre-cancerous stomach lesions. Pre-cancerous lesions are unfortunately rarely explored in research papers as most of them are focused on cancer detection or conducted ex-vivo . For this purpose, a novel prototype is introduced. It consists of a standard endoscope with multispectral cameras, an optical setup, a ﬁberscope, and an external light source. Reﬂectance spectra are acquired in vivo on 16 patients with a healthy stomach, chronic gastritis, or intestinal metaplasia. A speciﬁc pipeline has been designed for the classiﬁcation of spectra between healthy mucosa and different pathologies. The pipeline includes a wavelength clustering algorithm, spectral features computation, and the training of a classiﬁer in a “leave one patient out” manner. Good classiﬁcation results, around 80%, have been obtained, and two attractive wavelength ranges were found in the red and near-infrared ranges: [745, 755 nm] and [780, 840 nm]. The new prototype and the associated results give good arguments in favor of future common use in operating rooms, during upper gastrointestinal exploration of the stomach for the detection of stomach diseases.


Introduction
Based on GLOBOCAN 2018 data [1], stomach cancer is the sixth most common and the fourth most deadly cancer. Unfortunately, the five-year survival rate of this cancer is about 30% and has not increased since the 1990s. Improving the survival rate of this cancer is important to have a better diagnosis and a better understanding of the early stages of cancer. Thus, a promising way is to focus on pre-cancerous lesions and to use new imaging modalities with supervised learning techniques.
Presently, practicians mostly use upper gastrointestinal endoscopy, under white light (WL) or Narrow Band Imaging (NBI), to diagnose stomach pathologies. Recognizing inflammatory lesions is still a hard task as they are often undetectable, especially under white light. Lesions are often too small to be seen at a macroscopic level. Thus, biopsies are performed in a systematic and non-oriented way. Little pieces of the stomach are collected and analyzed a posteriori in a laboratory to make a diagnosis. As the location of the samples is randomly chosen, the result could be different from one location to another and depends on the endoscopist's experience. gray-level image to observe the fluorescence response and the other to obtain a multispectral image of fluorescence [15].
An attractive aspect of hyperspectral acquisitions is the link between spectra and biological parameters. A spectrum is the sum of several chromophores' contributions. Thus, an unmixing process can be done. For example, Grosberg et al. characterized gastric mucosa with hyperspectral two-photon microscopy. They obtained images at cellular levels. Unmixing was performed to distinguish four tissue types: epithelium, lamina propria, collagen, and lymphatic tissue [16]. Bergholt et al. acquired spectra via Raman spectroscopy [17]. Spectra were also unmixed to find the relative contribution of DNA, proteins, lipids, and glycoproteins. Relative contributions were successfully used to classify four tissue types: healthy, intestinal metaplasia, dysplasia, and adenocarcinoma. They have obtained a good sensitivity between 80% and 90% and a specificity higher than 90% with a Partial Least Squares-Discriminant Analysis PLS-DA).
Martinez et al. developed a system with a filter wheel for acquisitions at six different wavelengths. The principle drawback is that images must be registered. A "Nearest Neighbors" classifier and an SVM classifier (with a linear or a gaussian kernel) have been used on 5x5 patches to detect pre-cancerous states in the stomach [18]. The best results have been obtained with the SVM classifier with a gaussian kernel with an accuracy of 77%. Except for this work, we rarely find other papers related to in vivo stomach exploration.
Other techniques have been developed to help the detection of pathologies with imaging systems. For example, contrast agents can be used to characterize the stomach wall like chromoendoscopy [19]. Some agents are absorbed by cells or accumulate on peaks and valley of the mucosa, thus showing the structure of the cells. The use of contrast agents is not mandatory. Some biological substances can be stimulated by light, acting as fluorophores. This phenomenon is exploited by fluorescence techniques: by exciting fluorophores with at a certain wavelength, they re-emit light in a higher wavelength [20]. In practice, collagen is the most important in sub-mucosa but we can also find Pyridoxal phosphate, riboflavin, phospholipid and porphyrins I and II (PPS) [21]. To the opposite, Chromophores (Hemoglobin for instance) does not re-emit light but absorb it. The result of fluorophores and chromophores is usually visualized by a multispectral camera [22].
In the present paper, a new prototype is introduced for acquiring NBI endoscopic images and multispectral images. We demonstrate how this prototype can be used for the recognition of several pre-cancerous gastric lesions such as chronic gastritis or intestinal metaplasia. In Section 2, we first introduced a previous preclinical study on mice's stomach [23]. Then a new clinical prototype based on multispectral cameras is detailed and the acquisition of data and the signal processing methods are given. In Section 3, the classification results are presented and discussed.

Preclinical Study on Mice's Stomach
In a previous paper, the reflectance of the stomach has been studied on mice infected by H. pylori, a bacteria involved in the development of gastritis on mice and humans [23]. The mice were sacrificed at different time-point after infection and their stomach resected for multispectral analysis with a spectrometer. The work on mice can be seen as a preliminary work in which a classification pipeline has been developed. This pipeline allowed us to separate spectra between control and inflamed mice with good accuracy of 98% and, more interestingly, to identify two wavelength ranges in the red/infrared ranges that are discriminant for the classification: [620, 668 nm] and [668, 950 nm]. This study proves that analyzing reflectance spectra is a promising way to detect pathologies. We re-use the methodology and the pipeline for the processing of in vivo human's data as presented in the following paragraphs.
The data that are acquired consist of a set of spectra. More precisely, two sets of spectra, one in the visible range with wavelengths between 400 and 630 nm with a 10 nm step (thus, 24 points by spectra) and in the near-infrared with wavelengths between 610 and 840 nm with a 10 nm step (also 24 points by spectra). It is interesting to examine the correlation between the wavelengths (i.e., the 24 points). Figure 1a,b show this correlation. The value of each point (i,j) of these correlation matrices corresponds to the correlation between reflectance spectra at wavelengths i and j. In both images, red squares can be observed along the diagonal. These squares indicate that some wavelengths are highly correlated, especially the ranges around [410, 480 nm] and [520, 580 nm] in the visible and around [610, 690 nm] and [780, 840 nm] in the near-infrared. This observation motivates us to cluster the spectral bands into coherent groups. Figure 1c,d show the results of the band clustering algorithm. A median spectrum is presented with vertical black lines allowing the visualization of borders. For the visible (VIS) camera, eight bands are obtained, and, for the near-infrared (NIR) camera, seven bands are obtained.
(a) Correlation matrix with the VIS camera.
(b) Correlation matrix with the NIR camera.
(c) Separation into spectral ranges for the VIS camera.
(d) Separation into spectral ranges for the NIR camera. To obtain these results, we recursively merge the neighboring bands that are correlated more than 90%. At every iteration, the correlation between neighboring bands is computed, and the two most correlated bands are merged. The detailed algorithm is presented in Algorithm 1. The algorithm uses the same scheme as hierarchical clustering with an additional connectivity constraint. At the beginning of the algorithm, every wavelength is a cluster, and at each iteration, two clusters are merged. A version of this algorithm exists with connectivity constraints. In our case, only neighboring wavelengths can be merged. This algorithm is unsupervised because the label (i.e., control, chronic gastritis, or metaplasia) is not used.

Algorithm 1: Pseudo code of the wavelengths clustering algorithm
Input : M = a (N spectra , N bands ) matrix containing all spectra τ = a constant equal to 0.9 by default The wavelengths that have been clustered are reduced to their spectral mean µ i on the cluster range according to Equation (1): where N i is the number of wavelengths in the range [λ i , λ i+1 ] and S λ is the reflectance of the spectra at λ.
After the reduction of the number of wavelengths, features are extracted. Three types of features are investigated: either the reduced bands are directly used, or the features are computed as the ensemble of all possible subtractions or divisions. More formally, we denote the ratios as R i,j and the differences as D i,j . They are defined in Equations (2) and (3): Then, a univariate selection step is performed to reduce the complexity of the classifier. Reducing complexity helps us to avoid over-fitting problems. The F-test of ANalysis Of VAriance (ANOVA) [24] is used to extract the k most discriminant features. It consists of the computation of a p-value, the probability that two distributions are the same. The lower the p-value is, the more discriminant the feature is.
The classification consists of a simple SVM classifier with a linear kernel. Leave one patient out cross-validation is performed. At every iteration, all spectra except for one patient are used to train the classifier. The complete pipeline is written in Python thanks to the Scikit-learn library [25] and is summarized in Algorithm 2.

Algorithm 2: Classification pipeline
Input : M = a (N spectra ,N bands ) matrix containing all spectra y = a N spectra elements vector corresponding to labels

Description of the Prototype
The system that has been designed in this study can acquire NBI images and multispectral images. A standard endoscope (Olympus Exera III) that already exists in most operating rooms is used. Moreover, there are two multispectral cameras (XIMEA based on CMOSIS CMV2000 technology [26]), one in the visible range (450-620 nm) with 16 bands and one in the red and near-infrared (600-1000 nm) with 25 bands. The cameras are connected to a computer with USB 3.0 ports. For convenience, these cameras are respectively called VIS and NIR. To acquire images inside the stomach, a fiberscope (microflex fiberscope from ITConcepts, 2.5 m length, and 2.5 mm thick [27]) is used. The fiberscope contains emitting fibers to illuminate the stomach and receiving fibers to catch images. It is to notice that this fiberscope must be sterilized before each use, this is one of the current limitations as the multispectral system cannot be used during the cleaning of the fiberscope procedure. To connect the fiberscope to the cameras, a beamsplitter system is used (TwinCam from Cairn Research [28]). A dichroic filter splits light rays into two according to the wavelength ranges [470-620 nm] and [600-975 nm]. Additionally, a filter is added to avoid Ultraviolet (UV) light. In our case, a 1000 W powerful external light source is needed because the fiber and the sensitivity of the camera induce a loss of intensity. Thus, a Mercury-Xenon light source controlled by a controller device helps to illuminate the stomach (Newport reference: 66924-1000XF-R1) The system must be calibrated spatially and spectrally. The spatial and spectral calibration have been presented in a previous paper [29]. Spatial calibration is needed to correct the redial distortion and to register the NBI image with the multispectral images. The calibration is done by using the tip of the fiberscope, which is visible in the NBI images. Spectral calibration is required because the filters of the cameras are based on the Fabry-Perot structure. They have a primary response but also a secondary response. Moreover, it exists noisy response out of the initial expected wavelength range called crosstalk problem. The calibration has been done by learning a multilinear transformation on a set of patches of the ColorChecker R . More concretely, the 24 bands ([400, 630 nm] with a step of 10 nm) are deduced from the 16 original bands with the multiplication of a 16 × 24 matrix and, in the same way from the 24 bands ([610, 840 nm] with a step of 10 nm) are deduced from the 25 original bands with the multiplication of a 25 × 24 matrix. After this transformation, all spectra are normalized according to their l 2 norm, meaning that all spectra are divided by their sum of squares.
In a practical situation, the endoscopist inserts the fiberscope in the operating channel of the endoscope, as described in Figure 2. This operating channel is usually used to insert tools like pincer or cutting/grasping instruments. The endoscopy usually lasts 1 or 2 minutes during which the multispectral cameras acquire images every second approximatively. A specific acquisition software has been developed in C# programming language with .NET 4.5 framework and the library xiAPI that allows the control of the multispectral cameras [30]. This software manages the data saving and adapts the exposition time of the cameras automatically. The cameras have been optically calibrated such that the focus is approximately 2 cm. This distance is the mean distance that exists between the fiberscope tip and the stomach wall.

Data Acquisition and Preprocessing
The presented bimodal system can acquire multispectral images at approximately one image per second. This exposure time is due to the low power of light that comes back in the fiber. Figure 3a shows an example of a multispectral image acquired in the stomach. The image is rearranged in 16 small images to show the 16 bands of the VIS camera. We can see the disk in the center, which is the area covered by the fiberscope. All images do not necessarily contain useful information. In general, multispectral images are often blurry. This phenomenon is due to the movements of the endoscope, the poor spatial resolution, the fiberscope, and the high exposure time. Then, it is challenging to use textural information as some parts of the image could be too dark or over-exposed. Therefore, we make sure to keep only pixels whose maximum is under 512. Indeed, the camera's technical documentation specifies that if the 10th bit is set to 1, then the pixel is considered saturated. In the same manner, pixels whose spectral mean does not exceed 128 are considered too dark to be exploited. To understand this choice, we present an example in Figure 3b. Three pixels are compared: in blue, a pixel which is too dark, in green, a valid pixel, and in red, a saturated pixel.
A linear transformation described in a previous paper [29] is applied to the spectral radiance to get estimated reflectance spectra (Figure 3c) i.e., to estimate the 24 points in the VIS and the 24 points in the NIR from the original 16 bands of the VIS camera and the 25 bands of the NIR camera.  As the multispectral images are difficult to interpret directly, an experienced gastro-endoscopist selected NBI endoscopic images in which the fiber is focused on a significative zone of the pathology. Then we can select the temporally nearest multispectral image for the analysis.
The image should then be filtered with a low-pass filter to reduce the noise. Although it exists a various number of filtering like the use of wavelets [31]. In our case, we have chosen to use a simple pyramidal filter:  Then, spectra are taken on a grid by considering 1 pixel over 7 (vertically and horizontally). We gathered data for 16 patients. The Table 1 classifies them according to their histological results. For all these patients, 200 valid spectra were randomly taken among the valid spectra to have the same number of spectra for each patient. Figures and results from the two cameras are intentionally separated. Cameras acquire images independently, and in practice, if no pixel in an image is valid (i.e., either too dark or over-exposed), the image is removed.
The two Figure 4 show median spectra of the 16 patients after calibration, respectively for VIS camera ( Figure 4a) and NIR (Figure 4b). The color indicates histological results: blue for control, green for chronic gastritis, and red for intestinal metaplasia. We do not see specific behavior in the visible range, but control spectra seem clustered for the NIR camera, they have a lower derivative. On the spectral range [610, 730 nm], plots corresponding to metaplasia are under the plots of control class and inversely for the spectral range [730, 840 nm].
(a) Median spectra from VIS camera.
(b) Median spectra from NIR camera. The obtained spectra are then processed by the pipeline detailed in Section 2. The next sections give the classification results obtained separately with VIS and NIR data.

Results
In this section, the classification results are presented. A support vector machine (SVM) with a linear kernel is trained in a "leave one patient out" manner. The three types of features are tested, i.e., the reduced bands, the subtractions features, or the divisions' features. It is to note that two hyperparameters can be tuned: the number of features retained in the feature selection step k and the regularization parameter c that controls the balance between misclassification rate and the size of the margin. A lower value of c leads to a better misclassification rate but also a smaller margin, thus being less robust to over-fitting.
In this section, data acquired by the two cameras are processed separately. The Table 2 presents the results obtained with the VIS camera and Table 3, the one obtained with the NIR camera. In Table 2, we observe that intestinal metaplasia is confounded with chronic gastritis. For the three types of features, the results are close to 53% of precision and 67% recall. Better results are obtained in Table 3, especially with divisions and subtractions' features. The average of the three classes is 80% for the precision and 80% for the recall.
It can be seen in the confusion matrix Figure 5 that the healthy class is sometimes confounded with intestinal metaplasia and that chronic gastritis is sometimes confounded with intestinal metaplasia. This second point is more acceptable in the sense that intestinal metaplasia usually appears as little patches in the middle of chronic gastritis. Thus, a patient with intestinal metaplasia has always chronic gastritis too. It is interesting to detail the classification results for each patient. Figure 6 presents the number of spectra classified in each category for each patient. It can be seen that the classification is usually good except for patients 10 and 15, which are healthy and have mostly "intestinal metaplasia"-like spectra. Patient 13 has intestinal metaplasia, but its spectra are mostly classified as chronic gastritis. More generally, 13 patients out of 16 patients have most of their spectra classified in the right class. Moreover, the interesting point is that only one ratio is sufficient to reach good classification results. The best results are obtained with k = 1, the best ratio is the one with the highest score in Figure 7b i.e., R 4,6 . R 4,6 is the division of the range [745, 755 nm] by the range [780, 840 nm]. It can be observed in Figure 7a that the distribution of the three classes (blue for healthy, green for chronic gastritis, and red for intestinal metaplasia) for this particular ratio shows a good separability, especially between healthy and the two pathologies. It is interesting to also note the correlation between the preclinical study on mice. In both cases, the most discriminant wavelength range is in the red and near-infrared part of the spectra.

Conclusions
In this paper, a new prototype is introduced to acquire endoscopic NBI videos and multispectral videos during in vivo stomach exploration. The prototype is used to acquire spectra on 16 patients. For each of these patients, 200 spectra are selected, leading to a dataset of 3200 spectra. Spectra obtained with multispectral cameras are processed with a classification pipeline containing a wavelength reduction algorithm, the computation of three types of features (reduced bands, subtractions and divisions) and a leave one patient out classification with an SVM classifier with a linear kernel. The classifier can recognize a healthy stomach wall from chronic gastritis and intestinal metaplasia with a good precision of 80%. Moreover, two exciting wavelength ranges are identified in the red and near-infrared ranges: [745, 755 nm] and [780, 840 nm]. To the best of our knowledge, the analysis of this ensemble of three types of tissues (i.e., chronic gastritis, intestinal metaplasia, and healthy) has never been considered in a previous paper.
These promising results must be confirmed on a larger dataset. The prototype can also be improved to have better quality for the images and, thus, to be able to exploit textural information caught by the multispectral cameras. Finally, unmixing algorithms can be used to get information on the relative concentration of chromophores and thus have a better understanding of the biological changes that appear with pre-cancerous lesions.