1. Introduction
Celiac disease (CD) is a gluten-sensitive immune-mediated enteropathy that occurs in genetically predisposed individuals [
1]. Diagnosis of celiac disease is made by combining clinical data, serological tests, and histopathological features [
1,
2]. Although celiac disease is a disease of infants, its onset usually occurs in patients aged between 10 and 40 years, when the typical signs of malabsorption are often replaced by an atypical presentation [
3,
4,
5,
6].
The clinical presentation is variable and exhibits a continuum spectra [
3,
4,
5,
6], with several degrees of severity correlated with histological severity and levels of tissue transglutaminase [
7,
8]. The “classical” gastrointestinal symptoms include persistent diarrhea, abdominal distension, weight loss, abdominal pain, constipation, and vomiting [
9]. Celiac disease is also associated with several non-gastrointestinal manifestations, such as growth and development alterations, neurologic and behavioral symptoms, liver disease, iron deficiency, skin alterations (dermatitis herpetiformis), dental and metabolic bone diseases, arthritis, and cardiomyopathy [
9,
10].
Histological characteristics of the small intestine (usually evaluated using duodenum biopsy) include mucosal inflammation, villous atrophy, and crypt hyperplasia that occur after exposure to dietary gluten; signs that improve after removing gluten from the diet [
11]. These histological features are variable and range from mild alteration with only increased numbers of intraepithelial lymphocytes, to severe atrophy and epithelial apoptosis [
12,
13,
14,
15,
16]. These alterations are assessed in several classifications, including the Marsh [
17], Marsh–Oberhuber [
18], Corazza–Villanacci [
19], Q-Marsh scale [
20], and Q-histology [
2].
The pathogenesis of celiac disease includes genetic factors (HLA DR3-DQ2, DR4-DQ8, several non-HLA loci, and autoimmune disorders), adaptive immune response (gliadin reactive T lymphocytes), autoantibodies and intraepithelial lymphocytes (IELs), and innate immune response. In patients with celiac disease, the immune response to fractions of gliadin results in an abnormal inflammatory reaction characterized by infiltration of the lamina propria and epithelium by chronic inflammatory cells and villous atrophy [
4]. A comprehensive review of the pathogenesis was conducted in our recent publication [
21].
Primary treatment for celiac disease is a gluten-free diet. Persistent or recurring symptoms may be due to a lack of adherence to dietary protocol, an incorrect initial diagnosis, or complications of refractory celiac disease and lymphoma [
1]. Among the different primary intestinal T-cell lymphomas, enteropathy-associated T-cell lymphoma (EATL) [
22,
23,
24,
25] may be preceded by refractory celiac disease [
26].
The diagnosis of celiac disease is based on the combination of clinical data (enterologist), serology (clinical pathologist), and duodenal biopsy with histological evaluation performed by a certified anatomical pathologist [
1]. Artificial intelligence technology allows computers to imitate human intellectual capacity and solve problems [
27]. Modern computer vision systems exhibit extraordinary image recognition and analysis accuracy. However, these systems do not understand what they observe. In recent years, applications of artificial intelligence in celiac disease diagnosis have been developed. The most relevant studies can be divided into two groups, the ones that perform computer vision on endoscopic images and others that performed histological analyses [
28,
29,
30,
31,
32,
33]. These studies have provided the basis for further development of the analysis of celiac disease using artificial intelligence, such as the introduction of other types of pathologies as shown in our study.
Several machine learning and deep learning algorithms have been developed to construct models that make predictions on images. Convolutional neural networks are supervised algorithms that are mostly used for image recognition workloads [
34]. Top pre-trained models for image classification are the following: ResNet (Residual Networks), Inception (GoogLeNet), VGG (Visual Geometry Group), EfficientNet, DenseNet (Dense Convolutional Network), MobileNet, NASNet (Neural Architecture Search Network), Xception (Extreme Inception), AlexNet, and Vision Transformers (ViT).
This study used a convolutional neural network to classify images of celiac disease, small intestine control, duodenal inflammation, duodenal adenocarcinoma, and Crohn’s disease.
2. Materials and Methods
A script was written to create and train a deep learning network with 71 layers and 78 connections (
Figure 1 and
Figure 2). The script was run to create network layers (
Appendix B), import training and validation data, and train the network. The code was created in MATLAB (R2023b Update 8 (23.2.0.25999560) 64-bit (win64) 29 April 2024) (MathWorks, Tokyo, Japan) and was based on transfer learning from the ResNet-18 (version 23.2.0) [
35] (
Figure 1 and
Figure 2). All analyses were performed using a desktop computer equipped with an AMD Ryzen 9 7950X CPU (AMD Japan Ltd., Marunouchi, Chiyoda-ku, Tokyo, Japan) [
36], 32 Gb of RAM, and an Nvidia GeForce RTX 4080 super-graphics card (Nvidia, Minato-ku, Tokyo, Japan) [
37].
ResNet-18 is a pre-trained model that was previously trained in a subset of images in the ImageNet database [
38]. This database includes 1000 types of objects and contains more than 1,000,000 images. ResNet-18 is a convolutional neural network with 18 layers. The input size is 224-by-224 (224 × 224 × 3). Size: 44.0 MB. Parameters: 11.7 M.
The analysis of the convolutional neural network (CNN) included the following steps: loading the pre-trained network, replacement of final layers, training of the network, prediction and assessment of network accuracy, and deployment of results.
The input images were hematoxylin and eosin (H&E) stainings of several diseases (
Figure 3).
The diagnostic dataset included hematoxylin and eosin (H&E) staining of 16 celiac disease patients (57 biopsies), selected from the Department of Pathology, Hospital Clinic of Barcelona, Spain, as previously described [
21]. The clinicopathological characteristics, such as age, sex, biopsy location, anatomical pathology diagnosis, and the Marsh–Oberhuber histological grade [
21,
39,
40] are shown in
Appendix A.
First, the input data for celiac disease included 7294 images, and the small intestine control included 11,642 images. The color images had three channels: red, green, and blue. An example is shown in
Figure 4 and
Figure 5.
The data (images) were split into three sets: a training set used for training the network (70%), a validation set used for testing its performance as it was trained (10%), and a test set (holdout) used after training to assess how well the network performed on new data (20%). The order of the images was randomized to ensure that the network learned the classes at a more even rate. As transfer learning (adjustment of a pre-trained network) was performed on ResNet-18, the fully connected and classification layers were removed and replaced with new layers with an output size of 2. Augmentation was not performed during training. To avoid overfitting, the initial learning rate was set to 0.001. The number of maximum epochs was set to five.
Data normalization was applied to the input images: imageInputLayer (an image input layer inputs 2-D images to a neural network and applies data normalization), and batchNormalizationLayer (a batch normalization layer normalizes a mini-batch of data across all observations for each channel independently. To speed up training of the convolutional neural network and reduce the sensitivity to network initialization, batch normalization layers are used between convolutional layers and nonlinearities, such as ReLU layers. Layer = batchNormalizationLayer (Name, Value) creates a batch normalization layer and sets the optional TrainedMean, TrainedVariance, Epsilon, Parameters and Initialization, Learning Rate and Regularization, and Name properties using one or more name-value pairs. After normalization, the layer scales the input with a learnable scale factor γ and shifts it by a learnable offset β) [
52].
Second, the analysis was repeated by adding a third histological subtype of nonspecific inflammation of the small intestine (duodenum). Therefore, in this analysis, the input data included 7294 images of celiac disease, 11,642 images of small intestine control, and 5966 images of the small intestine (duodenum) with chronic and acute inflammation (
Figure 6).
Third, a fourth histological subtype of 3723 images of duodenal adenocarcinoma (
Figure 7) was added as test images of the previously trained convolutional neural network. The purpose of this analysis was to determine how the previously trained network, which was trained using celiac disease, small intestine control, and non-specific inflammation of the duodenum, could classify an unknown histological disease.
Fourth, a convolutional neural network was trained, including as input all the histological subtypes of celiac disease, small intestine control (both duodenum and ileum), non-specific inflammation of the duodenum, and duodenal adenocarcinoma (
Figure 4,
Figure 5,
Figure 6 and
Figure 7).
Finally, to expand the diagnosis differential into other intestinal pathologies related to alterations to the immune tolerance and immune homeostasis of the gut, the model included 13,032 images of Crohn’s disease (
Figure 8).
All cropped images of 224-by-224 (224 × 224 × 3) size were reviewed by the histopathologist (J.C.) and non-diagnostic and artefactual images were excluded from the datasets.
4. Discussion
Within the specialty of computer science, computer vision is a technique that allows computers to recognize the observable world. In the field of artificial intelligence, there are several machine learning and deep learning algorithms that build models that make predictions from images or videos [
62]. Convolution neural networks are a type of supervised deep learning algorithm that are used for image recognition. A simple convolutional network comprises several steps, including image channel, convolutions, pooling, convolutions, pooling, flattening, artificial neural network full connection, and prediction [
62].
The ResNet-18 network was used in this study. This convolutional neural network was a pre-trained model trained on a subset of the ImageNet database. The network is trained with more than a million images and managed to classify them into 1000 different categories [
35]. In the medical field, this network has been used in several studies based on transfer learning, such as in the diagnosis of intracranial hemorrhage in CT scans [
63], heartbeat classification of electrocardiogram (ECG) signals [
64], dynamic gesture recognition [
65], selective transplanting of leafy vegetable seedlings [
66], automatic classification of malaria parasites on the blood smear [
67], prostate imaging [
68], classification of Alzheimer’s disease levels [
69], and diabetic retinopathy [
70], among others. Therefore, the ResNet-18 model is a useful network that can be applied to many types of studies, including our study of celiac disease.
Convolutional neural networks and image recognition have also been applied to celiac disease research, including the analysis of whole slide images [
29,
71,
72,
73,
74] and endoscopic images [
75,
76]. Therefore, computer vision is a useful tool in the field of histopathology.
Our group has published several papers on the use of artificial intelligence, including machine learning and artificial neural networks, in the field of lymphoma research [
77,
78,
79,
80,
81,
82,
83]. In these publications, the focus was on data analysis of gene expression levels in the context of immuno-oncology in lymphoma and other hematological neoplasia [
77,
78,
79,
80,
81,
82,
83]. The most frequent lymphoma subtype that we analyzed was diffuse large b-cell lymphoma [
78,
79,
80,
81], which is one of the most frequent non-Hodgkin lymphomas [
26]. In addition, we have also published data analysis-based studies on celiac disease in which we highlighted the importance of the B and T lymphocyte associated (BTLA) gene [
21], and programmed cell death 1 ligand 1 (CD274 antigen) in ulcerative colitis [
84]. The subject of this article represented a switch from data analytics to computer vision.
In this study, a confusion matrix was used to measure the performance of the trained network. The data (images) were split into three sets: a training set used for training (i.e., teaching) the network (70%), a validation set used for testing its performance as it was trained (10%), and a test set used after training to assess how well the network performed on new data (20%). The order of the images was randomized to ensure that the network learned the classes at a more even rate. In the Results Section, the confusion matrices of the test set were shown. Of note, if the data were imbalanced, the performance checking by accuracy could be deceptive. The confusion matrices of our study combined output data that was binary (
Figure 9) and multiclass (
Figure 10 and
Figure 11). All performance parameters were high, including accuracy (defined as the proportion of correct predictions), precision (used in information retrieval, pattern recognition), recall (what in medicine is called sensitivity), and F1-score (measures test of accuracy). The fundamentals of clinical data science and modeling methodology are well described in chapter 8 of the book written by Frank J.W.M. Dankers et al. [
85].
This study focused on the identification and classification of celiac disease images compared with normal small intestine images obtained from the duodenum and ileum. The accuracy of the network was very high. The model could handle and properly classify 3 classes with the addition of non-specific acute and chronic duodenal inflammation. Interestingly, when the 3 classes’ trained network was tested with duodenal adenocarcinoma, the network failed to realize that those samples were a different type of disease. Therefore, the use of automated computer vision analysis for the evaluation of histopathological slides is not recommended without the supervision of a pathology specialist. However, when the network was trained with 4 classes of histological subtypes, the network managed to differentiate celiac disease, duodenal inflammation, small intestine control, and duodenal adenocarcinoma with good performance, proving the usefulness of the convolutional neural network for classifying histological images.
To expand to other intestine inflammatory conditions, images of Crohn’s disease were included in the study, and the neural networks also manage to classify the different diseases properly.
There are several reports of artificial intelligence studies of celiac disease diagnosis and classification using both endoscopic and histological images.
Regarding endoscopic images, Ciaccio EJ et al. used videocapsule endoscopy images to detect pathologic alterations of 13 celiac and 13 control patients using a masking strategy that allowed nearly 80% accuracy [
86]. Bing Nan Li implemented a principal component analysis (PCA) on videocapsule endoscopy images to develop a computerized tool of celiac disease recognition; a dataset of 240 images was used and the strip PCA method had an average recognition accuracy of 93.9% [
87]. Jahmunah Vicnesh et al. used DAISY descriptors for the automated diagnosis of celiac disease by videocapsule endoscopy reaching an accuracy of 90% [
88].
Regarding histological images, Joseph DiPalma et al. proposed a deep learning-based methodology for improving the computational efficiency of histology image classification based on distillation and self-supervision [
89]. Florentino Luciano Caetano Dos Santos et al. used machine learning to assess and classify images of IgA-class endomysial autoantibody (EmA) images with an accuracy of 97% [
90]. Kamran Kowsari et al. used a deep learning Hierarchical Medical Image classification (HMIC) approach to classify between celiac disease, environmental enteropathy, and histologically normal controls and the precision was 91% [
91]. Joel En Wei Koh et al. used a Steerable Pyramid Transform (SPT) method and nonlinear features to automatically detect and classify celiac disease biopsy H&E images with an 89% accuracy [
92]. Oliver Faust et al. used high-magnification biopsy images and a Support Vector Machine (SVM) to classify celiac disease versus normal control with an accuracy of 98% [
30]. In the study of Prasenjit Das et al., the quantitative histological classification system based on software from Media Cybernetics and a Q-histological assessment reached a sensitivity of 94% [
93]. J Denholm et al. classified H&E celiac disease and normal images from biopsies from several different types of digital image slide scanners. The workflow included patch extraction, stain normalization, and classification using ResNet50, and the final model reached an accuracy of 97%. Overall, these studies classified celiac disease images successfully but there is variability in methodology and accuracy. Our study is comparable to the one by J Denholm. However, our study included more different types of diagnosis.