2.1. Colonoscopy Database
All collected colonoscopy images were captured with the general colonoscopes produced by Olympus Corporation of Japan (Shinjuku Monolith, 2-3-1 Nishi-Shinjuku, Shinjuku-ku, Tokyo 163-0914, Japan). Colonoscopes models were GF-260 and 290, so we can ensure that the image quality were close. Endoscopic images were obtained by gastroenterologists from Taipei Medical University Hospital, Taipei, Taiwan.
A small amount of patient data was used for analysis, and decided the next method of analyzing pictures and data for the research. We analyzed data of 298 patients with MC from the database between 1 January 2016 and 17 July 2017; In the initial analysis of the study, the analysis of polyp texture in patients without tumor index such as CEAs or CA 199 was not related to pathological findings, so we included only 124 patients who had CEA data and clear colonoscopy images. We then analyzed these patients’ colonoscopy images. The 124 patients’ images survey indicated that if analysis is conducted using a large size image or if the reflection of feces are not avoided and the part with white text recorded in the image is not removed, it may be caused by the aforementioned interference. This can lead to unsatisfactory results. Regarding the usefulness of correcting images according to patients’ normal skin and luminosity, because the calculation results are based on the gray-level symbiotic matrix analysis method, the final calculated result exerted little effect. The patient’s normal skin and brightness were therefore not ultimately required for correction. We also observed that taking a polyp image consisting of 100 × 100-pixel squares from the center point provided improved texture features and results.
Next, 51,891 colonoscopy data were acquired from the Division of Gastroenterology and Hepatology, Department of Internal Medicine, Taipei Medical University Hospital. The research encompassed the period from 1 January 2012 to 30 September 2017. The research materials and images were deidentified by the Taipei Medical University Human Research Joint Ethics Committee for research permission and ensured that research materials and images were deidentified. No informed consent is required for this retrospective study. After the adoption, relevant research and statistical work was commenced. Among them, 28,974 colonoscopy data had polyps; thus, the polyp detection rate (PDR) was 55.84%. Because 12,244 times colonoscopy data presented with adenoma or colorectal cancer, adenoma detection rate (ADR) was 23.60%. Finally, 834 patients presented with colorectal cancer, so the detection rate of colorectal cancer was 1.61%.
Next, from these data, the total number of patients with a diagnosis of MC was observed to be 1092, and the MC prevalence rate was 2.1%, similar to the prevalence rate of 1.78% in Mainland China [
2]. These patients included 787 women and 305 men, and therefore, the women had an incidence of MC 2.58 times that of men. The age distribution of these patients was as young as 17 and as large as 97. Of these, 96 patients were over 80 years old, and 480 patients were between 60 and 80 years old. Only three patients were younger than 20. The first patient received colonoscopy study on 2 January 2012. The last patient underwent a colonoscopy on 29 September 2017. Among these patients, 658 patients had polyps. 390 patients had colon adenoma detected by colonoscopy. 18 patients’ pathology reports had colon cancer.
Based on the data collected from 1092 person-times, the PDR and ADR were 60.26% and 35.71%, respectively—close to the 34.7% noted in other studies [
1]. Colorectal cancer detection rate was 1.65%. Compared with data from the previous database, patients with MC had a slightly higher PDR than did patients who had received general colonoscopy (4.42%); moreover, the ADR was much higher (12.11%), and the colorectal cancer detection rate was similar. The result indicating that patients with MC were more prone to adenoma corresponds with results reported previously [
1] and indicates that even if patients with MC are more likely to exhibit polyps under colonoscopy, they have a higher adenoma incidence than other people do. MC is associated with adenomas. The study initially used a smaller quantity of data and image analysis to help determine the pattern of screening and analysis for all subsequent data.
We excluded the data of patients without CEAs, poor colon preparation, or with unclear data. Finally, 370 patients were included in the image analysis study. The relationship between image texture GLCMs and CEAs was compared. Among the 370 patients, 181 had polyps and pathological biopsies. The study used images of 181 patients. All pictures collected by one gastroenterologist. Using Image J, a 100 × 100-pixel block diagram—taken from outside areas where feces had accumulated or from positions in the centers of polyps—was isolated for analysis. Images were divided into three groups:
Cecem, image of the appendix and cecal mucosa (C), consisting of 181 images.
Splenic flexure spleen images (S), consisting of 181 images.
Polyp (P), if the patient had a polyp or a tumor. If multiple polyps were present, we collected the largest and most prominent textured polyp. The gastroenterologist manually cut the image from central point of the polyp. This also resulted in 181 images.
Figure 1a shows one melanosis coli patient’s cecal image and
Figure 1b shows the same patient’s cecal image after stopping anthraquinone containing laxative agents for six months.
This section describes the image analysis methods used in this study, like this article [
17]. The way we use computer programs to analyze textures has also been widely used in other areas [
18]. The images are divided into three groups as previously described. Example images of patients with MC are illustrated in
Figure 2a–c and the examples of regions of interest in
Figure 2d–f.
The images demonstrate that the MC has a particular dark brown pigmentation and presents a special black texture; places without pigmentation are white. The color textures of polyps or tumors are obviously different from the surroundings, exhibiting lighter and more turbulent textures, and most of them have no black texture.
We analyze the GLCM for quantitative feature extraction and the 14 features of the GLCM. The characteristics of the study are divided into three types: pattern, brightness, and texture features. Analysis focuses mainly on the texture features, and the results are calculated using the colonoscopy image RGB channels. The method for verifying the results uses the Pearson correlation coefficient to verify the correlation between features and pathological results. The tool calculates the Pearson correlation coefficients and p values using MS Excel and IBM Statistical Package for the Social Sciences (SPSS). A Pearson correlation coefficient close to 0.4 and a p of <0.05 represent significant correlation.
2.2. Image Texture Feature Analysis
Image texture features may be analyzed in many ways, with common methods including GLCMs, gray-level co-occurrence histograms, gray-level run length matrices, gray-level size zone matrices, neighboring gray tone difference matrices, gray-level dependence matrices, and etc. Although many new technologies have been developed so far, some traditional feature analysis methods still have their value. GLCMs are mainly used to count the probability of occurrence of pixel pairs in different directions and distances and our study focuses on similar textures on polyps and MC mucosa, so we selected GLCM method for this study.
The GLCM represents the grayscale value change of the relative positions for statistical pixels in space, reflecting the distribution of texture in space. The state was developed by Haralick et al. in 1973 [
19]. Haralick believed that graphic texture is composed of multiple texture units and that the texture is caused by the repeated occurrence of the grayscale distribution in a spatial position. Therefore, a certain grayscale occurs between two pixels separated by a certain length in the image space. The spatial relationship of grayscale in an image can exhibit graphical characteristics [
20]. The GLCM of an image can reflect information about the gray level of the image, such as that with respect to direction, adjacent interval, and variation amplitude. It constitutes the basis for analyzing the local patterns of images and their arrangement rules. In simple terms, data may be recorded in N×N square matrices according to grayscale relationships of pixels in images. For example, if the first lattice in a matrix is 1, only one pair of grayscales is 1 pixel horizontally adjacent. It is illustrated in
Figure 3.
The GLCM has been widely used for texture analysis of various images types. Following various amendments, feature analysis methods were updated to improve results [
21]. Ultrasound analysis for breast tumors [
22] and rotator cuff tears [
18], computed tomography for liver tumors [
7] and hyperacute stroke [
17], and image analysis for retinal vessels [
23] have been successfully performed. The use of computed tomography to simulate colonoscopy for polyp detection has also been studied [
24]. In addition, one study analyzed pathological biopsies for colorectal cancer and normal colorectal mucosa [
25], and this technique can provide clinicians with valuable assistance. Because this represents the earliest and the most mature image texture analysis method, this study analyzes the colorectal polyp image texture characteristics of patients with MC. The GLCM has a total of 14 features: autocorrelation, dissimilarity, energy, entropy, homogeneity, difference variance, difference entropy, information measure of correlation, inverse difference normalized, inverse difference moment, cluster prominence, cluster shade, contrast, and correlation. These features can be clinically applied to the analysis of various images. For the analysis of medical images, the features are employed mainly to study their correlations with various lesions. Of the 14 features analyzed in this study, only eight were related to the final results: entropy, energy, correlation, dissimilarity, homogeneity, autocorrelation, cluster_prominence, and cluster_shade.
Entropy is a measure of the quantity of information that an image possesses. Texture information also qualifies as image information. It is a measure of randomness. When all elements in the co-occurrence matrix possess maximum randomness and all values in the spatial co-occurrence matrix are almost equal, the elements in the co-occurrence matrix are dispersed, and the entropy (fluctuation) is large. This represents the degree of nonuniformity or complexity of the texture in the image.
Energy refers to the sum of the squares of the values of the GLCM elements, so it is also called energy. This reflects the uniformity of the gray scale distribution of the image and the texture thickness. If all values of the co-occurrence matrix are equal, the angular second moment (ASM) energy value is small; conversely, if some of the values are large and other values are small, the ASM energy is high. When the elements in the symbiotic matrix are concentrated, the ASM energy is high. Higher ASM energy indicates more uniform and regularly varying texture pattern.
Correlation is used to distinguish whether two objects have mutual correlations in shape and other features, and then, the correlation value is used to determine the characteristics of the object to locate the object. CorrelationM indicates the gray-level linear correlation between a pixel and its neighbors, similar to Correlation.
Dissimilarity is the degree of dissimilarity in gray-level value measurements for an image. It is sensitive to the arrangement of gray-level values in space or the hue of the image.
Homogeneity is used to reflect the homogeneity of image textures and to measure how much image texture changes locally. A large value indicates a lack of variation between different regions of the image texture, and the locality is largely uniform.
Autocorrelation is the degree of similarity of the metric spatial GLCM elements in the row or column direction. Therefore, the correlation value reflects the local grayscale correlation in the image. When the matrix element values are similar, the correlation value is large; conversely, if the matrix cell values differ greatly, the correlation value is small. If the image has a horizontal direction texture, the correlation value of the horizontal direction matrix is greater than the correlation value of the remaining matrix. The more vicious the image of the polyp, the higher the value of the relationship for horizontal or vertical textures is.
Cluster_prominence and cluster_shade indicate a lack of symmetry in the gray-level distribution. Therefore, the more malignant a polyp, the more complex is the polyp texture and the surrounding asymmetry.
In this study, the gastroenterologist manually used a 100 × 100-pixel box to mark the polyps then extracted and quantified the brightness features and texture features and analyzed the GLCM features of the colonoscopy images to obtain some of the MC. If the final pathological results confirm that texture features demonstrate Pearson correlation with pathological results, this model can be used as a reference for clinicians. As long as a gastroenterologist performs a colonoscopy, polyps that may be poorly pathologically differentiated can be treated immediately.
where μ
x, μ
y, σ
x, and σ
y are the mean and standard deviation (SD) of the marginal distributions of p(i,j|d,θ);