Deep Learning Could Diagnose Diabetic Nephropathy with Renal Pathological Immunofluorescent Images

Artificial Intelligence (AI) imaging diagnosis is developing, making enormous steps forward in medical fields. Regarding diabetic nephropathy (DN), medical doctors diagnose them with clinical course, clinical laboratory data and renal pathology, mainly evaluate with light microscopy images rather than immunofluorescent images because there are no characteristic findings in immunofluorescent images for DN diagnosis. Here, we examined the possibility of whether AI could diagnose DN from immunofluorescent images. We collected renal immunofluorescent images from 885 renal biopsy patients in our hospital, and we created a dataset that contains six types of immunofluorescent images of IgG, IgA, IgM, C3, C1q and Fibrinogen for each patient. Using the dataset, 39 programs worked without errors (Area under the curve (AUC): 0.93). Five programs diagnosed DN completely with immunofluorescent images (AUC: 1.00). By analyzing with Local interpretable model-agnostic explanations (Lime), the AI focused on the peripheral lesion of DN glomeruli. On the other hand, the nephrologist diagnostic ratio (AUC: 0.75833) was slightly inferior to AI diagnosis. These findings suggest that DN could be diagnosed only by immunofluorescent images by deep learning. AI could diagnose DN and identify classified unknown parts with the immunofluorescent images that nephrologists usually do not use for DN diagnosis.


Introduction
Lifestyle-related diseases are a big issue for worldwide health. Diabetes mellitus (DM) and chronic kidney diseases are believed to be lifestyle-related diseases. When DM worsens and progresses, diabetic nephropathy (DN) is caused as a microvascular complication. The diagnosis of DN is based on a long history of DM, the presence of other microangiopathies such as diabetic retinopathy and/or diabetic neuropathy (Supplementary Figure S1). Sometimes, differential diagnosis is needed for DM patients if urinary occult blood is clearly observed or a rapid renal function deterioration is demonstrated. In that case, a renal biopsy may be performed to distinguish other glomerular diseases on the view of renal pathology. At the point of renal biopsy evaluation for DN, renal immunofluorescent staining images are considered unimportant because there are few specific features of renal immunofluorescent images in DN. DN is diagnosed typically with nodular lesions, exudative lesions and diffuse lesions in light microscopy images (Supplementary Figure S1). Immunofluorescent images are sensitive The image data were obtained as JPEGs or Tiff files. We changed the resolution of the files from 2776 × 2074 pixel to 256 × 256 pixels. After the conversion, we converted the JPEG file into a PNG file for analysis.

Deep Learning
Python was used as the programming language, and the environment was Microsoft's visual studio code and neural network console (Sony Network Communications Inc., Tokyo, Japan.). The input was six types of renal immunofluorescent images of IgG, IgA, IgM, C3, C1q, and fibrinogen converted as previously described. The renal pathological images are classified into training images and test images at a ratio of 8:2. The test images are shown in Supplementary Figure S3 (Supplementary Figure S3 a-c). We performed supervised training for deep learning.

Statistical Analysis
Statistical analysis was performed by JMP (SAS Institute Inc. version 11.0.0 for Windows software, Tokyo, Japan). Statistical significance was defined by one-way analysis of variance (ANOVA) with Student's T-test. Data are shown as the mean ± SE. Significance was defined as p < 0.05. In addition, to determine the cut-off value of the DN diagnosis, a receiver operating characteristic (ROC) curve was constructed using statistical analysis software JMP.

Overview of Computer Schema of Deep Learning
An overview of the computational schema is shown ( Figure 1). We input six kinds of immunofluorescent images, IgG, IgA, IgM, C3, C1q and Fibrinogen. Each image resolution is 256 × 256 pixel. Each image was analyzed, and six types of data were integrated, analyzed again and connected to the output. We used the software Neural Network Console provided by Sony Inc. This software automatically adds or deletes some layers to adjust parameters, obtaining an optimum result. Using this software, we created 419 different types of programs in this study. We evaluated them with a learning curve for each program. Some programs did not work well, and we harvested some better programs for this study.

AI Could Diagnose DN from Immunofluorescent Images
A total of 419 programs were trained using the immunofluorescent images obtained in our hospital (representative: Supplementary Figure S4, a: schema of program, b: learning curve, c: result of diagnosis). Their programs ranged in accuracy from 30% to as high as 100%. The total area under the curve (AUC) of the diagnostic rate of all the created programs was 0.71807, R 2 0.2213, p < 0.0001 (Figure 2a,b). In addition, among the obtained programs, we analyzed the 39 programs where the accuracy ratio was 60% or more. In these extracted programs, the accuracy rate was 83.28 ± 11.64%, the precision rate was 80.56 ± 21.83%, and the recall rate was 79.87 ± 15.65%, and the AUC was 0.92914, R 2 0.4586, p < 0.0001 (Figure 2c,d). Six programs showed 100% accuracy, precision, and recall, and the AUC was 1.000, R 2 1.000, p < 0.0001 (Figure 2e,f). This indicates that AI could automatically extract features from limited image information, and that the judgment is reproduced at high rates even in test data.
Diagnostics 2020, 10, x FOR PEER REVIEW 4 of 9 Figure 1. The overview of convolution neural networkprogram. We used input data as six types of renal immunofluorescent images, IgG, IgA, IgM, C3, C1q and Fibrinogen (Fib).

AI Could Diagnose DN from Immunofluorescent Images
A total of 419 programs were trained using the immunofluorescent images obtained in our hospital (representative: Supplementary Figure S4, a: schema of program, b: learning curve, c: result of diagnosis). Their programs ranged in accuracy from 30% to as high as 100%. The total area under the curve (AUC) of the diagnostic rate of all the created programs was 0.71807, R 2 0.2213, p < 0.0001 (Figure 2a,b). In addition, among the obtained programs, we analyzed the 39 programs where the accuracy ratio was 60% or more. In these extracted programs, the accuracy rate was 83.28 ± 11.64%, the precision rate was 80.56 ± 21.83%, and the recall rate was 79.87 ± 15.65%, and the AUC was 0.92914, R 2 0.4586, p < 0.0001 (Figure 2c,d). Six programs showed 100% accuracy, precision, and recall, and the AUC was 1.000, R 2 1.000, p < 0.0001 (Figure 2e,f). This indicates that AI could automatically extract features from limited image information, and that the judgment is reproduced at high rates even in test data.

The Differences of the Diagnosis among DN Immunofluorescent Images
Next, the DN images used in the test dataset were analyzed in the point of accuracy. We used test image data, which consisted of, representatively, six DN patients' images ( Figure 3). We compared the accuracy with four types programs, the complete diagnosis program (CP: Supplementary Figure S10b). In the AV program, patient #1 did not diagnose as DN, and other DN patients diagnosed as DN slightly above the diagnosis line. These results suggest that AI could diagnose DN the same as a human could.
Diagnostics 2020, 10, x FOR PEER REVIEW 5 of 9 Next, the DN images used in the test dataset were analyzed in the point of accuracy. We used test image data, which consisted of, representatively, six DN patients' images ( Figure 3). We compared the accuracy with four types programs, the complete diagnosis program (CP: Supplementary Figure S5 Figure S10b). In the AV program, patient #1 did not diagnose as DN, and other DN patients diagnosed as DN slightly above the diagnosis line. These results suggest that AI could diagnose DN the same as a human could.

Lime Analysis
Next, to determine the characteristic findings in glomerular changes on DN, we examined which parts of image areas were the main focus under AI diagnosis by using the local interpretable model-agnostic explanations (Lime) analysis. The Lime analysis could show the part where AI mainly makes a decision. We chose representative images of patient #1 (Figure 3). The CP program focused on a part of the central site and the periphery (Figure 4 CP, Supplementary Figure S13), In addition, another CP program focused on the periphery (Supplementary Figure S14). These results suggest that characteristic findings may be located at the periphery for glomeruli of DN. In programs that produce false negatives (FN), judgments were often made only at the margins (Figure 4 FN). Conversely, even in programs that produce false positives (FP), judgments were based only on the margins (Figure 4 FP).
In programs with both false positives and false negatives (AV), judgements were made at the center and margins (Figure 4 AV). These results suggest that, as a characteristic finding, judgment was made mainly on the change of the snare, and the judgment was compared with the central part.

Lime analysis
Next, to determine the characteristic findings in glomerular changes on DN, we examined which parts of image areas were the main focus under AI diagnosis by using the local interpretable modelagnostic explanations (Lime) analysis. The Lime analysis could show the part where AI mainly makes a decision. We chose representative images of patient #1 (Figure 3). The CP program focused on a part of the central site and the periphery (Figure 4 CP, Supplementary Figure S13), In addition, another CP program focused on the periphery (Supplementary Figure S14). These results suggest that characteristic findings may be located at the periphery for glomeruli of DN. In programs that produce false negatives (FN), judgments were often made only at the margins (Figure 4 FN). Conversely, even in programs that produce false positives (FP), judgments were based only on the margins (Figure 4 FP). In programs with both false positives and false negatives (AV), judgements were made at the center and margins (Figure 4 AV). These results suggest that, as a characteristic finding, judgment was made mainly on the change of the snare, and the judgment was compared with the central part.

The Diagnostic Comparison between Nephrologist and AI
To compare the differences between humans and AI, we provided the diagnosis test to nephrologists, using the same test data. Among 39 previously described programs that showed the accuracy above 60%, the diagnosis rate of the test images by the AI was 83.28 ± 11.64% for the accuracy ratio, 80.56 ± 21.83% for the precision ratio and 79.87 ± 15.65% for the recall ratio (Figure 2c,d). On the other hand, the diagnosis rate of the test images by the nephrologist was 67.50 ± 6.12% for the correct answer rate, 62.62 ± 3.85% for the precision rate, and 67.26 ± 9.96% for the recall rate (Figure 5a,b). These results suggest that AI could diagnose the DN in a superior way to nephrologists.

The Diagnostic Comparison between Nephrologist and AI
To compare the differences between humans and AI, we provided the diagnosis test to nephrologists, using the same test data. Among 39 previously described programs that showed the accuracy above 60%, the diagnosis rate of the test images by the AI was 83.28 ± 11.64% for the accuracy ratio, 80.56 ± 21.83% for the precision ratio and 79.87 ± 15.65% for the recall ratio (Figure 2c,d).
On the other hand, the diagnosis rate of the test images by the nephrologist was 67.50 ± 6.12% for the correct answer rate, 62.62 ± 3.85% for the precision rate, and 67.26 ± 9.96% for the recall rate (Figure 5a,b). These results suggest that AI could diagnose the DN in a superior way to nephrologists.

Discussion
The purpose of this study is to examine whether AI can automatically extract features from images that doctors do not consider important in diagnosis by deep learning, and whether AI could make a diagnosis.
DN is usually diagnosed with reference to the long-term history of diabetes and the presence of microangiopathic complications such as diabetic retinopathy. When it is necessary to differentiate from other renal diseases in DM patients, because of the preference of atypical features in DN such as hematuria, sudden onset of proteinuria, and a change in renal function, a renal biopsy is performed and the patients are diagnosed from renal pathology. In DN, characteristic lesions such as diffuse lesions, exudative lesions, and nodular lesions are observed in light microscopy, however, the immunofluorescent images are not used because there are no significant parts. AI can automatically extract and classify features from the immunofluorescent images. From the Lime analysis, AI pointed out the changes in the peripheral vessel loop. At the peripheral vessels loop, we could observe the changes of vessel loop thickening and/or wrinkling in the light microscopy. However, the changes were very subtle in immunofluorescent images, thus we could not observe any significance compared to other glomeruli with nephritis using human eyes.
However, in the analysis of AI, there are still many unclear points about the judgment during the process, and this is said to be the so-called AI black box problem. In the future, we hope that a method will be created that allows humans to understand AI decisions. In this study, we did not examine the difference in the diagnosis rate when using light microscopy images. AI may develop to be better for diagnosis with light microscopic images because there are significant diagnostic points in light microscopic images.
As a study limitation, AI diagnosis is performed only from immunofluorescent images that were small datasets from limited patient images. In order to validate them, verification will be necessary by adding data from other hospitals. In addition, human doctors diagnose comprehensively with clinical course, clinical findings and clinical images, such as light microscope images and electron microscope images. To compare the differences between human and AI, further examination is needed.
This study showed that AI extracts characteristic findings even from data that people do not normally use. This indicates that when a medical doctor makes a clinical diagnosis, AI can independently extract the findings that human doctors find difficult to notice. This study suggested that AI is not a threat to clinicians, but a partner that points out the difficult points which humans do not notice.

Supplementary Materials:
The following are available online at www.mdpi.com/xxx/s1. Supplemental Figure  S1; The diagnosis method for diabetes mellites nephropathy (DN). Supplementary Figure S2

Discussion
The purpose of this study is to examine whether AI can automatically extract features from images that doctors do not consider important in diagnosis by deep learning, and whether AI could make a diagnosis.
DN is usually diagnosed with reference to the long-term history of diabetes and the presence of microangiopathic complications such as diabetic retinopathy. When it is necessary to differentiate from other renal diseases in DM patients, because of the preference of atypical features in DN such as hematuria, sudden onset of proteinuria, and a change in renal function, a renal biopsy is performed and the patients are diagnosed from renal pathology. In DN, characteristic lesions such as diffuse lesions, exudative lesions, and nodular lesions are observed in light microscopy, however, the immunofluorescent images are not used because there are no significant parts. AI can automatically extract and classify features from the immunofluorescent images. From the Lime analysis, AI pointed out the changes in the peripheral vessel loop. At the peripheral vessels loop, we could observe the changes of vessel loop thickening and/or wrinkling in the light microscopy. However, the changes were very subtle in immunofluorescent images, thus we could not observe any significance compared to other glomeruli with nephritis using human eyes.
However, in the analysis of AI, there are still many unclear points about the judgment during the process, and this is said to be the so-called AI black box problem. In the future, we hope that a method will be created that allows humans to understand AI decisions. In this study, we did not examine the difference in the diagnosis rate when using light microscopy images. AI may develop to be better for diagnosis with light microscopic images because there are significant diagnostic points in light microscopic images.
As a study limitation, AI diagnosis is performed only from immunofluorescent images that were small datasets from limited patient images. In order to validate them, verification will be necessary by adding data from other hospitals. In addition, human doctors diagnose comprehensively with clinical course, clinical findings and clinical images, such as light microscope images and electron microscope images. To compare the differences between human and AI, further examination is needed.
This study showed that AI extracts characteristic findings even from data that people do not normally use. This indicates that when a medical doctor makes a clinical diagnosis, AI can independently extract the findings that human doctors find difficult to notice. This study suggested that AI is not a threat to clinicians, but a partner that points out the difficult points which humans do not notice. Funding: This study is partially supported by Yukiko Ishibashi Memorial foundation.

Acknowledgments:
The data that support the findings of this study are available on request from the corresponding author, S.K. The data are not publicly available according to the approval contents of the ethics committee at our hospital.