Development of “Mathematical Technology for Cytopathology,” an Image Analysis Algorithm for Pancreatic Cancer

Pancreatic ductal adenocarcinoma (PDAC) is a leading cause of cancer-related death worldwide. The accuracy of a PDAC diagnosis based on endoscopic ultrasonography-guided fine-needle aspiration cytology can be strengthened by performing a rapid on-site evaluation (ROSE). However, ROSE can only be performed in a limited number of facilities, due to a relative lack of available resources or cytologists with sufficient training. Therefore, we developed the Mathematical Technology for Cytopathology (MTC) algorithm, which does not require teaching data or large-scale computing. We applied the MTC algorithm to support the cytological diagnosis of pancreatic cancer tissues, by converting medical images into structured data, which rendered them suitable for artificial intelligence (AI) analysis. Using this approach, we successfully clarified ambiguous cell boundaries by solving a reaction–diffusion system and quantitating the cell nucleus status. A diffusion coefficient (D) of 150 showed the highest accuracy (i.e., 74%), based on a univariate analysis. A multivariate analysis was performed using 120 combinations of evaluation indices, and the highest accuracies for each D value studied (50, 100, and 150) were all ≥70%. Thus, our findings indicate that MTC can help distinguish between adenocarcinoma and benign pancreatic tissues, and imply its potential for facilitating rapid progress in clinical diagnostic applications.


Introduction
Pancreatic ductal adenocarcinoma (PDAC) is a leading cause of cancer-related death worldwide; the 5-year survival rate for patients with PDAC is less than 10%, and most patients die within 2 years after diagnosis [1,2]. Although surgery is recommended for patients with early-stage or locally advanced disease, less than 20% of such patients are good candidates for resection. Because PDAC is a systemic disease, multimodal treatment is required, such as neoadjuvant/adjuvant chemotherapy and chemo radiation therapy [3]. Therefore, obtaining a definitive diagnosis by endoscopic ultrasonography-guided fineneedle aspiration cytology (EUS-FNA) and endoscopic ultrasonography-guided fine-needle biopsy (EUS-FNB) before surgery or treatment has become increasingly essential [4,5].
When making a definitive diagnosis of PDAC, a cytological diagnosis by EUS-FNA can be strengthened by performing a rapid on-site evaluation (ROSE), which helps to provide immediate feedback and enables a diagnosis to be made in the shortest possible time [6][7][8]. However, some issues exist with ROSE, such as the limited number of facilities where ROSE can be provided [9]. One reason for this may be the relative lack of cytologists who can immediately diagnose pancreatic cancer with ROSE; thus, the burden on cytologists is increasing. Therefore, there is a need to develop a new diagnostic-support technology.
Currently, attempts are being made to build automated systems using artificial intelligence (AI) [10,11]. However, AI itself has problems, such as high cost, low versatility (since the results depend on the quality and quantity of the teaching data), and an unknown reason for the diagnostic result (AI's black box). In addition, the distribution of the cell nucleus (which provides a clue for diagnosis) is random. Thus, the random distribution of the cell nucleus makes it difficult to apply algorithms based on the supervised data used in AI. Therefore, using AI for cytological diagnosis is difficult, because the data need to be analyzed by capturing the random and three-dimensional distribution of lesions.
In contrast, the mathematical method developed in this study does not require teaching data or a large-scale computer system. Our mathematical method clarifies ambiguous cell boundaries by solving certain differential equations. Specifically, we (I) clarify ambiguous nuclear boundaries by solving a reaction-diffusion system and (II) quantitatively evaluate the cell nucleus status using mathematical principles, an approach known as the homology profile (HP) method, to match it with physicians' interpretations [12][13][14][15]. The HP method is an algebraic tool for measuring the topological features of objects [16]. Given a topological space, the HP algorithm computes the number of connected components and holes using the structure of that space, based on continuous thresholds. Recently, Qaiser et al. employed the HP algorithm for tumor segmentation by focusing on the connectivity between nuclei [12,17].
Using the HP method, medical images are converted into structured data, which renders them effective for use with AI techniques. We named the series of methods developed in this study as the Mathematical Technology for Cytopathology (MTC) algorithm. The essence of MTC is the structuring of medical images; once the image data are structured, the bottleneck of applying AI technology to pathological images (a current limitation) will be largely eliminated, leading to rapid progress in clinical applications. MTC does not require a large-scale computational system and does not depend on monitoring data, such as staining conditions. MTC can be applied universally because it is robust. In addition, the algorithm is clear and the reason for the diagnosis can be explained. In this study, we investigated the applicability of MTC to support the cytological diagnosis of pancreatic cancer.

Study Design
This study was designed as an exploratory observational study to analyze whether PDAC can be diagnosed using MTC with medical records and existing cytology specimens, without involving invasion or intervention. This research was conducted as a joint effort between investigators at Mie University, Osaka University, and Tohoku University. We attempted to differentiate adenocarcinoma from benign pancreatic cells by quantitatively analyzing information on the distributions (size and variability of the cell cluster form) of cells and cell nuclei via MTC analysis of cytological images (103 normal and 143 adenocarcinoma specimens) obtained by EUS-FNA or EUS-FNB.

Procedure of EUS-FNA/EUS-FNB and Diagnosis
A convex-array echo-endoscope (GF-UCT260, Olympus, Tokyo, Japan) was used for EUS-FNA and EUS-FNB procedures. After identifying tumors using B-mode imaging and confirming the absence of vessels in the target area, we punctured the pancreatic mass under endoscopic ultrasonographic guidance. We mainly used four types of needles, namely, 25 G and 22G FNA needles (EZ-shot 3 Plus, Olympus, Tokyo, Japan), and 22G and 19G Franseen needles (Acquire, Boston Scientific, Natick, MA, USA). The different needles were used according to their availability.
A cytologist immediately examined each specimen with ROSE using rapid staining (Diff-Quick stain; International Re-agents, Kobe, Japan) to verify that a sufficient sample was obtained. Further punctures were performed in cases where an insufficient sample was obtained. We confirmed the diagnosis of PDAC by cytological and/or histological analyses with EUS-FNA and EUS-FNB specimens. Both the cytological and pathological diagnoses were based on the review of all these materials by cytopathologists.

Automatic Diagnosis Assistance System
To automate the ROSE analysis of each series of contents, it was necessary to analyze the morphology and arrangement information of the "nucleus," such as the irregularity of the cell nucleus. It was only necessary to extract the nuclei; however, the nuclei were layered on top of each other, which made it difficult to separate them using ordinary image processing methods. Therefore, only the cell nuclei were extracted using the "reactiondiffusion method," which involves separation by adjusting the gray areas to either black or white (Figure 1). At present, the "reaction-diffusion method" takes approximately one minute per image. With further improvements, it should be possible to process each image in approximately 10 s.

The Mathematical Method
Reaction-diffusion systems are often used to analyze self-organization phenomena [18], but they can also be applied to image analysis. Here, the method was applied to detect ambiguous boundaries of the nuclei. In general, physicians detect nuclei by ignoring small particles and light-colored areas. However, this method shows poor reproducibility when performing ordinary image analysis. We sought to increase the reproducibility by solving a reaction-diffusion system. The most important key is Equation (3), which is presented below in Section 2.1. The first term (the reaction term) changes brightly colored areas to black or white. The second term (the diffusion term) serves as an averaging factor, causing small particles to disappear. The associated mechanism can be explained as follows: if the value of u is in interval (i), then the reaction term is negative (Equation (3)). If u t is considered negative, then the value of u decreases. Conversely, if u is in interval (ii), then the value of u increases ( Figure 2). Here, let interval (i) be (a, 0) and interval (ii) be (0, b). Therefore, the value of u finally converges to that of a (black) or b (white). Figure 1 shows representative reaction-diffusion results (right panels), which seem to be close to the cytological images (left panels).

Figure 2.
If the value of u is in interval (i), then the reaction term is negative. If u t is considered negative, then the value of u decreases. Conversely, if u is in interval (ii), then the value of u increases. Here, let interval (i) be (a, 0) and interval (ii) be (0, b). Therefore, the value of u finally converges to that of a (black) or b (white).
The idea of applying a reaction-diffusion system to image analysis was first introduced and developed by Nomura et al. for detecting edges in images with variable brightness [19]. In addition, Mahara et al. applied this method to detect vague boundaries, such as material grains (JIS-SUJ2) and capillaries at the base of the fingernails (Figure 3) [20].

The Reaction-Diffusion System
Local average thresholds were determined based on the following diffusion equation: where D a is the diffusion coefficient. The value a is a threshold for running the FitzHugh-Nagumo (FHN) equations, i.e., Equations (3) and (4) [19,21]. The initial value a 0 of a is determined by the following: where I is the brightness of pixels with the gray scale (0-255) in the original image. I max and I min are the maximum and minimum brightness values of the pixels, respectively. Next, a reaction-diffusion system was used based on the FHN equations. The system is described by Equations (3) and (4): where D u and D v are the diffusion coefficients for the variables u and v, respectively. The parameter ε is a positive small constant (0 < ε < 1). The parameter b is a positive constant and is spatially homogeneous. The initial value of u is defined as follows: where C is constant, and the appropriate value of this parameter depends on the original image used for edge detection. The initial value of v was set to zero uniformly in this domain.

Numerical Computations
Numerical calculations were carried out as previously described [18][19][20]22,23]. The initial conditions of a and u were determined based on the pixel data of the images (Equations (2) and (5)). We discretized Equations (1), (3) and (4), and used the fourth-order Runge-Kutta method for space and the finite-difference method for time. The Neumann boundary conditions were applied.

Calculating the Quantitative Index
The quantitative index was calculated for the connected components with areas of 100-1000 pixels in the images. The connected components with areas outside of this range were considered to be noise or to reflect instances where the nuclear boundary could not be distinguished well. Seven quantitative indexes were calculated, including the following: (1) number of pixels; (2) area (pixels); (3) interquartile range of the area; (4) area/pixel; (5) average perimeter of the connected components; (6) average circularity of the connected components; and (7) interquartile circularity range of the connected components. The parameter D (induced by D u and D v ) can be regarded as reflecting the state of the tissue staining. In this study, we selected three parameters for D (50, 100, and 150). Important indices for detecting adenocarcinoma cells were identified by calculating the accuracy, sensitivity, and specificity of the quantitative index.

Classifying Tissues as Normal or Adenocarcinoma Tissues
Tissue classifications (i.e., normal or adenocarcinoma) were performed using univariate and multivariate analysis with the seven quantitative indexes mentioned in Section 2.5.
When performing univariate analysis, the median value was used as the threshold value to perform the classification. For multivariate analysis, we combined the quantitative indexes by summing them. It should be noted that we repeatedly performed multivariate analysis by changing the number of combined quantitative indexes from two to seven (i.e., 120 combinations). In this analysis, min-max normalization was used to normalize each quantitative index. MATLAB R2020a (Math Works, Natick, MA, USA) was used for the calculation.

Evaluating the Classification Accuracy
The accuracy, sensitivity, and specificity were calculated by the following equations: Here, TP, TN, FP, and FN represent the true-positive, true-negative, false-positive, and false-negative values, respectively. MATLAB R2020a was used for evaluating the accuracy of the classification method.

Results
Here, D was calculated as an index that depended on the diffusion coefficients Du and Dv. First, to select the appropriate parameters, we outputted the reaction-diffusion images with three different D values (50, 100, and 150). As the D value decreased, the unnecessary parts of the edges became visible, instead of the core content. In contrast, as the D value increased, the unnecessary parts of the edges disappeared, while the content showed a tendency to almost disappear. Therefore, we selected three D values with acceptable performance: D = 50, D = 100, and D = 150 (Figure 4). For all the images, the quantitative indexes were extracted by cropping around the selected cell masses. With the univariate analysis, the highest accuracies for each parameter were 71% (D = 50, number of pixels; Table 1), 69% (D = 100, interquartile range of circularity of the connected components; Table 2), and 74% (D = 150, interquartile range of circularity of the connected components; Table 3), respectively. These results showed that setting D to 150 resulted in the highest accuracy (i.e., 74%) among all three parameters studied.   The multivariate analysis was performed using 120 combinations of evaluation indices. With the multivariate analysis, the highest accuracies for each parameter were 75% (D = 50, number of pixels + interquartile area range + average perimeter of the connected components; Table 1), 70% (D = 100, number of pixels + interquartile area range; Table 2), and 74% (D = 150, area/pixel + interquartile circularity range of the connected components; Table 3), respectively.

Discussion
The results of this study show that MTC could be used to distinguish between adenocarcinoma tissue and benign pancreatic tissue. Although MTC showed excellent results in discriminating adenocarcinoma from benign patterns in cytology images, there were three problems. One problem was that the edges of the cell clusters often overlapped with each other, making it difficult to capture individual nuclei, resulting in false-positive results. To solve this problem, the cytologist manually selected the region of interest to remove the unnecessary overlap. The second problem was related to the nuclear area in aggregated pancreatic cancer cells with mucus production. Although the distance between the nuclei was irregular, due to the wider cytoplasm of mucus-producing cancer cells, nuclear enlargement appeared to be relatively mild and contributed to the false-negative results. The last problem was that little information was available when the specimens were small and, thus, the judgments varied. The latter two problems can be solved by increasing the number of patterns and performing deep learning.
To the best of our knowledge, no studies have used machine learning or deep learning to support the cytological analysis of pancreatic tissues to diagnose adenocarcinoma in pancreatic EUS-FNA specimens. In 2021, Naito et al. reported the first application of deep learning to detect adenocarcinoma in pancreatic EUS-FNB specimens [24]. They stated that the specimens that pathologists needed for diagnosing adenocarcinoma included various tissue components, such as invasive ductal carcinoma cells in desmoplastic stromata and circulating fragmented and intact cancer cells in the blood. Histological diagnosis is based on the diagnosis of both cellular and structural atypia, whereas cytological diagnosis is based on the morphological abnormalities of individual cells, such as nuclear atypia. Therefore, it is more difficult to directly incorporate cytological diagnosis into deep learning than it is to incorporate histological diagnosis.
Recently, EUS-FNB has been used more than EUS-FNA for tissue acquisition [25][26][27], as EUS-FNB has been reported to provide more stable diagnostic results after improvements were made to the puncture needle [28,29]. However, while EUS-FNB is useful for diagnosing large masses, it is quite difficult to collect tissue fragments by EUS-FNB from small masses (i.e., <1 cm in diameter). In such cases, cytology by EUS-FNA with ROSE may often be more useful. Mie et al. reported that EUS-guided tissue acquisition from small solid pancreatic lesions for ROSE had a high diagnostic yield and was safe [30]. Similarly, in our institution, when tissue samples are obtained under EUS from a small pancreatic mass, they are confirmed by ROSE, and the samples are processed as direct smears and/or formalin-fixed core biopsy specimens.
The role of cytologists in ROSE is significant. Fitzpatrick et al. reported the diagnostic performance of cytopathology (CP) in the evaluation of pancreatic EUS-FNB specimens, evaluated the FNB diagnostic performance stratified by tissue triage, and reviewed the specimen types [31]. They reported that CP accurately diagnosed pancreatic FNB specimens, the ROSE review by CP improved the diagnostic yield and operating characteristics, and that a concurrent review of both the cytological features of direct smears and the architectural features of core biopsies improved the overall diagnostic performance. These findings highlighted the importance of CP in assessing FNB specimens to evaluate the adequacy and render a preliminary diagnosis at the time of the procedure. Therefore, it is important to train cytologists to perform ROSE quickly and accurately. The development of this diagnostic aid technology, MTC, is expected to be very useful in clinical practice, and serve as a good teaching tool for training cytologists, by comparing the analytical results of MTC with their own diagnoses.
Our study has several limitations. One limitation is that the test images were all obtained from a single institution; therefore, it is uncertain how well the MTC model would perform with images obtained from a different institution. Second, no external validation was performed. The third limitation is that the test set size was small (103 normal specimens and 143 adenocarcinoma specimens), and it might not include all the potential variations in cases that could be encountered. As future work, we intend to further develop and evaluate our model with multiple test sets obtained from different medical institutions, to assess its generalization performance and move closer towards adopting such assistive models in routine cytological diagnosis workflows.

Conclusions
In the future, we can expect to improve the accuracy by selecting optimal parameters for each extracted image. Because MTC is simple and does not require supervisory data, it can be applied to various medical facilities and is expected to be useful for diagnosis support in the future. Informed Consent Statement: Patient consent was waived. Instead of informed consent, the study content was disclosed to the study subjects and their families through the hospital's website, and patients were guaranteed the opportunity to refuse the use of their data for the study. Prior to the start of this study, the study outline was registered in the University Hospital Medical Information Network Clinical Trial Registry (UMIN-CTR) under UMIN-CTR identification number 000044462).