Machine Learning Based on Morphological Features Enables Classification of Primary Intestinal T-Cell Lymphomas

Simple Summary We presented a machine learning approach for accurate quantification of nuclear morphometrics and differential diagnosis of primary intestinal T-cell lymphomas. The human interpretable machine learning approach can be easily applied to other lymphomas and potentially even broader disease categories. This approach not only brings deeper insights into lymphoma phenotypes, but also paves the way for future discoveries concerning their relationship with disease classification and outcome. Abstract The aim of this study was to investigate the feasibility of using machine learning techniques based on morphological features in classifying two subtypes of primary intestinal T-cell lymphomas (PITLs) defined according to the WHO criteria: monomorphic epitheliotropic intestinal T-cell lymphoma (MEITL) versus intestinal T-cell lymphoma, not otherwise specified (ITCL-NOS), which is considered a major challenge for pathological diagnosis. A total of 40 histopathological whole-slide images (WSIs) from 40 surgically resected PITL cases were used as the dataset for model training and testing. A deep neural network was trained to detect and segment the nuclei of lymphocytes. Quantitative nuclear morphometrics were further computed from these predicted contours. A decision-tree-based machine learning algorithm, XGBoost, was then trained to classify PITL cases into two disease subtypes using these nuclear morphometric features. The deep neural network achieved an average precision of 0.881 in the cell segmentation work. In terms of classifying MEITL versus ITCL-NOS, the XGBoost model achieved an area under receiver operating characteristic curve (AUC) of 0.966. Our research demonstrated an accurate, human-interpretable approach to using machine learning algorithms for reducing the high dimensionality of image features and classifying T cell lymphomas that present challenges in morphologic diagnosis. The quantitative nuclear morphometric features may lead to further discoveries concerning the relationship between cellular phenotype and disease status.


Introduction
Primary intestinal T-cell lymphomas (PITLs) are rare and therefore pose a challenge for diagnosis. According to the 2017 World Health Organization (WHO) classification of lymphoid neoplasms, PITLs include enteropathy-associated T-cell lymphoma (EATL),

Patient Samples
A total of 40 PITL patients, including MEITL (n = 26), ITCL-NOS (n = 10), and borderline cases (n = 4), were collected from 17 hospitals in Taiwan. All 40 specimens were surgical specimens and their diagnoses were made by a senior hematopathologist (SSC) based on the 2017 WHO criteria [1]. They were all PITL cases, as confirmed by staging workups. All cases were negative for Epstein-Barr virus (EBV) as assessed by in situ hybridization for EBV-encoded small RNA (EBER). Some cases had been included in our previous study on the significance of EBV in PITL [6]. All four cases (no. T02, T13, T18 and T20) examined for serum anti-tissue transglutaminase by ELISA were negative [6]. The major clinicopathological findings of the four borderline cases are summarized in Table 1. Phenotypically, they were all consistent with MEITL with the expression of CD8, CD56, and cytotoxic markers; however, the tumor cells were larger than that would be expected in typical MEITL cases. The study was approved by the Institutional Review Board at Chi-Mei Medical Center (approval no. 10612-010).

Dataset Preparation
For each patient, one hematoxylin-and-eosin-stained section was scanned and digitalized to create a total of 40 WSIs with an average size of 175,274 × 407,126 pixels. To avoid image areas of inferior quality (due to conditions such as out-of-focus scanning, tissue folding, and bubbles), the slides were reviewed by a senior hematopathologist (S.S.C.), who manually selected the most representative and high-quality area in each slide before scanning (a representative circled area as shown in Figure 1A). The sections were scanned using the Pannoramic 250 Flash digital slide scanner (3DHISTECH, Hungary) using 40× objective. For each marked area, 33 regions of interest (ROI, size = 115 µm × 115 µm) randomly sampled from 19 cases were extracted for the development of the lymphocyte detection model, whereas 10 high-power fields (HPF, size = 346 µm × 346 µm) from each case were randomly selected (green boxes in Figure 1A) for the development of nuclear morphometric feature exaction and the lymphoma classification model.
For the lymphocyte detection model, a full dataset of 33 ROIs were split into the training, validation, and testing sets at a ratio of 8:1:1. ROIs used as testing sets were contained in WSIs independent from those that contained training and validation ROIs. Within each ROI, the contours of lymphocytes were manually annotated by two pathologists (C.H.L. and S.S.C.). For the development of the disease classification model, a total of 400 HPFs were used as the dataset. A 3-fold cross-validation was employed to train and evaluate the Cancers 2021, 13, 5463 4 of 15 model such that the images from an individual patient would only appear in either the training or validation set. The four borderline cases were excluded from both the training and testing sets and were used only as references. The classification model was tested at the case level. The recruited cases and the disposition of training and validation sets under different phases are summarized in Table 2. For the lymphocyte detection model, a full dataset of 33 ROIs were split into the training, validation, and testing sets at a ratio of 8:1:1. ROIs used as testing sets were contained in WSIs independent from those that contained training and validation ROIs. Within each ROI, the contours of lymphocytes were manually annotated by two pathologists (C.H.L. and S.S.C.). For the development of the disease classification model, a total of 400 HPFs were used as the dataset. A 3-fold cross-validation was employed to train and evaluate the model such that the images from an individual patient would only appear in either the training or validation set. The four borderline cases were excluded from both the training and testing sets and were used only as references. The classification model was tested at the case level. The recruited cases and the disposition of training and validation sets under different phases are summarized in Table 2.

Lymphocyte Detection Model
To segment the neoplastic lymphocytes, an instance segmentation model was trained by employing a region-based CNN model-the hybrid task cascade regional proposal convolutional network (HTC-RCNN) [18] with ResNet50 [19] as the backbone. The HTC-RCNN model was trained using the stochastic gradient descent (SGD) optimizer, at a learning rate of 0.001, and a batch size of 16 on a single NVIDIA V100 GPU. The performance of the detection model was evaluated using average precision (AP). To compute AP, each detected box was first matched to the ground truth to evaluate whether the intersection-over-union (IoU) was over a threshold of 0.5. Subsequently, precision scores were estimated under different recall thresholds ranging from 0 to 1, and AP was computed by summing the area under the precision-and-recall curve.

Computation of Nuclear Morphometrics
The trained lymphocyte detection model was applied to the 400 HPFs to segment lymphocytes. For quantitative analysis of nuclear morphology, seven numerical attributes were calculated for each segmented nucleus of lymphocyte (Table 3). For each HPF, four moment statistics including mean, variance, skewness, and kurtosis were computed for the whole cellular population within an HPF. Thus, the nuclear morphology of each HPF was characterized by 28 features. The throughput of our morphology extraction module was estimated, and it required an average of 2.22 ± 0.42 s (range: 0.91 to 3.13 s) to process each HPF, which could be applicable for clinical practice. At the case level, the extracted features were further aggregated by averaging feature scores across HPFs for each case no. to form a feature set (feature matrix). The feature set was used in a decision-tree-based machine learning algorithm (XGBoost) for disease type classification at the case level. Table 3. Definition of attributes in cellular morphometrics.

Attributes Definition
Ratio of axis length The ratio of the longest axis and the second-longest axis

T-cell Lymphoma Classification Model
We used extreme gradient boosting (XGBoost), a tree-based machine learning approach, to select important features for modeling case-level classification. For the machine learning algorithm, the weight by Gini was applied as the feature selection method iteratively to alleviate the effect of redundant features and to minimize the loss of binary cross-entropy, while random search was employed as the hyper-parameter optimization approach. Two XGBoost models were employed, taking different input sets: (1) only morphological features (28 feature scores per subject) were used as inputs; (2) both morphological features and immunohistochemical (IHC) phenotypes (i.e., either positive or negative for the expression of CD8 and CD56) were used as inputs. A binary classifier was trained to classify each HPF into two types of PITLs, either MEITL or ITCL-NOS, based on the selected features.
To compare the model performance of the XGBoost approach versus the CNN approach, a CNN with ResNet50 backbone was also trained by applying the AdamW optimizer, at a learning rate of 0.001, with a batch size of 16 using an Nvidia Quadro RTX 8000. To prevent the loss of the model in local minima due to limited number of training images, the AdamW optimizer is preferable to the SGD in the current study. The data augmentation technique was applied during the training phase to increase data variability. The applied augmentation methods in this study included random horizontal/vertical flipping, random translation, random rotation, random scaling, random color jittering in brightness/contrast/saturation/hue, random Gaussian blurring, and random cropping. An early-stopping technique was employed to stop training processing when validation loss did not improve for 10 epochs. The CNN took an image of HPF as the input and predicted the disease type of the image based on its case-level outcome. For the testing phase, the case-level prediction for disease type was based on the classification decisions (outcome probabilities) averaged from the 10 HPFs of each case. The classification performance of different models was evaluated using the area under the receiver-operating curve (AUC) with a 3-fold cross-validation, where each fold is split at the case level. Finally, Delong's test was employed to compare AUCs between models with a two-tailed hypothesis testing at the significance level of 0.05.
In addition to the case-level prediction, the feature importance of the XGBoost was evaluated using "gain", which implies how informative the features are when used to split the data across all trees. Higher "gain" indicates a greater contribution of the corresponding feature to the model.

Modeling for Diagnostic Prediction from the Feature Profile
A general linear model (GLM) was employed to better understand the effects of different morphological features on diagnostic decisions. A repeated measurement method was used to control the within-subject variations. Given the fact that the cells sampled from a single HPF might not be fully representative of each case, 10 HPFs were randomly sampled and used to estimate the error variance sourced from HPF sampling. The effects of the morphological feature scores on the prediction of the diagnosis can be formulated as follows: Y ij = µ + α j + π i + ij , where Y ij (dependent variable) denotes the morphological feature score; µ, the population mean of the feature; α j , the group mean score sourced from disease type j; π i , the mean score of Case i ; and ij , the residual error for the observation. The four borderline cases were not included in the statistical analysis.
The descriptive statistics of the 28 aggregate morphological feature measurements of two lymphoma types (METIL vs. ITCL-NOS) and the F-test were conducted to test the differences of each feature score between two disease subtypes.

The Lymphoma Nucleus Detection Model Shows High Sensitivity along with High Positive Predictive Value
An average of 892.18 ± 254.80 nuclei (range: 331 to 1502) were detected in each HPF. The lymphoma nucleus detection model achieved an average precision of 0.943 and 0.881 on segmenting lymphoma nuclei for the validation set and the testing set, respectively. For the testing set, our model achieved a precision (positive predictive value) of 0.911 with a recall (sensitivity) of 0.868. Representative examples of boundary-segmented lymphoma nuclei with monomorphic, pleomorphic, and borderline morphology were depicted in Figure 1B.

The T-Cell Lymphoma Classification Model Discriminated MEITL and ITCL-NOS Cases and Showed Higher Accuracy Than the CNN
The classification performance of two XGBoost models using morphological feature measurements with or without IHC phenotypes as inputs on discriminating the two types of PITLs was evaluated by using AUC. As depicted in Figure 2A, the XGBoost model using only morphological feature measurements achieved an AUC of 0.966 (95% CI: 0.949-0.984), while the XGBoost model using both morphological feature measurements and IHC phenotypes as inputs achieved an AUC of 0.955 (95% CI: 0.935-0.975), indicating no significant difference in the discriminative power by adding the IHC phenotype as a model input (p-value = 0.412), as assessed by Delong's test. In comparison, the deep CNN directly trained using HPFs as image inputs achieved an AUC of 0.820 (0.734-0.906), which was significantly inferior to the XGBoost method (without IHC, p < 0.01; with IHC, p < 0.01; Table 4).
of PITLs was evaluated by using AUC. As depicted in Figure 2A, the XGBoost model using only morphological feature measurements achieved an AUC of 0.966 (95% CI: 0.949-0.984), while the XGBoost model using both morphological feature measurements and IHC phenotypes as inputs achieved an AUC of 0.955 (95% CI: 0.935-0.975), indicating no significant difference in the discriminative power by adding the IHC phenotype as a model input (p-value = 0.412), as assessed by Delong's test. In comparison, the deep CNN directly trained using HPFs as image inputs achieved an AUC of 0.820 (0.734-0.906), which was significantly inferior to the XGBoost method (without IHC, p < 0.01; with IHC, p < 0.01; Table 4).  As shown in Figure 3, there was a high concordance between the XGBoost model and the diagnoses by the senior hematopathologist except for case nos. T02, T20, and T70. Case no. T20 was diagnosed as ITCL-NOS by the hematopathologist but was predicted as MEITL by the XGBoost model; while both cases no. T02 and T70 were diagnosed as MEITL but predicted to more likely be ITCL-NOS by the model.  As shown in Figure 3, there was a high concordance between the XGBoost model and the diagnoses by the senior hematopathologist except for case nos. T02, T20, and T70. Case no. T20 was diagnosed as ITCL-NOS by the hematopathologist but was predicted as MEITL by the XGBoost model; while both cases no. T02 and T70 were diagnosed as MEITL but predicted to more likely be ITCL-NOS by the model.

The Importance of Features Obtained from the XGBoost Model Can be Ranked
As illustrated in Figure 2B, the most important morphological features selected based on the "gain" derived from the XGBoost model included the variance in perimeter, the variance in nuclear area, and the mean of nuclear irregularity. When the IHC phenotype was used as an input in the XGBoost model, both CD56 and CD8 expression status ranked higher than most of the other morphological features, except for the variance in perimeter, indicating that the variance in perimeter contributed the most to the classification decisions.

Feature Analysis Using the GLM Enabled Explicit Interpretation of the Morphological Features
Cell contour delineated by the detection model enabled the quantitative comparison of the morphological profiles between the two disease subtypes. The 28 morphological feature scores computed for each case are presented in Table 5. These features were used to conduct statistical analysis using GLM. Among all the detected cells, the average cell size was 41.72 ± 9.00 um 2 ; the axis ratio, 1.29 ± 0.06; the perimeter, 24.32 ± 2.66 um; the irregularity,1.15 ± 0.40; the circularity, 0.75 ± 0.03; and the entropy, 5.85 ± 0.10, respectively.

The Importance of Features Obtained from the XGBoost Model Can Be Ranked
As illustrated in Figure 2B, the most important morphological features selected based on the "gain" derived from the XGBoost model included the variance in perimeter, the variance in nuclear area, and the mean of nuclear irregularity. When the IHC phenotype was used as an input in the XGBoost model, both CD56 and CD8 expression status ranked higher than most of the other morphological features, except for the variance in perimeter, indicating that the variance in perimeter contributed the most to the classification decisions.

Feature Analysis Using the GLM Enabled Explicit Interpretation of the Morphological Features
Cell contour delineated by the detection model enabled the quantitative comparison of the morphological profiles between the two disease subtypes. The 28 morphological feature scores computed for each case are presented in Table 5. These features were used to conduct statistical analysis using GLM. Among all the detected cells, the average cell size was 41.72 ± 9.00 µm 2 ; the axis ratio, 1.29 ± 0.06; the perimeter, 24.32 ± 2.66 µm; the irregularity, 1.15 ± 0.40; the circularity, 0.75 ± 0.03; and the entropy, 5.85 ± 0.10, respectively. The F-statistic derived from GLM was employed to access the significance of variance between the means of the two disease subtypes among morphological features measurements. Generally, the F-test estimates if the two effects are sourced from the same population. A two-tailed null hypothesis was employed to examine the significance. The results reveal statistically significant differences in several morphological feature measurements between the two disease subtypes. For differentiating MEITL and ITCL-NOS, five out of seven measurements in variance were significant (p < 0.01) and five of seven measurements in mean were significant. Only irregularity and perimeter showed a significant difference in both measurements in skewness and kurtosis. The orientation of cells in the local region showed no statistical significance in differentiating these two diseases. Features associated with the size of cells, including the perimeter and the area, showed a stronger effect (p < 0.001) for measurement in the variance than in the mean.
As illustrated in Figure 4B, the two different disease subtypes can be visually separated by plotting the variance in nuclear perimeter versus the variance in nuclear irregularity. The borderline cases (green dots) that could be hardly distinguishable were also shown in Figure 3B. For MEITL cases, the variance in perimeter and the variance in irregularity were smaller and showed a linear correlation; on the contrary, the ITCL-NOS cases showed a higher variance in irregularity or perimeter. shown in Figure 3B. For MEITL cases, the variance in perimeter and the variance in irregularity were smaller and showed a linear correlation; on the contrary, the ITCL-NOS cases showed a higher variance in irregularity or perimeter.

The Model Produced a 1:1 Ratio Prediction for the Four Borderline Cases
The clinicopathological findings and the model prediction results of the four borderline cases as diagnosed by the senior hematopathologist are presented in Figure 3B. Phenotypically, they were all consistent with MEITL, with neoplastic cells expressing CD8, CD56, and cytotoxic markers; however, the tumor cells were larger than would be expected in typical cases of MEITL. Based on the model prediction, two cases (case nos. T31 and T63) were predicted as MEITL; while the other two cases (T05A and T60) were predicted as ITCL-NOS.

Discussion
Due to the rarity of PITL, differentiating MEITL from ITCL-NOS is challenging for pathologists, and the distinction relies on the experience of the pathologists in evaluating histopathological features, particularly monomorphic versus pleomorphic tumor cells, in conjunction with immunophenotype; nevertheless, the morphologic classification by pathologists could be subjective. In the current study, we collected PITL cases from 17 hospitals in Taiwan and demonstrated a successful workflow that could detect and segment lymphoma nuclei accurately using a CNN model, followed by knowledge-based morphological feature extraction for quantitative analysis and classification of PITL using an XGBoost model.
Previous studies have shown that analyzing cellular properties such as nuclear size and the homogeneity of the cell population can be clinically useful for differentiating different groups of cells and for predicting prognosis. For example, Maqlin et al. [15] and Faridi et al. [14] applied boundary-based and region-growing methods to segment cells for extracting morphological features such as nuclear size, solidity, and eccentricity to grade pleomorphism scores of WSIs of breast cancer cases, and achieved an accuracy of

The Model Produced a 1:1 Ratio Prediction for the Four Borderline Cases
The clinicopathological findings and the model prediction results of the four borderline cases as diagnosed by the senior hematopathologist are presented in Figure 3B. Phenotypically, they were all consistent with MEITL, with neoplastic cells expressing CD8, CD56, and cytotoxic markers; however, the tumor cells were larger than would be expected in typical cases of MEITL. Based on the model prediction, two cases (case nos. T31 and T63) were predicted as MEITL; while the other two cases (T05A and T60) were predicted as ITCL-NOS.

Discussion
Due to the rarity of PITL, differentiating MEITL from ITCL-NOS is challenging for pathologists, and the distinction relies on the experience of the pathologists in evaluating histopathological features, particularly monomorphic versus pleomorphic tumor cells, in conjunction with immunophenotype; nevertheless, the morphologic classification by pathologists could be subjective. In the current study, we collected PITL cases from 17 hospitals in Taiwan and demonstrated a successful workflow that could detect and segment lymphoma nuclei accurately using a CNN model, followed by knowledge-based morphological feature extraction for quantitative analysis and classification of PITL using an XGBoost model.
Previous studies have shown that analyzing cellular properties such as nuclear size and the homogeneity of the cell population can be clinically useful for differentiating different groups of cells and for predicting prognosis. For example, Maqlin et al. [15] and Faridi et al. [14] applied boundary-based and region-growing methods to segment cells for extracting morphological features such as nuclear size, solidity, and eccentricity to grade pleomorphism scores of WSIs of breast cancer cases, and achieved an accuracy of 89.4% and 86.6%, respectively. Similarly, Moran et al. [16] employed the cell detection module in QuPath [20], an open-source image analysis toolbox for digital pathology, to extract the morphological features of Markel cell carcinoma (MCC) and found that nuclear area and circularity were crucial factors for the prognosis of MCC. To date, many critical findings have been achieved by analyzing morphological features using traditional image-processing methods with handy packages such as QuPath and ImageJ; however, the selection of parameters such as background radius, nucleus radius, or local intensity threshold require careful adjustment when processing each single image. As most imageprocessing methods adopt the same hyper-parameter set to segment cells, it may fail in some challenging cases, including: (1) image with crowded cells, (2) cells with inconsistent shape or size in a local area, and (3) cells with hyperchromatic or vesicular nucleus [21].
The state-of-the-art deep-learning-based segmentation models have achieved excellent performance in nuclear segmentation [22,23]. As illustrated in Figure 5, QuPath with the default setting can segment the cell boundary successfully in most cases; however, fragile contours and false positives occurred frequently when cells were crowded, vesicular, or with atypical chromatin patterns. In comparison, our deep-neural-network-based algorithm showed a robust segmentation result with less fragile contours. Notably, nuclei of other types of cells, such as vascular endothelial cells, may fail to be distinguished from lymphocytes because only lymphocytes were labelled to train the detection model in the current study. Falsely included non-lymphoid cells may slightly distort the estimation of morphological features. The inclusion of various types of annotated cells are expected to improve the algorithm. Still, using CNNs to segment cells precisely, requiring no tedious parameter adjustment across different fields of view, could enable pathologists to analyze nuclear morphology without manual region selection and can be applied to WSIs efficiently.
In accordance with the aforementioned studies using nuclear morphometrics for classifying cells into benign versus malignant [24][25][26] or presence versus absence of Merkel cell polyoma virus in MCC [27], the current study demonstrated that nuclear morphological patterns made a significant contribution in differentiating MEITL from ITCL-NOS. Specifically, the variance in perimeter and irregularity were two of the most critical features to separate MEITL and PICL-NOS into different clusters. As illustrated in Figure 4B, the shape of lymphocytes in a typical MEITL case (case no. T48) was apparently round and regular, while that in a typical ITCL-NOS case (case no. T45A) was pleomorphic. For atypical cases, such as case nos. T02 and T69 ( Figure 4B), it was difficult to classify them into either MEITL or ITCL-NOS according to the appearance of nuclei (in the middle of both slides) by human eyes. Using our algorithm-derived morphological indexes and classification scheme, the four borderline cases as judged by the experienced hematologists (typical immunophenotype but with more pleomorphic nuclei than typical MEIL cases) were classified either as MEITL (case nos. T31 and T63) or ITCL-NOS (nos. T05 and T60), indicating limitations from both manual and AI-assisted histopathological evaluation when it comes to making a clear-cut diagnosis. Alternatively, the degree of "monomorphism" might not be as strict as the name of MEITL implies, suggesting that there might be a certain degree of nuclear pleomorphism in cases currently defined as MEITL. The nature of such borderline cases in our series may potentially be revealed by their underlying genetic features, which we have been currently investigating.
Immunophenotypically, Weisenburger et al. [28] reported that 19% and 6% of ITCL-NOS cases expressed CD8 and CD56, respectively, indicating a phenotypic overlap between MEITL and ITCL-NOS. Our algorithm could classify PITL cases into either MEITL or ITCL-NOS based on the morphological indexes; however, the relationship between these indexes and disease prognosis remains unclear. Further studies are needed to explore the relationship between morphological features and patient outcomes with these rare tumors. For differentiating MEITL and ITCL-NOS, immunophenotypic features were found to bring no additional benefits for classification performance to the tree-based models on top of morphological features in our models. ular, or with atypical chromatin patterns. In comparison, our deep-neural-network-based algorithm showed a robust segmentation result with less fragile contours. Notably, nuclei of other types of cells, such as vascular endothelial cells, may fail to be distinguished from lymphocytes because only lymphocytes were labelled to train the detection model in the current study. Falsely included non-lymphoid cells may slightly distort the estimation of morphological features. The inclusion of various types of annotated cells are expected to improve the algorithm. Still, using CNNs to segment cells precisely, requiring no tedious parameter adjustment across different fields of view, could enable pathologists to analyze nuclear morphology without manual region selection and can be applied to WSIs efficiently. Figure 5. Comparison of contour segmentation results with the QuPath cell detection module. The first row represents the original image. For the QuPath cell detection module (the second row), default parameters were employed, except the nucleus sigma (radius of the nuclei) was set to 3 μm due to the pixel-to-μm ratio. Fragile contours (FC) and false positives (FP) occurred frequently in QuPath. Our lymphocyte detection model (the third row) showed fewer FC and FP but more Figure 5. Comparison of contour segmentation results with the QuPath cell detection module. The first row represents the original image. For the QuPath cell detection module (the second row), default parameters were employed, except the nucleus sigma (radius of the nuclei) was set to 3 µm due to the pixel-to-µm ratio. Fragile contours (FC) and false positives (FP) occurred frequently in QuPath. Our lymphocyte detection model (the third row) showed fewer FC and FP but more false negatives (FN) at a score threshold of 0.2, along with an IoU threshold of 0.5, which might be due to the low confidence of the segmented contours.
Using CNNs as an end-to-end solution to estimate clinical quantification indexes, such as nuclear atypia score [7,13], has become popular in recent years. In this study, the CNN model trained by images directly was capable of differentiating these two types of PITLs (AUC = 0.82); however, the classification performance was significantly inferior to the tree-based models (p < 0.01) using quantitative nuclear morphometric features as input. Previous studies have shown that CNNs can achieve over 95% AUC in the classification of lymphomas, for instance, the diffuse large B-cell lymphoma [29] and the follicular lymphoma (FL) [30]. In these studies, the researchers utilized architectural patterns rather than cellular details for analysis. For instance, in our previous study, we found that the duodenal FL had a higher density of follicles and larger follicle size as compared with the reactive lymphoid hyperplasia [31]. In the current study, the growth patterns of PITL, either MEITL or ITCL-NOS, was diffuse and we could not use growth pattern for differential diagnosis. Instead, we needed to extract morphological features of the single tumor nucleus for second-order statistical analysis. Notably, CNNs are known to achieve an inferior performance when there are insufficient training data, whereas XGBoost models with expert-defined features could achieve a satisfactory performance with limited data, because this prevents models overfitted to the ultra-high dimensionality of image features. Incorporating more cases might improve the classification performance of the CNN model. The explainability of CNNs remains a debatable but valuable research topic in the future, especially in medical image analysis. In contrast, models based on accurately segmented-then-extracted handcrafted features could be used as clinical indexes for prediction, given that the weights of each extracted feature parameter can be estimated and understood by pathologists. The weighting of extracted features makes it easier for pathologists to associate their diagnostic experiences with digital findings, and thus, the algorithm may offer the possibility of a reliable computer-aided system for routine diagnostic practice.
While most quantitative indexes, such as mitoses, tumor budding, and the Ki67 index, require pathologists to follow a complex visual scoring protocol to execute laborious counting, using ML algorithms to accomplish an automated scoring is ideal given that a computer algorithm can provide a consistent and objective result efficiently. Previous studies have demonstrated that an ML algorithm is able to achieve comparable quantification results with manual examination by pathologists in a variety of tasks, including cell counting and IHC scoring [32]. In this study, cellular morphology was quantified, and the results show that several extracted morphological indexes, including nuclear size, nuclear circularity, and irregularity, were found to be important morphological features and showed a significant contribution in regression models for differentiating MEITL from ITCL-NOS. These findings are in line with previous findings showing correlations between morphological indexes and the severity of diseases. With the aid of AI, such morphological evaluation and counting tasks can be easily accomplished on a larger scale, which is beyond most of the current visual scoring protocols using limited numbers of cells for the score index, after selecting the representative fields.
In the context of clinical practice, pathologists may incorporate the AI-derived results with immunophenotypic findings for diagnosis, particularly for morphologically borderline cases as shown in our current study. In computational pathology, the ML algorithm may be applied to other indications; for example, to differentiate classic versus pleomorphic cytomorphological variants among mantle-cell lymphomas or to classify low-versus highgrade follicular lymphoma. Therefore, our workflow might be applicable to various tumor types for morphological evaluations.

Conclusions
In conclusion, this study used ML techniques based on quantitative cell morphometric information for classifying histopathological images of PITLs into two subtypes, either MEITL or ITCL-NOS. The classification performance of the XGBoost model was superior to the end-to-end CNN model and could elucidate explicit relationships between predictions and morphological features. Furthermore, it achieved a comparable result to that of the incorporation of immunophenotype and to that of the senior hematopathologist. Our model may hold great potential to improve diagnostic consistency, efficiency, and accuracy for other types of cancers.

Informed Consent Statement:
The study was granted a waiver of the informed consent process because it is a retrospective study which involves no risk to patients and most patients had passed away before this study.

Data Availability Statement:
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.