1. Introduction
Canine Soft Tissue Sarcoma (cSTS) is a heterogeneous group of mesenchymal neoplasms (tumours) that arise in connective tissue [
1,
2,
3,
4,
5,
6]. cSTS is more prevalent in middle-age to older and medium to large-sized breeds with the median reported age of diagnosis between 10 and 11 years old [
3,
7,
8,
9,
10]. The anatomical site of cSTS can vary considerably, but it is mostly found in the cutaneous and subcutaneous tissues [
9]. In human Soft Tissue Sarcoma (STS), histological grade is an important prognostic factor and one of the most validated criteria to predict outcome following surgery in canines [
10,
11,
12,
13]. General treatment consists of surgically removing these cutaneous and subcutaneous sarcomas. Nevertheless, it is the higher-grade tumours that can be problematic, as their aggressiveness can reduce treatment options and result in a poorer prognosis. The focus of this study was on one common subtype found in dogs: canine Perivascular Wall Tumours (cPWTs). Canine Perivascular Wall Tumours (cPWTs) arise from vascular mural cells and are often recognisable from their vascular growth patterns [
14,
15].
The scoring for cSTS grading is broken down into three major criteria: the mitotic count, differentiation and the level of necrosis [
9]. Mitosis counting can be exposed to high inter-observer variability [
16], depending on the expertise of the pathologist; however, the counting of mitotic figures is considered the most objective factor in comparison to tumour necrosis and cellular differentiation when grading cSTS [
16]. It is routine practise to investigate mitosis using 40× magnification; however, manual investigation at such high-powered fields (HPFs) is a laborious task that is prone to error, thus leading to the previously discussed inter-observer variability phenomenon.
For the purposes of this study, the focus was on creating a mitosis detection model as it is a significant criterion from the cSTS histological grading system [
13] where the density of mitotic figures is also considered highly correlated with tumour proliferation [
17]. Mitosis detection has been pursued in the computer vision domain since the 1980s [
18]. Before 2010, relatively few studies aimed to automate mitosis detection [
19,
20,
21]. However, since the MITOS 2012 challenge [
22], there has been a resurgence of interest. Mitosis detection can often be considered as an object detection problem [
23]. Rather than categorising entire images as in image classification tasks, object detection algorithms present object categories inside the image along with an axis-aligned bounding box, which in turn indicates the position and scale of each instance of the object category. In the case of mitosis detection, the considered objects are mitotic figures. As a result, several approaches have used object detection-related algorithms for mitosis detection. An example of an object detection algorithm is the regions-based convolutional neural network (R-CNN) [
24]. At first, a selective search is performed on the input image to propose candidate regions, and then the CNN is used for feature extraction. These feature vectors are used for training in bounding box regression. There have been many developments on this type of architecture such as Fast R-CNN [
25] and Faster R-CNN [
26], which is the primary object detection model used in this work. One set of authors detected mitosis using a variant of the Faster R-CNN (MITOS-RCNN), achieving an F-measure score of 0.955 [
27].
Several challenges have been held in order to find novel and improved approaches for mitosis detection [
17,
22,
23,
28,
29]. Some of these challenges and research on mitosis detection methods have also been conducted using tissue from the canine domain [
30,
31,
32,
33].
It was made apparent by the collaborating pathologists that AI approaches for grading tasks in cSTS were desirable, and so this study aims to tackle one criterion, which is to develop methods for mitosis detection in a subtype of cSTS: cPWT. To the best of our knowledge, this is the first work in the automated detection of mitoses in cPWTs.
3. Results
The pathologists-in-the-loop approach for dataset refinement was first applied as demonstrated by
Figure 2. In a preliminary investigation, two magnifications (40× and 20×) were used to determine the best resolution for our for our task (see
Table 2).
Table A6 and
Table A7 show the differences in mitotic candidate numbers before and after refinement (second review) for the training/validation and test sets, respectively. The first set of results from the optimised Faster R-CNN approach is depicted in
Table 3. This shows a comparison of performance of the Faster R-CNN trained on the initial mitosis dataset and the updated refined mitosis dataset. It is apparent that sensitivities have improved for all folds when using the updated refined dataset; however, in some cases, such as in fold-1 validation, fold-3 validation and fold-3 test, we can see that the F1-score is lower due to a decrease in precision scores. This could be due to the updated refined dataset containing more difficult examples for the effective mitosis object detection training. The previous initial dataset may have contained more obvious mitosis examples and thus was predicting detections that closely resembled these obvious examples.
Table 4 shows the Faster R-CNN results before and after F1-score thresholding was applied on the models trained using the updated mitosis dataset. The thresholds were predetermined on the validation set for each fold using Equation (
4) (see
Figure 4). When applying the optimal thresholds, we saw large improvements in the F1-score, which were largely due to an improvement in precision because of a reduction in FPs. This was seen on the test set with an F1-score of 0.402 to 0.750. However, this increase in precision came at the expense of some sensitivity across all three folds, where for example on the test set the mean sensitivity for all three folds reduced from 0.952 to 0.803. Nevertheless, the depreciation in sensitivity does not offset the increase in precision, where sensitivity decreased by 14.9 % and precision increased by 45.2 %. This suggests that the majority of TP detections prior to the adaptive F1-score thresholding are of a high probability confidence compared to the FP detections.
4. Discussion
This study has demonstrated a method for mitosis detection in cPWT WSIs using a Faster R-CNN object detection model, an adaptive F1-score thresholding feature on output probabilities and the refinement of a mitotic figures dataset by keeping pathologists in the loop.
Many approaches in the literature use the highest resolution images for their object detection methods (typically at 40× objective); however, we preliminarily found that 20× magnification was beneficial for our task and the dataset provided, as shown in
Table 2. Nevertheless, this warrants a further investigation and additional discussions with the collaborating pathologists, who may provide reasoning as to why certain candidates were classed as mitosis at different resolutions.
Initially, solely using the outputs from a Faster R-CNN model produced promising results generating high sensitivities; however, these outputs required further post-processing to improve precision. Applying adaptive F1-score thresholds, where the optimal values were predetermined on the validation set and applied to the test set, demonstrated an effective method of reducing the number of FP predictions. This ultimately resulted in dramatically increasing the F1-score due to a stark increase in precision. However, this came at a small expense of sensitivity. Nevertheless, the rate of change of the sensitivity and the precision are not equal with the latter vastly improving. This suggests that the majority of FP detections are of lower probability confidence compared to TP detections.
Multi-stage (typically dual-stage) approaches have also become increasingly prevalent over the years where they typically take the form of selecting mitotic candidates in the first stage and then apply another classifier in the second stage [
32,
33,
47,
48,
49]. Although not reflected in the main findings of this study, we attempted to use a second-stage classifier (
Figure A1) on mitotic candidates to classify between TP and hard FPs to no avail (see results of the two-stage approach in
Table A8 and its subsequent ROC curves in
Figure A2). Most machine learning methods require large datasets for effective training, which in this case was not available once optimisation was applied using the adaptive F1-score threshold method. One could train models using the non-thresholded detections; however, this would result in a model that is able to distinguish between true positive mitosis and mostly obvious FP candidates. By applying the adaptive F1-score thresholding method, we constrained the dataset and attempted to learn differences between TP and high confidence hard false positive detections, but we did not provide an adequately large dataset for training.
Figure 5 depicts a 512 × 512 pixel image in the test set, highlighting FN and FP detection.
Different phases and other biological phenomenon could influence the size of the mitosis region of interest. Going forward, it may also be worth labelling mitosis in regard to the phases and thus creating a multi-class problem rather than binary, as shown in this study. As a consequence, the size of the ground truth bounding boxes could also be varied depending on the target phase being classified. Nonetheless, the models were still able to predict the vast majority of mitosis in these phases.
It must be further denoted that the methodology is applied to only patches from HPFs containing mitosis that were annotated by the collaborating pathologists. Therefore, we propose expanding our dataset to include a broader range of sections, including those not initially marked by pathologists, to evaluate and enhance our model’s generalisability. The data should include labels for areas containing tumour and non-tumour tissue to fully consider the overall impact of this mitosis detection method.
Our focus for this study is on cPWT; however, we could potentially adapt this method to other cSTS subtypes as well as to other tumour types. An additional study might explore the application of cPWT-trained models to different cSTS subtypes to assess if comparable outcomes are achieved. Nevertheless, given that tumour types from various domains exhibit unique challenges due to their specific histological characteristics, it may be necessary to train or fine-tune models using tumour-specific datasets to evaluate the efficacy of this approach.
While our F1-score demonstrates competitive performance for detecting mitosis in the canine domain, the clinical relevance and applicability of this metric should be taken into account. Future work should focus on employing this method as a supportive tool, assessing its practical effectiveness and reliability in a veterinary clinical setting.
To conclude, by using our experimental set-up, the optimised Faster R-CNN model was a suitable method for determining mitosis in cPWT WSIs. To the best of our knowledge, this is the first mitosis detection model applied solely on cPWT data, and thus we consider this a baseline three-fold cross-validation mean F1-score of 0.750 for mitosis detection in cPWT.