1. Introduction
Cancer is the leading cause of death in companion animals, and mammary tumors, the most common neoplasm in female dogs, represents a serious issue in worldwide veterinary practice [
1,
2]. As in animals, human breast cancer (HBC) is the most common malignancy in women worldwide, and several clinical and molecular similarities between canine mammary lesions and HBC have been described [
3,
4]. Consequently, dogs have attracted considerable interest as potential animal models to study human cancer [
5]. Despite the tremendous effort made in fighting cancer, the biological and morphological heterogeneity of canine mammary tumors (CMTs) has challenged veterinary pathologists since the early days of diagnostic pathology. As a result, an increasing number of studies in this area have been published in recent decades. Mammary cancer is a multifactorial disease and various elements contribute to its occurrence and behavior [
6].
Epidemiological, clinical, histological, and molecular factors are considered important risk factors for mammary neoplasms. Among the main epidemiological CMT risk factors, age, breed, and reproductive and hormonal status are consistently reported in the literature [
1,
2,
6].
CMTs usually affects middle-aged and older dogs with an increased risk between 8–11 years old [
6]. Additionally, benign tumors are more likely in dogs ranging between 7 and 9 years, while malignant tumors are more frequently encountered in older dogs [
7,
8,
9,
10,
11]. However, the peak incidence dictated by the age should be carefully evaluated given that larger breeds of dogs have a naturally shorter lifespan and therefore tend to be younger than smaller breeds when they receive a cancer diagnosis [
1].
Mammary neoplasm can occur in dogs of any breed, although pure breeds seem more prone to develop CMTs [
1,
2,
6,
10]. Poodles, Chihuahuas, Dachshunds, Yorkshire Terriers, Maltese, and Cocker Spaniels are frequently listed as high-risk dog breeds in the small breed category. Some of the larger breeds are also at higher risk, including the English Springer Spaniel, English Setters, Brittany Spaniels, German Shepherds, Pointers, Doberman Pinschers, and Boxers [
1,
2,
6,
10,
11]. However, considerable discrepancies exist between studies regarding the breed as a CMT risk factor. A representative case is the evaluation of familial or inherited germline mutations in Breast Cancer 1 and 2 genes (BRCA1 and BRCA2), that in women are related to an increased lifetime risk of HBC, but that led to nonspecific results in veterinary medicine [
12,
13,
14,
15].
More consistent data are reported regarding the sex hormones effect, with a general agreement on the concept that the exposure to endogenous ovarian hormones is a cause of mammary tumor development in dogs [
1,
2] and references therein. According to Schneider and colleagues in 1969 [
16], dogs spayed before their first estrus have a 0.5% risk of developing CMTs in their lifetime, while the benefits of the ovariohysterectomy diminish with each estrus cycle. It seems biologically plausible to state that the greatest benefit on CMTs prevention is exerted when the dogs are spayed early in their reproductive lifetime, probably by reducing the occurrence of proliferation stimuli and therefore the risk for cancer-related events (e.g., mutations) [
3,
4,
17]. Furthermore, a more recent prospective randomized study reported a significantly decreased risk for new tumor development by performing ovariohysterectomy concurrently with benign CMT removal [
18], while the same effect was not observed for malignant tumors [
19].
Clinical features of CMTs are also considered as prognostic factors by numerous studies [
1,
20,
21,
22,
23,
24]. Of note, the tumor size (T), the involvement of lymph nodes (N), and the presence of distant metastasis (M) are the key features of the clinical, prognosis-related, “TNM” staging system, developed in 1980 by the World Health Organization (WHO) and recently revised [
1,
20,
21,
22,
23,
24].
In the recent WHO version [
1], stage advances from I to II to III as the size of the primary tumor increases from smaller than 3 cm, to between 3 and 5 cm, to larger than 5 cm. Lymph node metastasis represents stage IV disease, regardless of tumor size, and distant metastasis constitutes stage V. Notably, the size of the tumors represents a critical parameter in stage I, II, and III and strongly impacts on CMT prognosis and outcome. According to MacEwen et al. [
25], 1985, dogs with tumors larger than 3.4 cm in diameter have a statistically significant worse outcome than dogs with smaller tumors, both in terms of remission and survival. Other authors, however, have found a change in prognosis only when tumors are larger than 5 cm [
21]. In one study, tumor size was not prognostic when node involvement was detected [
24]. Despite these studies, the importance of the tumor size is a biologically trustworthy factor, considering that more aggressive tumors grow faster and, therefore, are larger and more likely to harbor metastatic subclones [
8]. Hence, the staging systems integrating different clinical parameters provide specific recommendations to clinician’s treatment decision making [
1,
26,
27].
In this study, we evaluated in a large retrospective statistical analysis the breed, the spayed status, and the age as epidemiological risk factors and the tumor size as a clinical prognostic-related feature of 1866 CMTs collected from three different Departments of Veterinary Medicine of the University of Sassari (UNISS), Padua (UNIPD), and Perugia (UNIPG). We analyzed the relationship between some epidemiological-clinical risk factors and the histological diagnosis to test the ability to prompt clinical data in predicting the diagnosis and, indirectly, a prognostic outcome. A supervised machine learning technique was compared to the classical statistical analysis and used to investigate the ability to predict the diagnosis of CMTs (malignant versus benign).
2. Materials and Methods
This retrospective study focused on reviewing CMT data generated from 3 different tumors databases (UNISS, UNIPD, UNIPG). Experiment permission was not required from the University’s Animal Care Ethics Committee because all the samples were retrieved from the archive of the pathology laboratories and were used for diagnostic purposes.
The inclusion criteria for data selection were: dogs with single mammary neoplasia, availability of documented medical history including breed, age, macroscopical tumor size as indicated either by the clinician or by the histological laboratory and histopathological diagnosis of the neoplasm.
All previous histological diagnoses were updated and classified according to the recent publication of Surgical Pathology of Tumors of Domestic Animals, Volume 2: Mammary tumors [
28].
2.1. Statistical Analysis—Descriptive Statistics and Univariate Analysis
To determine whether there was an association between epidemiological (age, breed, spayed status) and clinical characteristics (tumor size) and tumor diagnosis, the breed, age, spayed status, and tumor size were examined in association with the histological diagnosis. For statistical purposes, the breed was classified as pure breed and mixed breed, the age was either treated as a numerical variable (in years) or categorized in 4 classes (0–4 years; 5–8 years; 9–12 years and >13 years).
The greatest diameter of the tumor (i.e., the size) was treated as a numerical variable (in centimeters), or categorized according to WHO TNM system (3 categories: T1 < 3 cm, T2 = 3–5 cm, and T3 > 5 cm) or as proposed by Sonremno and colleagues (5 categories: S1 < 1 cm, S2 = 1 to <2 cm, S3 = 2 to <3 cm, S4 = 3 to <5 cm, S5 > 5 cm) [
1,
8].
According to Pena et al., 2013, and references therein, malignant tumors were grouped into 3 histological categories (i.e., HD3 categories) based on morphological features and biological behavior as follows: group I, which included in situ carcinoma, simple carcinoma, carcinoma arising in a mixed tumor, complex carcinoma, mixed-type carcinoma, ductal carcinoma, and adenosquamous carcinoma; group II, which included solid carcinoma, comedocarcinoma, carcinoma, and malignant myoepithelioma, and anaplastic carcinoma; group III, which included other histological types [
24].
Statistical analysis was carried out using a Student’s T test for continuous normally distributed variables, chi-square (X2) test and nonparametric Kruskal–Wallis ANOVA followed by Dunn’s post hoc test for categories. Data were analyzed with Stata version 11.2 (StataCorp, 2009), and results were considered significant when p ≤ 0.05.
2.2. Statistical Analysis—Multivariate Analysis and Machine Learning Model
Logistic regression analysis was performed to evaluate the influence of the different covariates (age, tumor size, spayed status, and breed) on tumor diagnosis. Covariates were selected through a nested likelihood ratio test (
Table 1 and
Supplementary Materials).
The selected continuous covariates were then converted into categorical covariates according to the previously described schemes, generating two further models: the IC model where the tumor size was encoded according to the WHO TNM system and the IIC model where the tumor size was split into 5 categories as previously reported by Sonremno et al., 2009 [
1,
8].
Machine learning was performed to investigate the possibility to predict the diagnosis of mammary neoplasms in the dog (malignant versus benign) based on the recorded epidemiological (breed, spayed status, and the age) and clinical (tumor size) factors. Models were built using the R programming language relying upon the caret package through algorithms provided by the GLM (for logistic regression), and the GBM (for stochastic gradient boosting) libraries [
29,
30,
31,
32,
33,
34,
35] (see
Supplementary Materials for details). In particular, the supervised machine learning technique employed is stochastic gradient boosting which is a powerful learning method based on the combination of many simple models. The basic idea is to apply sequentially a “weak” learner (here, a decision tree) to modified versions of the initial data. Each time a tree is built, the data are modified by applying weights to increase the influence of misclassified observations. The final classification is performed through a weighted majority vote [
36,
37,
38,
39]. To assess the predictive performances of logistic regressions (GLM) and stochastic gradient boosting (GBM), a nested cross-validation was performed [
39]. The dataset was split into 5 nonoverlapping training and a test sets by keeping 80% of cases for training. The split was performed randomly within each of the two classes of the outcome, to preserve the overall class distribution of the data. For each of the two classifiers (even if not required for GLM, using the same procedure allows for an easier comparison), the tuning of the hyperparameters was performed through 10-fold cross-validation repeated 5 times [
34,
36]. Continuous features were centered and scaled. The best setup was chosen by optimizing the area under the receiver operating characteristic (ROC) curve [
40] and, with such parameters, a final fit was performed on the entire training set. The final result was obtained by repeating the procedure for each outer split and taking the average over the test sets.
4. Discussion
In veterinary medicine, the increase in the incidence of neoplastic disease represents a relentless challenge for veterinary oncology specialists. Consequently, many efforts have been made in the on-going research to increase the early diagnosis and life perspective in dogs harboring mammary tumors. As a consequence, in this background, cancer research is mainly focused on the discovery and control of cancer-related risk factors [
43,
44]. However, a large retrospective statistical analysis that related the breed, hormonal status, age, and tumor size with the histological diagnosis and, consequently, with the possible behavior of CMTs, has not been previously performed.
In this work, an approximately equal proportion of benign (46.5%) and malignant tumors (53.5%) was observed, and mixed BTs accounted for the highest number of the total cases. Mixed neoplasms are the most frequent neoplasias in female dogs, and are characterized by the proliferation of both luminal epithelial and interstitial myoepithelial elements admixed with foci of mesenchymal tissues such as cartilage, bone, and fat [
28,
45]. The most frequent MT was simple tubular or tubulopapillary carcinoma (26.1%) followed by complex carcinoma (13.3%) confirming what has been reported in the literature [
10] and references therein.
In our study, sixty-one percent of CMTs were observed in pure breed dogs, suggesting, as previously described by Sorenmo and colleagues [
6], that the breed could be a putative risk factor, and that certain breeds, such as Miniature Toy, Shih Tzu as well as German Shepherd, are prone to develop mammary neoplasms [
1,
2,
6,
10,
11]. Interestingly, in our study, benign tumors occurred predominantly in small breed dogs, particularly in Yorkshire terriers, while malignant ones were detected with higher frequency in German Shepherd dogs. A better prognosis for small breeds has been previously reported in a retrospective multivariate survival analysis [
46]. However, given the increasing prevalence of CMTs in small breeds, it is uncertain whether small size in dogs could represent a reliable risk factor or if these data are influenced by the greater veterinary care in those breeds than larger dogs [
10]. According to Salas and collaborators [
7], no significant association was observed between the breed and the development of BTs, MTs, as well as with the malignant carcinoma categories proposed by Pena et al., 2013 [
24]. Similarly, the breed showed a slight influence in the logistic and GBM machine learning models (<1%), corroborating the considerable divergences between studies regarding the breed as a CMT risk factor. Moreover, considering that the mutations in Breast BRCA1 and 2 genes and their protein products have been variably associated with the development of CMTs, a definitive conclusion about CMT breed-related risk should be performed in the context of genetic research [
12,
13,
14,
15].
Age is considered one of the most important risk factors for developing mammary tumors with a peak incidence between 8 to 11 years, with younger dogs prone to having BTs [
6,
7,
8,
9]. These data seem to be confirmed by our study, in full agreement with what was reported by Sorenmo [
6].
Noteworthy, simple MTs occurred at an older age than nonsimple ones. According to different authors [
47,
48], simple carcinomas have a poor prognosis compared to complex ones confirming, as proposed by Pena and collaborators [
24], that the age should be considered an indirect, but a strong, prognostic factor. Furthermore, these data are supported by the multivariate analysis where a 12% increase in the odds of a MT per 1 year increase in age was observed.
Hormonal exposure is a well-documented canine mammary tumor-associated risk factor and steroid hormones, mainly 17 beta-estradiol (E2), are involved in cell proliferation by exerting an antiapoptotic effect that favors the neoplastic process [
10,
49]. Furthermore, the landmark publication by Schneider et al., in 1969, reported that mammary tumors occurred in 0.05% of females spayed before the first heat cycle, and this incidence increased from 8% to 26% when the animals were spayed after the first or second heat [
16]. As a consequence, reproductive health policies responsible for spaying animals at a very early stage of life had a double beneficial effect, contributing to the reduction in the number of stray dogs and preventing mammary neoplasm development. Likewise, in our study, 83% of mammary neoplasms were diagnosed in unspayed dogs, substantiating the protective effects of ovariohysterectomy as described by several authors [
16,
18,
19].
However, the lack of significance between BTs and MTs and spayed and unspayed dogs could suggest that the hormonal influence sorts an unrelated effect on the CMT malignancy, although 39% of the tumors observed in our cohort of spayed dogs were simple MTs that are generally related to an overall poor prognosis when compared to complex tumors [
47,
48]. Nevertheless, considering that our dataset lacks information regarding the age of dogs at spaying and that most of the tumors occurred in unspayed dogs probably as a consequence of the ethical concerns in Mediterranean countries regarding the gonadectomy, a careful and prudent outlook should be kept regarding the generalization of the hormonal status role in the onset of CMTs.
The size of the tumor is considered one of the main macroscopical findings related to CMT behavior. In the present study, we also considered the role of the tumor’s size as a clinical, prognosis-related, CMT factor demonstrating that BTs were smaller than MTs, as previously reported by Sonremno et al., 2009 [
8]. Furthermore, the tumor’s diameter was related to the histological malignant categories proposed by Pena [
24], with small size neoplasm was more prone to a better prognosis compared to the larger one. However, considering the size of the tumor using the WHO classification and the five categories proposed by Sonremno [
8], 62.5% of carcinomas were smaller than 3 cm, and 18% were less than 1 cm. Interestingly, these data conflict with what has been described by Sorenmo et al., 2009 [
8], who reported that only 3% of MTs were smaller than 1 cm, providing compelling evidence that the tumor size should be carefully evaluated during the assessment of the TNM-WHO clinical staging, as previously suggested by Pena [
24].
Supporting these data, the application of the logistic regression characterized the age and the size as the best predictors, with an overall diagnostic accuracy of 0.63 and low predictive values, both positive and negative. This value of accuracy is probably related to the number of factors used in our model. A similar predictive performance was observed using one of the most powerful machine learning models, suggesting that the age and the size are sufficient but not exhaustive parameters for the diagnosis of CMTs. Thanks to dramatic breakthroughs in artificial intelligence and machine learning technologies in the mainstreaming of vertiginous cancer-related research, it is highly credible that the ways to investigate cancer risk factors and the consequently generalized impact will be subverted and revolutionized in a tailor-made personalized animal outlook.