Using Machine Learning Approaches on Dynamic Patient-Reported Outcomes to Cluster Cancer Treatment-Related Symptoms

Asper, Nora; Witschel, Hans Friedrich; von Stockar, Louise; Laurenzi, Emanuele; Kolberg, Hans Christian; Vetter, Marcus; Roth, Sven; Kullak-Ublick, Gerd; Trojan, Andreas

doi:10.3390/curroncol32060334

Open AccessArticle

Using Machine Learning Approaches on Dynamic Patient-Reported Outcomes to Cluster Cancer Treatment-Related Symptoms

by

Nora Asper

^1,*

,

Hans Friedrich Witschel

²

,

Louise von Stockar

³,

Emanuele Laurenzi

²,

Hans Christian Kolberg

⁴

,

Marcus Vetter

⁵

,

Sven Roth

¹

,

Gerd Kullak-Ublick

⁶

and

Andreas Trojan

^6,7,*

¹

Center for Dental Medicine and Faculty of Medicine, University of Zurich, 8032 Zurich, Switzerland

²

School of Business, FHNW, University of Applied Sciences and Arts Northwestern Switzerland, 4600 Olten, Switzerland

³

Mobile Health AG, 8008 Zurich, Switzerland

⁴

Marienhospital Bottrop GmbH, 46236 Bottrop, Germany

⁵

Cantonal Hospital Baselland, 4410 Liestal, Switzerland

⁶

Department of Clinical Pharmacology and Toxicology, University Hospital Zurich, University of Zurich, 8006 Zurich, Switzerland

⁷

BreastCenter Zürichsee, 8810 Horgen, Switzerland

^*

Authors to whom correspondence should be addressed.

Curr. Oncol. 2025, 32(6), 334; https://doi.org/10.3390/curroncol32060334

Submission received: 6 April 2025 / Revised: 7 May 2025 / Accepted: 4 June 2025 / Published: 6 June 2025

Download

Browse Figures

Versions Notes

Abstract

In patients undergoing systemic treatment for cancer, symptom tracking via electronic patient-reported outcomes (ePROs) has been used to optimize communication and monitoring, and facilitate the early detection of adverse effects and to compare the side effects of similar drugs. We aimed to examine whether the patterns in electronic patient-reported outcomes, without any additional clinician data input, are predictive of the underlying cancer type and reflect tumor- and treatment-associated symptom clusters (SCs). The data were derived from a total of 226 patients who self-reported on the presence and severity (according to the Common Terminology Criteria for Adverse Events (CTCAEs)) of more than 90 available symptoms via the medidux^TM app (versions 2.0 and 3.2, developed by mobile Health AG based in Zurich, Switzerland). Among these, 172 had breast cancer as the primary tumor, 19 had lung, 16 had gut, 12 had blood–lymph, and 7 had prostate cancer. For this secondary analysis, a subgroup of 25 patients with breast cancer were randomly selected to reduce the risk of overfitting. The symptoms were aggregated by counting the days on which a particular symptom was reported, resulting in a symptom vector for each patient. A logistic regression model was trained to predict the type of the respective tumor from the symptom vectors, and the symptoms with coefficients above (0.1) were graphically displayed. The machine learning model was not able to recognize any of the patients with prostate and blood–lymph cancer, likely as these cancer types were barely represented in the dataset. The Area Under the Curve (AUC) values for the three remaining cancer types were breast cancer: 0.74 (95% CI [0.624, 0.848]); gut cancer: 0.78 (95% CI [0.659, 0.893]); and lung cancer: 0.63 (95% CI [0.495, 0.771]). Despite the small datasets, for the breast and gut cancers, the respective models demonstrated a fair predictive performance (AUC > 0.7). The generalization of the findings are limited especially due to the heterogeneity of the dataset. This line of research could be especially interesting to monitor individual treatment trajectories. Deviations in the electronic patient-reported symptoms from the treatment-associated symptom patterns could dynamically indicate treatment non-adherence or lower treatment efficacy, without clinician input or additional costs. Similar analyses on larger patient cohorts are needed to validate these preliminary findings and to identify specific and robust treatment profiles.

Keywords:

electronic patient-reported outcome (ePRO); symptom cluster; cancer; eHealth; real world evidence; adherence; decision support; machine learning

Graphical Abstract

1. Introduction

Cancer is a societal and economic problem, leading to 9.7 million deaths in 2022 [1] and presenting a severe burden for people’s physical and psychological well-being [2]. Although the cancer diagnostics and treatment options have progressed rapidly in recent years, digital symptom reporting and specific symptom management are progressing slowly [3]. Some studies indicate that specific symptoms tend to present concurrently and can be grouped into symptom clusters (SCs) [4,5]. SCs involve the co-occurrence of at least two symptoms and are distinct from other SCs, meaning that the correlation between symptoms within a single symptom cluster is sturdier than the correlations between symptoms in different symptom clusters [4,6]. SCs are informative as the co-occurrence of symptoms can hint towards shared underlying mechanisms and are, for example, used in psychiatry to classify disorders [6]. In cancer, one of the most frequently described symptom clusters is the gastrointestinal SC, consisting of vomiting, general nausea, and a lack of appetite [5,7,8].

Two typical approaches have been used to cluster symptoms. Clinically determined SCs are based on physicians’ observation of symptom co-occurrence or based on the primary cancer site. Statistically determined SCs are purely based on the cluster analyses of symptom distributions [9]. However, patients’ subjective experiences have rarely been considered in the data [9,10]. Recent studies [10,11,12,13] performed surveys and interviews to create SCs based on how patients subjectively interpret their symptoms, perceive the relationships between them, and assess their impact on their quality of life. The individual variability found in the patient-reported SCs significantly differs from the SCs derived from observational studies, multi-symptom assessment tools, and symptom checklists [10]. Patient-centered analyses support the development of more effective and individualized treatment plans, as objectively similar symptom profiles may require different treatment strategies based on personal interpretation and prioritization [9,10]. To improve access to treatment and create more precise and meaningful symptom clusters, smartphone applications (apps) can be used to capture real-time patient-reported data. Electronic patient-reported outcomes (ePROs) have generally been found to improve their quality of care through optimized communication and monitoring [14,15], and the continuous evaluation of the collected data has the potential to predict adverse events [16,17,18].

Recently, clustering approaches have turned to machine learning (ML) techniques to identify more complex patterns in larger datasets [19,20]. Unlike the traditional methods that depend on predefined symptom relationships, ML algorithms can autonomously learn patterns from the data itself [21]. One way ML has been applied is in reverse clustering. As opposed to classical symptom clustering, reverse clustering seeks to reproduce a known dataset. As such, patients are grouped by their existing diagnosis rather than into symptom profiles, and the coefficients are calculated accordingly. Reverse clustering is a way to validate the existing symptom clusters and assess whether symptom clusters accurately represent the population. Reverse clustering has been used in data science, but not yet been applied in the medical field [22] and could improve our understanding of how symptom clusters manifest in different types of cancer and treatments [8,23,24]. However, despite the advances towards more patient-centered symptom clustering and the increasing use of ePROs, dynamic and real-time ePRO data have rarely been leveraged for symptom clustering purposes. Exploring new approaches that combine dynamic ePRO data with ML methods could improve our understanding and characterization of symptom patterns across different cancer types and treatments.

Given the increasing complexity and chronicity of cancer treatments, along with demographic shifts and limited healthcare resources, there is an increasing drive to support oncology through digital innovation. Despite this, tools for systematically and continuously capturing patients’ well-being, symptoms, and vital parameters during and beyond treatment are not yet widely established. Using the patient-centered app medidux™ as an example, this study demonstrates how the systematic and structured digital documentation of their physical complaints, well-being, cognitive functioning, and vital signs can contribute to improving the quality and efficiency of cancer research and care.

In this study, we sought to examine whether the analysis of electronic patient-reported symptoms could “predict” the diagnosed underlying cancer type and confirm tumor- and treatment-associated specific symptom clusters. For this, we used electronic patient-reported outcomes (ePROs) data previously collected through a Class I medical device application. In previous work, this app demonstrated a high level of agreement between the patients and the physicians in their symptom ratings [25] and has been used to compare the performances of biosimilars [26] and investigate symptom patterns indicative of emergency hospital visits [18].

2. Materials and Methods

2.1. Study Design

This study involves the secondary analysis of 226 patients undergoing treatment, who self-reported the presence and severity of over 90 symptoms using the medidux^TM app (formerly Consilium Care) in accordance with the Common Terminology Criteria for Adverse Events (CTCAEs). The patient data were obtained from two studies that both received approval from the Swiss Institutional Review Board (KEK-ZH: 2021-D0051; KEK-ZH: 2017-02028) and were conducted in accordance with the current principles of the Declaration of Helsinki. Additionally, the study is registered on ClinicalTrials.gov (NCT05234021 and NCT03578731) and on the Swiss National Clinical Trials Portal (SNCTP000004711).

2.2. Participants

For the original 226 patients, the primary tumors included 172 breast, 19 lung, 16 gut, 12 blood–lymphoma, and 7 prostate cancer cases. For balanced analysis and to reduce the risk of overfitting, 25 patients treated for breast cancer were randomly selected. The eligible participants had initiated adjuvant or neoadjuvant systemic therapy, including with at least one cytotoxic drug +/− antibody treatment, antibody drug conjugate, or antihormone, were over 18 years old; owned a smartphone, spoke German; and had provided written informed consent. From the 226 individuals in this study, 34 were male, 191 were female, and 1 classified as other. The mean age of the participants was 58.4 years (Table 1).

2.3. Mobile App

The medidux^TM app (formerly consilium care; versions 2.0 and 3.2, developed by mobile Health AG based in Zurich, Switzerland) used in this study is a patient-centered CE-marked medical device software application designed as a digital companion during cancer therapy. Rather than repeatedly requiring patients to fill in digital questionnaires, the app allows for patients to dynamically record their relevant symptoms according to the CTCAE [25]. Aside from over 90 available symptoms, the app also offers cognitive testing and functionalities to record patients’ well-being, vital parameters, and medications. Graphical representations of the recorded data are available for patients and care teams, as well as alerts in case the symptoms exceed grade 2 according to the CTCAE. During the study period of 84 days, the patients were encouraged to enter their individual symptoms daily as they occurred.

2.4. Statistical Analyses

For statistical analyses, the symptoms were displayed as a vector for every patient by counting the days during the study period on which a particular symptom was reported. Logistic regression models were fitted for each cancer type to predict the presence or absence of that specific type of cancer based on the corresponding symptom vector of the patient. While logistic regression models are relatively simple, more sophisticated machine learning (ML) approaches were unwarranted due to the small and heterogenous sample. The focus on human interpretability and simplicity is additionally especially important in the context of clinical translational research to foster trust among clinicians who are unfamiliar with ML approaches [27]. To evaluate performance, each model was 10-fold cross-validated, and the Area Under the Curve (AUC) metric was calculated. The AUC is commonly used to assess performance and validate the predictive accuracy of diagnostic tests. It represents the degree to which a test can differentiate between patients who have a disease and those who do not, with a value of 1 indicating perfect discrimination and 0.5 suggesting a performance that is no better than random guessing [28]. The threshold for acceptable discrimination is typically an AUC value of 0.7 [29]. Further, a simple single-sample hypothesis-testing approach was applied to calculate the corresponding 95% confidence intervals (CIs). The model sensitivity and specificity were calculated.

Additionally, confusion matrices were generated to evaluate how well each model could identify the patients with the respective cancer type. The models were discarded if they failed to correctly identify at least one relevant patient. The large coefficients of each remaining model were extracted, assuming these represent typical symptoms of each cancer type or common adverse effects of the associated therapies. These symptoms were visualized in a symptom cloud to highlight relationships and prevalence.

To address class imbalance, we initially trained the models on the full breast cancer dataset (n = 172) without down sampling. However, this led to a marked prediction bias towards the majority class. Therefore, we conducted the random down sampling of the breast cancer group (n = 25) to achieve a more balanced model. The performance metrics for unbalanced analysis are reported in the Results section.

To contextualize the performance of logistic regression, we additionally evaluated two alternative machine learning models: a decision tree and a gradient boosting model. These models were trained on the same symptom vectors and evaluated using 10-fold cross-validation. The performance was assessed using standard metrics, including the Area Under the Curve (AUC), 95% confidence intervals, sensitivity, and specificity. Detailed performance comparisons are reported in the Results section.

3. Results

Over the three-month period, the patients reported between four and sixteen different symptoms [30]. With respect to the ePRO Data, the participants reported a median of 3.0 symptoms daily (with a range of 1.2 to 3.3 symptoms), resulting in a total symptom count of 43,430. The median duration of symptom tracking was 82 days (ranging from 14 to 225 days). The three most frequently reported symptoms varied by cancer type, although fatigue consistently ranked as the most common symptom across all the categories. For the patients with breast cancer, the most reported symptoms included fatigue, hot flashes, and taste disorders. For blood and lymph cancers, fatigue, nausea, and dry mouth were the predominant symptoms. The patients with gut cancer reported fatigue, sensory disorders, and issues with the oral mucosa. The patients diagnosed with prostate cancer experienced fatigue, a dry mouth, and taste disorders (Figure 1).

Although the symptom clusters could potentially be broken down by specific treatment regimens, this analysis was not available from the current dataset due to limited the number of events and the heterogeneity of treatment regimens for the most frequently applied treatments for cancers of the breast, lung, and gut (listed in Table 2).

As illustrated from the confusion matrices (Figure 2), the machine learning model was not able to recognize any patients with prostate or blood–lymph cancers. Consequently, these cancer types were excluded from further analysis. The AUC values and the corresponding 95% confidence intervals for the three remaining cancer types were breast cancer: 0.74 (95% CI [0.62, 0.85]); gut cancer: 0.78 (95% CI [0.66, 0.89]); and lung cancer: 0.63 (95% CI [0.50, 0.77]). All the models show low sensitivity (32%, 14%, and 16%, respectively), but high specificity (92%, 100%, and 98%, respectively). The AUC for lung cancer was not statistically significant as the 95% confidence interval includes 0.5; for the other two cancer types, the confidence intervals indicate a fair performance.

In the preliminary analysis, we trained a logistic regression model on the full set of 172 patients with breast cancer without down sampling. While the model achieved a high sensitivity of 94% for breast cancer, for the other cancer types, sensitivity dropped to 14% for gut cancer and 5% for lung cancer. The AUC values also reflected this imbalance, with 0.65 (95% CI [0.54, 0.76]) for breast, 0.78 (95% CI [0.68, 0.88]) for gut, and 0.65 (95% CI [0.53, 0.77]) for lung cancer. These results demonstrated a strong prediction bias toward breast cancer and justify the application of down sampling in the final analysis to improve the sensitivity of the models.

To evaluate the model performance across the different classifiers, we assessed a decision tree and a gradient boosting model alongside logistic regression. For breast cancer, the AUCs and the corresponding 95% confidence intervals were 0.68 (95% CI [0.56, 0.80]) for the decision tree and 0.69 (95% CI [0.57, 0.81]) for gradient boosting. For gut cancer, the AUCs were 0.51 (95% CI [0.34, 0.68]) and 0.54 (95% CI [0.37, 0.71]), respectively. For lung cancer, both alternatives outperformed logistic regression, achieving 0.72 (95% CI [0.60, 0.84]) with the decision tree and 0.76 (95% CI [0.65, 0.87]) with gradient boosting.

While gradient boosting achieved the best overall performance for lung cancer, logistic regression showed a superior or comparable performance in breast and gut cancers, while also offering better human interpretability. Given this trade-off, logistic regression was selected as the primary model.

For the three cancer types, large coefficients are graphically displayed in a symptom cloud to illustrate their associations. In this graphical representation, the thicker lines signify higher coefficients, indicating a stronger association (Figure 3).

4. Discussion

Our study aimed to explore the potential of the reverse clustering of treatment-related ePROs to predict the underlying cancer types and to confirm the tumor- and treatment-associated symptom clusters. Despite the small participant number, we demonstrated fair AUCs for breast and gut cancers. The models for prostate and blood–lymph cancers performed the worst, not correctly identifying a single patient with the relevant cancer type; however, for these two cancer types, the fewest patients were included (7 and 12, respectively), which likely caused the poor model performance. The breast and gut cancer models nevertheless exhibited an acceptable performance with the data derived from only 25 and 16 patients, demonstrating that exploratory ML studies are feasible even with comparable small ePRO datasets.

Despite not reaching statistical significance, the symptom cloud based on the lung cancer model appears plausible from a medical perspective, suggesting that further investigation may be warranted, as symptom clusters show strong similarity with those from other studies, with all the symptoms except nosebleeds also represented in these SCs [31,32]. Considering that all the symptoms were individually selected, rather than pre-specified through a questionnaire, the data analyzed are inherently informative. Likewise, our models for the SCs in breast and gut cancers overlap with the SCs throughout the literature, particularly for symptom clusters such as fatigue and pain for both the cancer types and gastrointestinal symptoms, especially for gut cancer [23]. In addition, for the patients with gut cancer, our model additionally identified clusters involving sensory disturbances, high blood pressure, visual disturbances, and burning during urination. These symptoms are less frequently emphasized in prior symptom cluster studies [33], potentially reflecting the influence of comorbidities such as diabetes mellitus, which is often prevalent in this patient population, but not always explicitly accounted for in previous studies. In breast cancer, fatigue, sleep disturbances, and taste disorders clustered also as expected, which is consistent with prior reports [23,34]. Interestingly, our breast cancer model also revealed symptoms such as headaches and ringing in the ears in the symptom cloud. As these symptoms are not typically highlighted in the previous studies [23,34], their emergence may reflect therapy-associated neurotoxicity or previously under-recognized symptom constellations, thus potentially offering new hypotheses for further clinical investigation. Although SCs are likely to be influenced by specific treatment regimens (Table 2), this analysis was not conducted on the current dataset. Thus, it is important to note that from a human-interpretable perspective, it seems likely that the identified symptom clusters might reflect treatment-related side effects rather than malignancy-specific symptoms alone. This interpretation is further supported by our comparison with prior studies, where many of the identified clusters correspond to known therapy-associated symptom patterns.

So far, ML approaches have already been applied in various areas of medicine, including radiology for the detection of tumors [35,36] and in cardiology, where ML models assist in cardiac image detection and the classification or prediction of hypertension [37,38]. The novelty of our approach lies in the fact that no clinical data was required since the dynamic symptom datasets were exclusively reported by empowered patients. Given larger samples sizes, it seems likely that robust treatment-specific symptom clusters can be derived from such ePRO data; this would allow for the improved monitoring of treatment trajectories, with deviations from those symptom patterns possibly acting as indicators for treatment non-adherence or lower treatment efficacy. In terms of adherence, if certain symptoms are no longer reported, this might indicate that medication is either routinely forgotten or that the patient has intentionally stopped their intake to mitigate a high side effect burden [39]. Alternatively, persistent symptom changes could also reflect reduced treatment efficacy, warranting further clinical assessment. If indicated by changes in an individual symptom cluster, physicians could have a conversation with their patients to identify the cause of non-adherence and either adjust their treatment or support them in building treatment habits. The earlier detection of such deviations could help timely and proactive clinical interventions. Integrating symptom-based prediction models and results into digital monitoring tools and clinical practice could help prioritize the patients at a high risk of non-adherence, eventually improving treatment success and potentially reducing hospitalizations [15,18]. Such clinical applications underline the potential value of the proposed prediction model beyond the statistical performance. Studies based on ePRO data may also improve the representation of elderly patients by increasing accessibility, improving patient engagement, and facilitating data collection for decentralized clinical trials (DCTs).

There are several limiting factors to this study. Reverse clustering has challenges such as the need for reliable and consistent initial data, as deviations in the initial entries could impede the formation of valid symptom clusters. The small sample size per cancer type, heterogeneity, and lack of detailed information on differential therapies, as well as on comorbidities, limits the generalizability of our findings and may lead to overfitting. To mitigate class imbalance, the breast cancer cohort was down sampled. While this limited the ability to fully capture symptom heterogeneity within the breast cancer group, it improved the model sensitivity to other cancer types. Due to the nature of the secondary dataset and comparable small sample size, detailed subgroup information regarding chemotherapy, antibody treatments, and radiotherapy could not be retrieved. All the models were strongly conservative and generally tended to reject the patients, which led to a large proportion of false negative cases. The high threshold likely led to fewer symptoms being identified within each symptom cluster, but resulted in clusters that were more clearly distinguishable between the different tumor entities. While low sensitivity indicates that the models at this performance level would not be suitable for screening purposes, they could have value for individual patients undergoing treatment. The typical symptom clusters could be used to detect changes at an individual level and helps distinguish them from random fluctuations.

The fair AUC value for the breast and gut cancer models indicates that relevant patterns could be extracted from the data, although larger datasets will be needed to train and validate more robust models and potentially explore the combination of ePROs with clinical markers in a hybrid approach. External validation on independent datasets was not performed, as no independent dataset was available, which limits the generalizability of our findings. Future work could explore advanced deep neural networks for clustering, including generative models based on Riemannian geometry and fuzzy clustering combined with graph convolutional networks [40,41]. Since we performed the secondary analysis of an existing dataset, a formal sample size calculation or post hoc power analysis was not conducted. However, to our knowledge, this is the first study to introduce the concept of reverse clustering into the medical field.

5. Conclusions

In conclusion, this exploratory study explores the potential of symptom clusters based on treatment-related ePRO data to retrospectively “identify” the type of underlying cancer using treatment-related symptoms. Despite the small dataset, our breast and gut cancer models demonstrated a fair performance and overlap with the symptom clusters derived from patient-reported outcomes in the existing literature, but further clinical investigation is required.

Author Contributions

The authors A.T., H.C.K., M.V., G.K.-U. are physicians and contributed to the design and conduct of this study and writing of this manuscript. N.A. drafted this manuscript as part of their master’s thesis and has conducted the literature research and contributed to the design of this study. L.v.S. is a psychologist involved with the medical device app and contributed to writing and editing of this manuscript. H.F.W. and E.L. are biostatisticians who contributed to the design and statistical analysis of this study as well as the writing of this manuscript. Final approval of this manuscript was provided by all authors. 1. Guarantor of integrity of the entire study: A.T., H.C.K., E.L. and H.F.W. 2. Study concepts and design: A.T., E.L., H.F.W., N.A. and G.K.-U., 3. Literature research: N.A., A.T., E.L., S.R. and M.V. 4. Trial execution: A.T., H.C.K. and G.K.-U., 5. Data analysis: H.F.W., E.L., A.T., L.v.S. and N.A. 6. Statistical analyses: H.F.W., E.L. and L.v.S. 7. Manuscript preparation: N.A., L.v.S., H.F.W., A.T. and G.K.-U. 8. Manuscript editing: L.v.S., N.A., A.T., H.F.W., M.V. and S.R. 9. Main Author: N.A., H.F.W. and A.T. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Foundation Swiss Tumor Institute Zürich (Grant No. REV2025).

Institutional Review Board Statement

The patient data were obtained from two previous studies that both received approval from the Swiss Institutional Review Board (KEK-ZH: 2021-D0051, 20211203, approved on 03.12.2021; KEK-ZH: 2017-02028, 20180219, approved on 19.02.2018) and were conducted in accordance with the current principles of the Declaration of Helsinki. Additionally, the study is registered on ClinicalTrials.gov (NCT05234021 and NCT03578731) and on the Swiss National Clinical Trials Portal (SNCTP000004711). All study documents were de-identified by assigning a unique ID to each patient. Functional data security was ensured by identification being made only possible via the patient’s ID. The data on the patient’s device was encapsulated in the app, and data exchange was encrypted with the patient’s ID.

Informed Consent Statement

Informed consent was obtained from all subjects involved in this study. This included informed consent allowing for the anonymized use of collected data for further research purposes.

Data Availability Statement

Data were available from the medical device patient app. All data generated or analyzed during this study are included in this article. Further enquiries can be directed to the corresponding author on reasonable request.

Acknowledgments

We would like to thank all patients for their contribution to the symptom data entries related to this manuscript.

Conflicts of Interest

A.T. is co-founder, stock owner and Chief Medical Officer of mobile Health AG, a start-up company that has developed, maintains and operates the medidux^TM platform. L.v.S. is the Scientific Project Manager of mobile Health AG.

Abbreviations

The following abbreviations are used in this manuscript:

APP	Application
AUC	Area Under the Curve
CI	Confidence Interval
CTCAE	Common Terminology Criteria for Adverse Event
DCTs	Decentralized Clinical Trials
ePROs	Electronic Patient-Reported Outcomes
ML	Machine Learning
SCs	Symptom Clusters

References

Bray, F.; Laversanne, M.; Sung, H.; Ferlay, J.; Siegel, R.L.; Soerjomataram, I.; Jemal, A. Global Cancer Statistics 2022: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA A Cancer J. Clin. 2024, 74, 229–263. [Google Scholar] [CrossRef] [PubMed]
Lee, E.M.; Jiménez-Fonseca, P.; Galán-Moral, R.; Coca-Membribes, S.; Fernández-Montes, A.; Sorribes, E.; García-Torralba, E.; Puntí-Brun, L.; Gil-Raga, M.; Cano-Cano, J.; et al. Toxicities and Quality of Life during Cancer Treatment in Advanced Solid Tumors. Curr. Oncol. 2023, 30, 9205–9216. [Google Scholar] [CrossRef]
de Góes Salvetti, M.; Sanches, M.B. Symptom Cluster: Management and Advanced Practices in Oncology Nursing. Rev. Esc. Enferm. USP 2022, 56, e20210452. [Google Scholar] [CrossRef]
Kirkova, J.; Walsh, D.; Aktas, A.; Davis, M.P. Cancer Symptom Clusters: Old Concept but New Data. Am. J. Hosp. Palliat. Care 2010, 27, 282–288. [Google Scholar] [CrossRef]
Kwekkeboom, K.L. Cancer Symptom Cluster Management. Semin. Oncol. Nurs. 2016, 32, 373–382. [Google Scholar] [CrossRef] [PubMed]
Kim, H.-J.; McGuire, D.B.; Tulman, L.; Barsevick, A.M. Symptom Clusters: Concept Analysis and Clinical Implications for Cancer Nursing. Cancer Nurs. 2005, 28, 270–282. [Google Scholar] [CrossRef]
Cheung, W.Y.; Le, L.W.; Zimmermann, C. Symptom Clusters in Patients with Advanced Cancers. Support. Care Cancer 2009, 17, 1223–1230. [Google Scholar] [CrossRef] [PubMed]
Dong, S.T.; Butow, P.N.; Costa, D.S.J.; Lovell, M.R.; Agar, M. Symptom Clusters in Patients with Advanced Cancer: A Systematic Review of Observational Studies. J. Pain Symptom Manag. 2014, 48, 411–450. [Google Scholar] [CrossRef]
Kirkova, J.; Aktas, A.; Walsh, D.; Davis, M.P. Cancer Symptom Clusters: Clinical and Research Methodology. J. Palliat. Med. 2011, 14, 1149–1166. [Google Scholar] [CrossRef]
Kwekkeboom, K.L.; Wieben, A.; Braithwaite, L.; Hopfensperger, K.; Kim, K.S.; Montgomery, K.; Reske, M.; Stevens, J. Characteristics of Cancer Symptom Clusters Reported through a Patient-Centered Symptom Cluster Assessment. West. J. Nurs. Res. 2022, 44, 662–674. [Google Scholar] [CrossRef]
Dong, S.T.; Butow, P.N.; Tong, A.; Agar, M.; Boyle, F.; Forster, B.C.; Stockler, M.; Lovell, M.R. Patients’ Experiences and Perspectives of Multiple Concurrent Symptoms in Advanced Cancer: A Semi-Structured Interview Study. Support. Care Cancer 2016, 24, 1373–1386. [Google Scholar] [CrossRef] [PubMed]
Erickson, J.M.; Ameringer, S.; Linder, L.; Macpherson, C.F.; Elswick, R.K.; Luebke, J.M.; Stegenga, K. Using a Heuristic App to Improve Symptom Self-Management in Adolescents and Young Adults with Cancer. J. Adolesc. Young Adult Oncol. 2019, 8, 131–141. [Google Scholar] [CrossRef]
Macpherson, C.F.; Linder, L.A.; Ameringer, S.; Erickson, J.; Stegenga, K.; Woods, N.F. Feasibility and Acceptability of an iPad Application to Explore Symptom Clusters in Adolescents and Young Adults with Cancer. Pediatr. Blood Cancer 2014, 61, 1996–2003. [Google Scholar] [CrossRef] [PubMed]
Salmani, H.; Nasiri, S.; Ahmadi, M. The Advantages, Disadvantages, Threats, and Opportunities of Electronic Patient-Reported Outcome Systems in Cancer: A Systematic Review. Digit. Health 2024, 10, 20552076241257146. [Google Scholar] [CrossRef]
Trojan, A.; Kühne, C.; Kiessling, M.; Schumacher, J.; Dröse, S.; Singer, C.; Jackisch, C.; Thomssen, C.; Kullak-Ublick, G.A. Impact of Electronic Patient-Reported Outcomes on Unplanned Consultations and Hospitalizations in Patients with Cancer Undergoing Systemic Therapy: Results of a Patient-Reported Outcome Study Compared with Matched Retrospective Data. JMIR Form. Res. 2024, 8, e55917. [Google Scholar] [CrossRef]
Avery, K.N.L.; Richards, H.S.; Portal, A.; Reed, T.; Harding, R.; Carter, R.; Bamforth, L.; Absolom, K.; O’Connell Francischetto, E.; Velikova, G.; et al. Developing a Real-Time Electronic Symptom Monitoring System for Patients after Discharge Following Cancer-Related Surgery. BMC Cancer 2019, 19, 463. [Google Scholar] [CrossRef]
Holch, P.; Warrington, L.; Bamforth, L.C.A.; Keding, A.; Ziegler, L.E.; Absolom, K.; Hector, C.; Harley, C.; Johnson, O.; Hall, G.; et al. Development of an Integrated Electronic Platform for Patient Self-Report and Management of Adverse Events during Cancer Treatment. Ann. Oncol. 2017, 28, 2305–2311. [Google Scholar] [CrossRef] [PubMed]
Trojan, A.; Laurenzi, E.; Roth, S.; Kiessling, M.; Atassi, Z.; Kadvany, Y.; Mannhart, M.; Witschel, H.F.; Jüngling, S.; Kullak-Ublick, G.A.; et al. Towards an Early Warning System for Monitoring of Cancer Patients Using Hybrid Interactive Machine Learning. Front. Digit. Health 2024, 6, 1443987. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Tseng, H.-H.; Wei, L.; Cui, S.; Luo, Y.; Ten Haken, R.K.; El Naqa, I. Machine Learning and Imaging Informatics in Oncology. Oncology 2018, 98, 344–362. [Google Scholar] [CrossRef]
Mitchell, T. Machine Learning; MC. Graw-Hill: New York, NY, USA, 1997. [Google Scholar]
Owsiński, J.W.; Opara, K.; Stańczak, J.; Kacprzyk, J.; Zadrożny, S. Reverse Clustering: An Outline for a Concept and Its Use. Toxicol. Environ. Chem. 2017, 99, 1078–1095. [Google Scholar] [CrossRef]
de Rooij, B.H.; Oerlemans, S.; van Deun, K.; Mols, F.; de Ligt, K.M.; Husson, O.; Ezendam, N.P.M.; Hoedjes, M.; van de Poll-Franse, L.V.; Schoormans, D. Symptom Clusters in 1330 Survivors of 7 Cancer Types from the PROFILES Registry: A Network Analysis. Cancer 2021, 127, 4665–4674. [Google Scholar] [CrossRef] [PubMed]
Lee, L.J.; Han, C.J.; Saligan, L.; Wallen, G.R. Comparing Symptom Clusters in Cancer Survivors by Cancer Diagnosis: A Latent Class Profile Analysis. Support. Care Cancer 2024, 32, 308. [Google Scholar] [CrossRef]
Trojan, A.; Leuthold, N.; Thomssen, C.; Rody, A.; Winder, T.; Jakob, A.; Egger, C.; Held, U.; Jackisch, C. The Effect of Collaborative Reviews of Electronic Patient-Reported Outcomes on the Congruence of Patient-and Clinician-Reported Toxicity in Cancer Patients Receiving Systemic Therapy: Prospective, Multicenter, Observational Clinical Trial. J. Med. Internet Res. 2021, 23, e29271. [Google Scholar] [CrossRef] [PubMed]
Trojan, A.; Roth, S.; Atassi, Z.; Kiessling, M.; Zenhaeusern, R.; Kadvany, Y.; Schumacher, J.; Kullak-Ublick, G.A.; Aapro, M.; Eniu, A. Comparison of the Real-World Reporting of Symptoms and Well-Being for the HER2-Directed Trastuzumab Biosimilar Ogivri With Registry Data for Herceptin in the Treatment of Breast Cancer: Prospective Observational Study (OGIPRO) of Electronic Patient-Reported Outcomes. JMIR Cancer 2024, 10, e54178. [Google Scholar] [CrossRef]
Rudin, C. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
Kleppe, A. Area under the Curve May Hide Poor Generalisation to External Datasets. ESMO Open 2022, 7, 100429. [Google Scholar] [CrossRef]
Çorbacıoğlu, Ş.K.; Aksel, G. Receiver Operating Characteristic Curve Analysis in Diagnostic Accuracy Studies: A Guide to Interpreting the Area under the Curve Value. Turk. J. Emerg. Med. 2023, 23, 195–198. [Google Scholar] [CrossRef]
Trojan, A.; Huber, U.; Brauchbar, M.; Petrausch, U. Consilium Smartphone App for Real-World Electronically Captured Patient-Reported Outcome Monitoring in Cancer Patients Undergoing Anti-PD-L1-Directed Treatment. Case Rep. Oncol. 2020, 13, 491–496. [Google Scholar] [CrossRef]
Li, N.; Wu, J.; Zhou, J.; Wu, C.; Dong, L.; Fan, W.; Zhang, J. Symptom Clusters Change Over Time in Patients With Lung Cancer During Perichemotherapy. Cancer Nurs. 2021, 44, 272. [Google Scholar] [CrossRef]
Yang, X.; Bai, J.; Liu, R.; Wang, X.; Zhang, G.; Zhu, X. Symptom Clusters and Symptom Network Analysis during Immunotherapy in Lung Cancer Patients. Support. Care Cancer 2024, 32, 717. [Google Scholar] [CrossRef] [PubMed]
Hao, J.; Gu, L.; Liu, P.; Zhang, L.; Xu, H.; Qiu, Q.; Zhang, W. Symptom Clusters in Patients with Colorectal Cancer after Colostomy: A Longitudinal Study in Shanghai. J. Int. Med. Res. 2021, 49, 03000605211063105. [Google Scholar] [CrossRef]
So, W.K.W.; Law, B.M.H.; Ng, M.S.N.; He, X.; Chan, D.N.S.; Chan, C.W.H.; McCarthy, A.L. Symptom Clusters Experienced by Breast Cancer Patients at Various Treatment Stages: A Systematic Review. Cancer Med. 2021, 10, 2531–2565. [Google Scholar] [CrossRef] [PubMed]
Hosny, A.; Parmar, C.; Quackenbush, J.; Schwartz, L.H.; Aerts, H.J.W.L. Artificial Intelligence in Radiology. Nat. Rev. Cancer 2018, 18, 500. [Google Scholar] [CrossRef] [PubMed]
Bi, W.L.; Hosny, A.; Schabath, M.B.; Giger, M.L.; Birkbak, N.J.; Mehrtash, A.; Allison, T.; Arnaout, O.; Abbosh, C.; Dunn, I.F.; et al. Artificial Intelligence in Cancer Imaging: Clinical Challenges and Applications. CA A Cancer J. Clin. 2019, 69, 127–157. [Google Scholar] [CrossRef]
Silva, G.F.S.; Fagundes, T.P.; Teixeira, B.C.; Chiavegatto Filho, A.D.P. Machine Learning for Hypertension Prediction: A Systematic Review. Curr. Hypertens. Rep. 2022, 24, 523–533. [Google Scholar] [CrossRef]
Jiang, B.; Guo, N.; Ge, Y.; Zhang, L.; Oudkerk, M.; Xie, X. Development and Application of Artificial Intelligence in Cardiac Imaging. Br. J. Radiol. 2020, 93, 20190812. [Google Scholar] [CrossRef]
Molloy, G.J.; Messerli-Bürgy, N.; Hutton, G.; Wikman, A.; Perkins-Porras, L.; Steptoe, A. Intentional and Unintentional Non-Adherence to Medications Following an Acute Coronary Syndrome: A Longitudinal Study. J. Psychosom. Res. 2014, 76, 430–432. [Google Scholar] [CrossRef]
Sun, L.; Hu, J.; Zhou, S.; Huang, Z.; Ye, J.; Peng, H.; Yu, Z.; Yu, P. RicciNet: Deep Clustering via A Riemannian Generative Model. In Proceedings of the ACM Web Conference 2024, Singapore, 13–17 May 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 4071–4082. [Google Scholar]
Yang, Y.; Li, G.; Li, D.; Zhang, J.; Hu, P.; Hu, L. Integrating Fuzzy Clustering and Graph Convolution Network to Accurately Identify Clusters From Attributed Graph. IEEE Trans. Netw. Sci. Eng. 2025, 12, 1112–1125. [Google Scholar] [CrossRef]

Figure 1. Frequency of 10 most commonly reported symptoms categorized by tumor type.

Figure 2. Confusions matrices for all five tumor types.

Figure 3. Comprehensive cloud aggregation of symptoms for breast, lung, and gut cancers.

Table 1. Baseline characteristics.

	Count	Percentage (%)
Overall	226	100
Primary Tumor
Breast Cancer	172	76.1
Lung Cancer	19	8.4
Gut Cancer	16	7.1
Blood–lymph Cancer	12	5.3
Prostate Cancer	7	3.1
Gender
Male	34	15
Female	191	84.5
Diverse	1	0.4
Mean Age	58.4
Selected Patients for Analysis	60
Primary Tumor
Breast Cancer	25	41.7
Lung Cancer	19	31.7
Gut Cancer	16	26.7
Gender
Male	15	25
Female	45	75
Diverse	0	0
Mean Age	50

Table 2. Most frequently applied treatments for cancers of breast, lung, and gut in percentage.

Most Frequently Applied Therapies	Percentage (%)
Herceptin/Perjeta +/− Docetaxel/Carboplatin	16.3
Antihormone +/− Everolimus o. CDK4/6-Inhibitor	16.3
Paclitaxel +/− Carboplatin	12.1
Docetaxel-Endoxan +/− Antihormon	10.6
EC-Paclitaxel	9.9
Checkpointinhibitor +/− Chemo	8.5
Capecitabine	6.4
EC-Docetaxel	6.4
Docetaxel/Carboplatin	2.8
Platine + Pemetrexed	2.1
FOLFIRI	2.1
CAPOX	2.1
Docetaxel	1.4
Platine + Etoposid	1.4
FOLFOX	1.4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Asper, N.; Witschel, H.F.; von Stockar, L.; Laurenzi, E.; Kolberg, H.C.; Vetter, M.; Roth, S.; Kullak-Ublick, G.; Trojan, A. Using Machine Learning Approaches on Dynamic Patient-Reported Outcomes to Cluster Cancer Treatment-Related Symptoms. Curr. Oncol. 2025, 32, 334. https://doi.org/10.3390/curroncol32060334

AMA Style

Asper N, Witschel HF, von Stockar L, Laurenzi E, Kolberg HC, Vetter M, Roth S, Kullak-Ublick G, Trojan A. Using Machine Learning Approaches on Dynamic Patient-Reported Outcomes to Cluster Cancer Treatment-Related Symptoms. Current Oncology. 2025; 32(6):334. https://doi.org/10.3390/curroncol32060334

Chicago/Turabian Style

Asper, Nora, Hans Friedrich Witschel, Louise von Stockar, Emanuele Laurenzi, Hans Christian Kolberg, Marcus Vetter, Sven Roth, Gerd Kullak-Ublick, and Andreas Trojan. 2025. "Using Machine Learning Approaches on Dynamic Patient-Reported Outcomes to Cluster Cancer Treatment-Related Symptoms" Current Oncology 32, no. 6: 334. https://doi.org/10.3390/curroncol32060334

APA Style

Asper, N., Witschel, H. F., von Stockar, L., Laurenzi, E., Kolberg, H. C., Vetter, M., Roth, S., Kullak-Ublick, G., & Trojan, A. (2025). Using Machine Learning Approaches on Dynamic Patient-Reported Outcomes to Cluster Cancer Treatment-Related Symptoms. Current Oncology, 32(6), 334. https://doi.org/10.3390/curroncol32060334

Article Menu

Using Machine Learning Approaches on Dynamic Patient-Reported Outcomes to Cluster Cancer Treatment-Related Symptoms

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Design

2.2. Participants

2.3. Mobile App

2.4. Statistical Analyses

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI