Healthcare Sustainability: Hospitalization Rate Forecasting with Transfer Learning and Location-Aware News Analysis

.


Introduction
Healthcare systems strive to tackle severe pressure and sustainability challenges due to changing priorities in widespread pandemics, such as COVID-19 [1][2][3].In 2019, the Coronavirus (COVID-19) outbreak in Wuhan, China, rapidly spread to over 228 countries [4][5][6].This emerging infectious disease has become a pandemic with alarming scales in a short period.In January 2020, it was declared as a public health emergency of international concern by WHO [7].The Centers for Disease Control and Prevention (CDC) announced that more than 98 million confirmed cases and more than 1 million deaths have been recorded in the United States [8].This public health issue has forced the world to reconsider the existing sustainable strategy in healthcare, especially for responding to crises rapidly when a new pandemic breaks out [9].Despite the intense growth of research in COVID-19 and the current situation of various requests for healthcare services, relatively little progress has been made to improve healthcare sustainability [10].Healthcare sustainability refers to the ability of healthcare systems to meet present and future healthcare needs while simultaneously considering social, economic, and environmental factors [11].It involves the provision of high-quality healthcare services, efficient resource allocation, and the promotion of positive health outcomes [12][13][14].The current context requires effective and accurate AI technical support on big data analysis to derive meaningful information for decision-making to achieve the above purposes [15].
Early monitoring and forecasting of hospitalization rates provide valuable opportunities for health organizations and public authorities to adjust sustainability strategies.We aim to design an analytic framework using machine learning methods to accurately and effectively predict hospitalization rates during emerging pandemics (e.g., .Existing research on healthcare analytics and forecasting has made great progress in various aspects.Mathematical methods, such as stochastic processes, Markov decision processes, and compartmental models, show great success in the theoretical analysis of macroscopic regularities of epidemic transmission, like the epidemic threshold and epidemic infection scale [16].Yet, the homogeneous assumption of data and minor groups of variables are insufficient to acquire the variety of factors related to epidemic transmission processes [17].Other statistical models such as autoregressive (AR), autoregressive moving average (ARMA), and seasonal auto-regressive integrated moving average with exogenous factors (SARIMAX) are convenient and straightforward for obtaining exact results in shortterm time series analysis [18][19][20].Nevertheless, their performance decreases with relatively long-term forecasting because of the evolution of COVID-19 and the impact of multiple complex factors over time.Although deep learning approaches, such as deep neural networks (DNN), recurrent neural networks (RNN), and temporal fusion transformers (TFT), are burgeoning methods to learn temporal patterns from different perspectives [21], these models must include or should be expanded to include social factors to predict healthcare performance indicators.
In general, several grand challenges lie in hospitalization rate forecasting, especially with sparse data and deficient historical experience.First, most existing mathematical and statistical methods are isolated and cannot exploit the previous experience from existing diseases in the relevant forecasting problems of an emerging disease.For instance, they fail to transfer and utilize the learned knowledge of existing diseases (e.g., flu) to predict hospitalization rates during new epidemics (e.g., .Second, most studies cannot directly and accurately capture the temporal dependencies of cultural/social factors during an emerging epidemic, such as the growing hospitalizations rates caused by large-scale epidemic outbreaks after holidays and festivals in November and December 2021.Third, existing research in healthcare sustainability ignores the prime importance of forecasting techniques for a long-term development strategy.Accurate and effective analysis results can provide reliable and intelligent decisions to benefit health system management. In this paper, we propose a novel analytical framework from the perspective of data science to provide more accurate monitoring and forecasting results for hospitalization.Given the delay in data monitoring and collection, we initially tackle the issue of hospitalization rate forecasting based on CDC track data of 50 US states with a lead time from 1 to 14 days.(Lead time is the time-span that the model forecasts in advance.For instance, if the input is X T and lead time = 14, the expected output is X T+14 , where T is the window size.)Some studies indicate similar evolving patterns within the existing contagious diseases and new emerging diseases [22].It is intuitive to conduct research in the initial phase of pandemics, based on the critical information and clues hidden in epidemic emergence and persistence mechanisms [23].In this work, we aim to exploit the experience of existing infectious diseases, such as influenza (flu), to forecast hospitalization rates during an emerging pandemic, such as COVID-19.Due to data scarcity, learning and transferring knowledge directly from historical hospitalization data of existing diseases (e.g., flu) is hard.We use non-linear correlation tests to demonstrate the significant relationship between infection cases and hospitalization rates [24].Based on discovered significant correlations, we apply a Heterogeneous Transfer Learning (HTL) approach to learn common characteristics from rich infection case data of flu, and transfer the learned knowledge to predict hospitalization rates during COVID-19.Several prior studies have demonstrated the association of social factors with healthcare problems, such as the mediating impact of human awareness and behavior change [25,26].Despite the potentially valuable information in text data, it is under-utilized in time series prediction.In our work, we analyze the effect of social factors on hospitalization rates during COVID-19 from two aspects: sentiment and semantic features of COVID-19 related news articles.Specifically, we address three motivating questions: (1) Will public sentiments and attitudes (e.g., pessimistic or optimistic) affect hospitalization rates?(2) Will public policies (e.g., lockdown and quarantine) affect hospitalization rates?(3) Will the news information from different locations affect local health situations (e.g., COVID-19 rates and hospitalization rates)?
This paper proposes an analytical framework using machine learning techniques to provide AI technical support for healthcare sustainability.We formulate the problem as predicting hospitalization rates during emerging epidemics (e.g., COVID-19) using limited historical time series data and epidemic-related news articles.Our key contributions can be summarized as follows: (1) We apply the transfer learning architecture with dynamic location-aware sentiment and semantic analysis (TLSS) [27], which is initially designed for emerging epidemic forecasting.We extend TLSS into a new application scenario: hospitalization rate prediction during the outbreak of an emerging pandemic.(2) We leverage non-linear correlation tests to demonstrate the significant correlation between COVID-19 infection cases and hospitalization rates.Therefore, we realize utilizing the rich infection data of existing diseases for hospitalization rate forecasting during an emerging disease outbreak.(3) We use sentiment and semantic analysis methods to extract relevant features from news articles.We apply multimodal data learning within TLSS to learn the impact of news sentiment and semantic information on hospitalization to interpret nontraditional variation patterns.(4) We then concatenate the learned information from the infection records of existing disease (e.g., flu), COVID-19 news semantic/sentiment features, and temporal dependencies in local time series data to forecast hospitalization rates during COVID-19 in a dynamic propagation process.(5) We conduct state-and country-level experiments on real-world hospitalization data during COVID-19 with different time settings.We evaluate the performance of various state-of-the-art methods with exogenous variables to demonstrate the efficacy and flexibility of TLSS in different application scenarios (e.g., hospitalization rate forecasting).
Overall, our research provides valuable statistical evidence and support that can enhance the sustainability of healthcare systems.We have developed and optimized an early-stage forecasting method for hospitalization rates during emerging epidemics, which can help predict the expected volume of patients in advance.By knowing the future possible hospitalizations in advance, health systems can manage costs accordingly, primarily to keep health costs under control and sustainable over the long term during pandemics.Furthermore, our method offers healthcare providers the opportunity to anticipate future demand, adjust medical resource allocation and staffing levels, and prevent hospitals from becoming overburdened.By forecasting hospitalization rates, our proposed method also helps public health institutions in enhancing patient care planning and coordination, improving service quality, and achieving overall healthcare sustainability.

Sustainable Development in Healthcare
Sustainability is a widely controversial subject that is hard to define and apply to real tasks, especially in the complex scenario of the healthcare perspective [28].Some studies stress that environmental, social, and economic development in healthcare institutes are the essential factors to realize sustainability over long-term development [29,30].A sustainable strategy should focus on optimizing resource utilization, delivering high-quality healthcare service, and managing clinical system and financial aspects [31,32].However, the application of theories into practical scenarios in a structured manner requires additional efforts to ensure the provision of professional support.Thus, despite the increasing interest in the sustainability of healthcare, COVID-19 demonstrates it is still a continuing challenge to improve healthcare services and optimize medical systems in terms of accessibility and outcomes [33].
Lennox et al. [11] followed Preferred Reporting Items for Systematic Reviews and Meta-Analysis guidelines to provide a systematic review of sustainability methods in healthcare.It provides enlightening insights and suggests exploring this topic to improve valuable resource allocation and patient outcomes.Capolongo et al. [34] constructed an innovative assessment system to provide a strategy for realizing sustainability.It indicates when involving medical facilities and resources, sustainability should be capable of delivering high quality and high efficiency in changing circumstances.Brambilla et al. [35] tested a multicriteria assessment tool and systematically analyzed the quality of operating hospitals in Germany, considering social, environmental, and organizational aspects.They assessed and analyzed the sustainability of existing operative health systems, but did not provide a specific improvement scheme to address the weakness shown in the assessment.Lennox et al. [10] explored how identified sustainability factors act on the improvement projects.They designed a sustainability work Long Term Success Tool to improve initiatives and investigated its critical features in real-world healthcare applications.However, they do not provide quantitative analysis to evaluate the improvement.
In general, most existing research emphasizes the significance of healthcare sustainability and provides valuable insights into this subject.However, there is a lack of extensive research that focuses on its practical implementation, especially in terms of utilizing AI technical support and employing robust quantitative analysis for health system optimization in sustainable development.

Time Series Forecasting in Healthcare
In healthcare analysis, many approaches focus on understanding epidemic spread patterns and forecasting hospital admissions [36][37][38].Time series regression is one of the main attempts of the problem formulation to model and simulate hospitalization rates.Several statistical data-driven approaches are designed and widely used for time series forecasting [39], such as AR, ARMA, and SARIMAX [40][41][42].Perone [20] applied several time series forecasting methods to predict the second wave of COVID-19 hospital admissions in Italy, including ARIMA, innovations state space models for exponential smoothing, neural network autoregression model, and all of their feasible hybrid combinations.Hybrid models achieve outstanding performance in short-term prediction, which can facilitate the decision-making of public health authorities.Cheng et al. [19] implemented the SARIMAX model to predict emergency department occupancy and demonstrated outstanding forecasting performance in real-time tasks.The latest advance in deep learning technologies demonstrated excellent learning capacity in time series analysis and provided innovative paradigms to capture temporal dependencies from complex data [43][44][45].Cheng et al. [46] proposed a novel bidirectional long short-term memory model to predict medical visits.The model performance was significantly improved by adopting the attention mechanism and time adjustment factors to learn the hidden states.Kaushik et al. [47] designed an ensemble model to predict patients' weekly spending on two pain medications at an average level.Despite these methods' impressive performance, their limitations are apparent, such as the unsatisfactory prediction for a more considerable lead time.Moreover, they do not consider the impact of exogenous factors, such as environment, geolocation, and social aspects.

Social Factor Impact on Healthcare
With the development of natural language processing (NLP), the information can be extracted more accurately and effectively from unstructured textual data [48,49].Textual data such as tweets, surveys, and news become practicable auxiliary features in healthcare forecasting and analysis [50][51][52].This observation inspired extensive experiments to utilize relevant features from textual documents in different ways and leverage them for analyzing real-world tasks [53].Sentiment analysis approaches, such as latent dirichlet allocation (LDA) [54], BERT-based sentiment classifier (BERTsent) [55], VADER [56], and semantic analysis approaches, such as Doc2vec [57], global vectors for word represen-tation (GloVe) [58], bidirectional encoder representations from transformers (BERT) [59], and sentence embeddings using siamese BERT-network (Sentence-BERT) [60] are crucial and widely used in various healthcare-related tasks [61,62].For more details about VADER and Sentence-BERT, see Section 3.2.2.
To explore the application of sentiment analysis in healthcare and extract the significant finding from the literature, Alamoodi et al. [63], Gohil et al. [64] conducted a comprehensive review for the existing work of textual data analysis.They provide precious inspiration and a meaningful context for future work in sentiment analysis within the healthcare domain.Some studies applied the National Research Council Canada (NRC) Word-Emotion Lexicon for sentiment analysis on textual datasets (e.g., news articles and tweets) to explore insight into the communication patterns and public sentiments during COVID-19 [65,66].Mourad et al. [67] adopted a lexicon-based data analytics methodology for analyzing social network users and content by utilizing NLP techniques.They provided valuable insights by discussing computing and non-computing implications for prospective solutions and social network management strategies during crisis periods.Mahdikhani [68] designed novel frameworks to detect public situations during different stages of the pandemic using text embedding methods.Their experiments demonstrated the influence of public situations on the retweetability of posted tweets during COVID-19.Zeng et al. [69] proposed an ensemble learning method to classify eligibility text criteria, which can choose suitable candidates for clinical trials based on their records.Gourisaria et al. [70] analyzed Twitter users' psychological reactions and discourse regarding COVID-19 using data-mining methodologies such as semantic analysis and topic modeling.They provide valuable insights about selecting the most suitable methods for different healthcare tasks.Despite existing work demonstrating the impact of textual resources, it is possible to improve the application of sentiment and semantic data for healthcare forecasting.In addition, most studies fail to capture temporal dependencies from complex text data precisely.

Problem Formulation
In this paper, we explore the association of hospitalization rates with the following factors: infection cases, geolocation, and news articles.We then apply and optimize a machine learning method for hospitalization rate forecasting.
To exploit the learned knowledge from infection case data in hospitalization rate forecasting, we apply non-linear correlation tests to demonstrate the significant relationship between hospitalization rates and infection cases.We then introduce a model TLSS with three modules: transfer learning, multimodal data learning, and prediction.In the transfer learning module of TLSS, we learn the general characteristics of existing infectious diseases (e.g., flu) from a source model Cola-GNN and transfer the knowledge to the target model for fine-tuning in hospitalization rate prediction during a new pandemic (e.g., COVID-19).In the multimodal data learning module, we collect temporal dependencies of hospitalization rates and encode news sentiment and semantic features in a dynamic propagation process.The prediction module is to concatenate the transfer learning knowledge, temporal dependencies, and sentiment and semantic features for future hospitalization rate forecasting.
Given current time k, we aim to forecast the future hospitalization rate at time k + h using time series data of past time window [k − T : k], where h is the lead time of the prediction and T is the historical window size (lag).We have daily historical data of hospitalization rates Y ∈ R N×T , where N is the number of locations.We also have daily data of exogenous variables: COVID-19 infection cases X ∈ R N×T , pre-trained news sentiment feature V ∈ R N×T and semantic feature S ∈ R N×T×50 .The sentiment feature V ∈ R N×T and semantic feature S ∈ R N×T×50 are extracted from the pre-trained news data, where V i,t ∈ R and S i,t ∈ R 1×50 are the average emotion score and semantic vector of day t's news for location i (see Table 1 for major notations and descriptions).
learned representations from the target model

Methodological Approach
This section introduces our experiments' main methods, algorithms, and evaluation metrics.We apply them in model performance improvement and comparison of forecasting of hospitalization rates during COVID-19, which provide potentially valuable AI technical support in healthcare sustainability.

Non-Linear Correlation Test
Non-linear correlation tests are used to measure the non-constant ratio of variations between two given variables changes.By exploring the potential non-linear relationships between variables, we determine whether to employ more sophisticated approaches for data analysis instead of using linear models directly.In this work, we apply White, Granger causality, and Brownian distance correlation tests to explore the relationship between hospitalization rates and infection cases during COVID-19 at the country and state levels, respectively.

• White Test
White test [71,72] is based on a neural network for neglected non-linearity, which uses hidden layers to detect the relationship between time series vectors.The network is defined as where ) includes nonlinear components, θ is a parameter vector, β j is the weight of the neural network model from the hidden layer to the output layer, α j is the weight of the neural network model from the input layer to the hidden layer, φ is the activation function, q is the number of hidden layers, and j is the index of hidden layers.Given historical COVID-19 cases data and hospitalization rate data Y k , we test non-linearity between them, where k is the current time stamp and T is the historical window size.The null hypothesis can be defined as There is a nonlinear correlation between COVID-19 infection cases and hospitalization rates if we reject the null hypothesis according to the chi-square and F distribution.
where L is the largest historical window size (lag value), the residual 1 ∼ N(0, σ) is a white noise series and Brownian Distance Correlation Székely and Rizzo [75] proposed Brownian distance correlation and distance covariance to measure the nonlinear dependence and test the joint independence of random vectors in multiple dimensions.Given the random variables X and Y, the distance covariance v(X, Y) measures the distance between f X f Y and f X,Y : where ||.|| is the L 2 norm, t and s are vectors, f X and f Y are the characteristic functions of X and Y, and f X,Y is their joint characteristic function.In an empirical version, v(X, Y) is designed to test the independence hypothesis: where In this paper, we aim to use the Brownian distance correlation R(X [k−T:k] , Y k ) for testing the non-linear dependence of hospitalization rates Y k in the current time k on the COVID-19 infection cases

Sentiment and Semantic Analysis
In this paper, we apply VADER and SBERT to extract the sentiment and semantic features from COVID-19-related news articles at the country level and state level, respectively.

•
Valence Aware Dictionary for sEntiment Reasoning (VADER) VADER [56] is a text analysis method that can be used to measure the word vector's emotions, sentiments, and attitudes.It is an unsupervised analysis that can leverage the sentiment lexicon to annotate the emotion polarity score for each word of unlabeled data.The range of polarities is [−1, 1], where 1 indicates an extremely positive attitude, 0 indicates a neutral attitude, and −1 indicates an extremely negative attitude.VADER is able to aggregate the polarity scores from individual words in a sentence to represent overall sentence sentiment.Each sentence produces a vector of sentiment scores with negative, neutral, positive, and compound polarities.The compound polarity represents an aggregate measure of all the other sentiments.

•
Sentence-BERT (SBERT) BERT [59] is a transformer-based machine learning technique for natural language processing.SBERT [60] is a derivation of the pre-trained BERT network that leverages siamese and triplet network structures to generate semantic embeddings that can be compared using cosine similarity.Given sentences A and B with varying lengths, SBERT creates fixed-size embeddings u and v using BERT and a pooling layer.These pairs of sentences are identical down to every parameter.
3.2.3.TLSS: Transfer Learning Architecture with Dynamic Location-Aware Sentiment and Semantic Analysis TLSS is a neural transfer learning architecture for learning and transferring general characteristics from existing epidemic diseases to predict a new pandemic.It also learns the impact of exogenous variables geolocation and news articles on epidemic transmission.In this work, we extend this algorithm into a new application scenario to predict the hospitalization rates during COVID-19, based on the demonstrated relationship between hospitalization rates and infection cases using nonlinear correlation tests.TLSS has the following modules: • Heterogeneous Transfer Learning (HTL) HTL [76] focuses on transferring knowledge from the source domain to a different but related target domain, in which data are heterogeneous in both feature and label spaces (see Appendix A).The HTL module of TLSS aims to learn the general patterns of existing epidemics (e.g., flu) and transfer the learned knowledge to forecast the hospitalization rates during the new pandemic (e.g., COVID-19).The base model is a cross-location attention-based graph neural network (Cola-GNN) [77], which is designed to combine the temporal dependencies and geolocation correlation for predicting long-term influenza-like illnesses (ILI).We pre-train the source model and share part of the parameters with the target model as initializations: W G → W G .Then, we fine-tune the target model TLSS on hospitalization rate data during COVID-19 and collect its hidden states.This transformation is defined as , where i is the index for a location and k is the index for a time stamp.Following the above process, we project the learned representation from the heterogeneous transfer learning module into the prediction module and combine it with news sentiment and semantic features for final predictions.

• Multimodal Data Learning
The multimodal data learning module captures the temporal dependencies of hospitalization rates and encodes sentiment and semantic features over time.It contains a dynamic location-aware analysis for sentiment and semantic features, which dynamically model the public emotions and opinions in different locations during COVID-19 from news data.Given pre-trained news sentiment data V ∈ R N×T and semantic data S ∈ R N×T×50 , where V i,t ∈ R and S i,t ∈ R 1×50 are the embeddings to represent the average emotion score and semantic feature of day t's news for location i, the module processes thw data as the following steps: (1) For each timestamp k, calculate the cosine similarity for news sentiment data (C v k ) and news semantic data (C s k ) with window size T between every two locations i and j: where V i,[k−T:k] ∈ R T and S i,[k−T:k] ∈ R T×50 represent the sentiment and semantic embeddings of location i, respectively, for the time-span [k The concat is a concatenation function for reshaping the semantic embedding dimension into R T×50 .
(2) Implement the location-aware attention mechanism to create an attention coefficient matrix A, for measuring the sentiment/semantic dependencies between every two locations i and j, where the coefficient a i,j in A is defined as where h i and h j are the last hidden state h k of an RNN model for location i and location j, W s , where W m ∈ R N×N and b m ∈ R are trainable parameters.(4) Apply linear transformation to the location-aware attention matrix of sentiment Âv k and semantics Âs k : where L v k ∈ R N×D and L s k ∈ R N×D are dynamic matrices of sentiment and semantic features that change over different time stamps, and W and b are the trainable parameters for each equation.

•
Prediction TLSS also learns RNN hidden states from historical hospitalization rates with window size T. The prediction module combines the embedding of news sentiment feature (L v i,k ∈ R D ), the embedding of news semantic feature (L s i,k ∈ R D ), the hidden states (h i,k ∈ R D ) from the RNN model, and the hidden states learned from Cola-GNN where φ is the activation function, θ ∈ R D+D+D+F and b θ are trainable parameters.D is the dimension of RNN hidden states and sentiment/semantic embeddings, and F is the dimension of the hidden states from the transferred knowledge of the source model.

Evaluation Metrics
We evaluate our models using the root mean squared error (RMSE) and the Diebold-Mariano (DM) Test.
• DM-test measures the difference between the predicted values from two models ( ŷ1 , ŷ2 ) and the corresponding observed values y i : The Root Mean Squared Error (RMSE) is the standard deviation of the residuals, which measures the difference between the predicted values ŷi from a model and the corresponding observed values y i :

Comparison Methods
We calculate the RMSE to measure the model performance of TLSS and other stateof-the-art methods and their derivative approaches, such as autoregressive-OLS (AR ♦ ), autoregressive-gradient descent (AR), autoregressive moving average-OLS (ARMA ♦ ), autoregressive moving average-gradient descent (ARMA), vector autoregressive (VAR), recurrent neural network (RNN), long-and short-term time-series network (LSTNet), and cross-location attention-based graph neural network (Cola-GNN) (see Appendix B for the description and experiment setup of these methods).In the state-level experiment, we specifically use the DM-test to compare the significance level of the improvement achieved by TLSS against other models.In the country-level experiment, we use the t-test to compare the performance of the various models (e.g., AR ♦ , ARMA ♦ , AR, ARMA, VAR, RNN, and LSTNet) under two conditions: with and without the exogenous variables, such as COVID-19 cases, news sentiment, and news semantic features.We split all data into training, validation, and test sets in chronological order at a ratio of 80%-10%-10%, respectively.We use validation data to avoid overfitting and to determine the number of epochs for training.To measure variables at different scales, we normalize the hospitalization rate data in a range of 0 to 1 based on the training data.We also normalize the COVID-19 cases data and pre-trained semantic data (country-level and state-level) between 0 and 1 based on their overall dataset.Pre-trained news sentiment data (country-level and state-level) are in the range of −1 and 1 to measure the negative and positive emotions.

Visual Examples to Describe the Association between Hospitalization Rates and News
Figure 1 shows visual examples to describe the potential valuable relationship between hospitalization rates and news articles, which is the motivation for using news articles as an auxiliary feature in hospitalization rate forecasting.Figure 1a exhibits the extensive coverage of epidemic policies (e.g., mask-wearing) in the news media aimed to control the virus spread, when hospitalization rates continue growing during COVID-19.Notably, the hospitalization rate subsequently decreased following the publicity and implementation of relative prevention policies.It suggests the latent impact of public opinions and policies on hospitalization rates during the COVID-19 outbreaks.Figure 1b shows that holiday-related COVID-19 news was widely reported in November and December 2021, respectively, in the US.Additionally, hospitalization rates grew rapidly after holidays like Thanksgiving and Christmas.It suggests that the social factors (e.g., public sentiments reflected in news articles) may imply informative clues in explaining some unconventional trends of hospitalization rates.

Experiment Setup
All programs are implemented using Python 3.7 and PyTorch 1.12.1 in Google Colab Pro with premium GPUs (e.g., NVIDIA V100 or A100 Tensor Core GPUs).

Country-Level and State-Level Experiments
We evaluate the experiment results with different time settings (lead time = {1, 7, 14}), and historical input window sizes T = {9, 15}.Window size = 9 days is calculated from AR model order selection in Python package statsmodels.tsa.ar_model.ar_select_order.Window size = 15 days is based on the consideration of the longest incubation period of COVID-19, which is 14 days [81].For baseline approaches containing an RNN module, the dimension of hidden units is tuned from {12, 20, 32, 64}, and the dimension of hidden layers is tuned from {1, 2, 3}.The batch size is 32, the initial learning rate is selected from the set {0.001, 0.005, 0.01}.All models are trained using the Adam optimizer [82] with a weight decay of 5 × 10 −4 and a dropout rate of 0.2.We set up the training epoch as 1500 and stop early if the validation loss does not decrease in 200 epochs.
We pre-train the source model Cola-GNN with input window size T of {9, 15}, and set lead time values as the same as the target model TLSS (lead time = {1, 7, 14}).We set the number of filters as 10, and long-term and short-term dilation rates as 2 and 1 in the multi-scale dilated convolution module of Cola-GNN.For the RNN module, the dimension of hidden units is tuned from {10, 20, 30}, and the dimension of hidden layers is tuned from {1, 2}.Other hyper-parameters are consistent with other baselines and trained using Adam optimizer.We initialize the parameters of the target model TLSS using the shared parameters of dilated convolution layers from the pre-trained source model.

Pre-Train Sentiment and Semantic Data
We collect COVID-19 related news articles at the country and state levels from Refinitiv Real-time News and GDELT via keyword filtering.The raw data undergo preprocessing, which includes removing punctuations, URLs, and numbers, as well as converting the text to lowercase.We then tokenize the data and feed it into VADER and SBERT models for sentiment and semantic feature extraction.
To generate a sentiment score for each news article, we pre-train the data using an unsupervised sentiment analysis method VADER.VADER calculates a sentiment score for every word, and these scores are aggregated to determine the overall sentiment of an article.For each location, we implement average pooling to the sentiment scores of news articles in a day and obtain a scalar value to represent the average polarity of news at the current time stamp.Country-level sentiment data only have one location (e.g., US), and state-level sentiment data have 50 locations (e.g., 50 US states).
The semantic analysis dynamically models public opinions and policies in different locations during COVID-19 from news data.We use a pre-trained SBERT model paraphrase − MiniLM-L6-v2 to generate word-embedding vectors for each news article with a maximum input length of 500 tokens and output size of 768.We then apply Principle Component Analysis (PCA) on the existing model and reduce the output size to 50 dimensions.For each location i, we implement average pooling to the semantic vectors of news articles in day t and obtain a vector S i,t ∈ R 1×50 to represent the average semantic feature at the current time stamp.Country-level semantic data only have one location (e.g., US), and state-level semantic data have 50 locations (e.g., 50 US states).

Ablation Test
We perform ablation tests in hospitalization rate forecasting to evaluate the contribution of each module in TLSS: • TLSS w/o transfer learning: Exclude the transfer learning module in TLSS and conduct training on the hospitalization rate data without utilizing knowledge learned from existing epidemics, such as the flu.• TLSS w/o sentiment analysis: Exclude the sentiment analysis module in TLSS and ignore the sentiment information in news data.• TLSS w/o semantic analysis: Exclude the semantic analysis module in TLSS and ignore the semantic information in news data.

Non-Linear Correlation Test Result
We implement three non-linear correlation tests (White test, Granger causality test, and Brownian distance correlation test) to detect the association of COVID-19 cases with hospitalization rates at country-level and state-level data, respectively.In a Granger causality test, the input window size (lag) is fifteen days, and the lead time is one day.In Table 3, we observe that the p-values [83] of the White test, Granger causality test, and Brownian distance correlation test are remarkably close to zero.It shows an extremely significant relationship between COVID-19 cases and corresponding hospitalization rates at the country level.Based on their significant correlation, we learn and transfer knowledge from rich epidemic infection data to hospitalization rate forecasting, thus overcoming the challenges of homogeneous data insufficiency, such as sparse historical hospitalization records.It effectively increases the flexibility of heterogeneous transfer learning architecture within TLSS.
Considering the spatial variation, we apply non-linear correlation tests at the state level to explore the relationship between COVID-19 cases and hospitalization rates in 50 US states.We adopt distinct p-value thresholds (e.g., 0.01, 0.05, and 0.1) to indicate the significance levels, such as highly significant, moderately significant, and weakly significant, corresponding to the 99%, 95%, and 90% confidence intervals [84].In Table 4, we count the number of states that show a significant correlation between COVID-19 cases and hospitalization rates.In the White test, the correlation between COVID-19 cases and hospitalization rates is significant in all 50 states, but the states AK, HI, and KS show relatively weaker significance compared with other locations.In the Granger causality test, the input window size (lag) is fifteen days, and the lead time is one day.The correlation is strongly significant in forty-three states, moderately significant in four states (GA, HI, ID, and WY), weakly significant in two states (ME and NM), and non-significant in the state KS.The distance correlation test also exhibits a significant correlation between COVID-19 cases and hospitalization in all 50 states.Table 5 indicates that the Brownian distance correlation is highly significant in 24 states (i.e., correlation larger than 0.7), moderately significant in 23 states (i.e., correlation larger than 0.5 and less than 0.7), and relatively weak significant in 3 states (AK, HI, and KS) (i.e., correlation less than 0.5).Overall, we demonstrate the significant relationship between COVID-19 cases and hospitalization rates in more than 90% of US states (see Appendix C for detailed state-level experiment results).

Country-Level Experiment
In Table 6, we compare the RMSE of forecasting models with different exogenous variables at country-level experiments.We train several state-of-the-art models on hospitalization rate data with an input window of fifteen lagged days and a lead time of one day.The significant difference in RMSE values across traditional AR ♦ , ARMA ♦ , and other models is because of their different estimation methods.Traditional AR ♦ and ARMA ♦ use ordinary least squares (OLS), while other models use gradient descent to estimate the unknown parameters.
Most methods exhibit relatively good performance in capturing temporal patterns due to the small information gap between the history window and the predicted time.When we only use the historical data of hospitalization rates in time series prediction, RNN achieves the best prediction result.When we add the exogenous variable COVID-19 cases into the model, the forecasting performance improves in most cases.The decreased prediction ability of RNN and lstnet suggests that input complexity affects deep learning model performance in time series forecasting.When we add more exogenous variables, such as news sentiment and semantic features, into the model, most approaches are further optimized.
In Table 7, we implement Student's t-test (t-test) [85] to measure the variance of model performance when adding exogenous variables (e.g., COVID-19 cases, news sentiment feature, and news semantic feature).Although some methods show improved performance on RMSE with exogenous variables, the results are not significant.However, it provides a valuable inspiration to investigate optimization in capturing temporal dependencies of multimodal data, especially for complex text data.Moreover, we will expand our analysis at the state level to consider the impact of geolocation.Algorithms listed are autoregressive-OLS (AR ♦ ), autoregressive-gradient descent (AR), autoregressive moving average-OLS (ARMA ♦ ), autoregressive moving average-gradient descent (ARMA), vector autoregressive (VAR), recurrent neural network (RNN), and long-and short-term time-series network (LSTNet).

State-Level Experiment
In this section, we evaluate the model performance and explore the contribution of exogenous variables at tabref:sustainability-2404857-t008s.We compare the prediction accuracy and significance level of TLSS against other baseline models based on RMSE and DM-test, with an input window of 9 and 15 lagged days and lead times of 1, 7, and 14 days.The large difference in RMSE values across AR ♦ , ARMA ♦ , and other models is because of their different estimation methods.In Table 8, we observe that the performance of some models improved when adding exogenous variable COVID-19 cases.Most baseline methods have decreasing performance when adding news features.It suggests that the temporal dependencies of complex text data are hard to capture in multiple locations.VAR and lstnet methods are sensitive to input complexity and lead time, which declares the challenges of long-term forecasting with multimodal data.Cola-GNN shows significant improvement in hospitalization rate forecasting, demonstrating the importance of geolocation correlation at the population level.TLSS outperforms most models with a stable and optimal forecasting performance, showing its conspicuous capacity to capture temporal dependencies from complex text data.When lead time is one day, most methods achieve comparatively good performance in capturing temporal patterns without exogenous variables due to the small information gap between the history window and the predicted time.When lead time is seven days, compared with deep learning methods (RNN, Cola-GNN, and TLSS), statistical models (AR, ARMA, and VAR) have declined performance, especially VAR, due to the largest number of model parameters.It suggests the influence of model complexity on time series forecasting when lead time becomes more extensive, particularly with limited input.When lead time is fourteen days, Cola-GNN exhibits competitive forecasting performance with TLSS because it is originally designed for long-term epidemic prediction.It also indicates that the news impact may wane over time.
We apply DM-test to compare the improvement and accuracy of TLSS against other baselines.In most cases, TLSS presents statistically significant optimization in hospitalization rate prediction with limited data during the COVID-19 pandemic.It demonstrates that efficient information extraction and application from news data will significantly improve the model's accuracy.News can serve as a useful supplementary feature, especially for prediction with a lead time of less than fourteen days.
Overall, TLSS outperforms all baseline methods in most situations.Directly incorporating exogenous variables, such as COVID-19 cases and news articles, does not automatically improve the performance of most models.It suggests that temporal dependencies are hard to capture accurately in multimodal data.Enhancing the capability of AI technical support in healthcare sustainability remains a persistent challenge.

Ablation Test
We perform ablation tests in hospitalization rate forecasting to evaluate the performance of each module within TLSS.Table 9 exhibits that TLSS achieves the best performance for forecasting hospitalization rates during COVID-19 with a lead time of 1, 7, and 14 days.Additionally, we observe that TLSS shows significant improvement with transfer learning architecture with a lead time of 7 and 14 days.This is due to its ability to capture the spread patterns of existing diseases and transfer the learned knowledge to the target model for hospitalization rate prediction during the initial phase of an emerging disease outbreak.The higher accuracy of more extensive lead time also shows the possible delay impact of infectious cases on hospital admissions.For example, confirmed COVID-19 cases may be hospitalized after three days when the condition worsens.Models that involve news sentiment and semantic analysis have better results with a lead time of 7 and 14 days due to the latency effect of news opinions and attitudes on epidemic transmission and hospital resources.In hospitalization rate prediction with a one-day lead time, the news sentiment feature has a more significant influence than news semantic information.This finding indicates that public emotion plays a crucial role in epidemic prevention efforts.The ablation test results demonstrate the essentiality of each model module and the inclusion of exogenous variables.By considering the impact of social factors and learning the general characteristics of existing pandemics, TLSS achieves accurate prediction of hospitalization rates.

Healthcare Sustainability During an Emerging Pandemic
Healthcare systems are confronted with sustainability challenges while constantly improving their quality level and reducing unnecessary waste, especially during the outbreak of emerging diseases (e.g., COVID-19) [86][87][88].An effective and accurate forecasting tool is of prime importance in understanding the expected volume of patients, thereby rapidly responding to pandemics [15].In this paper, we implement the state-of-the-art method TLSS into a new application scenario: forecasting hospitalization rates.We also incorporate information on existing infectious diseases, news sentiment, and semantic information as exogenous features to model the dynamic propagation of new pandemics.Our proposed analytical framework outperforms other baseline models, especially with longer lead times, suggesting that existing methods struggle to capture accurate temporal dependencies for relatively long-term forecasting.
These accurate forecasting results can support healthcare institutions in making informed decisions toward sustainable finance management during pandemics.For example, institutions can avoid over-investing in unnecessary resources or under-investing in highdemand resources.Moreover, by anticipating future demands, health systems can enhance their flexibility in managing human and medical resources, such as constructing temporary hospital facilities to provide additional hospital beds during the peak period of the COVID-19 outbreak.Accurate hospitalization rate forecasting can also aid in planning and coordinating patient care, thus improving the service quality.For instance, it can improve patient outcomes, including shorter hospital stays, reduced readmissions, and fewer complications.In summary, hospitalization rate forecasting is crucial for health institutions to enhance the overall healthcare sustainability over the long term.

Strength and Applicability of TLSS
More researchers have gradually recognized the critical impact of relevant historical knowledge on an emerging forecasting task, as discussed in Appendix A. They demonstrated that transfer learning architectures outperform traditional isolated machine learning approaches in many cases [89,90].However, most of them use a homogeneous transfer learning approach such as utilizing the knowledge of existing diseases (e.g., flu) to forecast a new epidemic (e.g., .In this work, we successfully learn and transfer heterogeneous information from infection case data to the task of hospitalization rate forecasting based on the demonstrated nonlinear correlation between them.It addresses the scarcity issue of historical data of the same type.For example, it is hard to learn knowledge from sparse hospitalization records of the existing disease (e.g., flu) and transfer it to predict hospitalization rates of an emerging disease (e.g., .We also further demonstrate the contribution of the transfer learning module in the ablation test, particularly with a lead time of 7 and 14 days.It implies that the learned experience from existing diseases significantly benefits relative long-term hospitalization rate prediction.Additionally, the novel application of TLSS inspires us to explore other factors, such as mortality and chronic diseases, which may be highly correlated with our target task and can provide critical clues.We will examine the possibility of learning knowledge from multiple correlated factors and transferring them to the target task. The COVID-19 pandemic has provided recent evidence that institutions incorporating social, environmental, and governance (ESG) factors have a competitive edge in long-term development [91].It drives multiple works focused on assessing hospital performance from an ESG perspective [92][93][94].However, few papers value or quantitatively analyze the impacts of ESG components on healthcare sustainability, such as how social factors affect hospitalization rates.TLSS adopts a multimodal data learning module to capture the impact of news sentiment and semantic features on hospitalization rates over time.We successfully extract the relevant information and collect the temporal dependencies from the complex text data (i.e., news articles), thereby improving the accuracy in predicting hospitalization rates.It is beneficial for health systems to address sustainability challenges considering the impact of news opinions and emotions on human behavior.For instance, news and public policies can suggest patients with mild illnesses to isolate at home, thus reducing the pressure of hospitalization during COVID-19.Furthermore, news sentiment exhibits a more significant influence than semantic information in hospitalization rate prediction in the ablation test.It suggests that public emotions and situations may affect human awareness and behavior more than public policies.Overall, our work further optimizes AI technical support by leveraging public opinions and concerns for strategizing sustainable development in healthcare.

Conclusions
We introduce healthcare sustainability challenges during an emerging pandemic (i.e.,  and propose an analytical framework using machine learning methods to address the issues from a data science perspective.In this paper, we utilize non-linear corre-lation tests (e.g., White test, Granger causality test, and Brownian distance correlation test) to demonstrate the significant relationship between infection cases and hospitalization rates during COVID-19.We then adopt TLSS into a novel application scenario: hospitalization rate forecasting during COVID-19.According to the demonstrated impact of infection cases on hospitalization rates, we learn the general characteristics from rich historical infection records of existing diseases (i.e., flu), instead of from the sparse data of hospitalization rates directly.The learned knowledge is then transferred to a target model for hospitalization rate forecasting during the new epidemic (i.e., .The heterogeneous transfer learning architecture within TLSS tackles the challenges of limited input data during the initial phase of an emerging epidemic and the homogeneous data scarcity of existing diseases.For instance, the hospitalization data are sparse in both the early stages of COVID-19 and flu history records simultaneously.We successfully incorporate social variables, such as news sentiment and semantic analysis, to forecast hospitalization rates during the spread of pandemics.We adopt a location-aware attention mechanism to capture the dynamic correlation between news text data and time series numerical data in multiple locations over time.We evaluate TLSS performance during COVID-19 propagation and demonstrate its effectiveness and accuracy in predicting future hospitalization rates across various lead times.
We provide valuable AI technical support for healthcare sustainability, facilitating urgent answers to crises when new epidemics outbreak.We optimize the early-stage forecasting of hospitalization rates during emerging epidemics to better understand the expected volume of patients.By accurately predicting future hospitalization rates, health institutions can effectively reduce their costs by avoiding over-or under-investing.Additionally, they can adjust staff and medical resource allocation to improve the service quality of patient care.
In the future, we intend to refine our AI analytics framework and improve its flexibility, adapting it to various scenarios through continuously updating and purposefully utilizing appropriate techniques.We propose to provide comprehensive readable and visualized forecasting results, enabling broad application in healthcare sustainability, thereby developing its further application to respond to the needs of both health providers and patients simultaneously.Additionally, we aim to extend the exploration of ESG factors, including governance and climate risks, which may also have specific impacts on hospitalization rates or other critical health-related indicators.Such further research will provide a valuable and comprehensive analysis that supports healthcare sustainability in long-term development.
prediction.We apply it in hospitalization rate prediction during COVID-19 in 50 US states.It is the source model of TLSS.

Appendix C. State-Level Non-Linear Correlation Test Results for Hospitalization Rates and Infection Cases during COVID-19
and b u ∈ R are trainable parameters.(3)Adopt an element gate to combine the sentiment/semantic cosine similarity matrix C v k /C s k and attention coefficient matrix A:

Figure 1 .
Figure 1.Examples of impact of news articles on hospitalization rates during COVID-19.(a) US: Normalized daily hospitalization count and normalized occurrence of keywords (e.g., mask) in news articles.(b) US: Normalized daily hospitalization count and normalized occurrence of keywords (e.g., holiday-related keywords) in news articles.
N×Thospitalization rates for N locations of window size T of the future value Y k , where k is the current time stamp and T is the historical window size.The Granger causality is typically calculated with the following bivariate linear autoregressive model: [73,74]er Causality TestGranger causality is a statistical hypothesis test that evaluates the causal relationship between multiple time series.It is widely used in economics, financial econometrics, and business[73,74].Given the time series variables X and Y, X [k−T:k] Granger causes Y k when X [k−T:k] happens prior to its effect, and X [k−T:k] has unique information about the prediction 3.3.Data Description 3.3.1.Dataset Description We used the following datasets from 1 August 2020 to 30 September 2022.Please refer to Table 2 for more data descriptions.• Hospitalization rate data are collected from the CDC COVID-19 Reported Patient Impact and Hospital Capacity by State Time Series [78].It comprises the daily count of newly admitted patients with confirmed COVID-19 (new admission counts) in 50 US states.In the country-level experiment, we aggregate hospitalization rates by location.
• COVID-19 Cases Data are collected from CDC US-COVID-19-Cases [8].It comprises the daily count of newly confirmed COVID-19 cases (new patient counts) in 50 US states.In the country-level experiment, we aggregate COVID-19 cases by location.• Country-level COVID-19 Original News Data are collected from Refinitiv Real-time News [79], which comprises news articles related to COVID-19 in the United States.• State-level COVID-19 Original News Data are collected from the Global Database of Events, Language, and Tone (GDELT) [80], which comprises news articles related to COVID-19 in 50 US states.• Pre-trained News Sentiment Data (Country-level and State-level) are the sentimentrelated features extracted from each news article.We pre-train the original countrylevel and state-level news data using VADER.• Pre-trained News Semantic Data (Country-level and State-level) are the semanticrelated features extracted from each news article.We pre-train the original countrylevel and state-level news data using SBERT.

Table 2 .
Data description.Size means the number of dates multiplied by the daily data dimension.The size of original news is the total number of news articles in the datasets.

Table 3 .
Country-level non-linear correlation test results for hospitalization rates and infection cases during COVID-19.

Table 4 .
Number of states with non-linear correlation test results for hospitalization rates and infection cases during COVID-19 with different statistical significance levels.
p < 0.X represents p-values at different significant levels.

Table 5 .
Number of states with different Brownian distance correlations between hospitalization rates and COVID-19 infection cases.

Table 6 .
Country-level Experiment: RMSE performance of different methods on hospitalization rate data and exogenous variables (COVID-19 cases, news sentiment feature, and news semantic feature) using an input window of fifteen lagged days and a lead time of one day.Boldface indicates the best result of each column.

Table 7 .
The t-statistic of RMSE differences between hospitalization rate forecasting with and without exogenous variables (COVID-19 cases, news sentiment feature, and news semantic feature) at the country level.

Table 8 .
State-level Experiment: RMSE performance of different methods on hospitalization rate data and exogenous variables (COVID-19 cases, news sentiment feature, and news semantic feature) with input windows of 9 and 15 lagged days and lead times of 1, 7, and 14 days.We use DM-test to compare the performance of TLSS against other models.Boldface and underlined indicate the best and the second-best result of each column.

Table 9 .
Ablation test result of TLSS with an input window of 15 lagged days and lead times of 1, 7, and 14 days.We use RMSE and DM-test to compare TLSS with partial versions of this model to evaluate the contribution of each component within it.-values (*, **, *** indicate statistical significance at p < 0.10, p < 0.05, and p < 0.01). p

Table A1 .
State-level non-linear correlation test results for hospitalization rates and infection cases during COVID-19.The Boldface of the White test results indicates a larger p-value compared with other states.Underline of distance correlations indicates a smaller correlation compared with other states.