Path Analysis of Sea-Level Rise and Its Impact

: Global sea-level rise has been drawing increasingly greater attention in recent years, as it directly impacts the livelihood and sustainable development of humankind. Our research focuses on identifying causal factors and pathways on sea level changes (both global and regional) and subsequently predicting the magnitude of such changes. To this end, we have designed a novel analysis pipeline including three sequential steps: (1) a dynamic structural equation model (dSEM) to identify pathways between the global mean sea level (GMSL) and various predictors, (2) a vector autoregression model (VAR) to quantify the GMSL changes due to the signiﬁcant relations identiﬁed in the ﬁrst step, and (3) a generalized additive model (GAM) to model the relationship between regional sea level and GMSL. Historical records of GMSL and other variables from 1992 to 2020 were used to calibrate the analysis pipeline. Our results indicate that greenhouse gases, water, and air temperatures, change in Antarctic and Greenland Ice Sheet mass, sea ice, and historical sea level all play a signiﬁcant role in future sea-level rise. The resulting 95% upper bound of the sea-level projections was combined with a threshold for extreme ﬂooding to map out the extent of sea-level rise in coastal communities using a digital coastal tracker.


Introduction
Since the industrial age, global mean sea-level (GMSL) rise has become a persistent trend that has greatly accelerated within the last two and a half decades [1]. This is largely due to thermal expansion, or increased temperature leading to a greater volume of water per unit mass, and the melting of glaciers and land-based ice caps [2]. The IPCC reports that beyond 2050, anthropogenic activities such as increased greenhouse emissions could result in further climate warming and a substantial increase in GMSL [3]. Additionally, high tide floods are becoming more frequent and reaching farther inland [4]. Many large cities located along coastal regions are likely to suffer drastic socioeconomic impacts from sea-level rise and high tide flooding. Hundreds of coastal communities, home to over 634 million people [5], are likely to face increased flooding that would devastate homes, lives, and communities, as seen in recent major flooding occurrences in Miami, New York City, and New Orleans (Figure 1).
The inherent relationship between GMSL and predictors such as glaciers, ocean circulation, and thermal expansion is not fully understood. Several recent studies were proposed in an attempt to simplify the relationship [6,7]. These approaches are able to recognize a trend or inherent relationship, but an estimate of the underlying causal pathways for long-term projection of GMSL is not given. Some researchers made predictions within the next 100 years [8][9][10]. Based on these studies, the GMSL rise by 2100 is estimated to be in the range between 60 to 130 cm. However, such models are either overly simplified, lacking the incorporation of essential causal relations, or overly complicated, featuring sophisticated Stats 2022, 5 13 atmospheric and climate models that can only be shown as a black box. This is a significant gap between existing analyses and public comprehension-and hence public actions. "Seeing is believing"-if we, as unbiased citizen scientists, could utilize available public data, design a comprehensive analysis pipeline to uncover the underlying causal relations and predict GMSL on our own, we would then be more convinced of the impending danger and thus take up actions to control for the causal links, such as the man-made greenhouse gas emission, to help curb this chain reaction. To this end, we have proposed a three-step statistical analysis pipeline ( Figure 2) to model and predict the global mean sea level. Firstly, a dynamic structural equation model (dSEM) is used to establish a pathway model to evaluate the effects of greenhouse gases, increasing air and water temperatures, decreasing Antarctic and Greenland Ice Sheet size, and decreasing sea ice. This pathway can help gauge the underlying pathway towards sea-level rise and help confirm whether the increase in greenhouse gases was responsible for the chain reaction that resulted in increased air temperature, which thus increases water temperature, the melting of ice sheets and ice caps, which then contributes to sea-level rise. In addition, we aim to analyze the relationship between sea ice, which has no direct contribution to sea-level rise, and the various predictors in our model. In our second procedure, we used a vector autoregression model (VAR) with more lags to more accurately forecast global mean sea-level rise based on direct causal factors identified through the SEM analysis. In our third procedure, we used a generalized additive model (GAM) in order to link global mean sea level to regional coastal areas in the United States and forecast regional sea level changes. Lastly, our predictions were used to analyze the extent of extreme coastal flooding by 2050 and 2100 using tidal gauge data in coastal urban areas such as New York City, Miami, and New Orleans. (SEM) to identify pathways between the global mean sea level (GMSL) and various predictors, (2) a Vector Autoregression Model (VAR) to quantify the GMSL changes due to the significant relations identified in the first step, and (3) a Generalized Additive Model (GAM) to model the relationship between regional sea level (RSL) and GMSL.

Data
All data were recorded in monthly averages and available in publicly accessible repositories. To gather data on sea-level rise, we used NASA's sea-level data, which were generated through multi-mission ocean altimetry [11]. We gathered the data for the monthly average sea ice extent in the Northern Hemisphere, mainly composed of Arctic sea ice, through the NSIDC (National Snow & Ice Data Center) [12]. The Greenland and Antarctic Ice Sheet mass losses were accessed through the IMBIE (Ice sheet Mass Balance Inter-comparison Exercise) datasets [13] and NASA [14]. The data for the average monthly air and water (sea surface) temperatures were accessed through Berkeley Earth [15]. The atmospheric CO 2 data were taken from the Mauna Loa Observatory, as part of the Scripps CO 2 program [16]. The CH 4 and N 2 O data were taken from 2degreesinstitute [17]. Lastly, the local tidal gauge stations for coastal areas in New York, Miami, and New Orleans were accessed through the NOAA Center for Operational Oceanographic Products and Services [18].
Prior to analysis, all monthly data were combined to range from 1992 to 2020. The Water Temperature and Air Temperature represent the global average temperature. They are calculated by adding the corresponding global average water/air temperature anomalies and their baselines, which are the monthly means of the global average water/air temperature. Data for the Antarctic and Greenland Ice Sheet mass loss ranged from 1992 to 2002 were taken from NSIDC [13], and those ranging from 2002 to 2020 were from NASA [14] after removing the seasonal effect.

Step 1: The Dynamic SEM Model
Structural equation modeling (SEM) is a multivariate technique [19] towards the confirmatory analysis of causal associations among a set of variables [20,21]. Even so, SEM is flexible enough to incorporate exploratory analyses of data as unsure pathways can be kept in the hypothesis to be validated by the analysis. SEM assumes that proposed relations among variables can be represented in a set of structural regression equations and that these relations can be represented pictorially in the form of a path diagram [19,22].
The unified (or dynamic) structural equation model [22], also known as VAR-SEM model or MAR-SEM model, can be used to analyze the proposed causal associations among a set of time-series variables by incorporating both the contemporaneous relations as well as the longitudinal relations represented by the VAR model. Specifically, longitudinal temporal relations are defined as relationships between regions involving different time points, while contemporaneous relations reflect relationships between regions at the same time point. Let y j (t) be the j th measured at time t, j = 1, 2, . . . , m. The m-dimensional multivariate autoregressive process of order p (VAR (p)) with an added component of contemporaneous relations can be written as: Here y(t) = [y 1 (t), y 2 (t), . . . , y n (t)] is the (n × 1) vector of observed variables measured at time t, (t) is the (n × 1) vector white noise with zero-mean, A is the parameter matrix of the contemporaneous relations, and φ i , i = 1, 2, ..., p is a series of (n × n) parameter matrices representing the longitudinal relations.
There is no latent variable in our VAR(1)-SEM model. In our model, there are six observed variables in total, which are GMSL, Sum (glacier) Mass, Sea Ice, Water Temperature, Air Temperature, and Greenhouse Gas. The definition of the variables is shown in Table 1.
As it is a VAR(1)-SEM model, each variable has two states in the model, at time t and time t − 1. Thus, there are 6 × 2 = 12 variables in the VAR(1)-SEM totally. If we use the original data Sum Mass to estimate the parameters in the VAR-SEM model, the model will fail to converge, as the value of Sum Mass is 1000 times larger than others. Thus, we take the natural log on the Sum Mass (cumulative glacier mass change) plus 8000 towards the VAR(1)-SEM. Table 1. Definition of global variables used in the proposed analysis pipeline. The only variable not included are regional sea levels measured by various regional tidal gauge stations.

Variable Explanation
Air Temperature Global average air temperature GMSL Global mean sea-level variation Sum Mass Summation of Greenland and Antarctic cumulative glacier mass change Greenhouse Gas The amount of CO 2 , N 2 O and CH 4 in the air (parts per million) Water Temperature Global average water temperature Sea Ice Northern hemisphere sea ice extent Based on extensive literature review, we proposed a climate model as follows. First, increasing Greenhouse Gas would lead to an increase in Air Temperature. The increase in Air Temperature would correlate to rising Water Temperature. Increase in Air/Water Temperature would then lead to the loss of Sea Ice and Sum Mass (glacier mass). The loss of glacier mass would in turn increase the Global Mean Sea Level.
Based on the above induction and the model parsimony principle, our hypothesis for the dynamic SEM was generated with the longitudinal relations only involving the lag one pathways, that is, a VAR(1)-SEM model, as shown in Figure 3.  Table 1 for more variable definitions.

Step 2: A More Refined Vector Autoregression Model
Let y(t) = [y 1 (t), y 2 (t), . . . , y n (t)] denote an (n × 1) vector of time series variables. The basic p-lag vector autoregressive (VAR(p)) model [20] has the following form: where φ i are (n × n) coefficient matrices, c is an (n × 1) constant matrix, and (t) is an (n × 1) white noise with a mean of 0. The coefficient estimation can be obtained by using ordinary least squares (OLS) per equation. When using a VAR(p) model, lag length selection can be an essential part as adding unrelated variables may reduce the forecast accuracy by increasing the estimation error, while insufficient lagged variables are not able to model the internal relationship. The general approach is to fit VAR(p) models with orders p = 0, 1, . . . , p max and choose the value of p that minimizes some model selection criteria such as the widespread Bayesian information criterion (BIC): tˆ t is the estimate of the covariance matrix of the VAR error. For our analysis pipeline, we select significant direct predictors for GMSL based on the dynamic SEM results, where three predictors were selected, and BIC analysis shows that a lag 3 VAR model is the optimal choice, as shown in Figure 4. This further validated our pipeline of using a more detailed VAR(3) model as the second link, for more accurate global sea-level rise prediction, rather than using the less refined VAR(1)-SEM from the first step. To validate the VAR(3) model, we used both the walk-forward (one-step ahead) method and the train-test split method (80:20 train/test split ratio). For the walk-forward method, a moving forecast scenario is used. For the train-test split method, we make predictions for all the time steps in the test data and compare the predictions to the actual observations. We utilize the R package tsDyn [23] for the walk-forward validation and package vars [24] for train-test split validation. We also use a univariate ARIMA model as the baseline model to compare the effectiveness of our model.

STEP 3: The Generalized Additive Model
To predict the magnitude of sea-level rise for each coastal region of interest, we linked the global mean sea level (GMSL) to the regional sea level (RSL) using a generalized additive model (GAM). A GAM is a type of generalized linear model (GLM) in which the linear predictor is defined as the sum of the smooth functions, which may be formed as linear combinations of basis functions of the predictors plus a conventional parametric component of the linear predictor. It extends linear and generalized linear models by including smooth functions of predictors. The equation form of a GAM can be written as where g(.) is the link function, y is the response variable, X is the model matrix with n rows and p predictors (plus an intercept column), Z is an n by d matrix containing the basis functions of some subset of X, where d is of an arbitrary length, and β and γ are vectors of model coefficients.
The main feature that distinguishes the GAM from other models is that each predictor variable is assigned its smoothing function and these functions are added together to create a model [25]. Each function is also nonparametric, meaning that they are shaped by the data alone rather than parameters of previously defined distributions. Despite this, these functions can be both linear and nonlinear. This allows GAM to be more flexible than other models, preventing the data from being overfitted. When constructing the GAM, we used a simple linear regression model as a baseline and compared its performance to GAM. We conducted the GAM analysis with the assistance of R package "mgcv" [26]. In the mgcv implementation, GAM's penalized regression splines are used to define smoothing functions, where the model's (negative log) likelihood is modified by penalizing each smoothing function. The basis functions employed in these splines are typically designed to be as efficient as possible, given the many basis functions utilized. The GAM in mgcv tackles the smoothing parameter estimation problem using the generalized cross-validation (GCV) criterion. The GAM penalized likelihood maximization problem may be efficiently addressed using penalized iterative reweighted least squares (P-IRLS). Wood (2011 and documented detailed information about the underlying fitting methods [27,28]. The pathways for the GAM is shown in Figure 5.

Results
We performed the data analysis following the proposed three-step pipeline. The confirmed pathways and factors from the Step 1 dynamic structural equation model in the form of VAR(1)-SEM were subsequently used to guide the construction of the more refined vector autoregressive model in Step 2. Model selection with BIC has subsequently recommended a VAR(3) model. This model was fitted via existing data and subsequently used to predict the future global mean sea level. Finally, in Step 3, we applied the generalized additive model to link the global sea level to regional sea level. Predicted regional sea-level changes for three coastal regions (New York City, Miami, and New Orleans) were subsequently depicted using a digital mapping tool.

Dynamic SEM Results
After the data transformation, the dynamic VAR(1)-SEM can be analyzed by the usual SEM software as we have shown previously [21]. In this work, the SEM software adopted is the Lavaan package in R. Confirmed pathways are depicted in Figure 6 below, with the estimated path coefficients and other statistics tabulated in Table 2. In summary, we found that nearly all the contemporaneous pathways were significant except for the pathway between Sea Ice(t) and Sum Mass(t). The data validated a significant pathway from Greenhouse Gas increase to Air Temperature increase, to Water Temperature increase, to Sea Ice Extent, to Sum Mass (glaciers cumulative mass loss), and subsequently, to Global Mean Sea-level rise. Table 2 shows the estimates of the parameters in the VAR(1)-SEM model. The RMSEA of this model is 0.400. The Mean in this table is the mean slope across the multiple imputation samples. After we drop the non-significant pathway at the significance level of 0.05 (2-sided), the resulting validated VAR (1)-SEM model is represented by the following

GMSL Projection with VAR(3)
Using significant variables and pathways confirmed by the dynamic SEM analysis, we have subsequently selected monthly Average Sea Ice Extent, Water Temperature, the sum of Antarctic and Greenland Ice Sheet mass, in addition to GMSL, for the ensuing refined VAR analysis. Although Air Temperature is also significant, it is highly correlated to Water Temperature, and was thus discarded to avoid multicollinearity. The optimal number of lags is determined by using the Bayesian Information Criterion (BIC). This analysis resulted in the optimal number of lags being 3, and thus the VAR(3) model is fitted subsequently. The Multiple R 2 for GMSL is 0.997. To validate the superiority of the VAR(3) model, we have also compared it to the best baseline ARIMA model, the ARIMA(2,1,2) as selected by BIC. We applied Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) to compare the predicting performance of these two models. Both statistics are used for measuring the discrepancy between predicted and observed values, and thus a smaller value indicates better performance. These results are shown in Figure 7 and Table 3. One can find that the predicting accuracy is improved by using VAR(3) than the ARIMA(2,1,2) according to both RMSE and MAE.  After validating the model, we reconstructed the VAR(3) model by applying all observed data available to date, from January 1993 to May 2021, and forecasted the global mean sea level for the next 80 years (see Figure 8). The expected mean global sea level change is 0.31 m by the end of the 21st century. However, to ensure statistical confidence and to guard against the worst-case scenarios, we used an upper bound of the 95% prediction interval for the ensuing forecast.
The relevant equation used to predict GMSL by the VAR(3) model is shown below: 3.3. Projections of Coastal Regional Sea-Level Rise with GAM We analyzed regional tidal gauges located in areas of New York City and Long Island, Miami, and New Orleans. Using the generalized additive model (GAM), we modeled the relations between Global Mean Sea Level and Regional Sea Level (RSL). The RSL data for each region of interest are obtained by averaging the observed RSL from multiple regional tidal gauge stations. For instance, the RSL of New York City and Long Island is obtained from the Montauk, Kings Point, Battery Park, and Bergen Point tidal gauge stations. The related path diagram for this GAM analysis is shown in Figure 9.  According to the results from the GAM model, the expected RSL rise for the New York City and the Long Island area is approximately 0.24 m by 2050 and 0.65 m by 2100. The prediction intervals show the likely RSL rise can range from −0.43 to 1.04 m by the end of 2050 and from −1.22 to 2.63 m by the end of 2100. The performance of linear regression and GAM were compared, and we found that the adjusted R 2 improved by 0.083, and the deviance explained improved by 0.098 using GAM with its adjusted R 2 being 0.348. One may notice that the prediction bands would rise over time. This is due to two main reasons: (1) The increased variability of RSL as time increases, which subsequently increased the prediction temporal volatility of GAM; (2) To calculate the prediction bands, we first use the upper bounds and lower bounds of the 95% prediction intervals for GMSL from the VAR(3) predictions as our predictor. Using the upper bounds of the VAR(3) and then GAM prediction bands, we can predict the worst-case scenario for coastal flooding, which is defined at a certain threshold above normal sea level. According to NOAA, the threshold for major flooding scale is 1.17 m (±0.39) above the local diurnal tidal range [4]. Therefore, we can approximate the worst-case scenario for flooding for the New York City and Long Island region to be 1.  Using the NOAA Sea-level rise Viewer [29], we plotted the respective coastal inundation scenarios for 2050 and 2100, respectively, in New York City, Miami, and New Orleans (see .

Discussion
Using the dynamic VAR(1)-SEM Model based on an existing public database, we have confirmed a significant pathway, from rising levels of greenhouse gases, to air temperature increase, to sea water temperature increase, to sea ice extent, to land ice (glaciers) mass loss, and subsequently, to global sea-level rise. As expected, greenhouse gases were a significant predictor of air temperature and global mean sea-level rise. In contrast, a controversial discovery was that the sea ice turned out to be critical in keeping the effects of the dynamic SEM Model. This could be due to a number of reasons. Firstly, declining sea ice means that there is less white surface to reflect the incoming sunlight, which lowers the albedo and thus leads to greater absorption and warming in the atmosphere, increasing the melting of sea and land ice [30]. Secondly, sea ice is also integral to ocean circulation, as the brine rejection below the sea ice increases the density of the water, causing it to sink and contribute to the global ocean conveyor belt [30].
The VAR(3) model took up the three significant and most direct factors for global sea-level rise from the dynamic SEM analysis. By using the optimal number of predictive lags in time, which was found to be 3, the VAR(3) model rendered more refined forecasting of sea level for the next 80 years, projecting a global mean sea-level rise of approximately 0.31 m by 2100. The GAM, which was found to be superior to the linear regression, was able to link GMSL and RSL for three regions of interest: New York City and Long Island, Miami, and New Orleans. The respective regional coastal sea-level rise projections were 0.65 m for New York City and Long Island, 1.21 m for Miami, and 1.37 m for New Orleans by 2100. Furthermore, the worst-case scenarios for flooding are projected to be 2.6 m by 2050 and 4.19 m by 2100, for New York City and Long Island; 2.55 m by 2050 and 3.98 m by 2100, for Miami; 2.41 m by 2050 and 3.54 m by 2100, for New Orleans.
Regional differences in elevation, ocean circulation, and coastal topography could account for the differences in sea-level projections. When considering the increased scope and severity of tidal flooding from sea-level rise, it is essential to factor in the thresholds for major flooding scenarios. Decades ago, typically only powerful storms could cause coastal flooding; however, due to regional sea-level rise, rather common wind events and seasonally high tides are now often the main cause of coastal inundation in communities [31], indicating that such events will persist and even worsen in future decades. Based on the NOAA's major flooding threshold, our new RSL predictions, when analyzed in coastal regions, show extensive flooding inland. Notably, flooding in Manhattan can be observed reaching deep inland in the Central Park region, along with significant inundation along the coast of Florida.
As a major objective of our analysis, we wish to develop a sensible open-box analysis pipeline using publicly available data for sea-level rise prediction. We have found that our predictions are aligned with major climate models from public agencies such as the IPCC, which reports an estimated sea-level rise between 0.43 m (0.29-0.59 m) and 0.84 m (0.61-1.10 m) [3]. This has convinced us that climate change and sea-level rise are not fiction nor conspiracy. As temperature increases and climate change progressively worsens, causing sea-level rise and extreme tidal flooding, millions of lives, property value, and ecosystems are at stake. Potential impacts of recurrent coastal flooding extend beyond sea-level rise. Examples include infiltration and degradation of stormwater and wastewater systems, and saltwater intrusion that raises coastal groundwater tables [32]. Potential responses for prevention measures could include building protective barriers, especially in urban areas such as New York City, where infrastructure such as subways is unprepared for the increased amount and severity of flooding, as evidenced by Hurricane Elsa in July 2021 [33]. Additionally, we hope our analyses encourages the protection of communities that are disproportionately impacted by coastal flooding and are the most vulnerable to socioeconomic damage. Developing sustainable technology and good management of coastal communities is also essential. However, identifying the most appropriate way to respond to sea-level rise is not straightforward and is contested with a range of governance challenges. Finding an ideal response to sea-level rise requires an assessment of their relative effectiveness, technical limits, costs, drawbacks, and the specific governance challenges associated with each type of response. Through our analysis, we hope to directly impact public policy and opinion for the environmental protection of coastal properties and proactive preparation from the impending climate crisis on the effects of sea-level rise on communities.
Lastly, we acknowledge that our work is not without limitations. Even with our model, it still remains a complex issue involving many other variables such as isostatic adjustment, wind forcing, sea level pressure, or climatic periodicities. Additionally, to account for the relatively low adjusted R 2 values for the GAM model, our confidence intervals became much larger over time due to large noise in the dataset. Lastly, as the data for the Greenland and Antarctic ice masses were only accessible from 1992, and as it is critical to consider the proportions of missing data compared to the available dataset when utilizing imputation, our ability to create more accurate forecasting of sea-level rise was limited. As the concluding remark for our work, we call for public agencies to provide more comprehensive data sets for citizen scientists in order to be able to independently model, analyze, and thus verify critical claims on the ongoing climate crisis.