A Novel Similarity Measure of Spatiotemporal Event Setting Sequences: Method Development and Case Study

: Examining the similarity of event environments or surroundings—more precisely, settings— provides additional insight in analyzing event sequences, as it provides information about the context and potential common factors that may have inﬂuenced them. This article proposes a new similarity measure for event setting sequences, which involve the space and time in which events occur. While similarity measures for spatiotemporal event sequences have been studied, the settings and setting sequences have not yet been studied. While modeling event setting sequences, we consider spatial and temporal scales to deﬁne the bounds of the setting and incorporate dynamic variables alongside static variables. Using a matrix-based representation and an extended Jaccard index, we developed new similarity measures that allow for the use of all variable data types. We successfully used these similarity measures coupled with other multivariate statistical analysis approaches in a case study involving setting sequences and pollution event sequences associated with the same monitoring stations, which validate the hypothesis that more similar spatial-temporal settings or setting sequences may generate more similar events or event sequences. In conclusion, the developed similarity measures have wide application beyond the case study to other disciplinary contexts and geographical settings. They offer researchers a powerful tool for understanding different factors and their dynamics corresponding to occurrences of spatiotemporal event sequences.


Introduction
An event setting, or more explicitly a spatiotemporal event setting, can be defined as a space and its collective influencing factors which are related to the occurrence of an event or sequence of events at a specific time and location. It can refer to the physical location, such as a specific venue or building, or to the overall atmosphere and environs or surroundings of an event. Similarity measures between events and event sequences have been well studied [1][2][3][4][5][6][7]. Assessing similarity between event settings adds another dimension to event sequence analysis in that it offers context and information on potential shared influencing factors. We hypothesize that the occurrences of at least some types of events and event sequences are likely to be related to the spatiotemporal settings from which they arise. In other words, spatiotemporal differentials in environmental settings contribute to variations in levels and patterns of event occurrences and event sequences.
While, as noted above, event sequence similarity has been well researched, no such similarity measures for event sequence settings have been found in the literature to date. This paper addresses this gap by developing similarity measures for event sequence settings. In [1], we established similarity measures for comparing event sequences and demonstrated their potential applications. In this study, we question whether similar patterns of event sequences reflect similarity in the spatiotemporal settings of the event sequences. A working hypothesis is that more similar spatial settings may generate more similar event sequences. epistemological framework of situated knowledge [24], which argues that our knowledge is contextualized by geographical location among other influences we may or may not be aware of.
Spatial context strongly influences the transport disadvantage that in turn affects social exclusion and well-being [25]. In travel behavior research, spatial context was shown to be strongly related to household travel patterns at an international scale [26]. A person's health-related problems can be strongly affected by different types of spatial context, such as environmental exposures [27,28], social environment (characteristics of communities and neighborhoods) [28,29], and ease of access to health services [30]. Spatial context greatly influences the potential of getting a disease, the adoption of healthy lifestyle, and the ease of access to medical services for disease diagnosis and treatment. An early psychological behavior research study indicates that decision behavior is affected by spatial context or spatially varied factors [31]. A farming population was selected to study the effect of spatial context in decision processes because the outcomes of decision behavior are easily observable over the landscape. The decision making in farming is dispersed spatially among many farmers due to the uneven diffusion of market and technical information. With the strong emphasis and integration of spatial context, a new area of ecological studies, called spatial ecology, has emerged [32,33].
Spatial context is also very important in recognition of objects in images. In a contentbased image retrieval experiment, incorporating spatial context models dramatically reduced the misclassification and increased the accuracy of material detection by 13% [34]. In order to better recognize or identify defined objects (e.g., cars, rivers, sky) in an image, combining the naturally classified texture or colors as spatial context greatly improved detection accuracy [35].
Spatial context plays an important role when measuring the similarity of two entities or event sequences. The effect of context on existing similarity measurement approaches has been reported on in the geospatial domain [36,37]. This work focuses on quantifying the impact of changing contexts on similarity measures, thus recognizing the potential influence of context on similarity measures embedded in that context. This paper focuses on measures of similarity for spatial settings with the expectation that setting similarity is likely influencing the similarity of event sequences observed within a setting.
In general, spatial-temporal contexts describe the general or broader context in which an event or phenomenon occurs. We distinguish spatial-temporal settings as referring to a specific location and time frame in which an event or phenomenon occurs. There is no fixed or natural scale [38] for such a setting, which may be as broad as a particular region or as specific as a particular location within a region. It can also refer to a specific point in time, or a specific time interval.
In this study, we develop similarity measures between individual spatiotemporal settings and sequences of spatiotemporal settings, which may affect or drive the formation of event sequence patterns. Spatiotemporal settings are characterized by a collection of parametric factors within the environs where events or event sequences are situated, with an emphasis on location, time, and circumstances. We discuss the concepts of classification and scale of spatiotemporal settings, followed by representation and variable selection for assessing spatiotemporal setting similarity. We then develop a matrix-based approach for computing similarity measures between spatiotemporal settings at a certain time point or period and sequences of spatiotemporal settings over serial times, which we evaluate through a case study. The developed similarity measure serves as an index that combines a set of quantitative and qualitative factors. The measures have broad application beyond ecological and environmental event settings, to social, cultural and health related contexts.

Model for Event Sequence Settings
As noted above, a conceptual challenge for modeling an event sequence setting lies in the spatial and temporal specification of the setting. The event sequence similarity measure described by Xu and Beard [1] assume time series and derived events sequences are observed at fixed point locations. Clearly, influences on a time series, and, by extension, a derived event sequence, extend beyond a point location, but a projected extent will be application and scale dependent, the scale dependence here being a function of pertinent event generating processes. As [12] note, the space of a site is something that emerges through unfolding event relations. Thus, we assume that the space of an event setting will vary based on the observed process, local environmental circumstances, and monitoring practices and have scale implications for variable selection.
As with most analyses, spatial and temporal scales must be considered in identifying and characterizing spatiotemporal event sequence settings. As a basis for modeling sequences of spatiotemporal event settings, we first model an event-situated setting at a specific temporal scale or time point with different spatial scales. Figure 1 illustrates the potential for different spatial boundaries for a setting. Where a boundary is placed has implications for the set of influencing factors. With changes in spatial scale, the influencing factors for a setting may vary and may be both static and dynamic.

Model for Event Sequence Settings
As noted above, a conceptual challenge for modeling an event sequence setting lie in the spatial and temporal specification of the setting. The event sequence similarit measure described by Xu and Beard [1] assume time series and derived events sequence are observed at fixed point locations. Clearly, influences on a time series, and, by exten sion, a derived event sequence, extend beyond a point location, but a projected extent wi be application and scale dependent, the scale dependence here being a function of pert nent event generating processes. As [12] note, the space of a site is something that emerge through unfolding event relations. Thus, we assume that the space of an event setting wi vary based on the observed process, local environmental circumstances, and monitorin practices and have scale implications for variable selection.
As with most analyses, spatial and temporal scales must be considered in identifyin and characterizing spatiotemporal event sequence settings. As a basis for modeling se quences of spatiotemporal event settings, we first model an event-situated setting at a spe cific temporal scale or time point with different spatial scales. Figure 1 illustrates the po tential for different spatial boundaries for a setting. Where a boundary is placed has im plications for the set of influencing factors. With changes in spatial scale, the influencin factors for a setting may vary and may be both static and dynamic. To account for the dynamic aspects of setting as relating to an event sequence at location, we conceptualize the setting as a sequence, i.e., a sequence of settings at ordere time points, as illustrated in Figure 2. The measurement of spatial pattern and heteroge neity is dependent on the scale at which the measurements are made. In this study, we d not consider interactions between scales. For a specific application context, we assum that we have determined the pertinent set of static and dynamic variables for representin all event settings at one spatial scale. For a set of monitored locations generating spatio temporal event sequences as discussed in [1], we specify corresponding sequences of spa tiotemporal event settings. Figure 2 graphically illustrates these conceptual sequences o spatiotemporal event settings with n dynamic and m static representative variables. To account for the dynamic aspects of setting as relating to an event sequence at a location, we conceptualize the setting as a sequence, i.e., a sequence of settings at ordered time points, as illustrated in Figure 2. The measurement of spatial pattern and heterogeneity is dependent on the scale at which the measurements are made. In this study, we do not consider interactions between scales. For a specific application context, we assume that we have determined the pertinent set of static and dynamic variables for representing all event settings at one spatial scale. For a set of monitored locations generating spatiotemporal event sequences as discussed in [1], we specify corresponding sequences of spatiotemporal event settings. Figure 2 graphically illustrates these conceptual sequences of spatiotemporal event settings with n dynamic and m static representative variables.

Matrix Representation of Sequences of Spatiotemporal Settings
For a given application context, we assume we have determined the major variables which strongly or satisfactorily represent the spatial settings for a set of sensor locations or monitoring stations where event sequences are observed. Given locations or monitoring stations and temporal points, we conceptually associate an event sequence with a setting sequence. We then represent these sequences of spatiotemporal event settings with a matrix, as schematically illustrated in Figure 3.

Matrix Representation of Sequences of Spatiotemporal Settings
For a given application context, we assume we have determined the major variables which strongly or satisfactorily represent the spatial settings for a set of sensor locations or monitoring stations where event sequences are observed. Given s locations or monitoring stations and t temporal points, we conceptually associate an event sequence with a setting sequence. We then represent these sequences of spatiotemporal event settings with a s × t matrix, as schematically illustrated in Figure 3.

Matrix Representation of Sequences of Spatiotemporal Settings
For a given application context, we assume we have determined the major va which strongly or satisfactorily represent the spatial settings for a set of sensor lo or monitoring stations where event sequences are observed. Given locations or toring stations and temporal points, we conceptually associate an event sequen a setting sequence. We then represent these sequences of spatiotemporal event s with a matrix, as schematically illustrated in Figure 3. For each setting λ with n dynamic (ν) and m static variables ( ), i.e., For each setting λ with n dynamic (ν) and m static variables (ρ), i.e., λ : ν 1 , ν 2 , . . . . . . , ρ m , Figure 3 can be expanded to Figure 4 to become the variables-based matrix representation of the sequences of spatiotemporal event settings.

Pairwise Similarity between Individual Spatial Settings
Pairwise similarity between individual settings is fundamental to further develop similarity measures between sequences of spatiotemporal settings based on certain criteria. In a study of environmental settings, for example, pairwise similarity can be used to measure the similarity between two or more settings based on factors such as temperature, humidity, rainfall, and other environmental variables. By calculating pairwise similarity scores, we can gain insights into how different or how similar settings relate to each other and identify patterns that may be useful in predicting future outcomes.
In this study, we develop a new pairwise similarity measure between spatial settings based on the modifications of the Jaccard index described in [1]. The original Jaccard index is a similarity measure commonly used in the context of sets or binary vectors, where each element can either be present or absent [39]. To adapt the Jaccard index for measuring the similarity between spatial settings associated with thematic events, we need to determine a set of common features, including static and dynamic variables, representing each spatial setting. Considering the number of common features for a pair of settings, we have two major considerations, (1) the magnitude or quantitative level of each element from both settings, and (2) that the values of the dynamic variables or elements should be measured at the same timestamps or time intervals.
We first identify the co-existing dynamic variables between two representative dynamic variable sets and , and the co-existing static variables between two representative static variable sets and of two spatial settings, setting 1 and 2. We calculate the relative ratios of individual common variables, and then sum them by dynamic and static variables. The modified Jaccard similarity between two spatial settings at time k can be expressed as the sum of relative ratios of all common features/variables divided by the total number of unique features/variables in both sets/settings as in Equation (1): Equation (1): where -set 1 representing spatial setting 1, including the subset 1 of dynamic variables ( ) and the subset 2 of static variables , -set 2 representing spatial setting 2, including the subset 1 of dynamic variables ( ) and the subset 2 of static variables , -sum of relative ratios of common dynamic variables between two settings at time k,

Pairwise Similarity between Individual Spatial Settings
Pairwise similarity between individual settings is fundamental to further develop similarity measures between sequences of spatiotemporal settings based on certain criteria. In a study of environmental settings, for example, pairwise similarity can be used to measure the similarity between two or more settings based on factors such as temperature, humidity, rainfall, and other environmental variables. By calculating pairwise similarity scores, we can gain insights into how different or how similar settings relate to each other and identify patterns that may be useful in predicting future outcomes.
In this study, we develop a new pairwise similarity measure between spatial settings based on the modifications of the Jaccard index described in [1]. The original Jaccard index is a similarity measure commonly used in the context of sets or binary vectors, where each element can either be present or absent [39]. To adapt the Jaccard index for measuring the similarity between spatial settings associated with thematic events, we need to determine a set of common features, including static and dynamic variables, representing each spatial setting. Considering the number of common features for a pair of settings, we have two major considerations, (1) the magnitude or quantitative level of each element from both settings, and (2) that the values of the dynamic variables or elements should be measured at the same timestamps or time intervals.
We first identify the co-existing dynamic variables between two representative dynamic variable sets l d1 and l d2 , and the co-existing static variables between two representative static variable sets l s1 and l s2 of two spatial settings, setting 1 and 2. We calculate the relative ratios of individual common variables, and then sum them by dynamic and static variables. The modified Jaccard similarity between two spatial settings at time k can be expressed as the sum of relative ratios of all common features/variables divided by the total number of unique features/variables in both sets/settings as in Equation (1): Equation (1): where l 1 -set 1 representing spatial setting 1, including the subset 1 of dynamic variables (l d1 ) and the subset 2 of static variables (l s1 ), l 2 -set 2 representing spatial setting 2, including the subset 1 of dynamic variables (l d2 ) and the subset 2 of static variables (l s1 ), Sd k12 -sum of relative ratios of common dynamic variables between two settings at time k, Ss 12 -sum of relative ratios of common dynamic variables between two settings, assuming no changes over time during the experiment, N d = |l d1 ∪ l d2 |-cardinality of union set of l d1 and l d2 , N s = |l s1 ∪ l s2 |-cardinality of union set of l s1 and l s2 . We have two similarity calculation situations dependent on variable types. First, if variable values are interval, ratio, binary and categorical, the pairwise similarity at time k can be calculated using Equations (2) and (3). Note that the categorical data can be converted to binary data format based on the number of categories.
If not considering weights or relative importance of individual elements/variables: If considering weights or relative importance of individual elements/variables: Second, if variables are ordinal valued, the similarity can be calculated using Equations (4) and (5): If not considering weights or relative importance of individual elements/variables: If considering weights or relative importance of individual elements/variables: where c kd12 -the number of common dynamic variables between two settings at timestamp k, c s12 -the number of common static variables between two settings, ω i , ω j -weights or relative importance of dynamic and static independent variables to response variable, n i , m j -ordinal levels of dynamic variable i and static variable j, respectively, lev(d k1i ), lev(d k2i )-the relative levels or magnitudes of two corresponding co-occurring elements in two dynamic subsets l d1 and l d2 at timestamp k, respectively: ω i , ω j -weights or relative importance of dynamic and static independent variables to response variable, lev(s 1i ), lev(s 2i )-the relative levels or magnitudes of two corresponding co-occurring elements in two static subsets l s1 and l s2 , respectively:

Pairwise Similarity between Sequences of Spatial Settings
Sequences of a spatial setting refer to the different configurations of the setting or a physical space that occur over time due to the changes of the dynamic variables while static variables are assumed stable during the study timeframe of interest. We can extend the modified Jaccard index-like pairwise similarity measure between individual settings, to calculate the pairwise similarity between sequences of spatial settings if the data from different locations are collected in equal time intervals or in the same order. Assuming we have determined the granularity of time intervals or certain sequential order and the total number of timestamps, T, the similarity between two sequences of spatial settings from two locations (S1 and S2) can be expressed as Equation (8): In dealing with the sequences of spatial settings, we also need to consider the data types and the weights or relative importance of explanatory variables to response variables (events or event sequences of interests). So, we also have four situations in which calculating the similarity between these setting sequences from different locations.
(1) Variable type: interval, ratio, binary and categorical; not considering the weights of individual variables: (2) Variable type: interval, ratio, binary and categorical; considering the weights of individual variables: (3) Variable type: ordinal; not considering the weights of individual variables: (4) Variable type: ordinal; considering the weights of individual variables:

Setting Similarity Analysis Workflow
To estimate similarity levels between event settings, a critical step is to effectively select and quantify the major attributes representing these settings where events or event sequences occur. As introduced above, the variables can be static, dynamic, or both, potentially covering a wide range of environmental variables. The selection of variables in developing similarity measures will be domain dependent and should be statistically discriminant. In a water quality monitoring application, for example, the static spatial setting variables of interest could include land cover, topography, and soils, and dynamic variables could be weather related. Figure 5 shows the steps for implementing similarity assessment between event settings or sequences of event settings in a specific domain. Geographies 2023, 3, FOR PEER REVIEW 9 Figure 5. Spatial-temporal setting similarity analysis flowchart.
Define a thematic event and identify sequences of spatiotemporal events: assume that we focused on an event or event sequences related research in a specific domain and identified a series of sequences of spatiotemporal events and completed similarity analysis between these sequences.
Identify relevant spatial settings and spatial features or variables: select potential dynamic and static variables representing spatial settings, which are deemed relevant to event occurrences based on domain knowledge. In studying air pollution events, for example, we could include data on such variables as wind direction, wind speed, sites of local manufactures, major pollution sources, concentration of major pollutants, and transportation density. A correlation matrix for these initial selected variables can be used to eliminate redundant information.
Collect spatial data and preprocess it: collect sufficient data on pre-selected static and dynamic variables intuitively correlated to occurrences of thematic events. Preprocessing or preparation of the collected data mainly includes normal distribution check, normalization, standardization of measurement units, and binarization of categorical data.
Analyze relative importance/weight of preliminarily selected variables: to improve the computation speed and accurate representation of similarity measures we should identify those variables most relevant to the events of interest and reduce the number. To determine which variables are most important to the thematic events and for the similarity measures, we can conduct relative weight analysis (RWA) [40][41][42] and partial least squares regression (PLSR) [43].
Calculate pairwise similarities between spatial setting sequences: Once the most relevant features or variables are identified, we can use the similarity measures developed in this study to compute the pairwise similarity between spatial settings and sequences of spatial-temporal settings and form the similarity matrix.
Validate the similarity measure: with the similarity matrix of spatial setting sequences, we can further conduct clustering analysis to group event sequences associated with locations or stations, and then conduct the comparison analysis with clusters of event sequences as ground truth. The other approach is to compare the results with other methods. Define a thematic event and identify sequences of spatiotemporal events: assume that we focused on an event or event sequences related research in a specific domain and identified a series of sequences of spatiotemporal events and completed similarity analysis between these sequences.
Identify relevant spatial settings and spatial features or variables: select potential dynamic and static variables representing spatial settings, which are deemed relevant to event occurrences based on domain knowledge. In studying air pollution events, for example, we could include data on such variables as wind direction, wind speed, sites of local manufactures, major pollution sources, concentration of major pollutants, and transportation density. A correlation matrix for these initial selected variables can be used to eliminate redundant information.
Collect spatial data and preprocess it: collect sufficient data on pre-selected static and dynamic variables intuitively correlated to occurrences of thematic events. Preprocessing or preparation of the collected data mainly includes normal distribution check, normalization, standardization of measurement units, and binarization of categorical data.
Analyze relative importance/weight of preliminarily selected variables: to improve the computation speed and accurate representation of similarity measures we should identify those variables most relevant to the events of interest and reduce the number. To determine which variables are most important to the thematic events and for the similarity measures, we can conduct relative weight analysis (RWA) [40][41][42] and partial least squares regression (PLSR) [43].
Calculate pairwise similarities between spatial setting sequences: Once the most relevant features or variables are identified, we can use the similarity measures developed in this study to compute the pairwise similarity between spatial settings and sequences of spatial-temporal settings and form the similarity matrix.
Validate the similarity measure: with the similarity matrix of spatial setting sequences, we can further conduct clustering analysis to group event sequences associated with locations or stations, and then conduct the comparison analysis with clusters of event sequences as ground truth. The other approach is to compare the results with other methods.

Case study: Setting Similarity of Coastal Monitoring Stations for Fecal Pollution
To demonstrate the use of our proposed method above, we determined the pairwise similarities of 16 monitoring stations along the Maine coast with the selected setting attributes for costal fecal pollution event sequences. The Maine Department of Marine Resources (DMR) manages the shellfish growing areas in coastal Maine based on the fecal pollution situations observed from more than 2000 monitoring stations. Fecal coliform is a type of bacteria that is found in the intestines and feces of warm-blooded animals, including humans. It is used as an indicator of fecal contamination of water [44]. Monitoring fecal coliform levels in coastal waters is important because it can help identify sources of contamination and provide an early warning of contamination, enabling faster responses. Maine DMR typically collects water samples at these monitoring stations (>2000) at regular intervals and analyzes them for fecal coliform levels. Grouping monitoring stations as similar spatial settings of fecal pollution events can provide several benefits and advantages. First, it can provide useful information for early detection of pollution events at similar stations [45,46]. Second, cluster analysis of monitoring stations across a wider area can help to identify trends and patterns in fecal coliform levels and pollution events, which can inform efforts to improve water quality. Third, followed by the previous two benefits, it will help to optimize resource allocation and prioritize monitoring efforts based on areas of higher pollution risk, which can help to reduce costs and increase efficiency in monitoring and management activities. Fourth, it can help to make more informed decisions about pollution control measures, such as beach closures or water treatment. Lastly, it also helps to increase public awareness of coastal water quality issues and the need for responsible use and management of marine resources.

Site and Variables
In this case study, we selected 16 monitoring stations along the Maine coast, with the following DMR assigned location IDs: WE020.00, WE028.00, WG008.10, WG027.00, WG038.00, WM003.00, WN057.00, WN077.20, WQ023.00, WR011.00, WS027.00, WS051.00, WT015.00, WT018.00, WT024.00, WV019.00, as shown on the map ( Figure 6). Multiple factors related to fecal coliform concentration around these monitoring stations contribute to characterizing the corresponding spatial settings for fecal pollution events. Some studies have shown that shoreline, basin hydrology, and marine environment affect the retention, survival, and distribution of fecal coliform [47]. Based on data availability, we selected a combination of basin characteristics as static variables and some marine environmental factors as dynamic variables. Their abbreviations and description are shown in Table 1.    Cumulative precipitation in 24 h inch RainCum48 Cumulative precipitation in 48 h inch RainCum72 Cumulative precipitation in 72 h inch RainCum96 Cumulative precipitation in 96 h inch

Data Collection
We used the geolocations of the 16 selected monitoring stations to delineate the corresponding basins with StreamStats v4.13.0 (https://streamstats.usgs.gov/ss/, the access date: 25 February 2023) and download all associated basin characteristics data. For the static variables described in Table 1, the data were extracted as shown in Table S1. We obtained marine environment related variables and fecal coliform measurements from Maine DMR (Table S2).

Methods
We used partial least squares regression (PLSR) analysis [41,42] to obtain the relative importance of all variables against the fecal coliform scores. We used the similarity measure developed in this study to achieve the similarity matrix of spatial setting sequences, and used the method developed in [1] to obtain the similarity matrix of the corresponding fecal pollution event sequences with the same locked timestamps. After converting the similarity matrices of both setting and event sequences to the distance matrices, we performed a cluster analysis [48].

Relative Weights and Selection of Representative Variables for Spatial Settings
The results of the partial least squares regression analysis on 39 variables revealed that some variables are more important than others in predicting the fecal coliform levels ( Table 2 and Figure 7). The signs associated with each variable provide insight into the direction of their impact on the fecal coliform levels. Salinity has the highest relative importance and the strongest negative influence on the fecal coliform. On the other hand, shortest distance from the coastline to the basin centroid (COASTDIST), bank-full streamflow (BKSF), and percentage of storage (combined water bodies and wetlands) from the National Wetlands Inventory (STORNWI) have the highest positive influence on the fecal coliform levels.  To reduce the number of variables for calculating similarity in the formula developed in this study, we selected the variables with higher weights. In this case study, we selected those variables with absolute values of relative importance greater than 1. We then re-ran PLSR with these selected variables against corresponding fecal coliform levels. The relative importance of these variables from the second round PLSR is shown in Table 3 and Figure 8, which can be used as relative weights for calculating similarities between spatial setting sequences when considering contribution from these individual variables.  To reduce the number of variables for calculating similarity in the formula developed in this study, we selected the variables with higher weights. In this case study, we selected those variables with absolute values of relative importance greater than 1. We then re-ran PLSR with these selected variables against corresponding fecal coliform levels. The relative importance of these variables from the second round PLSR is shown in Table 3 and Figure 8, which can be used as relative weights for calculating similarities between spatial setting sequences when considering contribution from these individual variables. To reduce the number of variables for calculating similarity in the formula developed in this study, we selected the variables with higher weights. In this case study, we selected those variables with absolute values of relative importance greater than 1. We then re-ran PLSR with these selected variables against corresponding fecal coliform levels. The relative importance of these variables from the second round PLSR is shown in Table 3 and Figure 8, which can be used as relative weights for calculating similarities between spatial setting sequences when considering contribution from these individual variables.

Clustering Analysis of Spatial Setting Sequences and Fecal Pollution Event Sequences
We computed all pairwise similarities between spatial setting sequences using the method of this study using the 16 selected variables in the previous section for 16 rainstorm-involved timestamps. The clustering analysis of spatial setting sequences labeled

Clustering Analysis of Spatial Setting Sequences and Fecal Pollution Event Sequences
We computed all pairwise similarities between spatial setting sequences using the method of this study using the 16 selected variables in the previous section for 16 rain-storminvolved timestamps. The clustering analysis of spatial setting sequences labeled with monitoring stations yields interesting insights into the underlying patterns and structures of the data of these selected static and dynamic variables ( Figure 9). The result indicates that there are 3~4 distinct clusters within the data, with each cluster representing a unique pattern of spatial setting sequences with similar characteristics. Figure 9 shows some geographically proximate spatial setting sequences in the same or connected clusters, but not all due to the diverse contributions of different static and dynamic variables. These clusters provide valuable information about the types of spatial setting sequences, which we next relate to clusters of fecal pollution event sequences.
Geographies 2023, 3, FOR PEER REVIEW 14 indicates that there are 3~4 distinct clusters within the data, with each cluster representing a unique pattern of spatial setting sequences with similar characteristics. Figure 9 shows some geographically proximate spatial setting sequences in the same or connected clusters, but not all due to the diverse contributions of different static and dynamic variables These clusters provide valuable information about the types of spatial setting sequences which we next relate to clusters of fecal pollution event sequences. We generated a similarity matrix between fecal pollution event sequences also labeled with monitoring stations and the corresponding setting sequences at the same time frame (16 days). With the conversion to the distance matrix, we implemented the clustering analysis and the similarity heatmap, and the clustering result is shown in Figure 10 Three major clusters are clearly identified. We generated a similarity matrix between fecal pollution event sequences also labeled with monitoring stations and the corresponding setting sequences at the same time frame (16 days). With the conversion to the distance matrix, we implemented the clustering analysis and the similarity heatmap, and the clustering result is shown in Figure 10. Three major clusters are clearly identified. beled with monitoring stations and the corresponding setting sequences a frame (16 days). With the conversion to the distance matrix, we implemen ing analysis and the similarity heatmap, and the clustering result is show Three major clusters are clearly identified.

Cross Analysis between Clusters of Setting Sequences and Clusters of Event
Cross-analysis between clusters of spatial settings and clusters of ev can provide insights into the causes and effects of pollution events in coa put the clustering results above from both setting sequences and event seq side to build the cross-comparison graph ( Figure 11). By examining com

Cross Analysis between Clusters of Setting Sequences and Clusters of Event Sequences
Cross-analysis between clusters of spatial settings and clusters of events sequences can provide insights into the causes and effects of pollution events in coastal waters. We put the clustering results above from both setting sequences and event sequences side by side to build the cross-comparison graph ( Figure 11). By examining components of the major clusters of setting sequences and pollution event sequences, we find cases of at least two stations within one major cluster among the event sequence clusters that were also grouped in the same major cluster of setting sequence clusters. We found 11 out of 16 monitoring stations showing this pattern. Specifically, WS027.00, WT015.00, WT024.00, and WR011.00 in event sequence Cluster E1 are also in setting sequence Cluster S2; WG008.10 and WE020.00 in Cluster E2 are also in Cluster S1; WQ023.00 and WV019.00 in Cluster E2 are also in Cluster S2; and WG027.00, WG038.00, and WM003.00 in Cluster E3 are also in Cluster S1. This cross-analysis between clusters of spatial settings and event sequences can help to improve our understanding of the complex interactions between environmental factors and basin characteristics and identify drivers for fecal coliform pollution events in coastal marine water.
Cluster E2 are also in Cluster S2; and WG027.00, WG038.00, and WM003.00 in Cluster E3 are also in Cluster S1. This cross-analysis between clusters of spatial settings and event sequences can help to improve our understanding of the complex interactions between environmental factors and basin characteristics and identify drivers for fecal coliform pollution events in coastal marine water.

Discussion
We developed similarity measures through modeling spatial setting sequences. The model uses a matrix representation of spatiotemporal event settings and considers both static and dynamic variables. To measure the similarity between spatial settings, the Jaccard index is modified based on the variables' magnitude and the time interval at which dynamic variables are measured. Pairwise similarity between individual spatial settings is crucial for developing similarity measures between sequences of spatiotemporal settings based on specific criteria. The pairwise similarity measure can help to identify patterns and predict future outcomes of corresponding event sequences.
The model's matrix representation of sequences of spatiotemporal settings can be used to represent a set of sensor locations or monitoring stations where event sequences are observed. The matrix representation has the flexibility to include n dynamic and m static variables that represent all event settings at one spatial scale. The modified Jaccard index measures the similarity between individual spatial settings and forms the basis for similarity measures between sequences of spatiotemporal settings. The modified Jaccard similarity between two spatial setting sequences considers the relative ratios of common features/variables. These measures provide information on the differences or similarity of spatial settings, which in turn contribute to the analysis of event sequences arising from these settings.
Through the case study, we demonstrated how to model the spatial-temporal setting sequences and provide a useful framework for understanding and characterizing spatial

Discussion
We developed similarity measures through modeling spatial setting sequences. The model uses a matrix representation of spatiotemporal event settings and considers both static and dynamic variables. To measure the similarity between spatial settings, the Jaccard index is modified based on the variables' magnitude and the time interval at which dynamic variables are measured. Pairwise similarity between individual spatial settings is crucial for developing similarity measures between sequences of spatiotemporal settings based on specific criteria. The pairwise similarity measure can help to identify patterns and predict future outcomes of corresponding event sequences.
The model's matrix representation of sequences of spatiotemporal settings can be used to represent a set of sensor locations or monitoring stations where event sequences are observed. The matrix representation has the flexibility to include n dynamic and m static variables that represent all event settings at one spatial scale. The modified Jaccard index measures the similarity between individual spatial settings and forms the basis for similarity measures between sequences of spatiotemporal settings. The modified Jaccard similarity between two spatial setting sequences considers the relative ratios of common features/variables. These measures provide information on the differences or similarity of spatial settings, which in turn contribute to the analysis of event sequences arising from these settings.
Through the case study, we demonstrated how to model the spatial-temporal setting sequences and provide a useful framework for understanding and characterizing spatial setting sequences corresponding to event sequences. The model's focus on defining the bounds of a setting and considering both static and dynamic variables allows for a comprehensive understanding of associated event sequences. The pairwise similarity measure helps identify patterns in event settings or setting sequences to comprehensively understand better the occurrences of events and event sequences. The similarity measures developed in this paper, and the framework incorporating static and dynamic variables to represent settings, will provide useful tools for a range of applications, from environmental settings to predictive modeling.
One potential application of similarity measures for event sequence settings is in the field of disaster management. By analyzing the spatial-temporal settings of past disasters, emergency responders can better predict the likelihood and potential impact of future disasters and allocate resources more effectively. For example, if a particular region is prone to frequent flooding, similarity measures can be used to identify patterns in the spatial-temporal settings of past floods and help emergency responders anticipate and prepare for future floods in that region. [49] studied the relationship between hydrological similarity measures and regional flooding frequency. The authors studied the similarity measures between catchments in the distribution of rainfall extremes and the extent of the impervious portion of the catchment. Similarly, our setting sequence similarity measures could be used to compare the frequency distribution of rainfall extremes and the extent of imperviousness across catchments. This could help identify catchments that have similar characteristics in terms of their rainfall and land use and may also be suitable for pooling together in regional flooding frequency analysis. [50] found that land cover/land use change (LCLUC) and sediment runoff affected by forestry practices and livestock grazing is temporally related to the water quality. We can potentially extend our similarity measures to this study to compare the temporal patterns of land disturbance and water quality variables (total suspended solids (TSS), turbidity, and visual clarity). This could help identify whether changes in land disturbance are related to changes in water quality over time, and whether there are any spatial relationships between land disturbance and water quality. In addition, as mentioned in this research, there exist nonlinear changes in land disturbance and sediment runoff; our novel approach on similarity between setting sequences can be easily plugged in to study these nonlinear ecosystem dynamics.
Overall, the use of similarity measures for event setting sequences has a wide range of potential applications in various fields, including disaster management, urban planning, transportation planning, and cultural heritage management. By analyzing the spatiotemporal context of events and their surrounding environmental factors, researchers and practitioners can gain a deeper understanding of the underlying mechanisms that drive those corresponding events and event sequences and use that knowledge to make more informed decisions about the management and planning of future events and activities.

Conclusions
In conclusion, modeling spatiotemporal event sequences requires careful consideration of spatial and temporal scales to define the bounds of the setting. The dynamic aspects of the setting should also be accounted for by conceptualizing the setting as a sequence. A matrix representation of sequences of spatiotemporal event settings can be developed for each setting with both dynamic and static variables. Pairwise similarity between individual settings and sequences of spatial settings can be calculated based on modifications of the Jaccard index, using a set of spatial features that represent each spatial setting.
With a careful consideration of spatial and temporal scales to define the bounds of the setting, we developed a modeling approach that incorporates dynamic variables or features in addition to static variables. Using a matrix-based representation of spatiotemporal setting sequences, we developed new similarity measures that include quantitative levels of individual elements within the sequence and comparison with locked timestamps or order. These similarity measures allow for the use of all variable data types in the equations. Overall, these similarity measures, along with the matrix-based representation of spatiotemporal event setting sequences incorporating both static and dynamic variables, provide a novel method in support of event sequence analysis.
Future research could investigate the potential of using the proposed similarity measures to analyze the dynamics of complex systems, such as ecological or economic systems, where events and their settings or contexts can be critical factors in understanding the system behavior. By examining the similarity of event settings, researchers could gain insight into how different factors interact with each other over time and across different spatial scales, which could inform better decision-making in a wide range of fields, from urban planning to disaster management.
Supplementary Materials: The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/geographies3020016/s1, Table S1: Static variables of basin characteristics associated with 16 monitoring stations; Table S2: Dynamic variables and fecal coliform scores in 16 monitoring stations.