GLM-Based Flexible Monitoring Methods: An Application to Real-Time Highway Safety Surveillance

: Statistical modeling of historical crash data can provide essential insights to safety man-agers for proactive highway safety management. While numerous studies have contributed to the advancement from the statistical methodological front, minimal research efforts have been dedicated to real-time monitoring of highway safety situations. This study advocates the use of statistical monitoring methods for real-time highway safety surveillance using three years of crash data for rural highways in Saudi Arabia. First, three well-known count data models (Poisson, negative binomial, and Conway–Maxwell–Poisson) are applied to identify the best ﬁt model for the number of crashes. Conway–Maxwell–Poisson was identiﬁed as the best ﬁt model, which was used to ﬁnd the signiﬁcant explanatory variables for the number of crashes. The results revealed that the road type and road surface conditions signiﬁcantly contribute to the number of crashes. From the perspective of real-time highway safety monitoring, generalized linear model (GLM)-based exponentially weighted moving average (EWMA) and cumulative sum (CUSUM) control charts are proposed using the randomized quantile residuals and deviance residuals of Conway–Maxwell (COM)–Poisson regression. A detailed simulation-based study is designed for predictive performance evaluation of the proposed control charts with existing counterparts (i.e., Shewhart charts) in terms of the run-length properties. The study results showed that the EWMA type control charts have better detection ability compared with the CUSUM type and Shewhart control charts under small and/or moderate shift sizes. Finally, the proposed monitoring methods are successfully implemented on actual trafﬁc crash data to highlight the efﬁcacy of the proposed methods. The outcome of this study could provide the analysts with insights to plan sound policy recommendations for achieving desired safety goals.


Introduction
Traffic collisions account for over 1.35 million annual fatalities and approximately 50 million injuries worldwide, and are predicted to become the fifth leading cause of death by the year 2030 [1]. On average, road traffic crashes account for approximately 3% of nations' gross domestic product (GDP) worldwide, irrespective of their growth and rate of motorization [2,3]. Road traffic injuries are the third leading cause of death in Saudi Arabia, which poses a major socio-economic and public health concern for road safety agencies. The country has witnessed rapid economic growth and motorization, particularly after the oil boom in the early 1970s [4,5]. The fatality index (deaths/100,000 population) due to traffic crashes in Saudi Arabia is estimated to be around 27.4, which is significantly high compared with developed countries like the United States, Australia, Sweden, Netherland, the United require excellent expertise, high computational efforts, and time. The Conway-Maxwell (COM)-Poisson (COM-Poisson) model proposed by Guikema and Goffelt [45] is the better alternative, which is a flexible model and able to handle any type of disperse data [46]. Lord et al. [47] and Lord et al. [48] have successfully used COM-Poisson regression to model the traffic crash data.
In the literature, studies have mostly focused on statistical modeling of the traffic crash data. However, factors contributing to crash occurrences and frequency exhibit spatial heterogeneity and vary from one location to another. In an effort to achieve better surveillance of highway safety, large amounts of data are collected by road safety organizations worldwide. Traditionally, the majority of existing crash prediction models rely on aggregated information with relatively large time-scales, usually on a yearly basis. However, researchers have argued that the likelihood of crash potential is significantly influenced by short-term fluctuations in crash contributing factors such as traffic, weather, complex terrains, and so on. Crash frequency models developed using aggregated data are designed to yield the prediction results on average data over a more extended period of time that may lead to loss of potentially useful information about some important explanatory variables. They also result in an error due to unobserved heterogeneity. Therefore, it is crucial to investigate the disaggregate models, also known as real-time crash risk evaluation models, for estimating crash potential in smaller time-scales such as an hour, a day, or a week. Crash prediction models with more refined time-scales are useful as they lead to timely and better safety decisions to improve highway safety. To fill this research gap, this study proposes the application of the statistical process control (SPC) method for real-time monitoring of crash data in the Kingdom of Saudi Arabia. A control chart is a well-known tool for statistical process control (SPC), which is often used to detect abrupt changes in the data [49]. There exist several GLM-based control charts, which are designed on the residuals estimated through GLM modeling. For example, GLM-based control charts based on the Poisson model were proposed by Skinner et al. [50] and Asgari et al. [51] in their studies. Skinner et al. [50] and Jearkpaporn et al. [52] discussed GLM-based control charts based on the Gamma model, while a chart based on the binomial model was proposed by Shang et al. [53] and Amiri et al. [54]. The GLM-based control charts under the negative binomial model were discussed by Alencar et al. [55] and Urbieta et al. [56]. In their study, Kinat et al. [57,58] proposed GLM-based control charts by assuming an inverse Gaussian distributed response variable, while Mahmood [59] proposed GLM-based control charts under the zero-inflated models.
Recently, Park et al. [60] and Park et al. [61] proposed Shewhart type GLM-based control charts by assuming the COM-Poisson distributed response variable. Park et al. [60] considered deviance residuals as the plotting statistics, while in another study, the authors utilized randomized quantile residuals as the plotting statistics. In general, the Shewhart type charts are designed based on the current information, and they are used to detect a large deviation from the mean of data. Practitioners are usually interested in detecting small changes as early as possible, for which exponentially weighted moving average (EWMA) and cumulative sum (CUSUM) are especially designed structures. Both EWMA and CUSUM charts are designed based on past and current information, which makes them more efficient to detect small or/and moderate shifts in the process mean. This study intends to design EWMA and CUSUM type GLM-based control charts using the deviance and randomized quantile residuals of the COM-Poisson regression model. Further, the performance evaluation and comparative analysis are conducted using the simulated data, and the proposed methods are implemented to monitor the number of crashes reported in Saudi Arabia.
The remainder of this paper is structured as follows. Section 2 presents a description of the data utilized in the current study. Section 3 highlights the proposed research methodology for crash frequency modeling and statistical process monitoring. Section 4 provides a comparative study of the proposed control charts using an extensive simulation study. Section 5 presents the implementation of proposed methods on the traffic collisions data. Finally, Section 6 summarizes key findings, study limitations, and outlook for future research.

Data Description
Motor vehicle crash data used in this study were procured from the ministry of transport, Riyadh, Saudi Arabia. A total of 47,984 crashes were reported during the three years (January 2017 to December 2019) of the study period. The study is focused explicitly on crashes involving only motor vehicles along inter-cities rural highways that fall under the ministry of transport jurisdiction. A significant proportion of these highways in the study area run through plain and desert terrain, having warm to high temperatures during most part of the year. Road inventory data are collected from the ministry for available sections, and for others, the geographic information system (GIS) tool was used to extract roadway geometric features. Each crash comprises several explanatory variables (shown in Table 1), including road type, road surface conditions at those sites, damage type post-crash, weather conditions, and presence or absence of road markings and cat eyes. In this study, frequency-based modeling was carried out for crashes aggregated on a daily basis. Figure 1 shows a time series plot of aggregated daily count data of the number of crashes. To access the best-fitting distribution of the aggregated crashes, we have implemented three well-known count models, i.e., Poisson, negative binomial, and  Table 2) indicate that COM-Poisson distribution is the best-fitting model as it produced the minimum values of decision criteria (i.e., loglikelihood, Akaike information criteria (AIC), and Bayesian information criteria (BIC)) compared with other models.
indicate that COM-Poisson distribution is the best-fitting model as it produced the minimum values of decision criteria (i.e., loglikelihood, Akaike information criteria (AIC), and Bayesian information criteria (BIC)) compared with other models.  As discussed earlier, each crash comprises a number of explanatory variables that were used in the modeling of the crash data set. To tackle the challenge of summing explanatory variables under each category against aggregated daily crashes, we have used the weighted average of the responses. The mathematical expression of the indexed value is defined below: where i is used to index days, j is used to index categories, Cij is the code value of category j on day i, Nij is the number of responses of category j on day I, and Nc is the total number of categories. For example, on day i, two crashes happened on the divided highway, one crash noted on the expressway, and five crashes reported on the single highway. Then, the indexed value for road type on day i can be obtained as follows:  As discussed earlier, each crash comprises a number of explanatory variables that were used in the modeling of the crash data set. To tackle the challenge of summing explanatory variables under each category against aggregated daily crashes, we have used the weighted average of the responses. The mathematical expression of the indexed value is defined below: where i is used to index days, j is used to index categories, C ij is the code value of category j on day i, N ij is the number of responses of category j on day I, and N c is the total number of categories. For example, on day i, two crashes happened on the divided highway, one crash noted on the expressway, and five crashes reported on the single highway. Then, the indexed value for road type on day i can be obtained as follows: Hence, all of the values of explanatory variables were converted into indexed values. Further, the Pearson correlation among all variables is estimated, and the plot is presented in Figure 2. It is noted that all the explanatory variables have a negative relation with the Hence, all of the values of explanatory variables were converted into indexed values. Further, the Pearson correlation among all variables is estimated, and the plot is presented in Figure 2. It is noted that all the explanatory variables have a negative relation with the number of crashes. The road surface condition is highly correlated, weather conditions are weakly correlated, and all other variables have a mild correlation with the number of crashes. In brief, the COM-Poisson distribution is the best-fitting distribution for the crashes, and there exist some significant correlated explanatory variables. Therefore, the COM-Poisson regression is further used to measure the relationship between the crashes and the corresponding significant explanatory variables. The description of the COM-Poisson regression and the relevant control charting design structure is provided in the next section.

Methodology
As discussed in the previous section, the COM-Poisson distribution was identified as the best-fitting distribution for the crashes, so this section consists of the COM-Poisson regression to assess the statistically significant explanatory variables for the crashes. Further, the existing monitoring schemes based on the COM-Poisson regression and the proposed structures are also discussed in this section. In brief, the COM-Poisson distribution is the best-fitting distribution for the crashes, and there exist some significant correlated explanatory variables. Therefore, the COM-Poisson regression is further used to measure the relationship between the crashes and the corresponding significant explanatory variables. The description of the COM-Poisson regression and the relevant control charting design structure is provided in the next section.

Methodology
As discussed in the previous section, the COM-Poisson distribution was identified as the best-fitting distribution for the crashes, so this section consists of the COM-Poisson regression to assess the statistically significant explanatory variables for the crashes. Further, the existing monitoring schemes based on the COM-Poisson regression and the proposed structures are also discussed in this section.

COM-Poisson Regression
Conway and Maxwell [62] proposed the Conway-Maxwell (COM)-Poisson (COM-Poisson) distribution, which is a flexible distribution to model the over/equi/under dispersed data. Let Y be the random variable, then the probability mass function of the COM-Poisson distribution is defined as follows: where λ i > 0, is the rate parameter, ν is the dispersion parameter, and is the normalizing factor. It should be noted that, when ν = 1, the COM-Poisson distribution reduces to the Poisson distribution; when ν = 0, the COM-Poisson distribution turns into the geometric distribution; and, when ν = ∞, it approaches the Bernoulli distribution. The mean and variance of the COM-Poisson distribution are expressed as follows [63]: where approximations hold for ν < 1 or λ > 10 ν . Further, the COM-Poisson regression model is used to describe the relationship between the response variable (Y) and the explanatory variables (X) as a function of η = X β, where β is the vector of unknown parameters. In this study, we are assuming link function g such as µ = E(Y) = g −1 (η) by following [64]. Suppose the likelihood function of the fitted model is represented by L(μ|Y) and for the saturated model is denoted by L(Y|Y) then the overall deviance can be expressed as follows: whereμ =Ê(Y) and the deviance residuals can be obtained by the following expression: where the deviance residuals are constrained such that y i > k for ν < 1/2k + 1, k ∈ N + . Moreover, the randomized quantile residuals of the COM-Poisson regression model are obtained as follows [60]: where Φ(.) is the cumulative distribution function of the standard normal distribution and U i is a uniform random variable between a i = lim is the cumulative distribution function of the response variable andθ i is the maximum likelihood estimate of the parameter θ i .

Monitoring Methods Based on the COM-Poisson Regression
Control charts are a well-known methodology of the SPC tool kit because of their efficacy in temporal monitoring of the time series process. A traditional control chart has two decision lines (i.e., lower control limit (LCL) and upper control limit (UCL)) along with a centerline. A plotting point refers to an in-control (IC) state if it lies between decision lines; otherwise, the plotting point is considered as out-of-control (OOC). Control charts have been used for numerous applications across various domains [65][66][67][68][69][70][71][72][73][74].

Existing COM-Poisson Model-Based Control Charts
Park et al. [60] and Park et al. [61] proposed Shewhart model-based control charts to monitor the COM-Poisson process. In these studies, the authors have used deviance residuals (dr) and randomized quantile residuals (qr) to define the plotting statistics. In the Shewhart model-based control charts, we define the plotting quantity as r = dr or r = qr. This quantity r of the COM-Poisson regression is used as a plotting statistic against the control limits given below: where L S1 and L S2 are specified to achieve the fixed in-control average run length (ARL 0 ). The chart declared an OOC point when any plotting statistic falls outside of the abovedefined control limits; otherwise, points are declared as IC. It is noted that, when r = dr, the said chart is known as the DR-COM-P Shewhart chart, while when r = qr, it is named as the QR-COM-P Shewhart chart.

Proposed COM-Poisson Model-Based Control Charts
The Shewhart charts are memory-less charts, which means they are designed based on current information [75]. Therefore, they are less prone to detect small shifts in the process. When the objective of the monitoring is to detect small to moderate shifts in the process, then EWMA and CUSUM charts are the alternatives [76]. The EWMA and CUSUM structures are also known as memory type charts as they used past information along with current information [77,78]. Hence, for the monitoring of small to moderate shifts, we have designed EWMA and CUSUM based charts in this study.

DR/QR-COM-P EWMA Control Charts
The EWMA statistic based on the residuals r (i.e., r = dr or r = qr) given in Equations (6) and (7) for the COM-Poisson regression is defined as follows: where λ is the smoothing parameter such that 0 < λ ≤ 1 and the starting value of the EWMA statistic is the process target (i.e., Z 0 = E(r)). The limits of the EWMA charts are given as follows: The charting constant L E1 and L E2 is set to obtain the fixed ARL 0 . When r = dr, the chart is named as the DR-COM-P EWMA chart, while when r = qr, it is known as the QR-COM-P EWMA chart. The EWMA chart signals an OOC point when any plotting statistic (Z i ) exceeds the control limits; otherwise, processes are declared as IC.

DR/QR-COM-P CUSUM Control Charts
The structure of the CUSUM chart is divided into two separate one-sided CUSUM statistics, i.e., C + and C − , which are defined as follows: where K is the reference value, and the starting values of the afore-mentioned CUSUM statistics are taken equal to zero (i.e., C + 0 = C − 0 = 0). The upper CUSUM statistic C + monitors the deviations above the target value E(r), while deviations below the target value are monitored through the lower CUSUM statistic C − . The CUSUM chart signals an OOC residual when C − i < −h 1 Var(r) and/or C + i > h 2 Var(r); otherwise, residuals are considered IC. Moreover, the parameters K, h 1 , and h 2 are carefully chosen against the pre-specified ARL 0 to determine the performance of the CUSUM chart. Furthermore, it is to be noted that the CUSUM chart is known as the DR-COM-P CUSUM chart when r = dr and the QR-COM-P CUSUM chart when r = qr.

Simulation-Based Assessment of Proposed Charts
In this section, we will provide a simulation study and the algorithm used to determine the coefficients of control limits. Moreover, the evaluation of the proposed charts and their comparison with the existing Shewhart structures are also presented.

Simulation Settings of COM-Poisson Model
For the evaluation of the proposed EWMA and CUSUM charts and the comparative analysis with existing Shewhart charts, we have used the following data generation model: . . n is the mean function and ν is the dispersion parameter. The X 1 and X 2 are independent auxiliary variables, generated through a standard normal distribution (i.e., X 1 ∼ N(0, 1) and X 2 ∼ N(0, 1)). The parameters β 0 are considered equal to 0.5, β 1 = 0.25, β 2 = −0.5, and the sample size is fixed as 1000 (i.e., n = 1000). Further, for the full coverage COM-Poisson model, varying choices of dispersion parameters were considered (i.e., ν = 0.5, 1.0, 1.5) to assess the performance of the proposed charts.
In the COM-Poisson model, λ i and ν are the fundamental parameters, where the main objective is to detect an increasing shift in λ i at the fixed dispersion parameter ν. Therefore, we evaluated the performance of the control charts by considering the shifts in the λ i , such that the process changes from λ i to λ i + δ Var(r). It is to be mentioned that the choices of δ and ν are made as listed below: Previous studies have proposed different evaluation metrics to assess the predictive performance of control charts. For example, Mahmood [59] used run length (RL) properties such as average run length (ARL) and standard deviation of run length (SDRL) to evaluate the model-based charts under zero-inflated models. For the current study, we are also evaluating the proposed EWMA and CUSUM charts based on the ARL and SDRL metrics. The ARL is the average number of samples until a signal occurs, which is further categorized into in-control ARL (ARL 0 ) and out-of-control ARL (ARL 1 ). The charts are designed based on the fixed ARL 0 and a chart having minimum ARL 1 is declared as the best chart among all others.

Algorithm for Charting Constants
As mentioned in Section 3.2, the control limits of each chart depend on the charting constants such as L S1 , L S2 , L E1 , L E2 , h 1 , and h 2 . The procedure to find the control charting constants for the stated charts at pre-specified ARL 0 = 200 is illustrated in the following steps: i.
Generate a data set of fixed sample size n using simulated model structure given in Section 4.1. ii. Fit the COM-Poisson model on the data set and obtain the deviance residuals (dr) given in (6) and randomized quantile residuals (qr) expressed in (7). Further, estimate the mean and standard error of the dr and qr. iii. For the Shewhart charts, set the values of L S1 and L S2 , and fix L E1 and L E2 for the EWMA chart. Similarly, fix the random values h 1 and h 2 for each CUSUM chart. Further, obtain the control limit(s) and control chart statistic(s) using the estimates from step ii and selected values. iv. In the case of Shewhart control charts, plot the dr and qr values against their respective control limits given in (8). For the EWMA control charts, use the respective dr and qr for obtaining the respective EWMA statistics using (9) and plot it against the respective control limits given in (10). However, in the case of CUSUM control charts, use the respective dr and qr for obtaining the respective CUSUM statistics using (11) and (12), and compare these statistics with their respective decision intervals. v.
Repeat steps i-iv for a large number of runs to obtain specified ARL 0 .
If pre-specified ARL 0 is not obtained, then adjust the afore-mentioned selected values and repeat steps i-v until pre-specified ARL 0 is achieved. It is noted that residuals sometimes showed asymmetry behavior. Therefore, we have set the limits in such a way that each side of the chart shows ARL 0 ≈ 400 to obtain an overall ARL 0 = 200. Further, the control charting constants are reported in Table 3 at different choices of dispersion parameters.

Performance Analysis
In this section, we have presented a comparative performance evaluation of existing and proposed charts schemes using the results obtained from an extensive simulation study with 10 6 iterations. The performance of memory type and memoryless control charts is assessed and reported in terms of ARL and SDRL in Tables 4-6. In addition, the impact of the reference value (K) on the performance of the CUSUM control charts is investigated. Similarly, the impact of the smoothing parameter (λ) on the performance of the EWMA control charts is also evaluated. Figure 3 shows the impact of charting parameters on the performance of the proposed charts.

Analysis Based on Under-Dispersed Data
By assuming ν = 1.5, the detection ability of the proposed EWMA and CUSUM charts is compared with the existing Shewhart charts in Table 4. The findings depict that the charts based on deviance residuals (i.e., DR-COM-P Shewhart, DR-COM-P EWMA, and DR-COM-P CUSUM) are relatively more efficient to detect increasing shifts in the mean. For example, when δ = 2, the ARL of the DR-COM-P Shewhart chart is reported as 26.73, which is lower than the ARL value of 27.08 of the QR-COM-P Shewhart chart. Further, at fixed λ = 0.2, a shift δ = 1 may cause 85 and 87 percent reduction in the ARL of the QR-COM-P EWMA and DR-COM-P EWMA charts, respectively. Similarly, at fixed K = 0.5, a 192.47-unit and 191.42-unit decrease is noted in the ARL of QR-COM-P CUSUM and DR-COM-P CUSUM charts, respectively, due to a shift δ = 2.5.
It is also noted that the EWMA type charts (i.e., QR-COM-P EWMA and DR-COM-P EWMA) have relatively better detection ability compared with CUSUM and Shewhart type charts. For example, at fixed λ = 0.2 and K = 0.5, when δ = 1, the ARL of QR-COM-P Shewhart is reported as 69.32, while ARL values of 29.33 and 32.98 are noted for the QR-COM-P EWMA and QR-COM-P CUSUM charts, respectively. Similarly, a shift δ = 1.5 may cause 79, 93, and 92 percent reduction in the ARL of DR-COM-P Shewhart, DR-COM-P EWMA, and DR-COM-P CUSUM charts, respectively. charts is assessed and reported in terms of and in Tables 4-6. In addition, the impact of the reference value ( ) on the performance of the CUSUM control charts is investigated. Similarly, the impact of the smoothing parameter ( ) on the performance of the EWMA control charts is also evaluated. Figure 3 shows the impact of charting parameters on the performance of the proposed charts.

Analysis Based on Under-Dispersed Data
By assuming = 1.5, the detection ability of the proposed EWMA and CUSUM charts is compared with the existing Shewhart charts in Table 4. The findings depict that the charts based on deviance residuals (i.e., DR-COM-P Shewhart, DR-COM-P EWMA, and DR-COM-P CUSUM) are relatively more efficient to detect increasing shifts in the mean. For example, when = 2, the of the DR-COM-P Shewhart chart is reported as 26.73, which is lower than the value of 27.08 of the QR-COM-P Shewhart chart. Further, at fixed = 0.2, a shift = 1 may cause 85 and 87 percent reduction in the of the QR-COM-P EWMA and DR-COM-P EWMA charts, respectively. Similarly, at fixed = 0.5, a 192.47-unit and 191.42-unit decrease is noted in the of QR-COM-P CUSUM and DR-COM-P CUSUM charts, respectively, due to a shift = 2.5.
It is also noted that the EWMA type charts (i.e., QR-COM-P EWMA and DR-COM-P EWMA) have relatively better detection ability compared with CUSUM and Shewhart type charts. For example, at fixed = 0.

Analysis Based on Equi-Dispersed Data
The findings of the comparative analysis between the proposed and existing charts at fixed ν = 1 are given in Table 5. The results showed that the QR-COM-P Shewhart and DR-COM-P Shewhart charts depicted almost similar performance. Moreover, the deviance residuals-based charts (i.e., DR-COM-P EWMA and DR-COM-P CUSUM) have better detection ability as compared with quantile residual-based charts (i.e., QR-COM-P EWMA and QR-COM-P CUSUM). For example, at fixed λ = 0.35, when δ = 0.5, the ARL of the DR-COM-P EWMA chart is reported as 36.21, which is lower than the ARL value of 38.11 of the QR-COM-P EWMA chart. Further, at fixed K = 1, a shift δ = 1.75 may cause 97 and 98 percent reduction in the ARL of QR-COM-P CUSUM and DR-COM-P CUSUM charts, respectively.
Similar to the findings of under-dispersed data, the EWMA type charts (i.e., QR-COM-P EWMA and DR-COM-P EWMA) under equi-dispersed data showed relatively better performance compared with Shewhart and CUSUM type charts. For example, at fixed λ = 0.35 and K = 1, a shift δ = 1.25 may cause 92, 96, and 95 percent reduction in the ARL of QR-COM-P Shewhart, QR-COM-P EWMA, and QR-COM-P CUSUM charts, respectively. Similarly, when δ = 1.75, the ARL of DR-COM-P Shewhart is reported as 8.59, while ARL values of 4.70 and 4.92 are noted for the DR-COM-P EWMA and DR-COM-P CUSUM charts, respectively.

Analysis Based on Over-Dispersed Data
By fixing ν = 0.5, the performance of the proposed EWMA and CUSUM charts is compared with the that of Shewhart charts in Table 6. Once again, it is noted that the charts based on deviance residuals (i.e., DR-COM-P Shewhart, DR-COM-P EWMA, and DR-COM-P CUSUM) have relatively better detection ability as compared with the quantile residual-based charts (i.e., QR-COM-P Shewhart, QR-COM-P EWMA, and QR-COM-P CUSUM). For example, when δ = 0.12, the ARL of the DR-COM-P Shewhart chart is reported as 60.21, which is lower than the ARL value of 61.08 of the QR-COM-P Shewhart chart. Further, at fixed λ = 0.5, a shift δ = 0.08 may cause 63 and 66 percent reduction in the ARL of QR-COM-P EWMA and DR-COM-P EWMA charts, respectively. Similarly, at fixed K = 1.5, a 186.21-unit and 187.01-unit decrease is noted in the ARL of the QR-COM-P CUSUM and DR-COM-P CUSUM charts due to a shift δ = 0.22.
Similar to the other findings, under over-dispersion data, the EWMA type charts (i.e., QR-COM-P EWMA and DR-COM-P EWMA) have relatively better detection ability as compared with Shewhart and CUSUM type charts. For example, at fixed λ = 0.5 and K = 1.5, when δ = 0.10, the ARL of QR-COM-P Shewhart is reported as 82.39, while ARL values of 51.00 and 68.21 are noted for the QR-COM-P EWMA and QR-COM-P CUSUM charts, respectively. Similarly, a shift δ = 0.20 may cause 89, 93, and 92 percent reduction in the ARL of DR-COM-P Shewhart, DR-COM-P EWMA, and DR-COM-P CUSUM charts, respectively.

Impact of Charting Parameters λ and K
The smoothing parameter λ plays a vital role in the detection ability of the EWMA chart. Therefore, for brevity, the performance of the QR-COM-P EWMA chart for three choices of λ (i.e., λ = 0.20, 0.35, and 0.50) at fixed v = 1.5 is plotted in Figure 3a. It is clearly seen that the detection ability of the QR-COM-P EWMA chart increases with the decrease of λ. For example, when δ = 1.5, the ARL values of QR-COM-P EWMA are reported as 16.00, 19.19, and 23.34 against λ = 0.20, 0.35, and 0.50, respectively.
In the CUSUM chart, the reference value K is used to set the detection ability of the CUSUM chart. To notice the effect of K, we have drawn the results of the DR-COM-P CUSUM chart for three choices of K (i.e., K = 0.5, 1.0, and 1.5) at fixed v = 1. The ARL curves revealed an inverse relationship between the detection ability of the CUSUM chart and reference value K. For example, when δ = 0.25, the ARL values of the DR-COM-P CUSUM chart are reported as 79.19, 102.87, and 125.53 against K = 0.5, 1.0, and 1.5, respectively. The impact of charting parameters λ and K is shown above in Figure 3.

Implementation of Proposed Methods for Real-Time Crash Monitoring
In this section, we have applied the COM-Poisson regression to observe the significant factors contributing to the crash occurrences. Further, based on the estimated model, we have implemented the proposed charts for real-time monitoring.
A close view of Figure 1 shows that daily crash counts witness a steady downward trend after December 2018 and do not exceed this point again. This downward trend in observed crash record may be attributed to the implementation and enforcement of a new automated citation system introduced at the start of 2018 under the SAHER program [79]. SAHER is an automated system adopted for controlling traffic using a digital network of cameras connected to the central information center. Hence, crash data to this point act as IC data and are used for establishing control chart structure, and the remaining data are considered for the monitoring phase (in an OOC state).
As discussed in Section 2, the indexed values of road type, road surface conditions, damage type post-crash, weather conditions, road markings, and cat eyes are considered as possible explanatory variables for the crashes. Therefore, to assess the most significant explanatory variables, we have computed pairwise COM-Poisson regression models based on IC data. Out of 63 different models, the following model is considered as the best model based on the minimum values of AIC = 6164.0668, BIC = 6182.9137, and Log − liklihood = −3078.0334.
where IVRT and IVRSC represent the indexed values of road type and road surface conditions, respectively. It is evident from the results of the IC model that the intercept term, road type, and road surface conditions are statistically significant explanatory/predictor variables for the crashes with p-values < 0.01 and standard errors of 0.1052, 0.0266, and 0.0120, respectively. Moreover, it is also observed that the estimated dispersion parameter is observed asν = 0.4169, which is also statistically significant with a p-value of less than 0.01 and a standard error of 0.021. For the development of the control chart setup, we have considered the abovementioned IC model. Further, using the normal probability plots plotted in Figure 4, it is observed that the IVRT is a normally distributed variable with a mean of 1.86 and variance of 0.017 (cf. Figure 4a). Similarly, the road surface conditions also follow a normal distribution, having a mean of 1.81 and variance of 0.08 (cf. Figure 4b). Hence, based on these estimates and following Section 4.1, we have set up the simulation settings of the COM-Poisson model, and control charting constants are obtained using the algorithm given in Section 4.1, and are reported in Table 7. It is noted that we have assumed ARL 0 = 200, λ = 0.50, and K = 1.5.  The randomized quantile residual-based charts (i.e., QR-COM-P Shewh COM-P EWMA, and QR-COM-P CUSUM) and deviance residuals-based charts COM-P Shewhart, DR-COM-P EWMA, and DR-COM-P CUSUM) are implem the crash data set. The QR-COM-P Shewhart, QR-COM-P EWMA, and QR CUSUM charts are plotted in Figure 5, whereas the DR-COM-P Shewhart, DR EWMA, and DR-COM-P CUSUM are given in Figure 6. In every chart, the represents plotting statistics based on the residuals from IC model, while the bl used to show plotting statistics based on OOC residuals. Further, green dotted used to show the control limits of every chart.  The randomized quantile residual-based charts (i.e., QR-COM-P Shewhart, QR-COM-P EWMA, and QR-COM-P CUSUM) and deviance residuals-based charts (i.e., DR-COM-P Shewhart, DR-COM-P EWMA, and DR-COM-P CUSUM) are implemented on the crash data set. The QR-COM-P Shewhart, QR-COM-P EWMA, and QR-COM-P CUSUM charts are plotted in Figure 5, whereas the DR-COM-P Shewhart, DR-COM-P EWMA, and DR-COM-P CUSUM are given in Figure 6. In every chart, the red line represents plotting statistics based on the residuals from IC model, while the blue line is used to show plotting statistics based on OOC residuals. Further, green dotted lines are used to show the control limits of every chart.
It is observed from Figures 5 and 6 that the QR-COM-P Shewhart chart signaled four false alarms with 26 OOC points, while the same number of signals is detected by the DR-COM-P Shewhart chart. The QR-COM-P EWMA and DR-COM-P EWMA charts show 13 and 14 false alarms, while 76 and 80 OOC signals are reported by QR-COM-P EWMA and DR-COM-P EWMA charts, respectively. Further, 22 and 23 false alarms are reported by the QR-COM-P CUSUM and DR-COM-P CUSUM charts, respectively, while 124 and 126 OOC points are signaled by the QR-COM-P CUSUM and DR-COM-P CUSUM charts, respectively. Symmetry 2021, 13, x FOR PEER REVIEW 17 of 21 Figure 5. Implementation of randomized quantile residual-based charts on crash data. IC, in-control; OOC, out-of-control.   . Implementation of deviance residual-based charts on crash data. Figure 6. Implementation of deviance residual-based charts on crash data.

Summary, Conclusions, and Recommendations
During the past few decades, many studies have proposed a wide range of statistical modeling approaches to explore the relationships between causal factors and crash occurrence. However, very few have focused on developing germane procedures for real-time highway safety surveillance using available crash records. Control charts are useful tools in SPC with numerous applications for active event monitoring. Traditionally, the Poisson distribution is frequently used to interpret the data for a control chart; however, it may be inappropriate to model under-dispersed or over-dispersed data. This study proposes the EWMA and CUSUM control chart scheme for highway safety monitoring using three years of crash data (2017-2019) for rural highways in Saudi Arabia. During the first stage of the study, three well-known count data distribution models (Poisson, negative binomial, and COM-Poisson) were investigated to identify the most appropriate distribution model for the data. The findings showed that the COM-Poisson regression model is the best fitted statistical model.
During the second stage of the study, EWMA and CUSUM type charts based on the residuals (i.e., deviance or randomized quantile residuals) of the COM-Poisson regression model were developed from a crash monitoring perspective. An extensive simulation study was designed to assess the performance evaluation of the proposed control charts scheme and their comparison with the existing Shewhart type chart. The results revealed that the charts based on deviance residuals (i.e., DR-COM-P Shewhart, DR-COM-P EWMA, and DR-COM-P CUSUM) were relatively more flexible and efficient in detecting increasing shifts in the mean. Further, it was noted the EWMA type charts (i.e., QR-COM-P EWMA and DR-COM-P EWMA) outperformed Shewhart and CUSUM type charts in terms of considered evaluation metrics (both ARL and SDRL). The results from the simulation study also indicated an inverse relationship of the reference value (K) smoothing parameter λ with the performance of the GLM-based CUSUM EWMA control charts. Finally, during the third stage of the study, the proposed monitoring methods were successfully implemented on real-time crash data. The findings of this study could provide useful essential guidance to policy-and decision-makers for initiating concrete steps to improve users' road safety. The proposed real-time monitoring for highway safety surveillance can guide on effective and proactive implementation of different hazard control measures to mitigate the occurrence of future traffic crashes. Some of the quick hazard control measures suggested in this regard include measures such as improving the surface conditions, installation of variable message signs for guidance and warnings, raised pavements, delineators on horizontal curves, weather warning systems, ramp metering, and induction of speed and traffic calming measures at crash hotpots, among others.
This study has a few limitations that might be addressed in future studies. It is worth noting that the present study was designed assuming the known parameters. It will be interesting to see the effect of parameter estimation on the charts in forthcoming studies. The performance of the proposed control chart scheme was verified using limited (three years) real-life crash data. Further investigations using other detailed datasets are needed for a more precise assessment of the statistical properties of the suggested monitoring procedures. Future studies could also consider the application of the proposed methods for real-life monitoring of crashes by individual severity groups (fatal, injury, and property damage) or specific crash types. The monitoring scheme based on crash severity classes will be helpful in prioritizing prevention strategies with the emphasis placed on more severe crashes. Future studies could also investigate the impact of auto-correlated response variables. Furthermore, one may adopt advanced charting structures such as moving average, progressive moving average, mixed EWMA-CUSUM, and HEWMA type charts.