2.2. Measuring Recovery Capability
Recovery is defined here as the process of reconstructing study regions, enabling life and livelihood to return to pre-disaster states [
11]. Recovery capability is therefore represented by the length of time required to return to the pre-disaster status. In our case, the waste-water and waste-gas discharge/emission data of key utilities and main factories or their surrounding factories and residential areas provide a basis to measure such a capacity. This is because the amount of waste water and gas produced by people’s living and local industrial production retains in a stable level in normal situations; however, during the flooding period, these factories or their served factories and residential areas are inundated and therefore the hourly amount of waste-water discharge and waste-gas emission would be disturbed, resulting in abnormal records (extremely high or low values). As a corollary, it is reasonable to capitalize on the waste-water and waste-gas discharge/emission data to assess the recovery capability, given the fact that such data can reflect whether people’s living and local industrial production are at normal or abnormal status.
To estimate the duration of a recovery process, a new approach is proposed in this study, which applies a time-series analysis of waste-water and waste-gas discharge/emission data that are disclosed online by key monitoring enterprises. There are 42 discharge/emission monitoring points from sample enterprises recording waste-water and waste-gas discharge/emission data every hour (
Figure 2). Given that most of the discharge/emission records showed a disturbance during the flood-time and the enterprises attributed the disturbance to flood influence, it can be reasonably assumed that there was a change in the time series of waste-water and waste-gas discharge/emission data if the enterprise had been affected by flooding. And thus we applied change detection analysis using the R programming language to extract the beginning and end times of the change, and used the results to estimate the duration of the recovery process of the sample enterprise. The time taken for the enterprises to recover is used to represent the recovery duration of regions in the study area. The following section describes the procedures in detail.
To minimize the influence of autocorrelation—the cross-correlation of a signal with itself at different points in time—plots of autocorrelation function (ACFs) and partial autocorrelation functions (PACFs) were first examined to detect the autocorrelation degree. The differences of the time series data of each sample were then calculated, and finally ACF and PACF were rechecked to control the degree of autocorrelation. The detailed procedures are given in
Figure 3.
Next, we need to select an optimal model for change detection procedure. The results of distribution tests show that most samples do not belong to any known distributions, so nonparametric approaches should be used to detect the change. Additionally, since this study attempts to detect time points at the beginning and end of changes, at least two change points need to be detected in the final results. Therefore, optimal models can only be selected from nonparametric alternatives for multiple change point analysis.
Among a set of nonparametric models for multiple change points detection, models used in the changepoint, cpm and ecp packages in R serve as widely-used models for change detection, with proven effectiveness and efficiency in detecting anomalies in bioinformatics [
43], transportation [
44], finance [
45], ecology [
46], etc. Each of the nonparametric approaches has its pros and cons [
47,
48,
49]. The model selection should be based on model performance in specific case studies. We first extracted change points in the observed time series of the sample enterprises using all three approaches, and then compared the detection results with real situations to achieve the optimal model. After determining the optimal model, the detection results were further modified for a better estimation of recovery duration.
The first step is to extract possible change points related to the flooding case. According to the precipitation records of all rain gauge stations in the study area, the first record of 24-h precipitation that exceeded 50 mm was at 26 June 2015 10:00, while the last was at 30 June 2015 9:00. Therefore, we can assume the start point of a possible change will be located during that period, if there is in reality a change over the flooding period. Based on this assumption, only points extracted between 26 June 2015 10:00 and 30 June 2015 9:00 are retained to detect the start point, and the start point of the change caused by the flood damage can only be one of these points. To determine the end point of the change, as the recovery process will only begin after a disaster strikes, points detected around 30 June 2015 9:00 (before or after) are hypothesized as the alternatives for end point selection. To further identify which of these alternatives could be the start or the end point of the change, we constrained the time duration of changes with respect to the time series of the precipitation data. As sample enterprises will always be disturbed (changes start) during the downpour and recover (changes end) after it, the start and end time of a downpour can serve as constraints in determining the exact start and end points. The downpour here is defined according to the Chinese precipitation classification system, where the degree of rainfall reaches downpour level if over 24 h precipitation accumulation exceeds 50 mm and tends to continue. After examining the start and end time of a downpour recorded by the nearest rain gauge station, the start point of the change in a sample monitoring point should be the first immediately after the time of the first downpour record of its nearest rain gauge station, and the end point of the change should be the first of the alternatives immediately after the time of the last downpour record, as observed by its nearest rain gauge station.
Next, the detection results must be validated, which is an important but always difficult procedure in change detection. In this study, the validation procedure helps compare the performances of three approaches, and determines which is optimal for change detection. As the sample monitoring point demonstrates change only when the sample area or the surrounding area is damaged by flooding, we examined the final change detection results of sample enterprises with respect to the flood-damaged area. For this purpose 100 m buffers for each sample monitoring point were created and used to examine whether they intersected with the damaged area. Damaged areas are defined according to the inundation map created by the ArcHydro toolbox in ArcGIS software, based on water level data of gauge stations (
Figure 4). The more the buffers of samples intersect with the damaged area, the better the detection results. The detection accuracy is calculated following Equations (1) and (2), based on real situations (damage or non-damage) and detection results (change or non-change), as shown in
Table 1.
where
After validation of the detection results, the best of the three approaches is already determined. However, as any of the three detection methods may have some misdetection cases, the result of the best approach should be modified to lower the misdetection rate. Misdetection can occur from two mismatching conditions: samples in the damaged area but with no changes detected (damage but non-change samples), and samples that have changes but are located in non-damaged regions (non-damage but change samples). For samples in the first condition, detection results of the other two approaches (mvc and cpm) should be checked, serving as substitutes for damage but non-change cases when similar detection results are found in both approaches. For samples in the second condition, changes detected by the selected method are compared to the results of the other approaches. When the other two methods have generated similar results, these common detection results will replace those detected by the selected method.
It is also noteworthy that changes detected in the previous process are composed of two parts—the duration of disturbance due to flooding damage, and the time consumption for a sample monitoring point to return to a stable status. To simplify the calculation procedure, the disturbance duration of a sample monitoring point is assumed to be the length of time the precipitation record of its nearest observatory remains in the downpour degree. Under this assumption, the returning time can be calculated by the following Equation (3).
where
: time consumption of ith sample monitoring point to return;
: end time point of detected change of ith sample monitoring point;
: end time point of downpour of ith sample monitoring point.
As sample enterprises are power plants, sewage treatment plants and main factories, their status can serve as an important index of production and life conditions in the neighboring area, and the return duration of these enterprises can to some extent represent the recovery capability of that area, particularly if it consists of build-up area which is derived from artificial land cover of the GLOBELAND30 dataset [
50]. Therefore, to evaluate the recovery capability of the study area, we mapped the return time consumption over the build-up area, based on the calculation results of the sample enterprises.
2.3. Connecting the Measurements of Recovery to Resilience
The existing literature has shown that infrastructure recovery capability could serve as important proxy of community recovery capability [
24,
33,
34,
35,
51] and provide a reference when assessing community resilience to natural disasters [
52]. Therefore, in this step, the measured infrastructure recovery capability would be used as the external validation metrics to identify dominant resilience factors in the study area. In doing so, a number of variables were collected to represent the multi-dimensional nature of disaster resilience. The variables were selected from social, economic, infrastructural, and environmental dimensions, which are reportedly common components [
53]. As several variables were from census statistics, such as demographic data at the sub-district (or town) level, other continuous variables were aggregated at the same levels to match the scale of the data. The final list of selected variables is given in
Table 2. The first component encompasses social aspects, integrating age distribution, sex ratio, informal settler, and health service. Demographical attributes serving as subcomponents of the social component suggest that regions with fewer elderly people, children, or women and with a low level of migration are more likely to enhance community resilience, thus speeding up the recovery process after a natural disaster [
54,
55]. The variable of health service is represented by the share of health care facilities within a sub-district (or town), and a higher level suggests a higher standard of living for local residents, which may promote pre-disaster preparedness and post-disaster recovery [
12,
16]. The second component focuses on economic indicators, combining indices of industrial development and economic stability. Key indicators are the urban to rural ratio, GDP per capita, share of central business district (CBD), and manufacturing density. A higher ratio of urban to rural, and a greater GDP per capita will in most cases indicate greater diversity and higher economic stability, which will enhance economic component [
11,
16]. The share of CBD and the density of manufacturing represent the commercial and manufacturing establishments that affect the economic asset exposure to natural hazards, implying a longer recovery time once damaged [
11,
16,
56]. The third component summarizes infrastructure conditions; it is designed to evaluate the preparedness for resisting damage before a disaster, and to evaluate the rapidity and redundancy of response during and recovery after a disaster. Indicators of this component, such as access to administration centers, hospitals, and open spaces, provide an assessment of the capability of communities to deal with emergencies [
55,
57]. Road density is also included, to represent pre-disaster evacuation capability and the ability to efficiently respond and quickly recover after disasters [
11,
12]. Environmental resilience is the last component of disaster resilience, and variables collected include ecological conditions such as river density, urban green level, elevation, and slope. Previous literature had found that a larger amount of urban green areas can improve the ecological condition [
15], while lower river density and a flatter surface reduce the risk of storm surge inundation and secondary landslides [
12]. Lower elevation and slope also provide an improvement of the accessibility and ease of rescue work [
13,
58].
After determining the variables to represent disaster resilience, the map of time consumption for recovery was spatially intersected and aggregated at sub-district (or town) level to generate the corresponding validation metrics at the same scale. A mean value of recovery duration per sub-district (or town), was then generated to represent the recovery capability of each sub-district (or town). A longer duration of recovery implies a lower recovery capability.
To validate the indicators of disaster resilience, regression analysis was applied between the external validation term, represented by the recovery capability of each sub-district (or town), and the variables collected as proxy for measuring disaster resilience. To prepare the independent variables, all data in the sub-district (or town) were standardized through a “Min-Max” conversion resulting in “0–1 range” rescaled variables. Variables which were interpreted as highly correlated (Spearman’s
R > 0.700) were eliminated from further consideration to collinearity problems. The return speed of water level within 24 h after the strong rainstorm was included in the regression as the control variable to avoid the confounding effects associated arising from the intrinsic relationship between floods intensity and flood recovery. This control variable further allows the separation of recovery process after the flood recession from the total recovery process including the process of flooding receding and the process of refunctioning of people’s lives and livelihoods as well. As the latter was documented being largely affected by preexisting community resilience level [
11,
23], it could, in turn, be more reasonable to be used as the external term to validate metrics of resilience.
As preliminary testing of variables showed a violation of linearity and normality assumptions concerning regression, ordinal logistic regression analysis was applied to assess the association among recovery capability, selected resilience variables and the control variable. Ordinal logistic regression is applied to deal with a dichotomous dependent variable, so allowing for more than two (ordered) response categories. Therefore, to prepare the dependent variable for logistic regression, fifty-eight sub-districts (or towns) within the study area were classified into four classes. These are coded from 1 to 4, where 4 indicates the sub-district could recover within 24 h, representing the highest recovery capability; 3 indicates a sub-district with the second-highest recovery capability (24 h ≤ recover time < 48 h); 2 represents the third rank of recovery capability (48 h ≤ recover time < 72 h); and 1 represents the last ranked class, which takes over 72 h to recover.
Unlike linear regression analysis, logistic regression models the log odds ratio of outcome as a linear combination of the predictor variables (Equation (4)). The outcome is the probability of one specific event occurring. In this study, the highest recovery class (Class 4) was assigned as the reference class, and an event was considered to occur when there is movement from one recovery class to the next.
In which,
: The probability of observing the particular set of dependent variable values () that occurs in the sample;
: The probability of referent dependent variable value (K) that occurs in the sample;
: The multinomial logit estimate for observing occurring, relative to observing K occurring in the sample, when the dependent variables in the model are evaluated at zero.
: The coefficient between jth independent variable and the natural log of the odds of the dependent variable equaling , when the coefficient is usually estimated using maximum likelihood.