Spatial Assessment of Urban Flood Susceptibility Using Data Mining and Geographic Information System ( GIS ) Tools

Using geographic information system (GIS) tools and data-mining models, this study analyzed the relationships between flood areas and correlated hydrological factors to map the regional flood susceptibility of the Seoul metropolitan area in South Korea. We created a spatial database of data describing factors including topography, geology, soil, and land use. We used 2010 flood data for training and 2011 data for model validation. Frequency ratio (FR) and logistic regression (LR) models were applied to 2010 flood data to determine the relationships between the flooded area and its causal factors and to derive flood-susceptibility maps, which were substantiated using the area flooded in 2011 (not used for training). As a result of the accuracy validation, FR and LR model results were shown to have 79.61% and 79.05% accuracy, respectively. In terms of sustainability, floods affect water health as well as causing economic and social damage. These maps will provide useful information to decision makers for the implementation of flood-mitigation policies in priority areas in urban sustainable development and for flood prevention and management. In addition to this study, further analysis including data on economic and social activities, proximity to nature, and data on population and building density, will make it possible to improve sustainability.


Introduction
The increase in global temperatures, seasonal shifts, and precipitation increases are just a few of the many consequences of climate change that are currently experienced on a daily basis [1,2].Heavy rains, typhoons, floods, and other meteorological conditions cause variation in the hydrological system and increase the pressure on drainage systems, waterworks, and sewerage facilities in urban areas [3,4].In addition, an increase in impermeable surfaces such as buildings and roads can lead to an exponential increase in surface flow by decreasing the absorption and storage of rainwater below the surface [3].Subsequently, with a higher concentration of rainwater in rivers, flooding risks increase substantially.Therefore, it is vital to assess and prevent these risks in urban areas to promote sustainability.Countermeasures to prevent urban flooding such as the extension of drainage pipelines and the improvement of undercurrent facilities and pumping stations are vital; they require that information on susceptible areas be obtained prior to flooding.Hence, flood susceptibility assessment is required for decision making and prioritization in order to minimize flood damage.
Recent advances in technology, including the spread of the Internet and mobile devices, has led to an exponential increase in the accumulation of data, whilst the development of hardware and distributed-parallel processing technology has improved our ability to accumulate data, providing the basis for data mining [5].In order to increase the utilization of large databases, it is important to make reasonable and rapid decisions, and to find significant information that can support decision making [6].In this environment, modeling is evolving with data-based learning instead of laboratory-based learning.Data-mining techniques are useful in elucidating the mechanisms between certain events and related variables [7].In this paper, data mining is performed to analyze the correlation between flood data and related factors used in previous research.
The Seoul metropolitan area, the capital and largest city of the Republic of Korea, suffers from extensive flood damage attributable to heavy rainfall during the summer.Flood susceptibility maps are essential to prevent and skillfully manage flood disasters.In this study, we generated flood susceptibility maps for the Seoul metropolitan area using data-mining models in a geographic information system (GIS) environment.Many studies have applied GIS and remote-sensing techniques to produce thematic maps related to geomorphology, land cover, drainage patterns, geology, and soils [8][9][10].
Due to a complex set of flood-related factors, it is important to use data-mining techniques to make a suitable assessment of flood susceptibility.Various data-mining approaches have been used for decades to predict and assess flood occurrence [11,12].In general, soft computing or statistical data-mining techniques have recently been used, such as the decision-tree based model [13][14][15], artificial neural network (ANN) based model [16][17][18], support-vector machine [19][20][21] or the logistic regression (LR) model [22,23].Proactively, a basic statistical model, frequency ratio (FR), is used to observe trends between flood-related factors and flood events [24][25][26][27].More in-depth assessments have incorporated numerical modeling and analytic hierarchy process analysis [28][29][30] Some studies have demonstrated hydraulic models or created additional thematic maps by mixing geophysical surveys with GIS [31][32][33] and remote-sensing techniques [34,35].In this study, flood susceptibility analysis was performed by using FR and LR models, which are the most basic statistical data-mining models, for the case study in the Seoul metropolitan area.
Individual efforts to predict floods related to climate change and to design measures to efficiently prevent flooding include studies of climate change, the correlation of rainfall and flooding, and statistical analysis methodology for floods [36][37][38][39][40][41][42].Studies mapping flood hazards [43][44][45] and analyzing the effects of urban growth or other factors on flooding [46,47] have been conducted.The purpose of this study is to map the regional flood susceptibility of the Seoul metropolitan area in South Korea by using the data-mining and GIS techniques.Therefore, in the current study, we contributed to this body of work by generating susceptibility maps based on FR and LR models as a case study in this study area.These models were then validated to assess the sensitivity of the Seoul metropolitan area to floods.

Study Area
Seoul is located in the middle of the Korean peninsula, covering an area of 605.41 km 2 at 126 • 59 40 E, 37 • 33 59 N (Figure 1).Although the city accounts for less than 1% of the total area of South Korea, 2011 census statistics indicated that the population had reached approximately 10 million citizens or approximately one-quarter of the total population of South Korea.Due to the high frequency of temporary population migration, the city has a highly developed traffic system, which is vulnerable to serious damage during natural disasters such as landslides [1].
Seoul has a coastal climate, with an average annual temperature (1981-2010) of 12.5 • C.Although the cold high-pressure system of the continent affects the city in winter, the influence of the Sea of Okhotsk in summer typically leads to a seasonal temperature difference of 30 • C. The average annual precipitation is 1450 mm, and most precipitation occurs during the summer.The city is situated in a natural basin surrounded by numerous peaks over 300 m a.s.l.The highest peaks are Bukhansan (836 m), Dobongsan (740 m), Inwangsan (338 m), and Gwanaksan (629 m).Historically, these mountains have served as a natural fortress, protecting the city from damage.Other important mountains in the region are Buaksan (342 m), which protects the presidential palaces of Cheongwadae and Namsan (232 m), the old city shield of Seoul.The city is divided into Gangbuk and Gangnam, on the Han River, which is one of four major rivers in Korea.Gangbuk developed in the valley between Dobongsan and Bukhansan.The region where the Cheonggyecheon, Jungnangcheon, and Han Rivers join is an alluvial plain.Geologically, the bedrock of Seoul is mainly granite and gneiss, which greatly affects the topography and soil.In the western part of the Gangbuk area, there are mainly banded gneisses; the eastern part contains mainly Daebo granite.In the Gangnam region, banded gneiss is present in the east, whereas a mixture of Daebo granite or granite gneiss and schist dominate in the west (Figure 2).Bukhansan (836 m), Dobongsan (740 m), Inwangsan (338 m), and Gwanaksan (629 m).Historically, these mountains have served as a natural fortress, protecting the city from damage.Other important mountains in the region are Buaksan (342 m), which protects the presidential palaces of Cheongwadae and Namsan (232 m), the old city shield of Seoul.The city is divided into Gangbuk and Gangnam, on the Han River, which is one of four major rivers in Korea.Gangbuk developed in the valley between Dobongsan and Bukhansan.The region where the Cheonggyecheon, Jungnangcheon, and Han Rivers join is an alluvial plain.Geologically, the bedrock of Seoul is mainly granite and gneiss, which greatly affects the topography and soil.In the western part of the Gangbuk area, there are mainly banded gneisses; the eastern part contains mainly Daebo granite.In the Gangnam region, banded gneiss is present in the east, whereas a mixture of Daebo granite or granite gneiss and schist dominate in the west (Figure 2).Bukhansan (836 m), Dobongsan (740 m), Inwangsan (338 m), and Gwanaksan (629 m).Historically, these mountains have served as a natural fortress, protecting the city from damage.Other important mountains in the region are Buaksan (342 m), which protects the presidential palaces of Cheongwadae and Namsan (232 m), the old city shield of Seoul.The city is divided into Gangbuk and Gangnam, on the Han River, which is one of four major rivers in Korea.Gangbuk developed in the valley between Dobongsan and Bukhansan.The region where the Cheonggyecheon, Jungnangcheon, and Han Rivers join is an alluvial plain.Geologically, the bedrock of Seoul is mainly granite and gneiss, which greatly affects the topography and soil.In the western part of the Gangbuk area, there are mainly banded gneisses; the eastern part contains mainly Daebo granite.In the Gangnam region, banded gneiss is present in the east, whereas a mixture of Daebo granite or granite gneiss and schist dominate in the west (Figure 2).

Spatial Datasets
Flood susceptibility is commonly determined using a number of factors such as hydrological factors, topographical characteristics such as surface and slope types, rivers, meteorological factors, soil and vegetation characteristics, and land-use or development status.In this study, we selected the 11 hydrological factors expected to be related to flood susceptibility and applied these to FR and LR models (Table 1, Figure 3).We identified areas experiencing frequent floods and, using an administrative area-based survey, generated an inventory map containing 4338 flooded zones within the entire area.Different sample sizes have been suggested in the literature for this type of analysis [48].The flood-related factors considered in this study were collected and extracted from thematic maps provided by the South Korean government.The input data used in this study were factors related to topography, geology, land use, forests, and soil (Table 1).
Topography plays an extensive role in hydrogeological systems [49].We created a digital elevation model (DEM) by applying the triangulated irregular network (TIN) method to contours digitized from a topographical map.The TIN was converted into a raster at a resolution of 5 × 5 m after removing the sink area, or internal drainage, from the elevation grid.Following the calculation of the DEM, the slope gradient was calculated from the DEM using the ArcGIS 10.0 software.The topographic wetness index (TWI), stream power index (SPI), and slope length factor (SLF) were also calculated from the DEM using the ArcView software (Version 10.1, Esri, Redlands, CA, USA).We extracted topographical factors because flood occurrence is influenced by the geomorphological characteristics of the region.Topography is an important element in the spatial variation in hydrological conditions [50].We calculated the distance from the river using topographic maps.The SPI, an index of the erosive power of a stream, can be used to describe potential stream erosion at a given point on the surface of the terrain.As the area and slope of the watershed increase, the amount of water contributed to the upward slope and the water flow rate increase, causing an increase in the SPI value.The SPI index can be defined [40] as: where As is the specific catchment area and b is the local slope gradient in degrees.TWI is a topographic factor used in the run-off model [51], and is defined as where α is the bottom-up cumulative area accumulated through a point (per unit contour length) and b is the slope angle at that point.TWI, the lateral transmission coefficient on the ground surface, indicates the probability of water flowing to a specific point on a local slope [51].The TWI index reflects the accumulation trend of water at any point in the watershed in terms of α and reflects the tendency of gravity to move the water along a slope, with tanβ representing the approximate hydraulic gradient.Water infiltration depends mainly on the permeability and other properties of the material such as pore pressure, which affects soil strength [52].The SLF for average erosion given a slope length λ, which is in feet (i.e., imperial units), varies as: where 72.6 is the revised universal soil-loss equation unit plot length in feet and m is a variable slope-length exponent [52].The slope-length index β is the ratio of rill to inter-rill erosion due to raindrop impact and rill erosion caused by flow, as shown in the following equation [53]: where β = sin θ 0.0896 3.0(sin θ) 0.8 + 0.56 (5) where θ is slope steepness angle.From the land-use map, the type of impermeable land use, agricultural land use and retarding basin factors were extracted.We classified the regions into six types of impermeable land-use areas: residential, manufacturing, commercial, recreational, traffic, and public.In addition, soil-drainage and geology data were derived from soil map and geological map, respectively.Thematic maps that include flood-related factors were acquired in vector format using the ArcGIS package.The 11 related factors (Table 1) calculated or extracted during this procedure were resampled to raster data at a 10 m spatial resolution and converted into a GRID file.We constructed a spatial database with a total of 11,222,445 cells (3705 rows × 3029 columns).There were a total of 127,087 flooded cells for 2010 and 65,964 for 2011.All of the factors were converted into ASCII format to apply the LR model.Before calculating the frequency ratios of the types and categories of each factor, the factors were classified into 10 equal-area classes.The 10 classes were automatically determined by stipulating a similar number of cells in the total area.where θ is slope steepness angle.From the land-use map, the type of impermeable land use, agricultural land use and retarding basin factors were extracted.We classified the regions into six types of impermeable land-use areas: residential, manufacturing, commercial, recreational, traffic, and public.In addition, soil-drainage and geology data were derived from soil map and geological map, respectively.Thematic maps that include flood-related factors were acquired in vector format using the ArcGIS package.The 11 related factors (Table 1) calculated or extracted during this procedure were resampled to raster data at a 10 m spatial resolution and converted into a GRID file.We constructed a spatial database with a total of 11,222,445 cells (3705 rows × 3029 columns).There were a total of 127,087 flooded cells for 2010 and 65,964 for 2011.All of the factors were converted into ASCII format to apply the LR model.Before calculating the frequency ratios of the types and categories of each factor, the factors were classified into 10 equal-area classes.The 10 classes were automatically determined by stipulating a similar number of cells in the total area.

Method
Effective decision making requires both data and predictions based on potential conditions.Therefore, we applied the principle of conditional probability and selected an FR model to determine the spatial relationship between flooded areas and flood-related factors.We subdivided the data layer into classes within each study area, based on the percentage of the total area that was flooded [54].In Equation ( 6), P(P) represents the area ratio of a class or type to a given number of unit cells, including the domain fraction of the class; P(O) is the percentage of flood occurrence within the class [55].Then the FR value of the type or class of each factor C is: We determined the correlations between flood-related factors using the FR model.A high correlation with flooding was represented by a value greater than 1 and low correlation by a value less than 1.Then we used these correlations to generate flood-susceptibility maps.Using the GIS, we reclassified all of the ratios composed of raster types to 10 classes on the basis of equivalent areas and calculated the FR for each class or type of each factor [56].Finally, we determined flood susceptibility

Method
Effective decision making requires both data and predictions based on potential conditions.Therefore, we applied the principle of conditional probability and selected an FR model to determine the spatial relationship between flooded areas and flood-related factors.We subdivided the data layer into classes within each study area, based on the percentage of the total area that was flooded [54].In Equation ( 6), P(P) represents the area ratio of a class or type to a given number of unit cells, including the domain fraction of the class; P(O) is the percentage of flood occurrence within the class [55].Then the FR value of the type or class of each factor C is: We determined the correlations between flood-related factors using the FR model.A high correlation with flooding was represented by a value greater than 1 and low correlation by a value less than 1.Then we used these correlations to generate flood-susceptibility maps.Using the GIS, we reclassified all of the ratios composed of raster types to 10 classes on the basis of equivalent areas and calculated the FR for each class or type of each factor [56].Finally, we determined flood susceptibility using the FR model, adding all of the factors using raster calculation functions in the GIS to merge the factors.The LR model is a multivariate analyses model useful for making predictions based on input variables.The purpose of LR is to explain the relationship between the dependent and independent variables [57].In this study, the input into the LR model included observed values and the presence/absence of predictive variables, indicated by binary numbers.The dependent variable of the LR model was a categorical or binary number.An advantage of the LR model is that the independent variables can be either continuous or categorical, and additional link functions can be added to a general linear regression model to allow for continuous, discrete, or a combination of variable types.In this study, the dependent variable was the location of the flood and the independent variables were flood-related factors.We used the SPSS 21 statistical package to calculate the coefficients for each factor in the LR model [58].The relationship between flood occurrence and the input factors can be conceptualized as: where Λ is the logistic function.Then, the LR model is: where P is the estimated probability of the event, or susceptibility to flooding based on intrinsic properties [59].The resulting curve is S-shaped, whereas z represents a linear combination.Therefore, the LR model implies that the data matches the following equation: where bi (i = 1, 2, . . ., n) values are the slope coefficients for the LR model and xi (i = 1, 2, . . ., n) values are the independent variables [49].The extracted linear model becomes an LR depending on the presence/absence of a flood given the values of the independent variables that were related to flood occurrence in previous years.Figure 4 shows a flowchart outlining the methodology used in this study.Areas flooded in 2010 and 2011 were used to demonstrate the application and validity of the resulting flood-susceptibility maps (Figures 1 and 2).Areas flooded in 2010 were used as dependent variables and the hydrological factors were used as independent variables.The spatial database determined that a total of 11 topography factors including soil, geology and land-use data were appropriate independent variables for the model.Therefore, calculated or extracted factors and the 2010 flooded areas were included in the LR model.To compare the results of the LR model with those of the probabilistic model, we applied an FR model in the flood-susceptibility analysis.The SPSS 20 statistical package was used in combination with GIS software to run these models.We validated the susceptibility map for the study area using areas flooded in 2011.To calculate the LR coefficients for the flood-related factors, we converted the spatial database to ASCII files using ArcGIS 10.0 and SPSS 20.The z values were multiplied by the coefficients and retrieved by adding the variables to these acquired coefficients in the GIS environment.We created flood-susceptibility maps by gathering flood-related data from topographic, geological, soil, and land-use maps, and constructing a spatial database describing the areas flooded in 2010.The database was established in a grid format and flood-susceptibility assessment was performed using the LR and FR models.After the study area was selected, we isolated the flooded areas for 2010 and 2011.The 2010 flooded area was used for training and the 2011 flooded area was used for validation.

Relationships between Flooded Area and Related Factors
The results of the FR model showed the relationship between the input factors from the geospatial database and the flooded area data used for training.Flooding was shown to occur at low altitudes and in regions with gentle slopes.In this study, topographical factors calculated from the DEMs were strongly correlated with flood susceptibility (Table 2), which indicates high flood susceptibility in these regions.The water flow traveled downslope from regions with high potential, where water is recharged, to those with low potential.Slope was an important factor influencing the balance between infiltration and runoff.The FR of slope under 8.58 degrees showed over 1.0, which is relatively higher than FR values of higher slope (<1.0).Penetration was inversely proportional to the slope, and there was more infiltration and less runoff in regions with gentle slopes [60].FR of SLF shows a similar trend with that of slope gradient.As a result, the rainfall runoff rate increased, the infiltration rate decreased, and flood susceptibility decreased where elevation was high and slope was steep.Conversely, low elevation and gentle slope produced the opposite results.A higher TWI value indicates a larger area of lower slope; therefore, the higher the TWI, the greater the susceptibility to floods.There was a positive correlation between TWI and flood occurrence and a negative correlation between distance from the river and flood occurrence.This implies that flood susceptibility decreases with distance from the river.

Relationships between Flooded Area and Related Factors
The results of the FR model showed the relationship between the input factors from the geospatial database and the flooded area data used for training.Flooding was shown to occur at low altitudes and in regions with gentle slopes.In this study, topographical factors calculated from the DEMs were strongly correlated with flood susceptibility (Table 2), which indicates high flood susceptibility in these regions.The water flow traveled downslope from regions with high potential, where water is recharged, to those with low potential.Slope was an important factor influencing the balance between infiltration and runoff.The FR of slope under 8.58 degrees showed over 1.0, which is relatively higher than FR values of higher slope (<1.0).Penetration was inversely proportional to the slope, and there was more infiltration and less runoff in regions with gentle slopes [60].FR of SLF shows a similar trend with that of slope gradient.As a result, the rainfall runoff rate increased, the infiltration rate decreased, and flood susceptibility decreased where elevation was high and slope was steep.Conversely, low elevation and gentle slope produced the opposite results.A higher TWI value indicates a larger area of lower slope; therefore, the higher the TWI, the greater the susceptibility to floods.There was a positive correlation between TWI and flood occurrence and a negative correlation between distance from the river and flood occurrence.This implies that flood susceptibility decreases with distance from the river.For the impermeable land-use areas, the frequency ratio was higher in residential and traffic areas and lower in manufacturing and recreational areas.The FR of non-green farm area (1.11) and barren ground (0.78) showed relatively higher value followed by field (0.39) and paddy (0.33) areas.In the retarding basin area, the relationship with flood occurrence is very low, with a FR value of 0.27.FR values were higher in poorly, somewhat poorly, and moderately well-drained soil areas (>1.0), and lower in well-drained soils (Table 2).Among the seven geological unit classes (porphyroblastic gneiss, limestone, alluvium, schist, garnet-bearing granite gneiss, banded gneiss, granitoid, and granite gneiss), the FR values were higher for alluvium (2.03), garnet-bearing granite gneiss (1.82), schist (1.79).Schist and banded gneiss showed lower FR values (<1.0), while others (pophyroblastic gneiss, granitoid granite gneiss) showed zero.Overall, FR results showed a correlation between flood occurrence and each factor.

Flood Susceptibility Mapping and Its Validation
The FR model was used to derive the correlation between flood occurrence area and flood-related factors in the previous step.A grade for each element was defined as a category or range according to the type of each factor, and the ratio of non-flooded area to flooded area is shown in Table 2.We obtained an FR map for each factor by spatially replacing the FR value for each class.Finally, the flood susceptibility map was obtained by summing all FR maps for each factor, as: FR = ∑ Fr (Fr, Rating of each factors' type or range) (10) We analyzed the relationship between flooded area and total area, and generated a flood susceptibility map using an LR model for further analysis (Figure 5).The LR coefficients (B), standard error of the slope coefficient (SE), and significance level (Sig.) of the variables were estimated using the maximum likelihood model (Table 3).An iterative algorithm is essential for parameter estimation in logistic multiple regression models when the relationship between independent variables and probabilities is non-linear [61].DEM, TWI, distance from the river, type of impermeable land use, type of agricultural land use, detention basin, soil drainage, and geology were important factors (P < 0.05).However, these results cannot be considered definitive [62].Therefore, we used 11 factors to predict flooding for management purposes.The Hosmer and Lemeshow 'goodness-of-fit' test resulted in a value less than 0.05 [63]; thus, the LR model was suitable for predicting flood susceptibility.We multiplied continuous data such as DEM, slope, SLF, SPI and TWI by the coefficients obtained.Factors expressed by categorical data were multiplied by coefficients obtained for each class (Table 2).Then we spatially summed all of the layers and added Z, a forecast parameter.Probabilities were calculated for all factors and the flood susceptibility was mapped using the LR coefficients (Equation (11), Table 3).As a result of the analysis, TWI in the topographic factors, distance from the river, residential area, commercial area, trafficked area and public area in terms of land use showed positive value of higher effects.Retarding basin, soil drainage and geology factors also showed positive values in all classes except for limestone.
Continuous data such as DEM, slope, SLF, SPI and TWI values are multiplied by the coefficient obtained.The other factors of categorical data were also multiplied by the coefficients obtained for each class in Table 2. Lastly, all the layers are spatially summed and Z, a forecast parameter, is added.The probability of factors was calculated and the flood susceptibility was mapped by using the LR coefficients in Table 3 and Equation (11).

Discussion and Conclusions
Floods are among the most damaging hydrological natural disasters worldwide.Governments and research institutes have evaluated flood susceptibility at the national scale and conducted studies to predict the spatial distribution of flood events.In the current study, FR and LR models were applied to create flood susceptibility maps of the Seoul metropolitan area.In the preliminary step of this process, we selected 11 core variables affecting flood susceptibility, and constructed FR and LR models to determine the relationships between flooded areas with specific capacity and flood-related variables.Factors related to flood occurrence were collected in a spatial database and floodsusceptibility values for the study area were calculated.We validated the susceptibility-prediction results using data collected in previously flooded areas in years not used to train the models.
The FR model results showed that slope factors had a negative spatial relationship with flooded area, whereas TWI showed a positive correlation.Residential and transportation network areas were more susceptible to flooding than other land-use types, and schist units and moderately well-drained soil classes had a higher susceptibility than other related classes.
We compared and analyzed flood-susceptibility maps produced by the LR and FR models, and found that the accuracy of the LR model was 79.33% and that of the FR model was 79.61%.All of the results from both models demonstrated reliable results of over 75% accuracy.
Time-series analyses of floods are limited in that it is almost impossible to obtain data measuring the change in flood area over time; it needs to be performed by evaluating drainage capacity which affects flood occurrence in certain areas.In addition, existing location information related to flood damage is created based on interviews in the damaged areas, rather than using accurate coordinates; this method could create problems in the spatial analysis.During flooding, the handling capacity of the study site, which is affected by factors such as drainage systems, waterworks and sewer facilities, also plays an important role.However, it is difficult to acquire such data.

Discussion and Conclusions
Floods are among the most damaging hydrological natural disasters worldwide.Governments and research institutes have evaluated flood susceptibility at the national scale and conducted studies to predict the spatial distribution of flood events.In the current study, FR and LR models were applied to create flood susceptibility maps of the Seoul metropolitan area.In the preliminary step of this process, we selected 11 core variables affecting flood susceptibility, and constructed FR and LR models to determine the relationships between flooded areas with specific capacity and flood-related variables.Factors related to flood occurrence were collected in a spatial database and flood-susceptibility values for the study area were calculated.We validated the susceptibility-prediction results using data collected in previously flooded areas in years not used to train the models.
The FR model results showed that slope factors had a negative spatial relationship with flooded area, whereas TWI showed a positive correlation.Residential and transportation network areas were more susceptible to flooding than other land-use types, and schist units and moderately well-drained soil classes had a higher susceptibility than other related classes.
We compared and analyzed flood-susceptibility maps produced by the LR and FR models, and found that the accuracy of the LR model was 79.33% and that of the FR model was 79.61%.All of the results from both models demonstrated reliable results of over 75% accuracy.
Time-series analyses of floods are limited in that it is almost impossible to obtain data measuring the change in flood area over time; it needs to be performed by evaluating drainage capacity which affects flood occurrence in certain areas.In addition, existing location information related to flood damage is created based on interviews in the damaged areas, rather than using accurate coordinates; this method could create problems in the spatial analysis.During flooding, the handling capacity of the study site, which is affected by factors such as drainage systems, waterworks and sewer facilities, also plays an important role.However, it is difficult to acquire such data.
GIS and remote-sensing technologies enable the integration of different data sources and support decision-making through classified remote-sensing data [64].GIS also allows quantitative assessment of changes over a wide range of spatial and temporal scales.Urban flood susceptibility was analyzed based on data-based learning with topographic, land-use, soil, and geological data in this case study.The validation results indicate that each factor used in this study is statistically significant.Several indicators related to flood susceptibility play an important role in water management, whose interdisciplinary nature requires a database designed to support decision-making in flood-susceptible areas containing diverse thematic information.
However, in this paper, the sewer system was assumed to have no influence.Although the city sewage system plays a very important role in flood occurrence, the underground materials in Seoul were classified as national institutional facilities which are treated as confidential.Also, the economic part such as land price or building information was not considered, since the land-price data in the present study area was not constructed very precisely.The urban infrastructure data such as building information is still partly constructed as spatial information on GIS, and the construction data has not been verified.Indeed, recent rainfall patterns are changing very intensively and it is very difficult to acquire the data that reflects the rainfall effect and complexity of urban areas such as the sewage system of the city.Based on the results of this study, further studies that reflect the sewer network would be possible, since all existing data are constructed as spatial information data, if only the sewer-related data is secured; also, the additional data could be used to enable economic analysis.
In this study, a limitation exists in validation due to the lack of field-observation data due to the property of a flood, which varies over time; and no separate correction for rainfall and flood occurrence were performed.The suggestion of this study is that the methodology of urban flood-susceptibility mapping will be used as a basis for establishing urban disaster-prevention policy and will contribute to protecting the public from flood disasters.The method applied in this study could also be used to map flood susceptibility in other regions in order to reduce flood risks to people and facilities, and to guide new infrastructure for the sustainable development of urban areas.The development of multiple models incorporating more case studies would permit future studies to generalize results based on flood-related factors.

Figure 1 .
Figure 1.Study area and flood area in 2010 and 2011: (a) The Korean Peninsula; (b) Seoul.

Figure 1 .
Figure 1.Study area and flood area in 2010 and 2011: (a) The Korean Peninsula; (b) Seoul.

Figure 1 .
Figure 1.Study area and flood area in 2010 and 2011: (a) The Korean Peninsula; (b) Seoul.

a
Inundation trace map produced by the Ministry of Public Safety and Security (MPSS); b topographical factors were extracted from the digital topographic map by National Geographic Information Institute (NGII); c detailed soil map produced by the Rural Development Administration (RDA); d geological map produced by the Korea Institute of Geoscience & Mineral Resource (KIGAM).Sustainability 2018, 10, x FOR PEER REVIEW 5 of 19

a
Inundation trace map produced by the Ministry of Public Safety and Security (MPSS); b topographical factors were extracted from the digital topographic map by National Geographic Information Institute (NGII); c detailed soil map produced by the Rural Development Administration (RDA); d geological map produced by the Korea Institute of Geoscience & Mineral Resource (KIGAM).

Figure 4 .
Figure 4. Flowchart outlining the methodology of this study.

Figure 4 .
Figure 4. Flowchart outlining the methodology of this study.

Figure 6 .
Figure 6.Cumulative frequency diagram showing flood susceptibility index rank (X-axis) as a cumulative percentage of flood occurrence (Y-axis).

Figure 6 .
Figure 6.Cumulative frequency diagram showing flood susceptibility index rank (X-axis) as a cumulative percentage of flood occurrence (Y-axis).

Table 1 .
Data layer describing the flooded area within the study region.

Table 1 .
Data layer describing the flooded area within the study region.

Table 2 .
Frequency ratios for flooding and related factors.