Big Data for Natural Disasters in an Urban Railroad Neighborhood: A Systematic Review

: Landslides and ﬂoods are among the most common disasters in Brazil and are responsible for losses on social, environmental, and economic scales, even resulting in deaths. Floods can negatively a ﬀ ect the structure and operations of a railway network, causing travel delays, train service cancellations, and major ﬁnes for the railway. The objective of this article is to conduct a bibliographic review of what is available in publications on natural disasters, particularly landslides and ﬂoods, big data techniques, and railroads, at international and national levels. A bibliometric analysis was carried out according to the “PRISMA Flow Diagram” guidelines. The analysis in this study was conducted through searches of the following reference databases: Scopus, Web of Science, Scielo, and Google Scholar. After the keyword search was completed, the absence of available data and references relating to Brazil was veriﬁed. This justiﬁed the development of this and other related papers, and the e ﬀ orts necessary to turn these data into useful information for the managers of cities and environmental institutions. The aim of this study is to ﬁll the gap in the research, focusing on Brazil, related to big data, smart cities, and natural disasters (particularly, landslides and ﬂoods), and to propose other papers that can be developed in this subject area.


Introduction
Landslides and floods are among the most common disasters in Brazil, and they are responsible for social, environmental, and economic damage, often resulting in loss of life [1,2]. It is necessary to define the time, location, and severity of the impacts of natural disasters in order to develop mitigation techniques, appropriate responses, and firm recovery plans [3].
Railway infrastructure is crucial for transportation and contributes to economic and social welfare [4]. However, transportation is subject to frequent natural disasters, such as floods, earthquakes, and landslides [5].
Climate change has provoked an increase in rainfall, generating concern that tornados and floods have also increased [5]. While expanding civil-engineering infrastructure projects, the capacity of resistance of these structures, in relation to external forces such as natural disasters, should be verified [6].
Rivers may also overflow, depending on the duration of the rainfall, which can flow to the roads and railroads [6]. Floods can negatively affect the structure and operation of a railroad network, including travel delays, service cancelations, and major fines to the railroad company because of the delay [7,8].
Although the PRISMA Statement was originally created for reviews of health care interventions, studies from different subject areas have also used this method to develop systematic reviews. When searching for the term "Prisma Statement" in the "Article Title, Abstract, Keywords" field in Scopus, the filter "by subject area" shows the majority of the studies that used this method for systematic reviews belong to the health area (2506), but they can also be related to other subject areas such as agricultural and biological sciences (86), social sciences (62), multidisciplinary (60), engineering (50), environmental science (33), computer science (20), materials science (11), arts and humanities (8), business management and accounting (8), decision sciences (6), economics, econometrics and finance (6), mathematics (6), chemistry (4), physics and astronomy (3), energy (2) and earth and planetary sciences (1).
The PRISMA Statement consists of a flow diagram and a 27-item checklist. This review process consists of the four stages of the PRISMA flow diagram: Identification (where it shows the number of records identified through database searching and also the number of additional records identified through others sources), Screening (where the records duplicated are removed and the number of records screened are identified and, at the same time, the number of records excluded), Eligibility (points the number of full-text articles assessed for eligibility and, consequently, the number of full-text articles excluded, with reasons) and finally, Included (presents the number of studies included in qualitative synthesis and, later, the number of studies in quantitative synthesis) [16].
The analysis for this study was made through searches of references on the following databases: Scopus, Web of Science, Scielo, and Google Scholar. Keywords were chosen that referred to the study theme (BD, natural disasters, landslides, and floods), techniques that can be used for BD analysis (machine learning, spatial analysis, remote sensing, and GIS), and collaborative terms to the study area (railway and railroad). The Scopus database search used the fields "Article Title, Abstract, Keywords", whereas, with Web of Science, Scielo and Google Scholar, only the "Title" field was used.
The search strategy used a structured form with the terms "Natural Disasters", or "Landslides", or "Floods" in the field "Title words", joined (or not) with the secondary terms "Big Data", "Machine Learning", "Spatial Analysis", "Remote Sensing", "GIS", "Railway", "Railroad", and the Boolean operators "AND" and "OR". In the first stage (Identification), the number of records identified on the search was obtained, in addition to the number of additional records identified by other sources. Following this, the duplicated records were removed. In the second stage (Screening), the published records between 2008 and 2019 were selected, considering the languages: Portuguese, English, and Spanish. The studies that did not meet these criteria were excluded. The third stage (Eligibility) was built upon the articles with full texts, which were evaluated by their eligibility. The adopted eligibility criteria were based on article relevance: first, we selected the 20 most relevant articles, then, we selected those that featured abstracts, objectives, and conclusions that were close to the study theme, which were consequently useful to the research.
Articles were classified according to relevancy and placed in descending order from most to least relevant. The classification system considered how many of the search terms were found in each record. Studies with the highest classification appeared at the top of the list [17]. The fourth stage (Inclusion) first considered the number of studies included in the qualitative synthesis and then the number of studies in the quantitative synthesis. The papers/studies were included when a description of techniques (BD, machine learning, remote sensing (RS), geographic information systems (GIS); abbreviations found in Table 2) were found, and applied to risk management or the studies of floods and landslides.

Results
In the Identification stage, 35,952 records were found with the keyword search in the database as well as 23 additional records from books, documents, and official websites (adding up to a total of 35,975 publications). Subsequently, 121 duplicated searches were removed, resulting in 35,940 records.
From the screening stage, a filter was applied to return only studies published between 2008 and 2019, and another filter for those in Portuguese, English, and Spanish. These filters reduced the total records to 23,959 (besides the additional 23), removing 11,981 from the previous stage. In the eligibility stage, the 20 most relevant records were used as criteria to determine studies in which the whose abstract, objectives, and conclusion were most similar to the research theme. Thus, 23,801 records were excluded because they were unrelated to the theme of this study, producing a final 158 results (Table 1). The final stage comprises the studies in the research. After a qualitative analysis and a quantitative synthesis, 90 references were included in the research (67 database records and 23 additional records). In the qualitative analysis, it was noted that there are many articles relating to the identification of natural disasters, mainly landslides and floods, but there are still a limited number of studies relating to the application of BD techniques for the analysis of natural disasters, particularly in reference to railroads. There is still a lack of knowledge concerning the geospatial information in BD environments, particularly relating to natural disasters and rail transportation.
For the quantitative analysis, the number of records used refering to the theme of this study in the Scopus database were dated between 2008 and 2019, and were in all languages. In the search field for "Article Title, Abstract, Keywords", the terms "Natural Disasters AND Big Data" were used in addition to "Natural Disaster AND Machine Learning", "Natural Disasters AND Remote Sensing", or "Natural Disasters AND GIS", "Natural Disasters AND Railway", or "Natural Disasters AND Railroad". Regarding natural disasters, BD and ML had an increase in searches on the subjects since 2008, with a peak in 2019 (135 quotes). When compared to the subjects of Natural Disasters (ND), RS, and GIS, it is notable there were many searches in this area, but also an increase in the number of articles (148 in 2008 and 255 in 2019). With regards to the subject of natural disasters and railroads, a few studies appear in later years, with fewer publications in 2010 (3) and most searches in 2014 (22) ( Table 2). Therefore, there is a growth in the period between 2008 and 2019, with a peak in 2018.  2008  12  141  4  2009  5  149  10  2010  11  142  3  2011  11  219  19  2012  9  190  7  2013  20  195  17  2014  38  208  22  2015  44  217  11  2016  60  226  19  2017  94  208  13  2018  105  278  16  2019  135  255  19 The Scopus database was searched to determine which countries had publications on the theme of study in recent years. The field "Article Title, Abstract, Keywords" was used again, with the terms "Natural disasters", "Smart cities" and "Big data" through the operator "AND". We then applied the "Country/Territory" filter, which indicates the countries with the most publications and considers the location of all authors of each study. All languages were considered since the first publication in 2014, resulting in a total of 19 records that represents 24 countries. The majority of the studies were conducted in the United States and Indonesia, followed by Taiwan (Table 3). Using the Scopus database (between 2008 and 2019, in Portuguese, English, and Spanish), the search contained the keywords "Big Data", "Natural Disasters", "Floods", or "Landslides" in the field search "Article Title, Abstract, Keywords". "Lecture Notes in Computer Science, Including Subseries Lectures Notes in Artificial Intelligence, and Lecture Notes in Bioinformatics" published the most returned articles for natural disasters (14) and floods (21), while the "International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences ISPRS Archives" contained the most for landslides (5 published articles). The same search was applied, adding the term "Railway". The results were restricted, returning one article, published on the "2017 29th Chinese Control And Decision Conference (CCDC)".

Review of Identified Related Works
In 2006, a great flood occurred around the River March, located on the border between Austria and Slovakia. This affected the North Railway, a relevant rail connection of the Federal Austrian Railway between Vienna and the Czech Republic. This event indicated the high vulnerability of the railroad infrastructures to floods [8].
In Japan, many railroad systems actively respond when heavy rainfall occurs. The train services are suspended when high levels of precipitation are identified. In cases of sudden torrential rainfall, areas are selected for refuge. When stations are flooded, trains are conducted to appropriate areas to allow passengers to find shelter. To facilitate this precipitation response, the system collects geographic data, including refuge areas close to the rails, information about roads to arrive at the evacuation points, and rail information to provide points where trains can stop [6]. Detecting sections of the railroad at risk of flooding allows the development of an effective plan and contributes to active safety management [4].
In this scenario, a more detailed predictive technology is required. Some authors study techniques to predict and mitigate the damage of floods on railways [18]. Through the use of results from evaluations of danger mapping, an algorithm has been developed based on wind prediction and precipitation to facilitate the identification of trains that will be affected by floods, where the trains must stop and which route the passengers may take to evacuate [8]. An empirical approach to modeling is presented to estimate the direct structural damage caused by floods to the infrastructure of a railway and the associated financial losses. Alternatively, a methodology has been proposed to quantitatively assess the vulnerability of the railroad system to floods using available historical data and GIS technology [5]. The study by [19] identified the risk of floods on railroads in the city of São Paulo and considered topographic and hydrological characteristics from the study area, and an analysis of the interference of such disasters on the operation of the railroad system was performed.
A mapping system on the risk of floods was presented by [6] to indicate the best places for the trains to stop, besides shelter trails for the passengers.
Natural disasters may also affect transportation. The main lines of transportation, found in mountain terrain, such as roads and railroads, are vulnerable to landslides that occur on slopes or natural hills [20]. Some authors studied the occurrence of these disasters on the surroundings of railroads. The survey by [21] attempted to integrate a wireless sensor network (WSN) with GIS for live economical monitoring of landslides in stations and railroads around the region of Abruzzo, Italy. The susceptibility of landslides on the railway between the cities of Kral'ovany and Liptovský Mikuláš in Slovakia was evaluated by [22], and the research by [20] showed a quantitative model on the risk of landslides for transportation lines, with an example of alignment of roads and railways, in parts of the Nilgiri hills in southern India.
This kind of disaster provokes direct or indirect damages to these installations. To estimate these risks, it is important to consider the expected number of landslides and their annual probability of occurrence. It is also necessary to estimate the probability of a landslide hitting a moving vehicle or a passenger. To establish this statistical relationship, there must be an inventory of landslides, although this in itself may have advantages and disadvantages. The benefit of the inventory is that it is prepared immediately after the occurrence of a landslide event, these events in an area. Its limitation is the lack of availability of a continuous record and difficulty in accessing the data [20].
Landslides have the potential of affecting the railroad network, so it is necessary to prevent these risks. Several authors propose strategies, including identification of scars from landslides through photo interpretation; risk maps generated by algebra from thematic maps such as pedological, geomorphological, land use and cover, lithologic and hypsometric [23]; the creation of a model of susceptibility to landslides with support from instability and updated models for flow of debris, land, and rock falls [24]; application of databases to analyze the susceptibility to landslides [23]; among other methods. Entry parameters to evaluate the susceptibility to landslides between Kral'ovany and Liptovský railroad sections in Mikuláš, Slovakia were used by [22].
The interpretation of geological conditions included morphometrical parameters (digital elevation model (DEM), slope angle, and aspect) such as landscape structure and landslide interpretation. Bivariate statistical analysis was based on a comparison between the landslides inventory map as a variable depending on each parametric entrance map. This allowed the weight calculation for each entrance variable. Other parameters included the process of weighting, a prognostic map from the susceptibility to landslides, and, finally, the process of checking. "The evaluation of the landslide susceptibility and the statistic process based in an axiom from the actualism", that is, it is supposed that the events from the future will occur in the same way as the events from the past ( [22], p.162).
It is not just landslides that directly impact the railroad system. The railroad network in the Czech Republic is heavily dependent on an electricity system provided by aerial cables. Thunderstorms are capable of causing tree falls, and those close to the railroad can affect the electricity cables and physically block railroad transportation [25].
Regarding landslides, the study by [26] showed how the logistic problems linked to advanced techniques of monitoring, such as the transfer and storage of data, can be treated in a way that is compatible with an alert system for advancing landslides. The study focused on the interaction between an area monitoring tool (an interferometric radar on soil) and a data collecting and processing center (DCPC). Strategies were developed by [27] for the prevention and risk mitigation of landslides. Two methods of machine learning were used (vector machines to support and artificial neural network) with a multi-varied model (logistic regression) for data collected in China [28].
With the rise of BD, the challenge of decision making inside an organization has emerged with regards to ensuring there is sufficient information without adding irrelevant data [29]. In the geosciences, large data from historical searches or aerial and satellite high-resolution images are sources of valuable information. However, it is not always possible to extract information from these data using standard statistical techniques. It is necessary to use automated machine learning methods to obtain relevant patterns since high-performance data processing and viewing are widely applied with success to data related to natural disasters [28].
The integration of several data sources can contribute to the improvement of the quality and integrity of the data. However, before this integration occurs, the data must be validated. When different sources of BD are integrated, an uncertainty is created in the disaster response. The interpretation and manual analysis from the data are no longer appropriate. More sophisticated methods of automatic analysis, like machine learning, are required to eliminate the non-related data and accelerate the analysis. This contributes to a more rapid forecast of the disasters and identification of the ideal responses [13].
Some studies used BD tools for forecasting and monitoring of natural disasters. The survey by [29] applied a structure of modeling to their search of practical scenery in order to connect tasks from decision-makers to data sources through observation-aware decision model and notation (oDMN). oDMN is an integrated model for decision-making, and a process of modeling was used to put oDMN into practice. The study by [30] classified tweets during a disaster, aiding the construction of a model of sentiment about the needs of the people involved. This proposed model allowed the rescue team to better understand the situation and act properly.
In relation to BD resources applied to flood mapping, an approach called RFim (live flooding model) was presented by [31]. The model provides a simulation and forecasts the extension of a flood. The RFim attempts to find a relation between remote-sensing data and in situ data to build a live forecast model. The model uses the spatial coverage and high resolution from remote-sensing images, in addition to the continuous temporal coverage and easy accessibility for observations in-situ. Social media were used by [32] to improve the control of floods, collecting and analyzing the data from natural language processing and computer vision techniques. Further, a method was proposed by [33] to activate flood alarms using social media and rain measures from authorized sources [28]. The research by [34] presented a technique of flood-mapping based on the index that is calculated using multi-temporal data from radar images, comparing many reference images to those acquired during the investigated flood and facilitating the categorization of flooded areas. Flooded areas referred to areas exclusively or temporarily covered by water or areas with a mixture of water and vegetation.
Regarding landslides, the survey by [26] showed how the logistic problems linked to advanced techniques of monitoring, such as the transfer and storage of BD, can be treated in a way that is compatible with an alert system from precocious landslides. The study focused on the interaction between an area monitoring tool (an interferometric radar on soil) and a data collecting and processing center (DCPC). Strategies were developed by [27] for the prevention and risk mitigation of landslides. Two methods of machine learning were used (vector machines to support and artificial neural network) with a multi-varied model (logistic regression) for data collected in China [28].
In addition, systematic reviews, bibliometric aspects, and reviews of how BD are used in natural disasters can be seen in [13]. The study conducted a systematic review from the literature in the analysis of the role of BD in natural disaster management. They highlight the status of technology on the supply of significant and effective solutions in natural disasters. The article presented the conclusion from several researchers relating to varied scientific research and the technologies that influence the use of the BD environment on disaster management.
Based on a systematic literature review, the research by [35] examined BD in disaster management, presenting the main contributions, gaps, challenges, and an agenda for future research. [36] affirms that in the era of the BD revolution, there is potential to mitigate the effects of disaster events, allowing access to live critical information. BD was associated by [37] with social media for the management of disasters, particularly those associated with relief. A systematic literature review for the applications of BD in disaster management was presented by [38], focused on buildings and intelligent networks. With a focus on energy. The research by [39] analyzed the Brazilian Integrated Disasters Information System (S2iD) to provide qualification and transparency to the management of disasters in Brazil.

Conclusions
This database review has verified that there is a lack of references on the relevant subjects, especially in Brazil. Further studies on the subjects would be very useful for urban mobility because, especially in the Metropolitan Region of São Paulo in the state of São Paulo in Brazil, there are occurrences of landslides and floods on railway lines that interrupt the circulation of trains.
This justifies the development of this paper and similar studies related to the availability and quality of data, and the necessary effort to turn these data into useful information for city managers and environmental institutions. There is also a lack of research examining the landscape and the spatial data in the BD environment in relation to natural disasters such as floods and landslides.
According to the study, objectives identified in this review is necessary to highlight this gap in research through an international and national bibliographic review. The aim of the paper is to address the lack of research, particularly in Brazil, related to BD, smart cities, and natural disasters. This is particularly focused on landslides and floods, and the paper proposes that other papers can be developed in the line presented herein.
This manuscript is useful in two respects: presenting the current state of the literature on the topic in question (use of BD methods for studying impacts of natural disasters on railroad infrastructure), and also introducing non-biomedical researchers to the PRISMA guidelines and showing how they can be implemented in a systematic review for topics related to smart cities instead of health care.
Author Contributions: T.P.C. has made a significant individual contribution in defining and applying the design methodology, in software applications, formal analysis, writing the original draft, and reviewing and editing; A.C.C. is a fundamental contributor in supervision, project administration, funding acquisition, conceptualization, design methodology, writing the original draft, and reviewing and editing; J.A.Q. worked on supervision, funding acquisition, conceptualization and methodology design, resource identification, writing the original draft, and reviewing and editing. All authors have read and agreed to the published version of the manuscript.