Analysis of Fire Accident Factors on Construction Sites Using Web Crawling and Deep Learning Approach

: Fire safety on construction sites has been rarely studied because ﬁre accidents have a lower occurrence compared to construction’s “Fatal Four”. Despite the lower occurrence, construction ﬁre accidents tend to have a larger severity of impact. This study aims at using news media data and big data analysis techniques to identify patterns and factors related to ﬁre accidents on construction sites. News reports on various construction accidents covered by news media were ﬁrst collected through web crawling. Then, the authors identiﬁed the level of media exposure for various keywords related to construction accidents and analyzed the similarities between them. The results show that the level of media exposure for ﬁre accidents on construction sites is much higher than for fall accidents, which suggests that ﬁre accidents may have a greater impact on the surroundings than other accidents. It was found that the main causes of ﬁre accidents on construction sites are violations of ﬁre safety regulations and the absence of inspections, which could be sufﬁciently prevented. This study contributes to the body of knowledge by exploring factors related to ﬁre safety on construction sites and their interrelationships as well as providing evidence that the ﬁre type should be emphasized in safety-related regulations and codes on construction sites.


Introduction
Fire accidents are greatly affected by external environments, such as weather and surrounding buildings or hazards, making them difficult to control and prevent [1]. This is especially dangerous on building construction sites because fire safety equipment such as sprinklers and fire alarms may not be completed depending on the progress of construction [2]. Most construction accident-related research focuses on construction's "Fatal Four" as they occur more frequently. The "Fatal Four" hazards provided by the Occupational Safety & Health Administration (OSHA) consist of falls, electrical exposure, struck-by, and caught-in/between. Since the "Fatal Four" is based on the frequency of accidents on construction sites, fire accidents are excluded. In addition, it has a limitation in not considering various impacts such as the secondary spread of accidents. However, fall accidents, for example, on construction sites are less likely to lead to secondary accidents. Therefore, it is necessary to examine the impact of fire accidents on construction sites and their surroundings in spite of their low occurrence. As a result, new fire safety regulations or rules for construction sites might be proposed.
The news media often covers accidents that have a great social impact on construction sites. Fatal accidents, such as major fire accidents, that affect the surroundings are more likely to be exposed to the media. Therefore, lessons can be learned from the articles provided by the media. By measuring media exposure by accident type, it can be a new attempt to measure the impact of each type of accident. Moreover, the articles provided by the media are organized in a similar format, which is efficient for researchers to use the data. To explore the level of impact of fire accidents on construction sites, this study collected articles on construction site accidents reported in The New York Times over the past 20 years. The web crawling method was used for efficient and accurate data collection. In addition, to analyze and visualize the relationship between factors related to fire safety on construction sites, word embedding, network analysis, and the uniform manifold approximation and projection method were applied. This study contributes to the body of knowledge by exploring the factors of construction site fire and providing relationships between factors through the collection of media big data on fire accidents on construction sites. In addition, this study promotes the active inclusion of fire types in the aspect of safety on construction sites based on the results of the study.

Background
The construction industry always considers safety, but the fatality rate on construction sites remains high [3]. According to the Occupational Safety & Health Administration (OSHA), 20.5% of fatal workplace accidents occur on construction sites [4]. In particular, a fire accident on a construction site can lead to secondary accidents such as fire spread and explosion, which can greatly increase the project cost and duration. According to the National Fire Protection Association, annual direct damage to US construction sites due to fire accidents on construction sites is estimated at about 172 million dollars [5]. When the direct damage to the construction site related to renovation and demolition is combined, it amounts to 300 million dollars per year. To reduce construction site fire accidents, various studies have been conducted to analyze accidents on construction sites, but there are limitations. Many studies related to fire accidents on construction sites are limited to analyzing the causes of accidents individually [6,7]. In order to increase fire safety on construction sites, it is necessary to collect big data related to fire accidents on construction sites and analyze the factors of fire using various approaches.

Construction Site Fire Safety
There are various threats on construction sites that can lead to fire accidents, such as flammable materials and dangerous construction activities [8]. Fire accidents on construction sites are likely to cause secondary accidents such as collapse, burial, and explosion, which could lead to greater damage to life and property. Some studies have sought to find the main causes of fire accidents that can cause enormous damage to construction sites. One study shows that fire hazards in the category of unsafe site conditions are the leading cause of accidents on construction sites [9]. Unsafe conditions on construction sites include flammable materials stored on-site and hazardous construction activities that can directly cause fires. According to a related study, fires on construction sites are mainly caused by the transfer of sparks to the surrounding insulation during welding [10,11]. In addition, various gases are stored on the site depending on the type of construction, which can increase the risk of fire and explosion on the construction site [12]. The location and method of storing combustible materials in construction sites was determined to be an important factor that can prevent fire accidents [13]. In this study, it was pointed out that site layout planning without adequate consideration of the storage location of flammable materials and heated devices are the main causes of construction site fires.
Some studies provide methods to evaluate fire hazards on construction sites, such as a fire hazard tracking system and an evaluation index [14,15]. One study collected information to evaluate construction site fire safety, provided relevant indicators, and evaluated the fire safety on the site by calculating this mathematically. In addition, there is research to improve fire safety by providing a monitoring system that can detect fires on construction sites in real-time. In related research, a real-time construction site fire detection system was developed by collecting CCTV screen data provided from construction sites and developing an image-based fire recognition model [16]. However, these studies to evaluate or monitor the fire risk of a construction site are limited in regard to improving the fire safety of a construction site in the design stage of a construction project because the related data is collected after the actual construction site is started. If data related to fire on the construction site are collected in advance and the fire factors on the site are analyzed, it can be effective in providing regulation before the start of the construction project that will prevent fires. In addition, when analyzing construction site fire factors, it is important to collect various data and analyze factors related to accidents on construction sites [17].

Web Crawling Application in Construction Research
Web crawling is a technique for systematically browsing the web for the purpose of web indexing [18]. It is often used for tracking web documents on the Internet to effectively collect the information the user needs [19]. Because the data on the web is huge, collecting web data manually can be time-consuming, and the accuracy can be reduced. However, web crawling technology automatically rotates the web server to repeatedly collect information that fits the purpose. These web crawling technologies are used in a variety of fields, especially in research involving decision models and prioritization [20,21]. Recently, research on safety and security has been conducted through web crawling [22].
The construction industry has also started using web crawling, and the most active field is construction material management and optimization. Related studies used web crawling technology to collect construction material information and provide automated processes. [23,24]. Web crawling has also been used to manage massive documents in construction projects. As an example, web crawling was used to develop a system that collects the latest construction market text data and automatically assigns it to each applicable construction document [25]. Recently, this technology has expanded into various fields related to construction. An example is a study that collects a variety of geographic information on the web and provides a model to predict air emissions from heating [26]. However, few studies have analyzed the factors related to the safety of a construction site using web crawling. In this study, web crawling technology was used to find factors related to site safety, which may suggest a new approach to improving construction safety.

Word Embedding and Network Analysis
Word embedding is a technique that provides a way to express similar words with the same meaning through data analysis [27]. This is a new way to represent words and documents and is one of the key breakthroughs in deep learning [28,29]. This method of analysis was mainly used as a new method for analyzing text or documents [30][31][32]. The main models using word embedding are Word2vec [33], GloVe [34], and fastText [35], and Word2vec was used in this study. The Word2vec model was created and published in 2013 by a Google research team. Word2vec has very efficient performance and accuracy [36]. In addition, Word2vec has a lot of Google news pre-trained data [37], so it is suitable for performing various analyses based on pre-trained data.
Network analysis is one of the methodologies for finding the relationship between various types of raw data and consists of nodes and edges [38]. Links between words can be expressed as structures, and relationships between words can be analyzed using them. In addition, the correlation between nodes can be analyzed by calculating the Jaccard coefficient, which can measure the similarity of sample data [39]. This network analysis has been widely used in recent safety issues, such as pandemic research [40].

Methodology
In this study, media data related to fires on construction sites was efficiently collected using web crawling. To analyze the collected data, word embedding and network analysis were used. Word embedding and network analysis play a role in providing relationships such as similarity by calculating vector values of important keywords in this study. In addition, uniform manifold approximation and projection (UMAP) was used to effectively express data analyzed using various methodologies in 2D space.

Web Crawling
This study implemented a web crawling method for data collection. The web crawler usually traverses web pages by using a recursive algorithm and then goes over a certain range defined by researchers. The crawler stores data in a data structure that researchers can use efficiently for their studies [41]. To begin with, the authors set the scope in which the crawler should travel. This research collected data from The New York Times, which is one of the top three media companies in terms of newspapers by circulation, and thus, it has sufficient representative data for the study. In addition, some media companies restrict crawling or limit the amount of data for crawling while The New York Times has generous terms for crawling the data.
This research utilized the search term "construction accident" and retrieved the most relevant articles within 20 years. Among the articles, the authors only handled data formatted in text. Since different formats such as blogs and interactive documents consist of irregular structures, these formatted articles cannot be analyzed. There are vital libraries for web crawling: Selenium, HTMLParser, and Beautiful Soup. Selenium is a set of tools that assists in the development of test automation for a web-based application [42]. Selenium automatically traverses the web pages and stores data within the limit range. Next, HTMLParser was used for parsing HTML files in each article. With the parsed HTML, Beautiful Soup collected the data the researchers desired to utilize. Beautiful Soup is a Python package that analyzes HTML/XML, as it extracts and edits information on web pages. It provides a simple interface for building a text analysis prototype for text analysis and mining the data [43]. In this research, the authors retrieved the data from parsed HTML and sorted out the bodies, titles, and dates of the articles. The structure of the articles provided by The New York Times has a tag for body and title. Collected data were used for word embedding in the next part of this study.

Text Data Preprocessing
Before embedding the words, preprocessing is an essential phase to utilize text data. In this study, preprocessing is operated in three steps: regularization, tokenization, and removing stop words. The study firstly regularizes all the articles by lowering the case. The Natural Language Toolkit (NLTK) library was used to conduct the remaining two phases of natural language processing (NLP). Using the NLTK library, the sentences were broken down into token-level, which breaks text into words, phrases, and symbols [44]. After sentences were tokenized, the authors removed "English stopwords" provided by NLTK. The definition of "English stopwords" is unnecessary words filtered before and after processing natural language data. Eliminating stop words reduces the dimensionality of vocabulary space, and examples of stop words are "the, in, a, an, etc." [45]. Some studies involving NLP remove more words than the words in this study, but this study minimized the target words. Given that word embedding requires sequences of words, removing numerous words interferes with the effectiveness of word embedding. Therefore, this research eliminated the minimum number of words.

Word Embedding (Word2Vec)
Word2Vec is one of the most powerful methods to implement word embedding. Since word2vec assigns a vector to each word, word2vec shows advantages over other word analysis techniques. Before the presence of word embedding algorithms such as word2vec, researchers usually used one-hot encoding to analyze words. One-hot encoding converts each word to specific (associated with) values (numbers) that differ from other words. Even though this helped them to conduct natural language processing (NLP), the researchers could not answer the relation of each word because every word in one-hot encoding is independent. Word2vec resolves this problem, as it adds an embedding layer into the model. Word2vec follows a deep learning approach that probabilistically predicts word vectors by using a hidden layer. Since Word2Vec conducts unsupervised learning that is trained on raw text data, it creates word embedding by figuring out the maximum likelihood of word prediction from their context [46]. The Word2Vec algorithm computes cosine similarity to find out similarity or dissimilarity between two vectors. In this study, cosine similarity plays a significant role in analyzing words, which helps to compare the words and construct relations among words.
There are two models for implementing Word2Vec: the continuous bag-of-words model (CBOW) and the skip-gram model. Even though the two models are both widely used in conducting word2vec, these models can be optimized depending on the goals and direction of the research. The skip-gram model computes the possibility of the target words in several contexts from a word. The model has multinomial distributions for the outputs. It can be effective when research is trying to figure out fewer frequency words. On the other hand, CBOW is more suitable for studies that concentrate on high-frequency terms. In this study, since the authors focused on important keywords that can explain accidents on construction sites and frequently appearing words, the authors implemented Word2Vec with the CBOW model. The CBOW model predicts words by considering context, which means the target word is predicted from surrounding words (W(t−n), W(t−n+1)...W(t+n)) of the target word, and thus, it has one output vector. Input consists of one-hot encoded input context words. Figure 1 shows an overview of the CBOW model. In the projection phase, input generates output through a hidden layer h, and it is computed as follows: where C is window size, x is input and W is weighted matrix. Output layer u j can compute the possibilities, and output y j is created by through the soft-max function.
where V is vocabulary size. Through this process, the CBOW model returns the one output from surrounding words. There are several parameters for conducting word2vec with CBOW. The authors restricted the minimum count of words to 200. In other words, the authors only considered words that appeared at least 200 times in overall articles. This model set the words vector to 400 dimensions, which represents better accuracy of the model. The window size was set to 5.

Network Analysis
Networks are a versatile method to show and analyze simple and complex interactions among factors in articles and thus they are used for studies in diverse areas. The network representation is simple but rigid since many parts of a specific system are sorted out and concentrate on the interaction among its elements [47]. The network is represented with nodes and edges that connect nodes to each other. In this study, the top five frequent keywords and similar words that have over 0.5 cosine similarity with the keywords are represented as nodes. The authors measured the Jaccard coefficient between each keyword to compare similarity and dissimilarity. The Jaccard coefficient calculates the result of the division between the number of features that can be seen and the total number of features [48]. The authors examined the interaction over five keywords and computed the Jaccard coefficient between two keywords over ten combinations of keyword sets.

Uniform Manifold Approximation and Projection (UMAP)
With the result of the Word2Vec, the authors generated 400-dimensional word vectors. UMAP was used to visualize the vector into a low dimension space, as UMAP is the state-of-the-art technique for dimensional reduction. Dimension reduction creates low dimensional space without loss of structure in high dimensional space. UMAP has been widely used in various fields with larger sizes of data [49]. Riemannian geometry and algebraic topology are the theoretical ground that construct UMAP. UMAP operates on weighted graphs, and it uses k-neighbors to cluster groups. UMAP is usually compared with the alternative dimensional reduction technique, t-SNE. Compared with t-SNE, UMAP performs significantly faster and more efficiently as well as better preserves the global structure. t-SNE more easily suffers from the curse of dimensionality on large-scale data sets [49]. However, UMAP shows almost no restriction on the embedding dimension, which leads to it being feasible for deep learning. Therefore, this study conducted dimensional reduction with UMAP and visualized word vectors in a two-dimensional space. Every word expressed in a multi-dimensional vector was reduced to a two-dimensional vector, and then, the research team annotated every dot in a two-dimensional graph. Since the authors focused on five of the most frequent keywords, the annotation of keywords had larger fonts and similar words with each keyword were represented in the same color.

Results
The results section of this study consists of six subsections. The detailed and statistical approach to the data collected via web crawling is described in the preliminary analysis and basic statistical analysis sections. Based on the collected data, the main keywords of this study were analyzed using word embedding and network analysis methodologies. In order to intuitively provide data analyzed in multiple dimensions, this study presented results in 2D space using the UMAP concept. Finally, the collected data related to fire accidents on construction sites were extracted and analyzed, and specific factors related to the fire accident on the construction site were provided.

Preliminary Analysis
Through web crawling, a total of 1010 relevant articles from The New York Times were found. "Construction accident" was used as a search term on The New York Times website, and the top 1010 relevant articles were retrieved. Of all the data collected via web crawling, only document type articles were used for analysis, and 149 articles of the interactive document and blog type were excluded from the analysis. The article types excluded from the analysis are not valid for scraping because of the irregular structure. Therefore, a total of 861 articles were analyzed. After completing data cleaning, the authors classified articles according to composition. It was confirmed that the articles are generally composed of title, date, and body, and the text data included in the articles were classified with consideration of the compositions. The Beautiful Soup library was used to classify and scrape the parts needed for analysis. Table 1 shows the collection of source data.

Basic Statistical Analysis
The five keywords were "fire", "fell", "collapsed", "building", and "people". The selection criteria considered the types of accidents on construction sites and keywords representing building fire. In many studies, three major elements of building fire are defined as a "fire", "building", and "people" [50]. In addition, "fell" and "collapsed" are the most frequent types of accidents on construction sites [3]. The frequency of each keyword is shown in Figure 2. According to the analyzed results, buildings and people were the most frequent. The "fire" "fell", and "collapsed" keywords related to the type of accident on the construction site showed a relatively similar level of media exposure. Since "fell" and "collapsed" are similar words, it may be reasonable to combine them when comparing with the keyword of "fire". After combining, "fire" has a media exposure level of 27%, and "fell" and "collapsed" combined have a level of 73%. This result differs significantly from the Bureau of Labor Statistics (BLS) analysis of construction site accident frequency. According to the BLS report, among the accidents on the construction site, fell and collapsed accidents account for about 40% of all accidents, and fire accidents are 2%. When converted to 100%, fire accidents have a frequency of 5% in fall-related accidents. There is a large gap in the media exposure level for "fire" presented in this study and the frequency of fire accidents in the BLS report. The possible reasoning for this difference can be explained by the characteristics of the media that the authors described in the introduction section. Fire-related accidents may be exposed to the media more than fall-related accidents, which may explain that fire accidents on construction sites have a greater severity than fall-related accidents.

Word Embedding in Vector Space through Word2Vec
In this study, cosine similarity was used to provide semantic similarities between keywords. A combination of ten pairs of key words were generated and the similarity of each pair of keywords was calculated, as shown in Table 2. To confirm the reliability of this study, the cosine similarity of "fell" and "collapsed", which are the most similar words among the analyzed words, was first checked. The similarity between the two words was 0.951, which was calculated as much higher than the similarity between the other keywords. Through this, the authors can confirm the reliability of this study. In the results related to the "fire" keyword, the "fire" keyword was more similar to the "building" keyword than the "people" keyword. The similarity between "fire" and "building" was 0.525, and the similarity between "fire" and "people" was 0.331. It can be interpreted that "fire" showed a much higher degree of similarity to words related to the "building" than words related to "people". For the "fell" and "collapsed" keywords, their similarities with "people" and "building" were very close. Specifically, "fell" has a similarity with "building" and "people" of 0.443 and 0.486, respectively. In addition, the similarity between "people" and "building" was calculated as a negative value, indicating that the correlation between the two keywords is not great.  Table 3 shows the top 20 similar words to each of the five selected keywords. Among the similar keywords related to "fire", "fell", and "collapsed", "Monday" and "Friday" showed across the board. Since "fire", "fell", and "collapsed" are all closely related to the types of accidents, this result is very consistent with the previous research findings on the construction industry's distribution of injuries across the weekdays [51]. However, the "fire" keyword showed a high degree of similarity with the word "night", which was not observed in "fell" and "collapsed" keywords. Generally, the possibility of spread increases when a fire occurs at night. There are many words that have the meaning of "administration" or "inspection" in words with high similarity to the "building" keyword. The top five words with the most similarity to the "building" keyword are "department", "inspectors", "issued", "commissioner", and "city", and these words tend to have a common meaning. Furthermore, most of the words with a high similarity to the "people" keyword are related to workers or activities related to construction workers.

Network Analysis of Keywords
Network analysis is an analysis method that describes the relationship of data with nodes and edges. By using this network analysis, not only is it efficient at interpreting the relationship between nodes of data, but it is also able to retrieve the node that has an important impact on the network other than the main nodes. In this study, keywords are nodes, and words with a cosine similarity of 0.5 or higher are connected by edges. Nodes depict five keywords and similar words, and the number of nodes is 136. Edges represent connection among words, and there are 353 edges in the network. In addition, the Jaccard coefficient between each keyword was calculated and provided. The Jaccard coefficient values between keywords are shown in Table 4. The Jaccard coefficient is a statistical value used to measure the relationship and diversity of sample data. Through this, the network of each keyword can be expressed as one unified network. This network graph is visualized in Figure 3. This entire network has different sizes for nodes and annotations based on degree. This means that the more nodes are connected (the higher degree), the larger the size of the node. Among the keywords in this study, "collapsed" has the largest degree. In addition to the five keywords, the word in which the node size is noticeably larger is "death". This shows that the word "death" has a high degree besides the five keywords in the network in Figure 3.

Visualizing with UMAP
In this study, the results analyzed through Word2vec were visualized in two-dimensional space using UMAP. It enables the sparsity among keywords and their similar words to be seen. Figure 4 shows the overall UMAP graph for this study. In order to increase the discernment of the graph, each keyword is expressed in a different color, and the range of each keyword is indicated by gradation. The gray dots on the UMAP are words with low similarity to keywords. As shown in the figure below, "fell" and "collapsed" almost overlapped on the UMAP. This means that the similarity between the two keywords in UMAP is very high. In the case of the fire keyword, the range appears wider than other keywords, and there is an intersection with the "building" keyword. For the "fell" and "collapsed" keywords, the "people" keyword appeared to be closer than the "building" keyword. By using this, this study can provide information about the relation among words on a two-dimensional plane according to the similarity of words.

Factors Related to Fire Accidents
Articles related to fire accidents were classified to provide an in-depth analysis of the factors related to fire accidents on construction sites. Through the classification of the season in which the article was written, most news articles related to fire accidents on construction sites were written in summer. According to the BLS report, which investigates the distribution of injuries throughout the year in workplaces, more injuries occur during the summer season than at other times of the year [52]. In addition, the two sources had a common feature that injuries were much less frequently reported near the end of the calendar year. Table 5 shows the distribution of articles and injuries related to fire accidents on construction sites by season. In this study, the topics and major factors of articles dealing with fire accidents on construction sites were classified. In the collected articles, the biggest factor in the fire accidents was explosion caused by a chemical gas leak. In addition, building-related factors such as building code violations, lack of regular inspection, and inadequate fire safety systems account for about 40% of major factors in fire accidents. Factors related to the construction activities on sites, such as welding and activities related to demolition and renovation, accounted for a relatively small proportion. It can be confirmed that these results are consistent with the results of the analysis through Word2vec in this study. Table 6 shows the main factors in the articles dealing with fire accidents on construction sites. Table 6. Distribution of major factors in fire accidents.

Factor Percentage
Explosion related to chemical gas 20.0% Violation of building and fire code 16.8% Lack of building and site inspection 11.2% Inappropriate fire safety system 10.4% Carelessness 8.8% High wind 5.6% Absence of an evacuation plan 5.6% Activities related to demolition 5.6% Welding 4.8% Activities related to renovation 3.2% Etc. 8.0%

Discussion
This study explored the relationship between factors related to accidents on construction sites through web crawling and deep learning approaches. It is interesting to find that the media exposure level for fire is disproportionally higher compared with the typical construction "Fatal Four". The difference between the frequency of actual accidents and the level of media exposure can be used as evidence to confirm the great impact that fire accidents have on construction sites. As OSHA publishes reports and statistics related to the "Fatal Four" every year, efforts in addressing them have been the focus of many stakeholders in the construction industry. Fire accidents have a low frequency, and thus, fire accidents are excluded from the list of major accidents on the construction site. However, the results of this study confirmed that fire accidents have a higher media exposure level than other types of accidents that happen more often on job sites, which signifies the greater social impact and severity than other accident types. This result is in line with the fire accident characteristics. Fire accidents are more likely to lead to secondary accidents than other types of accidents on the construction site. Due to the spread of fire, it can affect the surrounding buildings and roads. Therefore, fire safety should at least be equally emphasized when developing on-site safety regulations and policies.

Construction Site Safety Training Period
This study conducted an analysis using Word2vec, one of the word-embedding models. The results through word-embedding can be used as useful data to explore ways to improve the safety of construction sites. One example is that the results of this study can be used to determine the safety training cycle of a construction site. The list of words with high similarity to the three keywords related to accidents ("fire", "fell", "collapsed") commonly includes "Monday" and "Friday". This result is consistent with statistical data related to accidents on construction sites. According to a related study, workers' injuries on construction sites were the highest on Monday [51]. The common results of these studies can be helpful in determining the timing of worker safety training to improve fire safety on construction sites. Fire accidents on construction sites can be effectively prevented by conducting fire safety training for the workers and inspections on Monday, which is the start of the week for construction projects. In addition, since the results of this study and the statistical data provided by The Bureau of Labor Statistics show that the frequency of accidents on construction sites increases in summer, this result should be reflected in the annual safety training schedule.

Fire Detection System on the Construction Site
The word "night" has a high degree of similarity to the "fire" keyword. Construction sites tend to have few occupants after work hours compared to other building types. In particular, a construction site may have a limited number of employees staying overnight for monitoring or may be completely empty, which does not allow for early detection of a fire on a construction site. Furthermore, during the construction phase, it is more difficult to recognize fires because safety equipment such as fire and smoke alarms have not been completed [2]. A report published by the National Fire Protection Association that breaks down direct property damage from construction site fires by time confirms this risk. According to this report, direct damage caused by fires on construction sites occurred from midnight to 4 am, accounting for 31% of the total damage [5]. This is about a 15% increase in direct damage compared to other time periods.
Early fire and smoke detection is very essential to the success of fire safety on building construction sites. Safety regulations and policies should be augmented so that fire and smoke detection systems or monitoring systems should be required on construction sites to detect night-time fires. In addition, there is a need for a plan to expand the periodic fire safety system inspection program that is conducted on completed buildings to the construction stage. Fire protection systems on construction sites are more likely to be exposed to the risk of damage compared to completed buildings, and these risks create a possibility that fire protection systems could not function properly in the event of a fire at night. According to related studies, smoke and fire sensors are difficult to use properly in the construction stage due to the open environmental conditions and environmental complexities of the construction site [16]. These risks are also consistent with the findings of this study. It was found that there were many words that had a meaning of inspection with a high similarity to the "building" keyword. These results show that periodic and reliable on-site inspections are as important as installing additional fire protection systems to improve fire safety on construction sites.

Fire Safety Regulations on the Construction Site
According to the results of this study, the "fire" keyword showed higher similarity to the "building" keyword than the "people" keyword. This can be explained by the fact that fires on construction sites have a higher relationship with building factors. The violation of the building code and the lack of regular inspection and training appeared as the main contributors to fire on the building construction site. According to the collected articles, the most noted violations on the construction site were dangerous working conditions such as the absence of a fire safety system and inadequate safety supervision. Improper use of fire extinguishers, such as those used to prop open doors, and exposure of electrical cords that could cause a fire have also been found in violations of the regulations. In addition, the absence of illuminated signs that can be found in the dark, which affects workers' evacuation in case of fire on a construction site, was pointed out. This result provides important evidence that many fire accidents can be sufficiently prevented through site regulations and behavior changes that reflect the building factors of the construction site. Factors related to the fire safety system on construction sites can improve safety at the design stage of construction projects, and dangerous working conditions on the site can be prevented through appropriate safety supervision and inspection by the site manager.
The current construction site fire safety system does not sufficiently consider the characteristics of the construction stage. Therefore, a blanket approach to address all stages of construction using the same regulation and fire safety system might not be appropriate. It is necessary to consider applying differentiated fire safety regulations for each construction stage by defining major factors and related activities at each stage of construction. An alternative is to determine the fire risk in related activities based on the building code and install a separate fire safety system at the location and time vulnerable to fire on the construction site. For example, some of the news articles reported that site management was aware that welding activities were the main cause of fires on construction sites, but no separate fire safety system was considered for welding activities on site. Indiscriminate exposure of electrical cords and storage of combustible materials on the construction site must be thoroughly managed during welding. In addition, under the current construction safety program, it is difficult to efficiently prevent fire accidents due to the lack of OSHA's fire training program at the construction stage and related research to improve fire safety on construction sites. Therefore, more research and investment in fire safety on construction sites are needed.

Risk of Explosion
In the case of explosions, which account for the largest proportion of fire accidentrelated articles collected in this study, additional modifications to the fire-related regulations and site safety policies will be required. There are many activities where chemical gas is used on construction sites. Depending on the progress of the construction project, there is a possibility that gas pipes can be involved, and safe control on gas pipes is an important factor in improving fire safety. In particular, construction activities that use gas in confined spaces, such as underground, increase the likelihood of fires and explosions on construction sites. Since gas leaks and explosions in confined spaces can cause greater injuries to workers, additional fire safety systems and protective equipment are required through additional regulations based on construction activities. It is possible to consider a method that can inform workers of the danger of explosion in an enclosed space by applying a portable sensor, which has been recently used for indoor air quality analysis.
However, many articles collected in this study point out that there are still no major regulatory changes to address explosion risks. Each regulation should include a manual on handling hazardous gases for each construction activity, and external conditions such as weather that may affect gas expansion should also be considered. In addition, all stakeholders involved in the construction site must be continuously provided with detailed information on where and how much flammable materials such as chemical gases are stored. It is essential to modify and strengthen the building and fire regulations related to the management and inspection of chemical gases according to the construction stage.

Conclusions and Recommendations
Researchers have rarely focused on fire safety on construction sites due to its low occurrence compared with other accidents, such as falls. Researchers also have not adequately explored the factors of fire accidents on construction sites and the effects of fire accidents. This study contributes to the body of knowledge by providing evidence to support the importance of fire safety, identifying specific causes and related factors of fire accidents, and providing specific recommendations to address fire safety on construction sites.
To analyze the influence of factors related to fire accidents on construction sites, new approaches such as web crawling and word embedding using deep learning were introduced and used in this study. This study provides some evidence of factors that influence construction site fires based on previously unstudied media data. It was found that fire accidents could be an important factor threatening the safety of construction sites.
The results of the current study suggest that when developing regulations and policies to improve construction safety, the risk of fire accidents should not be overlooked and should be equally considered. In addition, this study suggested that building and fire code violations, lack of regular inspections, and an incomplete fire safety system were major factors in fire accidents on construction sites. This is not a result of careless and risky behavior of workers on construction sites but factors that can be prevented with appropriate regulations and on-site protection systems. By finding the reasons for fire accidents on the construction site and exploring the relationship between the related factors, this study makes meaningful contributions to developing safety-related regulations that consider the characteristics of construction sites.
The methodology and results chosen for the current study presented several limitations to the analysis. The target of the data collected in the current study is limited, analysis of other targets that can investigate the impact of fire accidents, such as data related to fire insurance on construction sites, is required in addition to the news article. Future studies should analyze factors related to fire safety on construction sites from various perspectives based on data collected from different targets. In this study, an analysis was conducted to find the relationship and similarity between representative keywords, but analysis of sub-words may be required based on sufficient data collection. The current study analyzed representative keywords such as "fire" and "building", but in future studies, the relationship between the sub-keywords belonging to the representative keywords can be explored. Therefore, future studies can analyze the factors related to the fire safety of construction sites by subdividing them, and through this, fire safety on construction sites can be considered on a larger scale. Applying more advanced data collection and analysis techniques to improve the reliability reinforces the results of this paper. New methodologies in other fields, such as deep learning, must be actively introduced to suggest ways to improve the fire safety of construction sites.