Identifying Critical Infrastructure in Imagery Data Using Explainable Convolutional Neural Networks

: To date, no method utilizing satellite imagery exists for detailing the locations and functions of critical infrastructure across the United States, making response to natural disasters and other events challenging due to complex infrastructural interdependencies. This paper presents a repeatable, transferable, and explainable method for critical infrastructure analysis and implementation of a robust model for critical infrastructure detection in satellite imagery. This model consists of a DenseNet-161 convolutional neural network, pretrained with the ImageNet database. The model was provided additional training with a custom dataset, containing nine infrastructure classes. The resultant analysis achieved an overall accuracy of 90%, with the highest accuracy for airports (97%), hydroelectric dams (96%), solar farms (94%), substations (91%), potable water tanks (93%), and hospitals (93%). Critical infrastructure types with relatively low accuracy are likely inﬂuenced by data commonality between similar infrastructure components for petroleum terminals (86%), water treatment plants (78%), and natural gas generation (78%). Local interpretable model-agnostic explanations (LIME) was integrated into the overall modeling pipeline to establish trust for users in critical infrastructure applications. The results demonstrate the effectiveness of a convolutional neural network approach for critical infrastructure identiﬁcation, with higher than 90% accuracy in identifying six of the critical infrastructure facility types.


Introduction
Critical infrastructure (CI) systems in the United States contain a diverse array of facilities, functions, and dependencies [1]. Failure of a facility in one sector can lead to cascading events that negatively impact CI across multiple sectors [2]. An example of this is infrastructure damaged during Hurricane Harvey, where a series of power failures led to a chemical plant explosion and chemical spills impacting the surrounding area [3]. The complexity of these systems, in addition to the siloed nature of CI in the United States, makes it difficult to identify and analyze these systems and their relationships using existing methods and information. The lack of a comprehensive understanding of the locations, types, and dependencies of CI across an area inhibits both the anticipation and response time of state and federal agencies when reacting to natural disasters or other events [4].
To date, no methods exist that enable the detailing of locations and functions of CI across the United States in a computationally efficient and repeatable manner. Instead, CI data exists in silos at the federal, state, municipal, and private levels [4]. Without understanding CI and their dependencies, it is difficult to provide accurate information in implementing risk mitigation measures and in emergency response. The ability to identify assets across multiple CI sectors is particularly important. Doing so enables identification of not only individual assets but also inferences to be made of functional relationships between assets in different sectors across both service provision and geographic infrastructural interdependencies [5]. An additional challenge associated with the study of CI is the evolving nature of CI systems. Thus, it is important to facilitate the timely updating of information regarding the construction and decommissioning of different CI facilities.
In this paper, we present a new approach to address these challenges in CI analysis. The main contributions are to: (1) provide a novel repeatable machine-learning method for the cross-sector identification of multiple CI facilities in satellite imagery with a high degree of accuracy, an approach currently absent in the existing body of literature, and (2) provide explanations for the model's conclusions. To achieve this, we use a combination of unique data generation practices, a DenseNet161 convolutional neural network (CNN) architecture, and two explainability frameworks: local interpretable model-agnostic explanations (LIME) and Shapley additive explanations (SHAP). The method enables the model to be easily transferred for use with new data as updated aerial imagery becomes available.
The rest of the paper is organized as follows: • Section 2 describes related work in this area and the advancements of this work compared to prior studies. • Section 3 describes our methodology, including data generation and model selection. • Section 4 presents our results and provides discussion on the accuracy of the outcomes.
To provide further insights into the results, this section also describes our explainability analysis, including the explainability frameworks implemented and the subsequent analysis of the results. • Finally, Section 5 provides conclusions on this work and describes future research directions based on the outcomes of this study.

Background and Related Work
The identification of objects in satellite imagery data has been an extensively studied field. In the domain of CI, there has been limited application of current machine-learning techniques to the identification process. Recent work within the related domain of damage detection in urban images focused on utilizing modified weakly supervised attention networks to detect destruction within an urban image [6]. Additionally, several works have identified components of CI, such as airports [7][8][9], ships [10][11][12], and roads [13]. These efforts include a range of approaches, including AlexNet [7] or VGG16 [14] architectures. The most common approach for these problems is using variations of R-CNN architectures [8][9][10][11][12]. However, these previous works have focused on detection of a single type of facility. Few, if any, efforts have included the domain diversity (i.e., range of CI sectors and facility types) provided in this work, and none have been developed with CI analysts as a target customer for the final results.
This work utilizes a DenseNet 161 architecture. Convolutional neural networks such as DenseNet161 will convert an image into a mathematical representation (matrix). From this point on, the derived matrix will go through a series of convolution, pooling, flattening layer, dropout layer, densely connected layer and activation functions that compose the bases of CNN architectures [15]. It is through this process that the model can "learn" the common features of a class of images (e.g., airports). When classifying an image, a trained CNN will utilize the feature learning obtained from the training process to classify an unknown image. Specific architectural features vary depending on the architecture in use. For example, the DenseNet architectures contain DenseBlocks, sections of architecture that are fully connected to other layers within the block. These blocks reduce accuracy decline, which is attributed to distance between input and output layers [16]. To date, no significant work has been published regarding the use of explainable CNNs in the identification of specific CI facilities in diverse CI sectors. There has been previous work regarding the detection of infrastructure expansion [17] and infrastructure quality [18]. However, the majority of this work focuses on change detection at the country or city level. These previous works do not touch on the identification of individual facilities. In the prior work on infrastructure expansion, infrastructure quality, or work at the facility scale focusing on detection of only single facility types, none of this work has incorporated elements of explanability for model conclusions in this domain.
This paper describes the creation of a new approach that combines machine-learning methods with subject matter expertise to result in a domain aware CI dependency analysis tool. Ultimately this paper aims to introduce the use of a deep learning technique in the identification of CI and to establish baseline performance metrics for lifeline CI sectors using a DenseNet161 CNN architecture. The result of this work is a repeatable, transferable, and explainable method for CI detection. It is applied to nine different CI asset types, highlighting the need and ability to identify and distinguish between facilities in different CI sectors. Results have applications in a range of end-use scenarios for CI including emergency response, dependency analysis, and identification of vulnerabilities that can be bolstered to ensure the safety, security, and resilience of CI systems.

Materials & Methods
This section details the data generation and model selection processes to build and train the proposed machine-learning model for CI identification. The detailed description of the workflow in its entirety is presented in the following text, but to provide an overview here, a schematic of the modeling and data analysis pipeline is shown in Figure 1. The figure describes the flow of information from image data to final explainable predictions. This process includes a train/test split, development of a DenseNet161 model, implementation of the model on unknown data, explainable assessments via LIME-described in more detail in Section 4.4-and a final output including the top three most probable predictions, along with an image communicating which superpixels in the image were the most influential in the classification's top prediction.
has been previous work regarding the detection of infrastructure expansion [17] and infrastructure quality [18]. However, the majority of this work focuses on change detection at the country or city level. These previous works do not touch on the identification of individual facilities. In the prior work on infrastructure expansion, infrastructure quality, or work at the facility scale focusing on detection of only single facility types, none of this work has incorporated elements of explanability for model conclusions in this domain.
This paper describes the creation of a new approach that combines machine-learning methods with subject matter expertise to result in a domain aware CI dependency analysis tool. Ultimately this paper aims to introduce the use of a deep learning technique in the identification of CI and to establish baseline performance metrics for lifeline CI sectors using a DenseNet161 CNN architecture. The result of this work is a repeatable, transferable, and explainable method for CI detection. It is applied to nine different CI asset types, highlighting the need and ability to identify and distinguish between facilities in different CI sectors. Results have applications in a range of end-use scenarios for CI including emergency response, dependency analysis, and identification of vulnerabilities that can be bolstered to ensure the safety, security, and resilience of CI systems.

Materials & Methods
This section details the data generation and model selection processes to build and train the proposed machine-learning model for CI identification. The detailed description of the workflow in its entirety is presented in the following text, but to provide an overview here, a schematic of the modeling and data analysis pipeline is shown in Figure  1. The figure describes the flow of information from image data to final explainable predictions. This process includes a train/test split, development of a DenseNet161 model, implementation of the model on unknown data, explainable assessments via LIMEdescribed in more detail in Section 4.4-and a final output including the top three most probable predictions, along with an image communicating which superpixels in the image were the most influential in the classification's top prediction.

Data Generation
For our analysis, we identified nine CI facilities within five CI sectors for study. These are shown in Table 1. To qualify for consideration in this work, a facility had to be considered critical to sector operations, possess a facility footprint detectable from a standard RGB satellite image, and contribute to demonstrating the model's ability to correctly identify heterogeneous facilities across multiple sectors. These nine facility types were selected to represent a range of CI functions. They represent facilities critical to the energy, water, transportation, healthcare, and chemical sectors. Three of these sectors (energy, water, and transportation) are designated by the U.S. Department of Homeland Security as lifeline sectors. Loss of a lifeline sector facility will have a direct impact on the resilience of the affected facility and any interdependent facilities [19]. As inputs for the model training, facility locations were obtained from Idaho National Laboratory's All Hazard Analysis (AHA) database. AHA is a methodology and application to collect, store, and model function, commodity, and service flows of interconnected systems to facilitate scalable and repeatable assessments of system behaviors suitable for vulnerability, consequence, and risk analysis [20]. Facility locations were then overlayed with the U.S. Department of Agriculture's National Agriculture Imagery Program's (NAIP) most recent data layer. The NAIP data set was selected for four fundamental reasons: (1) a 3-year refresh rate of the data set, (2) a resolution ranging between 2 m and 0.5 m, (3) an average of 10% or less cloud cover in gathered images, and (4) coverage of the contiguous United States [21]. Single facility images were extracted from NAIP using a combination of manual and automated techniques. An example of NAIP imagery data with AHA overlay is shown in Figure 2. The example shows an airport, as identified and indicated by the green dot on the image. Total number of datapoints for each facility ranged from 479 to 2292 unique images as shown in Table 2. Data was then randomly sorted along a 20% testing and 80% training split.

Model Selection
Numerous deep learning models and approaches are currently available to researchers working in the field of imagery classification. These approaches range from widely available and utilized architectures to advanced multimodal deep learning and cross-modality learning frameworks that allow for complex in-depth analysis of the imagery they classify [22]. This work utilized the former approach, given the encouraging results of the widely available architectures demonstrated throughout the training and testing process and the desire to make the work easily replicable to a wide audience. Additionally, we implemented deep learning rather than shallow methods (traditional machine learning) because of the complexity of the datasets. One challenge of automated CI analysis is that the facilities themselves are diverse as are the background (i.e., surrounding geography). During the exploratory phase of this project, the team implemented a range of network depths and network types and found that acceptable accuracy was not achievable with anything less than a deep and fully connected network.

Model Selection
Numerous deep learning models and approaches are currently available to researchers working in the field of imagery classification. These approaches range from widely available and utilized architectures to advanced multimodal deep learning and cross-modality learning frameworks that allow for complex in-depth analysis of the imagery they classify [22]. This work utilized the former approach, given the encouraging results of the widely available architectures demonstrated throughout the training and testing process and the desire to make the work easily replicable to a wide audience. Additionally, we implemented deep learning rather than shallow methods (traditional machine learning) because of the complexity of the datasets. One challenge of automated CI analysis is that the facilities themselves are diverse as are the background (i.e., surrounding geography). During the exploratory phase of this project, the team implemented a range of network depths and network types and found that acceptable accuracy was not achievable with anything less than a deep and fully connected network.
The initial stage of model selection was exploration-based, where the DenseNet-201, DenseNet-161, ResNeXt-101, and Resnet-152 were implemented and assessed for performance. These assessments focused on accuracy, training loss, validation loss, and a qualitative assessment of LIME explanations. Based on preliminary findings, a DenseNet-161 architecture was implemented for the final model. The DenseNet architecture was developed by Huang et al. [16] and implements a densely connected CNN, where each node is fully connected to every other node in a series. Unlike residual styles, such as ResNet and ResNeXt, dense CNN blocks do not utilize skip connections. Instead, they are designed for efficiency by implementing shallow sub-networks separated by convolution and pooling layers that simplify the data. DenseNet's robust communication between nodes is computationally expensive but facilitates the assessment of complex data, such as the complex imagery data generated via remote sensing as used in this work. Based on the preliminary results and this architectural style that is less prone to overfitting and requires fewer parameters to develop an accurate model than alternative methods (Huang et al., 2017), this model architecture was therefore selected and used here.

Results and Analysis
This section describes the results of the model based on the DenseNet-161 architecture. Included are the overall accuracy and training loss results as well as accuracy results for Remote Sens. 2022, 14, 5331 6 of 14 individual CI facilities by type from cross validation. Next, the explainability activities conducted during the model development process are described, along with implementation and outcomes of both LIME and SHAP explainability frameworks. Resultant dataset analysis based on the explainability outcomes are then discussed.

Model Accuracy
With the developed model, we evaluated the results by both accuracy and training loss as shown in Figure 3. Accuracy is measured by the proportion of correctly identified facilities when testing data is used in conjunction with a trained model; training loss is the summation of incorrect predictions that occurred during a training epoch. The lower the training loss, the more accurate the model should be. Results for both the training data (80% training set) and validation data (20% testing set) are shown. From Figure 3, the results show that after an initial training phase, the highest overall validation accuracy of 82% was achieved at Epoch 33. During model training, the model accuracy and training loss improved significantly between Epochs 1-8 and only improved slightly with additional training. The values converge near their best performance around Epoch 15, suggesting that the model is slightly underfit and additional improvements in performance likely will not be achieved by processing the data over additional epochs. Accuracy ( Figure 3a) and loss (Figure 3b) patterns for Epochs 9-50 include a substantial amount of noise which is likely caused by the inherent complexity of remotely sensed imagery data [23]. This is caused by the diversity of the CI facilities themselves as well as the diverse background pixels relating to climate and landscape diversity throughout the United States. The best accuracies for each class (i.e., each CI facility type) are ranked as follows: airports (97%), hydroelectric facilities (96%), solar farms (94%), hospitals (93%), potable water tanks (93%), substations (91%), petroleum terminals (86%), natural gas generation plants (78%), and water treatment plants (78%). These results are presented as a confusion matrix in Figure 4. The confusion matrix presents the predicted (horizontal axis) compared to true (vertical axis) data labels. The frequency of intersections between predicted values and true values are represented by a color bar, where the most frequent intersections (most accurate) are indicated by the darker blue and the least frequent (least accurate) are white. The results indicate airports, solar farms, hydroelectric facilities, substations, finished water tanks, and hospitals as achieving greater than 90% accuracy (i.e., where more than 90% of the true and predicted labels for that class are the same). The best accuracies for each class (i.e., each CI facility type) are ranked as follows: airports (97%), hydroelectric facilities (96%), solar farms (94%), hospitals (93%), potable water tanks (93%), substations (91%), petroleum terminals (86%), natural gas generation plants (78%), and water treatment plants (78%). These results are presented as a confusion matrix in Figure 4. The confusion matrix presents the predicted (horizontal axis) compared to true (vertical axis) data labels. The frequency of intersections between predicted values and true values are represented by a color bar, where the most frequent intersections (most accurate) are indicated by the darker blue and the least frequent (least accurate) are white.
The results indicate airports, solar farms, hydroelectric facilities, substations, finished water tanks, and hospitals as achieving greater than 90% accuracy (i.e., where more than 90% of the true and predicted labels for that class are the same). matrix in Figure 4. The confusion matrix presents the predicted (horizontal axis) compared to true (vertical axis) data labels. The frequency of intersections between predicted values and true values are represented by a color bar, where the most frequent intersections (most accurate) are indicated by the darker blue and the least frequent (least accurate) are white. The results indicate airports, solar farms, hydroelectric facilities, substations, finished water tanks, and hospitals as achieving greater than 90% accuracy (i.e., where more than 90% of the true and predicted labels for that class are the same).

Cross-Validation
To further evaluate model accuracy, we conducted a k-fold cross-validation analysis [24]. Based on a series of test runs with k values ranging from 5-50, we established that of k of 10 folds was appropriate. During each iteration of k, data for each facility type was randomized k times and then split into training and testing data sets along an 80/20 split, respectively. The CNN model was then run k times. The overall cross-validation process is shown in Figure 5.

Cross-Validation
To further evaluate model accuracy, we conducted a k-fold cross-validation analysis [24]. Based on a series of test runs with k values ranging from 5-50, we established that of k of 10 folds was appropriate. During each iteration of k, data for each facility type was randomized k times and then split into training and testing data sets along an 80/20 split, respectively. The CNN model was then run k times. The overall cross-validation process is shown in Figure 5.  Table 3 presents the accuracy results averaged over all runs [25]. The similarity between the initial analysis results and the results from cross-validation indicate that the accuracy results are consistent, and the model results are unbiased relative to the data distribution. In addition, the high accuracy in cross-validation indicates the  Table 3 presents the accuracy results averaged over all runs [25]. The similarity between the initial analysis results and the results from cross-validation indicate that the accuracy results are consistent, and the model results are unbiased relative to the data distribution. In addition, the high accuracy in cross-validation indicates the generalizability of the approach to new datasets, particularly for the identification of airports, hydroelectric dams, solar farms, potable water tanks, and substations facilities.

Explainability
While many machine-learning models operate as black boxes, in addition to the accuracy results presented in the previous sections, key to this work is our explainability analysis of the model outcomes. Explainability is a process that can assist in determining why a machine-learning model produced a certain output given a unique input, "explaining" how a trained model came to its conclusions. This provides a window into an otherwise black box process. Different machine-learning models and approaches utilize different implementations of explainability [26]. For our purposes, we utilize explainability to ensure the trained model is detecting the correct CI facilities for each class, guard against any unknown bias present in the training data set and provide a level of certainty in the model's conclusions. Additionally, we integrate our explainability approaches into the overall modeling pipeline to establish a basis of trust for potential non-expert users to view and understand model classifications. This trust in the model's conclusions is particularly important for CI applications, where asset and facility identifications have lifeline-critical implications, and where information is to be used by CI owners, operators, and emergency response personnel. The following two sections describe the analysis outcomes from implementing the LIME and SHAP explainability frameworks.

LIME Implementations
LIME is a model-agnostic approach utilized in the explanation of machine-learning classification models [27]. When applied to an image classification model, LIME begins its analysis by dividing an image into superpixels or defined regions within the given image. A linear regression model is then trained based on the probabilities of correct classifications produced by turning off and on various superpixels. The results of the linear regression model are then used to apply positive or negative weights to each superpixel region. These weights correlate with how important a region is in the classification of an image. Figure 6 shows an example of LIME's weighted superpixels feature applied to a substation. In the rightmost part of the figure, the darkest colors indicate areas of higher correlation, indicating the region was weighted heavily in the model's classification process. LIME was utilized in our process to validate model classifications by running 100 random samples from each class through LIME to confirm that the classification model was correctly classifying images based on the CI present in the sample image. Figure 7 shows example LIME results for a solar farm, water treatment plant, substation, petrol terminal, airport, and hydroelectric dam. For each pair of images for each facility type, the lefthand images show the highlighted superpixels defined by LIME. In the righthand images, red indicates areas of negative correlation, and blue indicates areas of positive correlation between superpixel region and probability of a correctly classified image, with values ranging between −1 and 1. LIME is a model-agnostic approach utilized in the explanation of machine-learning classification models [27]. When applied to an image classification model, LIME begins its analysis by dividing an image into superpixels or defined regions within the given image. A linear regression model is then trained based on the probabilities of correct classifications produced by turning off and on various superpixels. The results of the linear regression model are then used to apply positive or negative weights to each superpixel region. These weights correlate with how important a region is in the classification of an image. Figure 6 shows an example of LIME's weighted superpixels feature applied to a substation. In the rightmost part of the figure, the darkest colors indicate areas of higher correlation, indicating the region was weighted heavily in the model's classification process. Figure 6. Example of LIME's weighted superpixels feature applied to a substation. LIME was utilized in our process to validate model classifications by running 100 random samples from each class through LIME to confirm that the classification model was correctly classifying images based on the CI present in the sample image. Figure 7 shows example LIME results for a solar farm, water treatment plant, substation, petrol terminal, airport, and hydroelectric dam. For each pair of images for each facility type, the lefthand images show the highlighted superpixels defined by LIME. In the righthand   Table 4 gives the LIME results for the nine CI facility classes. LIME provides the top three predictions for a given image. In Table 4, "First Guess" gives the accuracy percentage of LIME's first guess out of the test set. "Overall" provides the correct estimations percentage across the top three predictions. Comparing the results shown in Table 4 with those in Table 3, LIME's performance was similar to the overall model accuracy. For the LIME analysis, noting that a sample size of 100 is a smaller testing data set than was used for cross validation, this could account for variability in the predicted classes. For the potable water tanks and hospitals classes, LIME accuracy was lower than DenseNet-161 model accuracies. This was attributed to the general ambiguity of the features within both classes. Hospitals appear as generic buildings and potable water tanks appear as circles. As LIME is designed to distinguish unique features in an image,  Table 4 gives the LIME results for the nine CI facility classes. LIME provides the top three predictions for a given image. In Table 4, "First Guess" gives the accuracy percentage of LIME's first guess out of the test set. "Overall" provides the correct estimations percentage across the top three predictions. Comparing the results shown in Table 4 with those in Table 3, LIME's performance was similar to the overall model accuracy. For the LIME analysis, noting that a sample size of 100 is a smaller testing data set than was used for cross validation, this could account for variability in the predicted classes. For the potable water tanks and hospitals classes, LIME accuracy was lower than DenseNet-161 model accuracies. This was attributed to the general ambiguity of the features within both classes. Hospitals appear as generic buildings and potable water tanks appear as circles. As LIME is designed to distinguish unique features in an image, this results in suspected inaccuracies when LIME establishes superpixels for classification for these classes.

SHAP Implementations
SHAP is another model-agnostic approach for explainability that utilizes cooperative game theory to determine which features of an image are crucial in the classification process. When using images, the pixels can be grouped into regions, distributing the predictions in the regions. For our purposes, we utilized SHAP with DeepExplainer, which is considered an enhanced version of the DeepLIFT algorithm. DeepExplainer approximates the SHAP values when going over several background samples by summing the difference between the expected model output based on the passed background samples and the current model output.
Similar to the analysis conducted with LIME, 100 images were randomly selected and analyzed with SHAP, returning the top three classification predictions for each image. SHAP denotes the correlation between a pixel and the model's weighting of the pixel when classifying the image with pink highlighting when positive and blue highlighting when negative as shown in Figure 8. The accuracy results from SHAP are shown in Table 5. SHAP provides the top three predictions for a given image. In Table 5, both first guess and overall accuracy values are shown. "First Guess" gives the accuracy percentage of SHAP's first guess out of the randomly selected 100 images; "Overall" gives the percentage of correct estimations across the top three predictions. A notable difference in the SHAP results from LIME was SHAP's poor performance across all but two classes (airports and hydroelectric dams) for first guess accuracy. When the top three classifications are considered, SHAP's results improve but still underperforms compared to both model and LIME accuracy. Locating the cause of the SHAP's accuracy discrepancies would require further investigation. However, given the performance of LIME, the LIME-explainability framework is better suited for analysis of CI imagery data and is recommended for use in the overall modeling and analysis pipeline, as detailed in Figure 1.
(airports and hydroelectric dams) for first guess accuracy. When the top three classifications are considered, SHAP's results improve but still underperforms compared to both model and LIME accuracy. Locating the cause of the SHAP's accuracy discrepancies would require further investigation. However, given the performance of LIME, the LIME-explainability framework is better suited for analysis of CI imagery data and is recommended for use in the overall modeling and analysis pipeline, as detailed in Figure 1.

Dataset Analysis
Considering the range of accuracy results across the CI facility types from Table 3 combined with the outcomes from the explainability analysis, we examined more closely the results for those classes with less than 90% accuracy. Of the original nine classes, the training data for potable water tanks, natural gas generation plants, petroleum terminals, substations, and water treatment plants exhibited less than 90% accuracy. Class outputs were examined using LIME and SHAP to determine which features were being misidentified. Using this approach, we determined that a large source of class confusion was originating from poor training data image quality. Once removed from the data set, two of five classes' (potable water tanks and substations) accuracy levels were increased to a 90% or greater accuracy when tested with cross-validation (Table 3). Data removal was based on two metrics: clarity of an image and the amount of noise or competing non-related class features in an image. While this was performed manually, it is only a one-time effort and does not need to repeated for use of the data analysis pipeline. Low levels of accuracy in the remaining three classes were attributed to commonality in imagery data between classes. When tested in isolation from other like classes, class accuracy improved to above the 90% accuracy threshold.
Beyond the aforementioned data quality assurance, data set size is limited by the number of locations where critical infrastructure exists. For example, there are substantially fewer airports than there are substations and so you're left with the option to either have substantially different dataset sizes or to limit the number of locations included from the larger class. An additional option is to introduce synthetically generated data, but as this is a benchmarking study, that is beyond the scope of the work presented here. The final challenge that should be addressed is the geographic diversity across the United States, resulting in a wide range of landscape types in the imagery surrounding infrastructure. This effectively adds noise to the data because the model has no way of knowing what pixels are part of the background versus those that are representative of components of interest as it develops the models. Unfortunately, current methods rely on rectangular areas of interest which makes it difficult to develop a model if the shape of the target of interest is non-rectangular or is positioned at an inconvenient orientation. A potential solution to this in future work is to implement feature masking or even manual reorientation of features to reduce the number of background images included in the training data.

Conclusions
This work provides a foundational understanding of how effective deep learning is for CI analysis. Presently, CI analysis is a labor-intensive activity that depends on consistent manual assessments by subject matter experts. This is problematic during crisis conditions when efficiency is key to effective response, such as when a natural disaster occurs. This paper does not solve those problems but provides a baseline understanding of the effectiveness of convolutional neural networks for CI applications. This work benefits from the All Hazards (AHA) database, which includes the most extensive geospatial and dependency-focused data source for critical infrastructure within the United States. Even with AHA, there are still several challenges remaining in this domain, especially relating to the number of data points available and the impacts of geographic diversity.
The method detailed in this work produced a model trained to recognize the nine classes of interest from open-source satellite imagery. It achieved a high degree of accuracy from open-source imagery data. The integration of a trust mechanism using LIME and SHAP provides potential users with a high degree of confidence, particularly with LIME, when assessing model classifications. The work presented here is the first instance of using explainable CNNs in the identification of specific CI facilities in diverse CI sectors. Both the trained model and explainability approaches provide a repeatable and reliable method for identifying the nine classes of CI for which the model was trained. In practice, the method could be utilized in additional CI research and analysis to identify previously unknown CI facilities. The method is transferable for use with new data as updated aerial imagery becomes available. The model is easily rerun with new data to provide timely updated information of the construction or decommissioning of different CI facilities. Given an updated imagery set for classification and proper CI baseline for an area, the model could be utilized for increasing situational awareness of CI assets for disaster preparedness and response.
The nine classes studied in this work represent a significant advancement on prior work. In future work, the number of facilities investigated can be expanded to include the full range of CI facilities that exist. The current nine classes were chosen to demonstrate method applicability across multiple CI sectors with a focus on lifeline sectors. CI sectors are composed of numerous individual facilities. Additional CI facilities fall outside of the scope of identification by traditional satellite imagery data (e.g., buried pipelines or non-descript buildings). Of the facilities that do fall within the scope of traditional satellite imagery, data availability was a determining factor in facility type selection. If there was not enough location data present for a given CI facility type, it was not included in this work. If additional data is available, the proposed approach can be utilized to identify those facilities.
Expansion of this work would include incorporating semantic segmentation to allow for finer grain analysis of individual components of identified CI facilities. Successful component identification could lead to the successful identification of dependencies, such as estimations of required treatment chemicals at a water treatment plant or the feasible generation capacity of a power plant. Incorporating semantic segmentation would require an expanded higher resolution data set and expanded classification ability. Funding: This work of authorship was prepared as an account of work sponsored by Idaho National Laboratory (under Contract DE-AC07-05ID14517), an agency of the U.S. Government. Neither the U.S. Government, nor any agency thereof, nor any of their employees makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights.
Data Availability Statement: Data was derived from United States Department of Agriculture's National Agriculture Imagery Program. Specific testing and training data sets can be obtained by contacting the authors.