Aerial Imagery-Based Building Footprint Detection with an Integrated Deep Learning Framework: Applications for Fine Scale Wildland–Urban Interface Mapping

: Human encroachment into wildlands has resulted in a rapid increase in wildland–urban interface (WUI) expansion, exposing more buildings and population to wildﬁre risks. More frequent mapping of structures and WUIs at a ﬁner spatial resolution is needed for WUI characterization and hazard assessment. However, most approaches rely on high-resolution commercial satellite data with a particular focus on urban areas. We developed a deep learning framework tailored for building footprint detection in the transitional wildland–urban areas. We leveraged meter scale aerial imageries publicly available from the National Agriculture Imagery Program (NAIP) every 2 years. Our approach integrated Mobile-UNet and generative adversarial network. The deep learning models trained over three counties in California performed well in detecting building footprints across diverse landscapes, with an F1 score of 0.62, 0.67, and 0.75 in the interface WUI, intermix WUI, and rural regions, respectively. The bi-annual mapping captured both housing expansion and wildﬁre-caused building damages. The 30 m WUI maps generated from these ﬁner footprints showed more granularity than the existing census tract-based maps and captured the transition of WUI dynamics well. More frequent updates of building footprint and improved WUI mapping will improve our understanding of WUI dynamics and provide guidance for adaptive strategies on community planning and wildﬁre hazard reduction.


Introduction
The wildland-urban interface (WUI), defined as the transition area where urban development meets or intermingles with the undeveloped wildland dominated by vegetation, is widespread across the globe [1,2]. The WUI extent has been increasing rapidly in many countries [3,4]. In the United States (US), decentralized urbanization has led to a rapid development in the outlying fringes of cities, fragmented rural areas and forests, and subsequently an increase in the WUI areas over the past three decades [5,6]. One tenth of the U.S. land areas are approximately within the WUI, home to around one third of the houses [7]. The loss of wildland areas, which used to serve as critical buffers from natural disasters such as wildfires, when combined with higher probability of human ignition, can increase community exposure to wildfire risk and destruction [6][7][8]. For example, the 2018 Camp fire burned in WUI destroyed over 18,000 structures with 85 fatalities, causing significant impacts on the functionalities of urban facilities [9,10]. Therefore, it is critical to routinely examine and update the WUI extent and characteristics, especially at the individual building level, in order to understand the spatio-temporal dynamics of WUIs and assess the community wildfire risk for planning and hazard preparedness purposes.
This study therefore aims to develop a deep learning-based method to map building footprints across landscapes with diverse building-vegetation mixtures using NAIP aerial imageries and to further improve WUI mapping. Specifically, we first developed a combined framework integrating Mobile-UNet and generative adversarial network (GAN) for the semantic segmentation of building footprints, using ground truth footprints from three counties in California. The ability of the model to capture spatial patterns and temporal dynamics of buildings was then examined over another three counties, taking advantage of the full time series of NAIP imageries since 2010. We further explored the potential improvements in generating WUI maps and analyzed the WUI dynamics through time.

Datasets
NAIP aerial photos were downloaded from Google Earth Engine for six counties in California, i.e., Shasta, Lake, Napa, Sonoma, San Luis Obispo, and Orange Counties. These counties have experienced a rapid expansion of the WUI in the past two decades [40,41], and represent the diverse landscape from the northern to southern part of the state. The NAIP imageries provide ortho photography in four spectral channels (red, green, blue, and near infrared) for the whole continental United States during the agricultural growing season. In California, it provides high-resolution images at 1 m since 2009 and at 0.6 m since 2016, with a 2-year acquisition cycle.
For building footprint detection algorithm development, we focused on Shasta, Napa, and San Luis Obispo (Figure 1a), where the complete building footprint data are available for the entire county. We obtained the footprints shapefiles, derived from various sources, from the corresponding county websites [42][43][44]. These footprints matched well with the ground truth for both locations and geometries based on the visual inspection with ortho-imageries. We further converted these reference shapefiles into binary rasters at 1 m resolution to match the pixel size of NAIP imagery dated back to 2009. For comparison purposes, we also obtained the 2018 Microsoft building footprint data detected from centimeter-resolution Bing images, which has a precision of 99% and recall of 92% across the U.S. based on 15,000 tested buildings [29].

Deep Learning Model Architecture
As an advanced technique in computer vision, deep learning models have recently been applied to remote sensing imagery and achieved state-of-the-art results in both pixel-and object-level classification tasks [45][46][47][48][49]. Our model framework consists of two components ( Figure 2). The Mobile-UNet was first used to generate building segment candidates from NAIP images, i.e., detecting candidate building pixels [50]. The fully convolutional network (FCN) can efficiently label pixels from high-resolution images [51]. The UNet model, for example, has been used to map objects such as tree crowns, roads, and buildings from commercial satellite images or aerial photos [52][53][54][55]. The UNet network structure uses convolutional layers to perform the semantic segmentation via spatial feature extraction by encoder followed by segmentation construction by decoder [52]. A UNetbased architecture was found to perform better for buildings in WUI regions than other network structures such as FCN or DeepLabv3 [45]. We used the Mobile-UNet model due to its improved accuracy and efficiency [50]. It replaces the UNet encoder with the MobileNetV2, a simple but efficient network, for robust feature extraction [56]. The adoption of depth-wise separable convolution reduces both the size and the complexity cost of the network [56]. Moreover, its implementation requires less parameters and thus potentially minimizes the over-fitting issue [50]. Features extracted from MobileNetV2 were further deconvoluted to generate segmentation masks [50].

Figure 1.
Foci of the study areas (a). NAIP image subsets from three counties, (b) Shasta, (c) Napa, and (d) San Luis Obispo were used to train and test the deep learning model. The model was applied to all NAIP imagery in Lake, Sonoma, and Orange Counties for building detection and WUI mapping every 2 years.

Deep Learning Model Architecture
As an advanced technique in computer vision, deep learning models have recently been applied to remote sensing imagery and achieved state-of-the-art results in both pixeland object-level classification tasks [45][46][47][48][49]. Our model framework consists of two components ( Figure 2). The Mobile-UNet was first used to generate building segment candidates from NAIP images, i.e., detecting candidate building pixels [50]. The fully convolutional network (FCN) can efficiently label pixels from high-resolution images [51]. The UNet model, for example, has been used to map objects such as tree crowns, roads, and buildings from commercial satellite images or aerial photos [52][53][54][55]. The UNet network structure uses convolutional layers to perform the semantic segmentation via spatial feature extraction by encoder followed by segmentation construction by decoder [52]. A UNet-based architecture was found to perform better for buildings in WUI regions than other network structures such as FCN or DeepLabv3 [45]. We used the Mobile-UNet model due to its improved accuracy and efficiency [50]. It replaces the UNet encoder with the MobileNetV2, a simple but efficient network, for robust feature extraction [56]. The adoption of depth-wise separable convolution reduces both the size and the complexity cost of the network [56]. Moreover, its implementation requires less parameters and thus To refine the building segments from the Mobile-UNet, the conditional generative adversarial network (cGAN) was applied to combine the candidate map with the original input images for final prediction labels [57]. This second step is necessary due to the potential challenges of using coarser resolution imagery for building segmentation in the diverse WUI landscapes, e.g., missing pixels or partially occluded objects, and false alarms. Originally proposed as the generative model for unsupervised learning, the GAN model includes a pair of two competing networks, namely, the generator and the discriminator [57,58]. The objective of the generative network is to generate fake samples while the discriminative network aims to evaluate outputs from the generator and distinguish these generated samples from the true data distribution [58]. cGAN extends the basic GAN model to condition on external information and thus can be used for image-to-image translation [59,60]. We here adopted the model structure proposed by Isola et al. (2018), which uses a U-Net-based generator and a convolutional PatchGAN discriminator, for image translation ( Figure 2). In cGAN, the generator not only aims to synthesize realistic-looking images to fool the discriminator, but also uses auxiliary information to generate images matching the labels [61]. The PatchGAN discriminator runs convolutionally across the image, focuses on each N×N patch of the image, and determines if it is real or fake [60,62]. It only penalizes the structure at the scale of image patches and then averages out all responses to make the final decision [63]. potentially minimizes the over-fitting issue [50]. Features extracted from MobileNetV2 were further deconvoluted to generate segmentation masks [50]. To refine the building segments from the Mobile-UNet, the conditional generative adversarial network (cGAN) was applied to combine the candidate map with the original input images for final prediction labels [57]. This second step is necessary due to the potential challenges of using coarser resolution imagery for building segmentation in the diverse WUI landscapes, e.g., missing pixels or partially occluded objects, and false alarms. Originally proposed as the generative model for unsupervised learning, the GAN model includes a pair of two competing networks, namely, the generator and the discriminator [57,58]. The objective of the generative network is to generate fake samples while the discriminative network aims to evaluate outputs from the generator and distinguish these generated samples from the true data distribution [58]. cGAN extends the basic GAN model to condition on external information and thus can be used for image-to-image translation [59,60]. We here adopted the model structure proposed by Isola et al. (2018), which uses a U-Net-based generator and a convolutional PatchGAN discriminator, for image translation (Figure 2). In cGAN, the generator not only aims to synthesize realisticlooking images to fool the discriminator, but also uses auxiliary information to generate images matching the labels [61]. The PatchGAN discriminator runs convolutionally across the image, focuses on each N×N patch of the image, and determines if it is real or fake [60,62]. It only penalizes the structure at the scale of image patches and then averages out all responses to make the final decision [63].

Data Preparation
For the model development, we used the 2016 NAIP imagery for Napa and San Luis Obispo, and 2018 imagery for Shasta County to match the report years of the corresponding building footprint reference data. NAIP images were resampled from 0.6 m to 1 m, in

Data Preparation
For the model development, we used the 2016 NAIP imagery for Napa and San Luis Obispo, and 2018 imagery for Shasta County to match the report years of the corresponding building footprint reference data. NAIP images were resampled from 0.6 m to 1 m, in order to apply the pre-trained model to 1 m NAIP images prior to 2016. Both the NAIP images and the reference building data were partitioned into blocks of 512 m by 512 m. We compiled a total of 2573 NAIP image subsets to cover different types of human settlement patterns in WUI, rural, and urban areas ( Figure 1). These subsets represented over 10,000 buildings across these three counties. We further randomly sampled 1200 image blocks for model training and 670 blocks for the general model accuracy evaluation (Figure 1). To further examine the model performance across the four different residential patterns, the rest of 703 blocks were reserved as an independent evaluation dataset, including 128 interface WUI subsets (6573 buildings), 179 intermix WUI subsets (2430 buildings), 68 urban subsets (4878 buildings), and 327 rural subsets (1121 buildings).
To further evaluate the model's ability in capturing both spatial and temporal dynamics of the building footprint and WUIs, we used all NAIP images from 2010 to 2018 in Lake County, and in 2010 and 2018 for Sonoma County and Orange County.

Model Configurations
The structure of our framework integrating two models is shown in Figure 2. The model takes image blocks at the size of 512 by 512, corresponding to 512 m by 512 m on the ground. We first applied two image preprocessing steps, histogram normalization through adaptive equalization and wavelet-based image denoising [64]. During the preliminary experiment, we also compared different input channels, including all four input bands, natural color composite (red, green, and blue), color infrared (near infrared, red, and green), and top three principal components. Simple RGB input was found to provide the best results and thus was used for this study.
The Mobile-UNet component consists of a contraction path and an expansion path. The contraction section applies an encoder with five inverted residual blocks to the input NAIP image to extract features. Each block includes the 1 × 1 convolution with the batch normalization, the rectified linear unit (ReLU) activation function, and a stride of 1; the 3 × 3 depth-wise convolution with batch-normalization, the ReLU function, and a stride of 2, and one more 1 × 1 convolution with the batch-normalization but without a non-linear function. The expansion path uses the decoder to create segmentation maps of candidate building footprints. Each upsampling layer in the decoder of the expansion section is fused with the same scale as it is in its symmetric downsampling layer.
Both the raw NAIP image and the candidate maps of building footprints were then fed into the cGAN component of the model. The generator of the cGAN follows a basic U-Net structure. The downsampler of the generator has seven 4 × 4 convolutions with the batch normalization, the LeakyReLU activation function, and a stride of 2. The upsampler uses the 4 × 4 deconvolution with the 50% dropout rate, the batch normalization, the ReLU function, and a stride of 2. The loss of the generator is calculated as the combination of the sigmoid cross entropy loss and mean absolute error between the generated image and the real image [60]. The PatchGAN discriminator applies blocks of 4 × 4 convolutions with batch normalization and LeakyReLU activation to generate 30 × 30 patches. It uses the Adam optimizer to minimize the sum of the sigmoid cross entropy losses of the real and the generated images.
The resulting images of building segmentations were further converted into shapefile format for geospatial analysis and applications. Finally, we applied a post-processing algorithm to smooth the output building segmentations, remove noise pixels, and regularize the shape and the geometry [65][66][67]. Specifically, we cleaned small polygons smaller than 4 m 2 and straightened narrow sides of any building outline shorter than 4 m.

Model Evaluation
We evaluated the model performance at the building segment level with testing and evaluation datasets, respectively. Evaluations were performed at the county level and across four types of residential landscape, i.e., urban, interface WUI, intermix WUI, and rural areas. Besides overall pixel level accuracy, three metrics, including precision, recall, and F1 score (also known as dice score), were calculated to assess the segmentation results at the object level, according to the number of objects correctly or falsely predicted by the model, as shown by Equations (1)-(3) [68][69][70]. Precision is used to represent, out of all the detected building footprints, what percentage is truly positive, while the "recall" metrics quantifies, out of all the reference building footprints, what percentage is detected. The F1 metric provides a solid evaluation of model performance, by taking the harmonic mean of both false positives and false negatives.
In addition, intersection of union (IoU), also known as Jaccard index, was calculated to assess the overlapping of the predicted segmentations (Equation (4)). Using our pre-trained model, we also detected the building footprints for another three counties, i.e., Lake, Sonoma, and Orange Counties from NAIP imagery. The 2018 wallto-wall mapping results was compared with 2018 Microsoft building footprints generated from very high-resolution Bing images. We randomly sampled 600 512 m by 512 m sites from the WUIs of these three counties and calculated evaluation metrics. We also evaluated the spatial consistence between our whole county maps and Microsoft data through visualizations of both subset regions and aggregated 300 m building count maps across the time.

WUI Mapping
Based on detected building footprints in Lake, Sonoma, and Orange Counties from our model, we further mapped wildland urban interface. WUI is the area containing at least one housing unit per 0.16 km 2 (40 acres), following the federal register's definition [1]. Based on the vegetation information, it can be further split into intermix WUI, where vegetation coverage is higher than 50%; and interface WUI, where vegetation coverage is lower than 50% but the land is within 2.4 km to a continuously heavily vegetated area that includes 75% wildland vegetation and is larger than 5 km 2 [1,3]. Additionally, if the intermix WUI is within a heavily vegetated area, it is further defined as a heavily vegetated intermix WUI [1].
We built a 30 m binary mask for vegetation and another mask for continuously heavily vegetated areas using National Land Cover Data (NLCD) available in 2011, 2013, 2016, and 2019 [71]. Forests, shrublands, herbaceous plants, and woody wetlands from the NLCD layers are masked as the vegetation. We then applied the moving window approach to quantify the housing density, the vegetation cover, and the distance to remote areas. The 400 m by 400 m moving window size (16 ha, 40 acres) was chosen to calculate housing density. For each 30 m pixel, if a housing unit exists in the 16 ha moving window, the vegetation percentage is then examined within the neighborhood of the pixel. If the fractional vegetation cover is higher than 50%, the pixel is labeled as the intermix WUI. However, if the vegetation cover is lower than 50% but the closest continuously heavily vegetated zone of the pixel is within 2.4 km, the pixel is labeled as the interface WUI. For an intermix WUI pixel, if the vegetation coverage is higher than 75% within the 2.25 km by 2.25 km moving window (5 km 2 ), the pixel is further classified as highly vegetated intermix WUI.
To evaluate the model applicability in county-wide WUI mapping, we generated WUI maps for the Lake, Sonoma, and Orange Counties in 2010, based on the building footprints detected in this study. Results of the 2010 WUI mapping were compared with the existing widely used WUI maps, developed by SILVIS lab [7]. The SILVIS WUI map relied on the housing density data from TIGER at the census block scale, available in 1990, 2000, and 2010 [3]. We also derived another set of WUI maps using the Microsoft building footprints, available only in 2018, for comparison. To examine if our approach can capture the temporal dynamics of WUI areas, we further performed a wall-to-wall mapping of the WUI region in the whole Lake County for the year of 2010, 2012, 2014, 2016, and 2018.

Model Performance
The integrative deep learning model was built with the 1200 NAIP image blocks, using the labeled reference data in three counties. Overall, it performed well in detecting building footprints over California's diverse landscape with various housing density (  (Figure 3h,i). Our approach even identified buildings omitted by the reference data in the rural areas of Calistoga (Figure 3l). Occasionally several adjacent buildings were identified as one large building segment, for example in dense urban areas, and the size of some detected building segmentations were smaller than that of the reference, such as the pixel-level omission at the building edges (Figure 3d,f).
A comparison with the full testing dataset from 670 blocks showed a high overall pixel level accuracy of 97% for building segmentation (Table 1). An F1 score of 0.53 and IoU of 0.52 suggest a reasonable performance on the individual building detection. Similar results were found when evaluated with the additional evaluation dataset with an F1 score of 0.64 and IoU of 0.5. The accuracy of our approach varied slightly with housing density ( Table 2). Relatively more precise and robust results were found over less populated regions such as intermix WUI or rural areas, as shown by F1 scores of 0.67 for the intermix WUI and 0.75 for rural regions vs. 0.62 for interface and urban areas, and higher percentages of predicted building objects being correctly mapped in dense residential areas. However, relatively lower recall values for these two sparse regions, especially for the intermix WUI, indicated potential omission of some buildings when they are highly intermixed with vegetation. In interface WUI, the model captured the individual buildings slightly better than in the intermix WUI and rural areas. We also found larger overlapping between the NAIP-based building segmentation and the reference building footprints in the interface WUI and urban areas (IoU of 0.53) vs. 0.47 for intermix and 0.43 for rural areas. Similarly, better results were found in counties with more dispersed building patterns, i.e., San Luis Obispo and Shasta Counties, than counties with more dense communities such as Napa County, as shown by much higher F1 scores and IoU across the four different types of human settlements. A comparison with the full testing dataset from 670 blocks showed a high overall pixel level accuracy of 97% for building segmentation (Table 1). An F1 score of 0.53 and IoU of 0.52 suggest a reasonable performance on the individual building detection. Similar results were found when evaluated with the additional evaluation dataset with an F1 score of 0.64 and IoU of 0.5. The accuracy of our approach varied slightly with housing density ( Table 2). Relatively more precise and robust results were found over less populated regions such as intermix WUI or rural areas, as shown by F1 scores of 0.67 for the The model also captured the total building count and footprint coverage well along the gradient of different housing densities across multiple counties ( Table 2). The result showed a very good agreement on the percentage of identified building footprint areas, with an error of around 1% in dense settlements compared with the ground truth information. It detected 80% of the building count in the interface areas, accounting for 5.1% of the total land area, similar to 6.15% from the reference data. In the intermix region, the model slightly overestimated the total building count but mapped a similar total percentage of the building area (1.39% vs. 1.47%). The detected building footprint areas in regions with very dense or sparse housing also agreed well with the reference data, accounting for (7.26% vs. 8.83%) and (0.39% vs. 0.40%) of the land areas, respectively. The integrative model developed here significantly improved the accuracy of building footprint detection, compared with Mobile-UNet only and cGAN only models (Table 1). It had a more balanced performance, as shown by the higher F1 score of 0.53 in the testing dataset and 0.64 in the evaluation dataset, compared with 0.41 and 0.48 by Mobile-UNet, as well as 0.31 and 0.35 by cGAN only. Although the Mobile-UNet model itself identified a higher percentage of reference building objects (recall), it caused false detection more likely as indicated by its lower precision (Table 1 and Figure 4). For example, some discrete and noisy pixels falsely detected by Mobile-UNet over intermix WUI or rural areas, probably due to the confusion with bare ground, were removed by applying the cGAN to the Mobile-UNet results and NAIP images at a second step (Figure 4b,c). Incorporation of cGAN also improved separating the adjacent buildings and filled the missing pixels for relatively large buildings in the Mobile-UNet outputs (Figure 4a,b). For example, predictions of Mobile-UNet clumped adjacent buildings together as one large and long object in communities by the Moonstone beach in Cambria and the buildings close to the Nacimiento Lake in Paso Robles (Figure 4a,b). The synthesized model, however, successfully solved this problem by learning the separation of those mixed pixels with building boundaries and residential spacings in input images.
The integrative model developed here significantly improved the accuracy of building footprint detection, compared with Mobile-UNet only and cGAN only models (Table  1). It had a more balanced performance, as shown by the higher F1 score of 0.53 in the testing dataset and 0.64 in the evaluation dataset, compared with 0.41 and 0.48 by Mobile-UNet, as well as 0.31 and 0.35 by cGAN only. Although the Mobile-UNet model itself identified a higher percentage of reference building objects (recall), it caused false detection more likely as indicated by its lower precision (Table 1 and Figure 4). For example, some discrete and noisy pixels falsely detected by Mobile-UNet over intermix WUI or rural areas, probably due to the confusion with bare ground, were removed by applying the cGAN to the Mobile-UNet results and NAIP images at a second step (Figure 4b,c). Incorporation of cGAN also improved separating the adjacent buildings and filled the missing pixels for relatively large buildings in the Mobile-UNet outputs (Figure 4a,b). For example, predictions of Mobile-UNet clumped adjacent buildings together as one large and long object in communities by the Moonstone beach in Cambria and the buildings close to the Nacimiento Lake in Paso Robles (Figure 4a,b). The synthesized model, however, successfully solved this problem by learning the separation of those mixed pixels with building boundaries and residential spacings in input images.

Building Footprint Mapping and Patterns
We applied the trained model to the time series of NAIP imagery and mapped individual building footprints for Lake, Sonoma, and Orange Counties every 2 years from 2010 to 2018. For county-wide visualization purposes, we aggregated the building footprints into building count at 300 m resolution (Figures 5 and 6). The building footprint mapping based on our approach captured the human settlement patterns from more remote to suburban counties well ( Figure 5). For example, in Lake County, areas such as Clearlake city and Lakeport city were well-identified as dense residential clusters and the expansion of the houses to the WUI region was also delineated (Figure 5a). Orange County was mapped as highly urbanized, with few smaller communities such as Silverado scattered in the rural region (Figure 5e). In Sonoma County, on the other hand, most buildings were clustered around Santa Rosa and human settlements spread towards the wildland areas (Figure 5c). Overall, the building patterns from our approach matched the 2018 Microsoft data, as shown by the building density aggregated at 300 m (Figure 5b,d,f). However, missing buildings from our detection may exist in the very dense region. Across the random samples from each county, our detected building footprints showed good consistency with Microsoft data, with F1 scores over 0.6 for all three counties. For Lake and Sonoma counties with sparse housing arrangements, our predictions have high precision scores of 0.79 and 0.83, but relatively low recalls of 0.62 and 0.47 as reference to Microsoft buildings. Conversely, in Orange County with dense interface WUIs or cities, the predictions have a lower precision of 0.54 but a recall of 0.79, possibly constrained by the limited number of urban trainings sampled in the model.
The time series of building footprints derived from NAIP imagery captured the dynamics of building expansion, for example, in Lake County ( Figure 6). The total number of houses increased from 34,566 in 2010 to 45,695 in 2014 (Figure 6a,b). Transitions from rural to human settlements, such as infills, community expansion, and new community development were well-captured by the model. The intermix WUI region showed an increase in both building density, e.g., around the town of Clear Lake, and area expansion, e.g., new residential communities in the southern and northeastern parts of the county. Example subsets were shown at the individual building level (Figure 7). Our approach detected recreational houses built between 2010 and 2012 around Lake Pillsbury, as well as the changes in structures such as the dam across the Eel River and piers along the shoreline of the lake (Figure 7a). New houses were also detected across the whole Spring Valley community and along the Spring Valley Road and Long Valley Road in the southwest of the community, resulting in increased housing density by approximately 25% from 2012 to 2014 in the local community (Figure 7b).
Our approach also identified building reduction caused by wildfire events. We found that the number of mapped buildings decreased by around 20% in 2018 from 2014. A closer examination of Lake County's fire history showed that around 2000 km 2 , approximately 57% of the total area in Lake County, were burned during 2015-2018 (Figure 6d), especially over the eastern and southern parts of the county, covering southern Mendocino National Forests and Cache Creek Wilderness. We found that a total of 6,768 buildings shown in the 2014 building footprint map (Figure 6c) were within the 2015-2018 fire perimeters and 2459 buildings were destroyed. This result was consistent with the DINS building survey which recorded a total of 2982 buildings damaged by 2015-2018 fire events in Lake County.
Although not designed for mapping building damage, the approach from this study captured the building loss from wildfires well (Figure 7c  Our approach also identified building reduction caused by wildfire events. We that the number of mapped buildings decreased by around 20% in 2018 from 2 closer examination of Lake County's fire history showed that around 2000 km 2 , ap mately 57% of the total area in Lake County, were burned during 2015-2018 (Figu especially over the eastern and southern parts of the county, covering southern M cino National Forests and Cache Creek Wilderness. We found that a total of 6,768 ings shown in the 2014 building footprint map (Figure 6c) were within the 2015-20 perimeters and 2459 buildings were destroyed. This result was consistent with th building survey which recorded a total of 2982 buildings damaged by 2015-20 events in Lake County.
Although not designed for mapping building damage, the approach from thi captured the building loss from wildfires well (Figure 7c,d). For example, over the ha burned by the 2015 Valley fire, we found 2078 buildings, out of 3574 buildings pre-fire building footprint map, disappeared in our postfire building footprint ma ure 7c), while 80 out of 165 buildings were destroyed over the 892.8 ha burned b

WUI Mapping-Spatial Patterns and Temporal Dynamics
We generated WUI maps every two years since 2010 in three counties, i.e., Lake, Sonoma, and Orange, using the building footprints derived from NAIP 1 m imagery with our approach and the NLCD vegetation map [72]. Overall, our WUI maps showed similar spatial patterns with the existing 2010 census tract-based SILVIS WUI maps within each county and across counties (Figure 8). For example, in Lake County, both approaches identified major clusters of interface WUI around the boundaries of major cities, such as Lakeport, Kelseyville, and Clearlake, around Lake Clear, transitioning into intermix WUI and highly vegetated regions. In the more urbanized counties (Figure 8d-i), such as Orange and Sonoma, our approach successfully mapped those WUI areas with a low housing density, especially those large census tracts with small housing clusters scattered within vegetated wildlands, which further captured the spatial spanning of the WUI clusters. Orange County has the largest WUI interface areas, followed by Sonoma, and Lake Counties. In contrast, Sonoma and Lake Counties have much larger intermixed WUI areas, similar to what were shown by SILVIS maps. Remote Sens. 2022, 14, x FOR PEER REVIEW 18 of 24 However, our 30 m WUI maps identified larger WUI areas and showed more granularity and smoother transition from urban to WUI areas than SILVIS maps. Overall, the results from this study were similar to patterns derived from the Microsoft building footprints (Figure 8c,f,i). In Lake County, our approach mapped a total WUI area of 468 km 2 , dominated by intermix WUI (375 vs. 94 km 2 of interface WUI), compared with 411 km 2 from the SILVIS WUI map (334 km 2 of intermix WUI vs. 78 km 2 of interface WUI). Our results identified total WUI areas of 1635 km 2 in Sonoma County and 660 km 2 in Orange County, which were 28% higher than, and almost doubled the SILVIS estimates, respectively. Among the WUI areas, both our maps and SILVIS maps showed that the intermix WUI was dominant in Sonoma County, accounting for 74% and 77% of the WUI areas; while Orange County was dominated by interface WUI, contributing to 80% based on our map and 82% in the SILVIS WUI map.
Using bi-annual building density and vegetation maps, the approach developed in this study captured well the temporal dynamics of WUI areas and types. For example, the time series of derived WUI maps in Lake County showed the changes in WUI regions every 2 years from 2010 to 2018 (Figure 9), associated with urban sprawl and wildfire disasters. We found that the combined area of interface and intermix WUI fluctuated from year to year. In the first half of the 2010s, WUI areas expanded steadily, reaching 210 km 2 in 2012 and 215.6 km 2 in 2014. The majority of expansions was found in the regions transitioning from wildlands to intermix WUI regions, with additional housing development in some tracts of highly vegetated intermix regions further away from populated towns (Figure 9f,g). After the 2015 extreme fire events, the total WUI areas decreased to 199.4 km 2 in 2016, but then increased to 215.5 km 2 in 2018 following the community rebuilding [73][74][75]. Our approach also detected that a continuous highly vegetated intermix region has evolved into the intermix WUI in the southwest of the county in 2018.
(c,f,i). Results are shown over Lake County (top), Sonoma County (middle), and Orange County (bottom panel).
However, our 30 m WUI maps identified larger WUI areas and showed more granularity and smoother transition from urban to WUI areas than SILVIS maps. Overall, the results from this study were similar to patterns derived from the Microsoft building footprints (Figure 8c,f,i). In Lake County, our approach mapped a total WUI area of 468 km 2 , dominated by intermix WUI (375 vs. 94 km 2 of interface WUI), compared with 411 km 2 from the SILVIS WUI map (334 km 2 of intermix WUI vs. 78 km 2 of interface WUI). Our results identified total WUI areas of 1635 km 2 in Sonoma County and 660 km 2 in Orange County, which were 28% higher than, and almost doubled the SILVIS estimates, respectively. Among the WUI areas, both our maps and SILVIS maps showed that the intermix WUI was dominant in Sonoma County, accounting for 74% and 77% of the WUI areas; while Orange County was dominated by interface WUI, contributing to 80% based on our map and 82% in the SILVIS WUI map.
Using bi-annual building density and vegetation maps, the approach developed in this study captured well the temporal dynamics of WUI areas and types. For example, the time series of derived WUI maps in Lake County showed the changes in WUI regions every 2 years from 2010 to 2018 (Figure 9), associated with urban sprawl and wildfire disasters. We found that the combined area of interface and intermix WUI fluctuated from year to year. In the first half of the 2010s, WUI areas expanded steadily, reaching 210 km 2 in 2012 and 215.6 km 2 in 2014. The majority of expansions was found in the regions transitioning from wildlands to intermix WUI regions, with additional housing development in some tracts of highly vegetated intermix regions further away from populated towns (Figure 9f,g). After the 2015 extreme fire events, the total WUI areas decreased to 199.4 km 2 in 2016, but then increased to 215.5 km 2 in 2018 following the community rebuilding [73][74][75]. Our approach also detected that a continuous highly vegetated intermix region has evolved into the intermix WUI in the southwest of the county in 2018.

Discussion
Our study demonstrated the possibility of an efficient approach for building surveys from high resolution images and improved the temporal and spatial accuracy of WUI mapping. Further improvements are needed for operational and broader applications. First, the building detection in this study was limited by the 1 m NAIP imagery in order to take advantage of the historical archives for bi-annual mapping. An improvement of the model can be based on the 0.6 m NAIP images in California after 2016 to better resolve the mixed pixel issues at the edge of the buildings. Additionally, some uncertainties in our building detection may be caused by some inconsistency in NAIP image acquisitions, such as varying viewing angles, sunlight conditions, and imaging days across different images. Although denoising and equalization during preprocessing can help harmonize differences in ground reflection, calibration of the input images across the space and time can further improve the accuracy and generalization of the model. Whenever possible, other well-calibrated high-resolution imagery can also be used as additional sources for improved local scale mapping. Secondly, improved accuracy of ground truth building footprints data are also needed, especially in intermix and rural areas.
Moreover, the building detection model in this study was trained mostly on images within the WUI regions due to the WUI focus of our study. Although when applied into a large region, the model can successfully capture the spatial extent of housing development areas, regions such as urban or dense residential areas might be less represented. Lastly, in this study, we used Mobile-UNet as the backbone of the model architecture considering its efficiency in applications. A previous study on WUI building detection showed that the UNet-based structure has promising performance; however, a more sophisticated feature extractor, such as ResNet or VGG models, can better optimize the model performance and improve detection accuracy through considerably increased network depth [76,77].
The improved performance of the combined network structure by stacking UNet or GAN was consistent with previous studies for image harmonization and noise cleaning for products derived from medical or remote sensing images [78][79][80][81]. Through image-to-image translation, GAN can serve as a post-processing process to reinforce spatial contiguity, remove artifacts or undesired objects, and boost and harmonize the quality of predictions from relatively low-resolution or compressed inputs [80,81]. Only a limited number of studies have focused on building detection in WUI at 1 m resolution. Caggiano et al. detected building footprints within sparse WUIs using object-based approaches in four counties of Colorado from NAIP images in 2014 [82,83]. Their approach achieved an overall accuracy fluctuated from 50 to 95%, a precision of 0.66, and recall of 0.51 [82]. The other study in WUI achieved a high F1 score of around 0.8, but was based on 0.5 m fused commercial SuperView-1 satellite data, which has twice as high resolution as NAIP images [45].
Most previous research of building detection have focused on urban regions, which had quite different landscapes and housing patterns from wildland urban interfaces. Although trained for the WUI areas, our model had a decent performance, F1 of 0.61, and IoU of 0.53, over the urban regions. Compared with urban building segmentation models, our model had very competitive recall scores, but slightly lower precision scores, possibly due to the much fewer urban samples in our training data [37]. For instance, Locally Constrained You-Only-Look-Once (YOLO) framework for object detection was developed for NAIP images with F1 scores varying from 0.73 to 0.8 across testing cities in Minnesota [36]. For similar studies using semantic building segmentation methods on NAIP images within urban regions, deep learning models such as segNet, CRFasRNN, or FCN were constructed for dense residential areas in the U.S. and achieved the overall accuracy ranging from 0.62 to 0.71 and IoU ranging from 0.45 to 0.58 [37,84]. As shown in these studies, the model built on dense urban regions performed relatively worse with high false positives when applied to sparse landscapes such as desert, mountainous areas, or agricultural lands, and requires further modification and retraining of the model [37,84].
In terms of WUI mapping, our approach improved upon previous methods and is able to delineate the natural transition from dense urban regions to WUIs and rural human settlements. Currently, the most widely used SILVIS WUI dataset relied on housing densities from census tract regions, which captures spatial heterogeneity at relatively coarse scales [7]. Although there were several other recent studies exploring the possibilities of using building locations or individual building information for WUI mapping, these results were only available for 1 static year while our bi-annual WUI maps provide more frequent updates with free NAIP imagery [31,85].

Conclusions
We developed and evaluated a deep learning framework to detect individual building footprints over the transitional areas from urban to wildland in this study. By taking advantage of the publicly available NAIP aerial imagery at meter scale, our framework provides an efficient approach to provide high resolution building footprint maps every other year. Our analysis in California showed that the combination of Mobile-UNet and generative adversarial network had a more balanced detection performance. When examined at a large scale over three counties, the total building area agreed well with that derived from the reference building area. Bi-annual footprint maps of Lake, Sonoma, and Orange Counties showed the capability of the integrated approach to capture the spatial patterns and dynamics associated with urban expansion and wildfire damages. We further applied a moving window-based workflow for WUI mapping using the derived fine scale building footprints. The resulting WUI maps showed finer granularity than those from census tract-based housing density, and are expected to contribute to community development planning, wildfire risk assessment, and adaptive strategies on climate adaptation and disaster response.