Fusion of High Resolution Aerial Multispectral and LiDAR Data : Land Cover in the Context of Urban Mosquito Habitat

Remotely sensed multi-spectral and -spatial data facilitates the study of mosquito-borne disease vectors and their response to land use and cover composition in the urban environment. In this study we assess the feasibility of integrating remotely sensed multispectral reflectance data and LiDAR (Light Detection and Ranging)-derived height information to improve land use and land cover classification. Classification and Regression Tree (CART) analyses were used to compare and contrast the enhancements and accuracy of the multi-sensor urban land cover classifications. Eight urban land-cover classes were developed for the city of Tucson, Arizona, USA. These land cover classes focus on pervious and impervious surfaces and microclimate landscape attributes that impact mosquito habitat such as water ponds, residential structures, irrigated lawns, shrubs and trees, shade, and humidity. Results show that synergistic use of LiDAR, multispectral and the Normalized Difference Vegetation Index data produced the most accurate urban land cover classification with a Kappa value of 0.88. Fusion of multi-sensor data leads to a better land cover product that is suitable for a variety of urban applications such as exploring the relationship between neighborhood composition and adult mosquito abundance data to inform public health issues. OPEN ACCESS Remote Sensing 2011, 3 2365


Introduction
Mosquitoes are a primary vector of a variety of animal and human diseases including malaria, West Nile Virus (WNV), encephalitis, and dengue fever in humans, and heartworm in dogs [1].Various environmental and human controlled factors determine whether a mosquito can survive in an area.The majority of mosquitoes require temperatures above freezing, a source of water to procreate, and a food source, pollen or blood, to occupy a region.Land use and land cover (LULC) along with topography represent important environmental variables that determine suitable mosquito habitat and breeding grounds which can range from wetlands [2] to urban areas [3].
Through the use of remote sensing technologies, various mosquito habitat-related LULC types can be delineated.For nearly 40 years, epidemiologists have advocated the use of remote sensing image data to advance the study of factors influencing people's health [4].Remote sensing technology has been used to study the spatial relationship of malaria and risk factors [5], other diseases, and disease vectors [6].However, the ever expanding science of remote sensing has never been used to its full potential to aid the field of epidemiology [7].According to Herbreteau et al. [7], six mosquito vector studies with the primary purpose of LULC classifications have used remote sensing technologies, but none of these have utilized high nominal spatial resolution (~1 m) images for classification purposes.
This study produced a fine spatial resolution LULC map using the latest remote sensing technology.The LULC map will ultimately be used to determine areas suitable for mosquito breeding in the study area.This paper focuses on the fusion of high spatial resolution multispectral reflectance imagery and Light Detection and Ranging (LiDAR) data to produce a more accurate urban land cover product than traditional single sensor classifications.We also posed that the Normalized Difference Vegetation Index (NDVI) data derived from the multispectral imagery would aid in the extraction of vegetation from human-made materials, while elevation and attribute height data extracted from the LiDAR data would help to discriminate attributes such as buildings, roads, and the often dry streams and waterways.
More accurate LULC classification improves the ability to model mosquito habitat locations, the spread of mosquitoes, and the probability of mosquito presence within the study area.The best classification output is going to be used in the Dynamic Mosquito Simulation Model (DyMSiM) in order to simulate the growth and structure of mosquito populations [8].DyMSiM is driven by climate data and can be parameterized for different LULC cover classes and water sources.The combination of temperature, precipitation, and LULC data will produce a spatially explicit and highly accurate prediction of mosquito population.For example, at a fine nominal spatial scale, DyMSiM could delineate level roadways as unsuitable breeding ground for mosquito even if the climate data indicated the area was favorable habitat.Even though we cannot identify all suitable habitats such as potholes or buckets, the ability to accurately model the spatial component of mosquito populations at a one meter resolution will permit the DyMSiM model to be run at sufficient spatial detail to enable city managers to better minimize mosquito impacts on inhabitants.
The goal of this study is to produce the most accurate urban LULC classification at one meter nominal spatial resolution in the context of mosquito habitat.The ability to delineate trees from herbaceous cover helps in identifying the microclimate variables required for mosquitoes to survive and thrive in the harsh desert temperatures and dry environments.The classification of man-made ponds and areas where water can pond naturally within the urban environment are necessary in determining where mosquitoes lay their larva.Distinguishing between buildings and roadways helps in determining areas where human influence has created mosquito habitat (e.g., irrigated areas) in the mixed urban desert environment.

Study Area
The study area covers 76 km 2 in the urban center of Tucson, Arizona, USA at 32°14′N, 110°57′W (Figure 1).The study area was chosen to coincide with seasonal meteorological and mosquito trapping observation sites.The area has an annual average precipitation of approximately 300 mm [9] and is characterized by a bimodal precipitation pattern, with the majority of rainfall occurring during the summer monsoon season (July, August, September) and the winter season (December, January, February, March).Winter nighttime temperatures rarely dip below 0 °C, while daytime temperatures average about 20 °C.Summer maximum temperatures often exceed 37 °C [9].
A variety of vegetation types are evident within Tucson.Native desert trees, shrub, cacti and grasses are abundant and in some areas threatened by introduced plants (e.g., Buffelgrass (Pennisteum ciliare); tamarisk (Tamarix spp.))The study area includes a college campus, two golf courses, numerous recreational parks, residential neighborhoods, swimming pools, and schools.Over the past few years, mosquitoes in the Southwest have become an increasing nuisance and more importantly, a public health concern due to outbreaks of WNV [8].The variety of LULC classes encompassed by the study area will be used to determine how these land cover classes impact mosquito habitat and populations and will be reported on in a separate manuscript.

Research Design
The diagram below provides an overall view of the mosquito habitat mapping and modeling framework and includes the multi-sensor data processing and analysis steps needed to create base maps for a better understanding of urban mosquito habitats and abundance (Figure 2).The LiDAR data and derived products (Intensity, Digital Terrain Model, and Canopy Height Model) provide 3-D information about urban habitat attributes like vegetation and structures.The LiDAR data was used in combination with multispectral and NDVI data to derive LULC classes and attributes that can discriminate mosquito habitat during their life cycle.We applied CART to derive LULC classes that were then used to create base maps (e.g., pervious and impervious surfaces, and Vegetation-Impervious Surface-Soil (V-I-S) composition [10]).Using a LULC accuracy assessment, we evaluated which combination of the LiDAR and multispectral data layers provided the most accurate LULC results.The next phase (dashed lines in Figure 2) of this research, not included in this paper, will integrate the LULC maps with ancillary geographic information, field-based temperature and humidity data, and mosquito counts to create habitat suitability maps, and model urban mosquito abundance and response to the environmental conditions using DyMSiM [8].

Multispectral Images and LiDAR Data
The multispectral data used for the classifications were collected by the National Agriculture Imagery Program (NAIP) between 23 June and 25 June 2007.Acquisition of NAIP imagery is handled by the United States Department of Agriculture (USDA) for various public and private uses.The data is available as digital ortho quarter quad (DOQQ) tiles that cover 47 km 2 [11].Four DOQQs were mosaiced and clipped to cover the study area for use in the classification.The multispectral data have a nominal spatial resolution of one meter and were collected in four spectral channels (blue, green, red, and near-infrared portions of the electromagnetic spectrum).
NDVI data was used to delineate between vegetation, soils, water, and human-made materials that represent the urban landscape.The red and near-infrared (NIR) reflectance data were used to create a NDVI image.NDVI [12] was calculated for each pixel using the following equation: Airborne multi-path discrete first and last return LiDAR data were collected for the Pima Association of Governments (PAG) by the Sanborn Map Company.Discrete-return LiDAR collects both elevation and intensity data for each object the laser hits until the laser hits the ground [13].For example, a laser can hit multiple parts of a tree canopy before finally contacting the ground meaning that the sensor will collect multiple pieces of elevation data from one pulse at one location [14,15].LiDAR data were acquired through multiple flight paths during February 2008 [16].
The data provided by the LiDAR system can be used to produce 3-D images (Figure 3), along with bare earth surface models, and canopy height models (CHM) [17].Forty one-mile by one-mile tiles of LiDAR data covers the research area.The LiDAR data point sample density was one point per meter or better.Ground sampling distance for the LiDAR data was generally one meter with a horizontal accuracy of one meter and a vertical accuracy of 37 cm.
Bare earth surface data were processed and provided with the original LiDAR point cloud data.The bare earth surface was created using an algorithm designed to locate and designate the last or lowest pulse return at each Cartesian (x,y) coordinate pair as the bare earth surface point for that location.Lee and Shan [18] used elevation data derived from LiDAR in combination with other image data to classify a coastal study area; in their study inclusion of the bare earth surface data and height attributes helped to distinguish between spectrally similar water and marshland.Fusion of LiDAR bare earth data and multispectral image data has also been used to classify various vegetation types within rangelands [19].Bare earth data were converted with a spatial resolution of six feet (feet is the unit used in the original data).The original LiDAR data were collected at a spatial resolution of one meter, but were processed at six feet based on testing to minimize data gaps and ensure a smooth surface [15].The six foot data were subsequently resampled to one meter to match and conserve the nominal spatial resolution of the multispectral NAIP image data.The bare earth surface and original point cloud data were used in combination to produce a Canopy Height Model (CHM) that provides the gridded heights of objects above the ground.To produce the CHM, the bare ground value is subtracted from the highest pulse return for each (x,y)-location.This means that elevation does not matter because the ground is treated as a reference with a height of zero over the entire study area.Therefore, in the CHM image, trees or buildings with the same height at different elevations were represented with the same pixel value.Inclusion of the CHM as a variable in our classification helped distinguish between buildings and roadways which shared similar spectral signatures, but varied in elevation.Other studies have fused multispectral and LiDAR CHM data to increase the separation of roofs, roads, and buildings that are otherwise confused due to similar spectral signatures [18,20,21].This study expanded on these methods by classifying buildings in addition to seven other classes to inform the habitat model.The CHM for this study was also created with a spatial resolution of six feet and subsequently resampled to one meter to allow for integration with the one meter multispectral data.
LiDAR intensity data acquired along with the LiDAR height data were also explored as a variable for the LULC classification.The pulse return intensity was calculated by examining the amount of energy measured by the sensor upon its return.Intensity data was synthesized from the LiDAR data with a spatial resolution of six feet to avoid gaps in the data and was resampled to one meter for integration with the other data sets.Integration of LiDAR intensity data has generally been neglected as an additional input for classification purposes.A few studies have used intensity data with promising results for separating certain materials and distinguishing between LULC types [22,23].Charaniya et al. [22] performed a fusion of LiDAR intensity data, a digital elevation model (DEM), and black and white photography to delineate asphalt, grass, roofs, and trees.This research built upon that study by integrating the various components of LiDAR data along with multispectral image data to classify a heterogeneous urban environment.

Urban Land Cover Classes
Eight LULC classes were defined to produce the classification product pertinent to mosquito habitats.The eight classes were: herbaceous, trees/shrubs, bare ground, pools, water, structures, pavement, and shadow.Although the LiDAR data would eliminate the need for a shadow class, the shadow class is required because the multispectral bands observed shaded areas, especially in images not acquired when the sun was at its highest point.The NAIP images west of the highway were collected at non-optimal sun-angles, resulting in more shaded areas being observed in these areas.The herbaceous class was composed of both non-irrigated and irrigated vegetation, the latter mainly represented by fields of grass.The trees/shrubs class consisted of vegetation with an above-ground level height such as bushes, desert shrubs, palm trees, and other desert trees.Identifying trees was crucial in identifying mosquito habitats as trees provide the shade necessary for mosquitoes to breed in the extreme heat.Without the shade provided by trees and shrubs mosquito larvae would die.The bare ground class was defined as surfaces not covered by vegetation or human-made construction (i.e., parking lots) and was primarily comprised of washes and open lots.Chlorinated water sources in residential and park areas were defined as part of the pool class.The water class was non-chlorinated water sources such as ponds and lakes (natural or constructed) that could contain wildlife.Distinguishing pools that could contain chlorinated water from ponds and lakes that did not contain chlorinated water was used to determine both suitable and unsuitable areas for mosquito breeding.The middle of both water body types were deemed unsuitable for mosquito breeding as larva keep to the shallow waters.The structures class was defined as human-made structures that have a height above the ground including, but not limited to, houses, apartments, office buildings, parking garages, and covered parking.Most human-made structures were impermeable which could allow for water to pool.However, at the spatial scale of this study it was impossible to recognize pooling directly.Additionally, the majority of the tops of such structures reach an unsuitable temperature for mosquito procreation.The other human-made class, pavement, is composed of roadways, parking lots, and sidewalks.Pavement was considered unsuitable for mosquito habitat at the one meter scale due to traffic and climate.

Classification and Regression Tree (CART) Analyses
Research has demonstrated that the classification and regression tree (CART) technique and software is able to handle a larger amount of data in a smaller amount of time than a traditional maximum likelihood classifier (MLC) [24,25].This is ideal for our study as the high resolution datasets tend to be large and the CART software allows for the stacking of multiple datasets without impacting processing efficiency.Studies comparing classifiers have demonstrated that the CART classifier produces as accurate, if not more accurate, results compared to other common classification algorithms [24,26,27].
Studies have used different combinations of multispectral and LiDAR data to produce a LULC classification.However, one study used an object-oriented non-parametric classifier and the other used a parametric classifier [22,28].This study used the CART classifier employing a non-parametric algorithm that can handle ratio and nominal data as input.
The CART software [29] worked in tandem with an ERDAS Imagine extension that was developed by the Multi-Resolution Land Characteristics Consortium (MRLC; http://www.mrlc.gov/).Similar processing and classification procedures used for the 30 m Landsat-based National Land Cover Dataset (NLCD) [30] were also applied to the 1m multispectral and LiDAR data acquired for the study area.Sampling to obtain the CART training data representative for each of the eight land cover classes, including different roof and road colors, came from selected combinations of multispectral image data, NDVI data, and LiDAR data.Application of CART provides the classification and tree results, information on the data layers that were most often used in the classification, and information about data layers that were not used.The CART results were then used within the ERDAS extension to produce a classified raster product.The raster was reclassed to represent the eight classes essential to urban mosquito habitat study.Once the raster was reclassed, the accuracy assessment was performed to produce the error matrix for the raster product.
To determine whether the fusion of aerial multispectral imagery data with different LiDAR-derived data could improve the CART LULC classification an iterative test was performed by adding the aforementioned datasets layer by layer and performing a classification for each combination.All classifications were performed with the same training data and assessed for accuracy with the same validation points.

Classification Training Data
Sampling for the eight classes was done using the high resolution multispectral image data through expert knowledge.The image data was loaded into ArcGIS and a polygon vector file was created to extract sampling data.Polygons were drawn around groups of pixels representative of each of 14 original classes, which would be consolidated to eight classes for the LULC map, to produce a representative sample of attributes and spectral signatures for each of the classes.Some classes, like the structure class, were composed of separately sampled classes to account for the variability in spectral and height attributes among roof colors.The structure class was comprised of a sampling of white roofs, black roofs, grey roofs, and red roofs that would be combined after the classification into the singular class of structure.The pavement class was comprised of a sampling of grey, black, and red pavements, while the herbaceous class was comprised of irrigated and non-irrigated vegetation.Some classes were sampled more frequently, such as structures and the herbaceous classes, as they represent a larger portion of the study area.Once a sufficient number of polygons were created for each class, the polygon vectors were converted to a raster to enable the CART classification.

Classification Accuracy Assessment
A global positioning system (GPS) receiver was used to locate selected accuracy points throughout the entire study area.Accuracy points were not selected within the classification's training site pixels.25 accuracy points were chosen for each of the eight classes, except for the water class due to an inadequate number of water bodies in the study area for both sampling and assessment.Points were selected in the middle of objects to avoid confusion that could be created by mixed pixels possibly representing multiple classes.Some of the points were collected at locations where mosquito traps were set.
The ERDAS Imagine accuracy assessment tool was used to examine what the accuracy points actually represent and compared to the CART classification of those same points.Data from the accuracy assessment were used in an error matrix which provides producer's and user's accuracies along with kappa values for each of the eight classes.Examination of the error matrix allowed us to determine whether certain classes were being confused and if the addition of data layers eliminated any confusion.An overall accuracy and kappa were calculated for the entire classification product [31,32].

Overview of the LULC Classifications and Iterations
Various combinations of the multispectral and LiDAR-derived image data were used to determine which combination of input variables provided the best LULC results to inform mosquito habitat.Table 1 provides an overview of the classification results based on the different data input combinations.Overall accuracies ranged from 71.1% to 89.2% and kappa values ranged from 0.67 to 0.88.Different issues were encountered throughout the process as data were added and subtracted from the input dataset.The four spectral bands of NAIP image data were used for the first LULC classification.The overall accuracy of the classification output was measured at 82.5% with a kappa of 0.80.There was some confusion between the two vegetation classes, herbaceous and trees/shrubs.The different species of vegetation in the study area all provide similar training data within the four spectral bands of data used for the first classification.Visually, trees in the area could be distinguished from fields of grass in the near-infrared portion of the spectrum due to the greenness of the leaves compared to some of the areas with patches of unhealthy grass.
Addition of an NDVI image layer led to a more accurate LULC classification (Figure 4).Discussion of the classification performed using the multispectral and NDVI data can be found in Section 3.2.
Addition of a LiDAR-derived CHM to the multispectral and NDVI dataset led to another improvement in accuracy and resulted in the highest overall accuracy and kappa value (Figure 5).Discussion of the classification performed with the inclusion of the CHM data can be found in Section 3.3.
The fourth classification (image not shown) integrated the bare earth surface DEM, CHM, NDVI data, and multispectral NAIP data.Inclusion of the LiDAR-derived DEM data led to an 8.2% decline in the overall accuracy of the classified output, from 89.2% to 81.0%.The kappa value decreased from 0.88 to 0.78, with decreases in the producer's accuracies for bare ground, water, and shadow classes.Inclusion of the DEM seemed to make the classification output less effective for areas where elevation changes were visibly evident, near mountains for example.
The fifth classification (image not shown) was performed with the NDVI, multispectral NAIP, and LiDAR-derived DEM data.This classification excluded CHM to examine the importance of the bare earth variable in LULC classification.Without the CHM data the overall accuracy of this combination of data was a low 71.1% with a kappa statistic of 0.67.The DTM data overpowered this classification iteration as elevation gradients were visible yet again; which possibly led to the large number of misclassified pixels.This study cannot truly be used as a measure of how useful the LiDAR-derived DEM data could be for a LULC classification due to minimal elevation changes in the study area.However, the DEM data could be useful for determining washes and streams based on their lower elevations compared to other classes.
The final classification conducted explored the addition of LiDAR intensity to the dataset.The issues that arose with the inclusion of intensity data are discussed in Section 3.4.

Multispectral and NDVI Data-Classification
Vegetation types were further delineated by combining a NDVI image and four spectral bands of NAIP data (Figure 4).Addition of the NDVI data increased the overall accuracy by 1.6% to 84.0% when compared to the first classification, with a kappa of 0.82 (Table 2).The majority of inaccuracies still occurred in the shadow class, but the producer's accuracies of all the classes stayed the same or increased.Shadow pixels continued to be misclassified as water pixels, but all water pixels were accurately classified.Addition of the NDVI image did not reduce the confusion between the herbaceous and trees/shrubs classes since the additional data distinguishes vegetation health but not vegetation heights.Both the multispectral and multispectral with NDVI classifications exhibited confusion between the bare ground class and the herbaceous class; this could be due to the presence of weeds based on the time of year the NAIP image data was acquired.However, the multispectral with NDVI classification did improve the accuracy of some of the other LULC types being defined.The water, pavement, and shadow classes demonstrated improved producer's accuracies.Even though the producer's accuracy of the shadow and water classes improved there was a lot of confusion between the two classes.Nine of the pixels that should have been classified as shadow were defined as water pixels by the classifier.Examining the visual appearance of the shadow and water classes, the human eye is unable to distinguish between the two classes when taken out of context.Comparison of the spectral signatures of both classes also demonstrated the difficulty of delineating between the two classes based solely on spectral information.Both class types were devoid of vegetation leading to no improvement in discrimination with the addition of the NDVI band.

LiDAR-CHM, Multispectral, and NDVI Data-Classification
The third classification integrated LiDAR-derived CHM, NDVI data, and multispectral NAIP data (Figure 5) producing the best result with an overall accuracy measured at 89.2% (Table 3), a 5.2% increase over the second classification product.An immediate improvement in the quality of the classification could be assessed based on a visual inspection.The error matrix confirmed the improved results.The classification had a kappa value of 0.88, markedly higher than the kappa value, 0.82, of the second classification using NDVI and NAIP multispectral information.Inclusion of the CHM data eliminated all confusion between the herbaceous and trees/shrubs classes with both classes achieving very high producer's accuracies.Inclusion of the height data also improved the producer's accuracies of the bare ground, pavement, and shadow classes, but decreased the accuracy of the water class.One of the pavement pixels was misclassified as a structure pixel; further examination of the accuracy points revealed that a structure was in close proximity to the pavement leading to the error.The accuracy point selected to represent the pavement at that location was probably an ill-advised choice.Addition of the CHM as a classification variable was performed to examine if the data could aid in the separation of the LULC types versus a classification executed using only multispectral and NDVI information.Examination of the CHM model results showed that some classes had constant heights, while others had varying heights for similar objects.Buildings and trees were the objects with the greatest heights compared to all other classes, but each house or tree had its own unique height value.The combination of the three datasets eliminated any confusion between the two vegetation classes.The herbaceous class always had a lower height than any of the trees or shrubs.The inclusion of the CHM provided the classifier with the information required to distinguish the two vegetation classes.Inclusion of the CHM data increased the confusion between shadow and structure pixels, but decreased misclassification of shadow pixels as water pixels.The confusion between the shadow and structures classes was most likely caused by the sampling procedure and the fact that sampling was primarily based on the multispectral image data.Although there are no shadows in the LiDAR data, what may appear as a shadow in the NAIP imagery could be part of a building in the CHM.Training of the shadow class may have inadvertently included the sampling of height due to part of the shadow falling on the side of a building.A test was performed in which shadows were not sampled, however this lead to all the shadow pixels being classified as water.The shadow class in this study was small since the multispectral image data contained few shadowed areas due to the high sun angle when the imagery was acquired.A small part of the study area west of the highway where multispectral images were acquired earlier in the morning resulted in significant shadowing on the west side of the structures.

LiDAR Intensity, CHM, Multispectral, and NDVI Data-Classification
The intensity return data was also examined as a potential input to improve the quality and accuracy of the final classification product.Since the DTM did not improve the accuracy of the classification result the next dimension of the LiDAR data worth exploring was the return intensity data.A classification was executed with the multispectral NAIP, NDVI, CHM, and intensity data to determine whether inclusion of the intensity data could produce a more accurate output.Removal of the DTM data and inclusion of the intensity data led to a good classification with an overall accuracy of 83.5% with a kappa value of 0.81 (Table 4).However, the classification product was less accurate than the multispectral NAIP, NDVI, and CHM classification.Inclusion of the intensity data led to some confusion between the bare ground and herbaceous classes and difficulty in accurately identifying the shadow class.This iteration of the classification process handled the pavement class best of all the classifications, achieving a producer's accuracy of 88% with a kappa of 0.86.The streets are easily distinguished from other LULC types when examining the intensity layer.Issues that arose due to the quality of the intensity data will be discussed in the next section of this paper.
Examination of the intensity layer on its own provided some useful insights regarding the collection of LiDAR data.Many of the flight lines are visible on the intensity image due to the existence of certain bright objects in the area that are not in others.When a pulse from the LiDAR system hits a bright object on the ground a lower gain is applied to all the return signals, leading to a change in the intensity values with respect to areas that do not have bright targets (Figure 6).These changes in gains were not registered and applied to the intensity data, resulting in uncalibrated intensity data that varies per flight line.These gains would have to be incorporated in the intensity data processing stream to create calibrated, spatially, and temporally consistent LiDAR intensity data points.The inconsistent intensity data was not an ideal input for the classification performed in this research, but it does demonstrate that the data could be useful based on the accuracies achieved with the addition of the intensity data.As with the DTM data, the usefulness of the intensity data for this study area cannot be truly measured because of the inconsistent intensity values that correspond to bright objects such as radio antennas or shiny materials on roof tops.Collection of LiDAR intensity data is still being improved as the technology continues to evolve and sensors are designed that send and receive pulse data, register, and adjust for the variable gain settings.Several studies have attempted to normalize intensity data and achieved high classification accuracies with normalized data, but the process is time consuming and does not always work over a large area [33,34].The inability to use intensity data from various flight lines due to inadequate calibration during data acquisition collection made the inclusion of intensity information less beneficial in our study.

Significance of Including LiDAR Data
Results of the multispectral NAIP and NDVI classification data were compared to the results of the multispectral NAIP, NDVI, and CHM classification using McNemar's test [35].McNemar's test allows separate classifications that were assessed using the same set of accuracy points to be compared without bias.McNemar's test is a non-parametric test based on a binary distinction between correct and incorrect class allocations that uses a 2 by 2 matrix to calculate a chi squared value [35,36].
McNemar's test of the multispectral NAIP, NDVI, and CHM classification compared to the multispectral NAIP and NDVI classification shows that the classifications are statistically different at the p = 0.01 level.The results of McNemar's test demonstrate that the inclusion of LiDAR-derived CHM in this series of classifications lead to a more accurate and different land cover product.

Multi-Attribute Vegetation-Impervious Surface-Soil (VIS) Map
The LULC classification based on the multispectral NAIP, NDVI, and CHM can be used to create a multi-attribute Vegetation-Impervious Surface-Soil (V-I-S) Model [10] (Figure 7).The V-I-S model assumes that urban environment and land cover is a linear combination of the vegetation, impervious surface, and soil surface areas.The V-I-S model has been shown to effectively deal with the mixed-pixel problem of urban land cover classifications and has been used to predict runoff rates and patterns for storm water drainage [37].Similarly, the application of the V-I-S model can assist in identifying mosquito habitat.Shadow and water classifications were removed from the V-I-S reclassification.Mosquito habitat will occur in areas with high vegetation, moderate pervious surface presence, and low impervious surface presence.These variables can result in cooler microclimate temperatures and increased humidity to facilitate adult mosquito habitat.The impervious and pervious surfaces also can cause ponding and create so-called water containers that are necessary for mosquito larvae to survive in the southwestern climate in Tucson.

Conclusions and Future Work
The most accurate classification map produced in order to model mosquito habitat in our study area was achieved with only six data layers.The combination of multispectral NAIP, NDVI, and CHM increased the overall accuracy of the urban LULC classification by 5.2%.The increased ability to distinguish between trees and ground vegetation will aid in determining shaded areas that mosquitoes would likely use as habitat.Separation of buildings and streets by the classifier will allow us to determine where mosquitoes are less likely to find suitable habitat.The benefits of combining multispectral NAIP and LiDAR-derived products to produce highly accurate LULC classifications will reach beyond mosquito habitat research, informing issues such as urban planning [38], urban heat island, and habitat fragmentation.
CART was used to perform various iterations of the classification process to determine the best combination of data to produce the most accurate land cover/use map.Further examination of the fusion of high resolution multispectral image data and LiDAR data lead to more accurate classification products that can be utilized for various analysis purposes and take advantage of the three dimensional DTM and CHM data products.These data products could potentially be used to identify flat roofs and houses with cooling systems, which could allow for water to pond and provide a habitat for breeding mosquitoes.
As LiDAR technologies continue to improve, the fusion of multispectral image data and components of LiDAR data will allow for accurate classification maps.Once the return intensity data for an area can be easily and properly normalized, the data are expected to aid in the improvement of classification products.The LiDAR flight lines and quality of the LiDAR data acquisition are important considerations for providing a spatially consistent data set.When certain areas have higher point densities than other areas, the classification accuracy will likely be unevenly distributed.The LiDAR data used in this study was acquired differently based on topography and other landscape features; certain areas had a pulse density of 8 pulses per square meter, while other areas were only covered by 4 pulses per m 2 .Ideally, future data will be collected with equal coverage throughout the area of acquisition.
The classifications from this research will be input with other variables, such as temperature and precipitation data, into DyMSIM [8] to model the population of mosquitoes throughout the study area.The goal will be to model mosquito populations in urban regions to better understand seasonal mosquito abundance and habitat and provide tools to better monitor and mitigate the spread of WNV and other mosquito borne illnesses as well as to compare the success of different mitigation approaches.Socio-economic and political aspects in response to increasing mosquito populations could be examined in this context.
The ability to produce highly accurate LULC maps can be used to perform various forms of analysis including change analysis.The ability to accurately model mosquito populations will assist in identifying areas of high risk and focused application of mitigation practices to decrease the incidents of mosquito borne illnesses by altering the environments deemed mosquito habitats.The LULC classifications will also be used in conjunction with data from the mosquito traps to determine LULC impacts on mosquito population in Tucson, Arizona.GIS based integration of mosquito abundance and mosquito habitat variables such as distance from the edge of a lake or pond, culverts, areas where water can pond, and water drainage channels, in combination with land cover class distributions will provide an improved understanding of urban mosquito habitat.Together, these data sets will create mosquito habitat suitability models to identify the areas and microclimate conditions most likely to act as breeding locations.

Figure 1 .
Figure 1.Near-infrared, red, and green false color composite of a mosaic of the multispectral NAIP (National Agricultural Imaging Program) images of the urban study area in Tucson (AZ, USA) that were acquired 23-25 June 2007.The majority of the red colored areas are irrigated golf courses, parks, sports fields, gardens and trees along streets.The dark colored Tumamoc Hill (lower left section) is a long term ecological observation site with desert vegetation on a mountainous area within the city.

Figure 2 .
Figure 2. Overview of urban mosquito habitat mapping and modeling phases.The LIDAR and multispectral data processing steps are used as input to the CART based LULC classification.The LULC is used to derive a Vegetation, Impervious and Soil (V-I-S) map [10].These map products contribute to mosquito habitat and simulation (outlined with dashed lines and arrows) and development of urban mosquito abundance map.

Figure 3 .
Figure 3. LiDAR point cloud image of elevation (ft) for a subset of the study area at the University of Arizona campus.Buildings and trees are easily recognized by their height, shape, and texture.The dark blue elevation values are generally used to create the DEM.

Figure 4 .
Figure 4. Land Cover/Land Use map based on the CART classification performed using 2007 NAIP multispectral data and NDVI data for the study area.

Figure 5 .
Figure 5. Land Cover/Land Use map based on CART classification performed using 2007 4-band NAIP data, NDVI data, and 2008 LiDAR CHM data.

Figure 6 .
Figure 6.Flight line calibration and gain changes cause large variability in the 2008 LiDAR return intensity data in the selected study area, making it difficult in this case to include these to improve the classification.

Figure 7 .
Figure 7. V-I-S model showing percent cover type based on aggregating the 1m CART classification results to 10 m pixels.Water and shadow are masked in this visualization of multiple attributes that affect mosquito habitat.

Table 1 .
Producer's Accuracy, Overall Accuracy, and Kappa for each of the iterations representing different input variables for each of the LULC classifications.

Table 2 .
Error matrix results for the classification performed with 2007 NAIP multispectral data and NDVI data.

Table 3 .
Error matrix results for the classification performed with 2007 4-band NAIP data, NDVI data, and 2008 LiDAR CHM data.

Table 4 .
Error matrix results for the classification performed with 2007 NAIP data, NDVI data, 2008 LiDAR CHM, and 2008 intensity LiDAR data.Data includes 4 bands of spectral data, 1 band of NDVI data, 1 band of CHM data, and 1 band of intensity data.