An Investigation of GIS Overlay and PCA Techniques for Urban Environmental Quality Assessment: A Case Study in Toronto, Ontario, Canada

: The United Nations estimates that the global population is going to be double in the coming 40 years, which may cause a negative impact on the environment and human life. Such an impact may instigate increased water demand, overuse of power, anthropogenic noise, etc. Thus, modelling the Urban Environmental Quality (UEQ) becomes indispensable for a better city planning and an efﬁcient urban sprawl control. This study aims to investigate the ability of using remote sensing and Geographic Information System (GIS) techniques to model the UEQ with a case study in the city of Toronto via deriving different environmental, urban and socio-economic parameters. Remote sensing, GIS and census data were ﬁrst obtained to derive environmental, urban and socio-economic parameters. Two techniques, GIS overlay and Principal Component Analysis (PCA), were used to integrate all of these environmental, urban and socio-economic parameters. Socio-economic parameters including family income, higher education and land value were used as a reference to assess the outcomes derived from the two integration methods. The outcomes were assessed through evaluating the relationship between the extracted UEQ results and the reference layers. Preliminary ﬁndings showed that the GIS overlay represents a better precision and accuracy (71% and 65%), respectively, comparing to the PCA technique. The outcomes of the research can serve as a generic indicator to help the authority for better city planning with consideration of all possible social, environmental and urban requirements or constraints.


Introduction
Urban Environmental Quality (UEQ) is defined as an indicator to generically describe the urban, environmental and socio-economic condition of an urban area. UEQ can be regarded as a multilayer concept that comprises physical, spatial, economic and social parameters at different scales [1]. Weng and Quattrochi [1] addressed that UEQ has the capability to influence many governing aspects, including urban planning, infrastructure management, economic influence, policy-making and social studies. However, it is challenging to predict and model the inter-relationship and dependence of all of the factors. Recently, satellite remote sensing techniques can help in modelling UEQ through providing continuous Earth observation images of the urban environment at different spatial, spectral and temporal resolutions [2][3][4]. A few preliminary attempts were found using multi-temporal and multi-resolution data to model UEQ [5][6][7][8], since these data can provide a clear vision for visualizing and understanding the land cover, water conditions and vegetation in urban areas [9,10]. As such,

Datasets
In this research, the city of Toronto, Ontario, Canada, was intentionally selected due to the data availability and the drivers of the population growth within the city during the past decade. Figure 1 shows Toronto, which is the capital of the Province of Ontario and the largest city in Canada with a total population of 2,615,060 [18]. The datasets being used in this study include three major categories: (1) Landsat TM satellite images; (2) GIS data layers; and (3) socio-economic data. All of the data were collected in the years 2010 and 2011, since GIS data and socio-economic are not consistently available after the year 2011. A Landsat TM image was downloaded from the United States Geological Survey (USGS) Earth Explorer [19]. The spatial resolution of the Landsat images is 30 m for the multi-spectral bands and 120 m for the thermal band. However, the thermal band was resampled to a 30-m resolution from the source of the data predominantly to align it with the multi-spectral bands [20]. The image was acquired during the summer season (July) in order to avoid the appearance of clouds and snow cover. On the other hand, a total of 14 GIS data layers were acquired from the Scholars GeoPortal [21] for Toronto during the same period of time. The GIS layer data including land use, population density, building density, vegetation and parks, public transportation, historical areas, Central Business District (CBD), sports areas, religious and cultural zonse, shopping centres, education institutions, entertainment zones, crime rate and health condition. These layers were first imported into the ArcGIS platform (ArcGIS; Esri; Redlands, CA, USA) for further analysis. Similar to the remote sensing data, all of the data were projected to the Universal Transverse Mercator (UTM) 17 N coordinate system. Those social-economic parameters were derived based on the use of Toronto census data that were obtained from the City of Toronto census bureau at the census tract level. The City of Toronto census bureau archives hundreds of information related to socio-economic conditions. In this research, the socio-economic parameters included education (university certificate, diploma or degree), family income and land values. Table 1 summarizes the data sources being used in this study.   Figure 2 shows the overall workflow implemented in this research. The Landsat image was clipped to the study area to speed up the data processing. The Atmospheric Correction model (ATCOR2) developed by Richter [22] was utilized to preform radiometric calibration and remove the effects that change the spectral characteristics of the land features [23]. To implement the ATCOR2 model, weather information (e.g., air temperature, visibility, etc.) was obtained from historical records at the nearest weather station at Lester B. Pearson International Airport.

Methodology
The calibration parameters for Landsat TM sensor (biases and gains) were also incorporated into the atmospheric correction. After conducting the atmospheric correction, those bio-physical parameters, including NDVI, NDWI, built-up index and LST, were derived from the Landsat images. Urban, environmental and socio-economic parameters were all extracted from the remote sensing, GIS and census data to combine all of the parameters together in the subsequent process. GIS overlay and PCA (pixel-based and object-based approach) were implemented, respectively, to integrate all of the urban, environmental and socio-economic parameters. Socio-economic parameters obtained from the City of Toronto census bureau, including family income, higher education level and land values, were used as a reference to assess the outcomes from GIS overlay and PCA. The validation was based on two criteria, including precision and accuracy (refer to Section 3.5.3). The final stage of the work is to assign the optimal integrated method to determine the best UEQ location in Toronto. LST is an essential parameter in a variety of disciplines used to study the urban climate [24,25], UEQ [8], urban heat island effect [26], urban expansion [27] and urban waste management [28]. LST is the result of a land-surface process that combines the analysis of all surface-atmosphere interactions and energy fluxes between the atmosphere and the ground. Mapping the LST from thermal remote sensing sensors can be useful for large-scale environmental and urban studies. Landsat TM and ETM+ data were substantially used in many urban environmental quality studies to derive the LST [2][3][4]. Landsat TM and ETM+ both have: (1) an archive of images that was released free to the public by the USGS [19] in 2008 and (2) a short repeat cycle (16 days), which produces a voluminous data archive for multi-temporal studies. Numerous researchers discussed the use of LST and the challenges to retrieve the LST using known and unknown Land Surface Emissivity (LSE) [29,30]. In this research, the authors utilized Geomatica (Geomatica, version 10.1; PCI Geomatics, Markham, ON, Canada, 2007). to derive the LST from the Landsat images. The adopted method to derive the LST in this research takes into consideration the atmospheric correction of the thermal band of the image. The computation of LST mainly involves three steps. The first step is to convert the pixel value of the thermal band into radiance using the following Equation (1): where L sat is the spectral radiance; L max is the spectral radiance that is scaled to Q cal.max ; L min is the spectral radiance to Q cal.min ; Q cal is the quantized calibrated pixel value in a digital number; and Q cal.max is the maximum quantized calibrated pixel value corresponding to L max . For Landsat TM Band 6, the values for L max , L min and Q cal.max are 15.3032 Wm −2 ·sr −1 ·µm −1 , 1.2378 Wm −2 ·sr −1 ·µm −1 and 255, respectively. The second step is to compute the emissivity value. Many factors, including water content, chemical composition, structure and roughness, are able to affect the emissivity of a surface [31]. Scholars emphasized that the surface temperature calculation mainly relies on an assumption of the emissivity value [32]. Some researchers assumed the emissivity value as a constant value (0.95) [33]. In contrast, other researchers epitomized that a constant emissivity value can be considered as an option and assigned three classes for the emissivity values, where the vegetation has = 0.97, soil = 0.96 and others = 0.98 as a rule of thumb [32]. However, if the emissivity value is unknown, the following Equation (2) can be used to calculate the emissivity value [34]: where a and b are obtained by a regression analysis based on a large dataset [35]. NDVI is the Normalized Difference Vegetation Index, which can be calculated from the values of the visible and near-infrared bands of the multi-spectral bands, as shown in Section 3.1.2. The third step is to conduct the atmospheric correction for the thermal band using the following Equation (3). As mentioned in Section 3, weather information (e.g., air temperature, visibility, etc.) and date and time, latitude and longitude are also needed to implement atmospheric correction. The equation for the atmospheric correction can be written as [36]: where L C is the atmospherically-corrected radiance, L sat is the spectral radiance (Wm −2 ·sr −1 ·µm −1 ), L up and L d are the upwelling and downwelling radiances (Wm −2 ·sr −1 ·µm −1 ) and and τ are the emissivity and transmittance, respectively. The fourth step is to convert the calibrated radiance into the at-sensor brightness temperature using the following Equation (4): where T BBT is the blackbody temperature in Kelvin (K), K 1 is the calibration Constant 1 in Wm −2 ·sr −1 ·µm −1 and K 2 is the calibration Constant 2 in Kelvin (K). For Landsat TM, K 1 and K 2 are 607.76 Wm −2 ·sr −1 ·µm −1 and 1260.56 K, respectively [37]. The fifth step is to convert temperature from Kelvin into temperature in Celsius using the following Equation (5): The computed ( • C) is regarded as the LST derived from the Landsat image.

Normalized Difference Vegetation Index (NDVI)
Prior to the existence of satellite remote sensing, urban vegetation was usually monitored and mapped by combining colour infrared aerial images and fieldwork. This method seems to be a unique option to measure the urban vegetation [38]. With the availability of multi-source multi-spectral satellite images, Fung and Siu [10] used Landsat and SPOT (Satellite Pour l'Observation de la Terre; Satellite for the Observation of Earth; Spot Image, Toulouse, France) images to quantify urban vegetation as a parameter for UEQ studies. Many researchers used Landsat images to extract NDVI [2,8,39]. NDVI is a ratio that presents the changes in the vegetation over time, and it has been applied to various applications, such as vegetation cover, biomass and Leaf Area Index (LAI) [40,41]. Most of the urban environmental studies showed that NDVI is one of the most important parameters that can be used to assess UEQ, where the higher values represent the positive impact on the city [2,8]. The NDVI (ranging from −1 to 1) refers to an index that is able to monitor the vegetation activity and its annual changes, which can be calculated using Equation (6) [42]: where N IR is the near infrared Band 4 in the Landsat TM image and Red is the red Band 3 in the Landsat TM image.

Normalized Difference Vegetation Index (NDWI)
NDWI is another remote sensing-derived biophysical parameter that represents the surface moisture in vegetation cover, as well as water bodies. Hardisky et al. [43] found that NDWI is able to track changes in vegetation biomass and water stress more than NDVI. NDWI can also be used to measure and assess the turbidity of water bodies from remote sensing data [44], and therefore, Liang and Weng [11] used NDWI as a parameter to assess the UEQ where the higher NDWI represents the higher urban quality (i.e., close to lake shore). The NDWI (ranging from −1 to 1) can be are calculated using Equation (7) [14]: where N IR is the near infrared Band 4 in the Landsat TM image and Green is the green Band 2 in the Landsat TM image.

Normalized Difference Built-Up Index (NDBI) and Built-Up Index
NDBI is another ratio that represents the spatial distribution of the urban and suburban areas. NDBI has been used in many urban planning applications. Zha et al. [42] used the combination of NDBI and NDVI to identify and monitor the areas in the city of Nanjing. Chen et al. [45] shows that land cover types can be represented by utilizing NDVI, NDWI and NDBI. Moreover, Faisal and Shaker [46,47] show that the built-up index derived from NDBI and NDVI could represent industrial areas within the city. Therefore, in UEQ studies, the higher NDBI/built-up values may be deemed to have a negative impact on the city. To derive the built-up area, first, the NDBI values (ranging from −1 to 1) are calculated using Equation (8) [42]: where MIR is the mid-infrared Band 5 of the Landsat TM image and N IR is the near infrared Band 4 of the Landsat TM image. The NDBI values refer to an index that represents the urban regions and its annual changes. Finally, the built-up values (ranging from −1 to 1) are defined by subtracting the NDBI layer from the NDVI layer using the following Equation (9) of Zha et al. [42]:

Land Use and Land Cover
The expansion of population can affect the urban environment and urban planning around the world. Therefore, monitoring land use and land cover should be conducted to avoid potential problems for sustainable urban and environmental planning. Monitoring land use and land cover helps planners and decision makers to build better urban environmental cities in the near future and assess the quality of the urban cities. Various studies recommended building urban green cities rather than a dense high rise urban environment. Urban green cities increase the value of UEQ within the city [48][49][50]. Medium to fine-scale land cover and land use maps can be derived from remote sensing satellite images [51] or, recently, airborne LiDAR data [52]. However, the accuracy of land cover and land use can change from one satellite to another due to the variation of the spatial resolutions of the satellites. In order to assess the urban quality of living, physical environmental parameters should be obtained. Physical environmental parameters, such as roads, cropland and pasture, water, commercial and industrial, high density residential, medium density residential, low density residential, forest and grass, are critical and essential parameters to assess the urban quality of life. The physical environmental parameters can be used also to extract some of the socio-economic parameters, such as population density and social conditions [11].

Urban Density
Around the world, residential areas can be affected by the increase of population and migration movement. Building density is one of the most important parameters that contributes to the urban heat island effect and urban quality assessment [53]. Building and population density can have a negative influence on the UEQ and transportation system in the developing cities. That is mainly because a dense high rise urban environment typically increases LST, noise pollution together with a high demand of vehicle use [54]. However, most public services, public transportation and jobs are located within walking distance from high density areas. Remote sensing technique can aid in determining the density values by extracting the urban areas from the image [8,12]. The extracted urban areas can be divided by the total areas, so as to calculate the building density, as shown in Equation (10). On the other hand, the population density can be calculated by dividing the number of people over the urban area as shown in Equation (11): Building density = Urban areas Total areas (10) Population density = Number of people Urban areas

Public Transportation
The acceleration of population growth may increase car ownership, which may increase the amount of carbon dioxide emission and subsequently affect the accessibility to roads, especially in the developing countries [55]. Transportation is the main sector that works in shaping and connecting the cities. Public transportation provides a faster, safer and easier way to travel around the city. Public transportation can help the city through connecting the sub-centres around the railway stations and building a linear development along the route of the public transit line [55]. It was found that most of the automobile-dependent cities lose the traditional community support processes [55]. Therefore, public transportation is one of the major parameters for the UEQ.

Open Spaces and Entertainment Zones
Many studies in UEQ justified that open spaces and open green areas are significant factors contributing to high environmental quality areas [8,12]. That is mainly because open spaces and parks offer a healthy and comfortable environment by cooling down the LST and reducing the air pollution especially in high density areas. Entertainment areas are mainly located in the public parks, plazas and open space areas for some occasions, such as Christmas and New Year. Famous open spaces, such as Times Square in the city of New York, Dundas square and Nathan Phillips Square in Toronto, are so invigorating with a big amount of visitors all over the year, mainly because they are located within the core of high density areas and thus provide a vibrant atmosphere. Such a phenomenon supports the argument that high density areas are more preferable than low density areas.

Historical Areas and Central Business Districts (CBD)
The design of historical cities around the world is mainly based on walking distance. Those historical cities are usually featured by high density, mixed land use and shaded streets in central forms, such as Jerusalem, Damascus, Athens and Istanbul. The average walking distance toward the historical cities is designed to be 5 km apart in order to be close to other facilitates in the city. A few cities still currently retain the historical buildings and walking characteristics, such as Society Hill in Philadelphia, the North End in Boston and the Rocks in Sydney [55]. That is mainly because historical areas retrieve the worth of past energy and provide a visual and physical conservation of cultural identity [56]. Currently, modern cities have more of a tendency to rebuild and preserve historical areas, such as Arabella Park in Munich, to attract tourists and provide a vibrant atmosphere for the city [55]. Historical neighbourhoods, which are always located in the city centre, have higher positive influence on UEQ, where the historical neighbourhoods and CBD are the most attractive regions in the city.

Crime Rate
Personal security is one of the most important factors for society regardless of where we live. Crime can be the reason for physical pain, anxiety and the loss of lives and property [57]. Anand and Santos [58] illustrated that the biggest influence of crime is the feeling of vulnerability in people's lives, and thus, the crime rate is negatively related to UEQ. It was reported that people move to live in more suburban and low density areas for the desire for new and better public schools and a low crime rate. However, in some cases, the low cost of housing may cause a demand for more housing per person, which may form new clusters for new urban crime [59]. Increasing the physical distance between the poor and the rich is not always the best way to reduce urban crime, particularly in the city centre. Instead, it is preferable to increase the community services and the quality of life in those areas to make them more vibrant and reduce the crime rate [54]. The crime rate can be calculated by dividing the number of crimes over the total population, as shown in Equation (12):

Education and Income
Education and income are two related factors among relevant socio-economic parameters. Research shows that wealthier urbanites tend to invest more in high quality properties and services. That is mainly because they have higher income and receive higher education, which gives them the tools to access and process more data about the high quality areas. In addition, people with high income and high education have the ability to invest in higher quality areas, compared to people with less education and less income [60]. Moreover, Kahn [54] pointed out that people with higher education and income are more interested in supporting UEQ-related issues. Wealthier and educated urbanites also tend to participate in politics and the community in order to enhance the quality of living in their living areas. Based on the above argument, the areas that have more highly-educated and wealthier urbanites are considered to have higher UEQ areas. Therefore, these areas are used as the first category of reference for our study.

Land Values
Knowing the parameters that influence the UEQ is an important advantage to design and assess the future urban development. UEQ is assessed by using various urban and environmental parameters. Reginster and Goffette-Nagot [61] conducted a study in two Belgian cities to investigate the relationship between the UEQ with respect to the residential location. It was revealed that UEQ may affect positively the land rent location and income in the city. Other research discussed the relationship between the real estate evaluation model and the environmental parameters in the city of Geneva, Switzerland [62]. It was found that urban and environmental parameters have an influence on the price within the city of Geneva. Topcu and Kubat [63] also examined the relationship between urban and spatial factors that might influence the urban land values in the city of Istanbul. It was found that the distance from the sea, the distances from the central business district, universities and sanitary facilities, as well as the the variable of the colour of building facades all have a predominant impact on the residential land values. As a result, our experiment assigned the land values as the second category of reference for this research.

Ranking the Parameters
Since the aforementioned parameters are extracted from different data sources, they may have different scale levels and cannot be combined to a specific unit. Therefore, all of the obtained data (parameters), including raster, census and GIS data, were first transformed into one scale (sub-neighbour), as shown in Figure 3. Then, all of the parameters were ranked from 1 to 10 to normalize the observation value for each parameter.
To normalize the parameters and represent the significant level of each polygon in the parameter, the Z-score method was performed for all parameters. The Z-score model is a statistical measurement that is able to standardize a wide range of data to represent the significant changes across the data [64]. Equation (13) shows the first step to normalize the parameters using the Z-score: where x is the observation values (polygons) (refer to the GIS polygons of the parameters as shown in Figure 4), i is the parameter, µ is the mean value of the parameter and σ is the standard deviation of the parameter.  The second step is to use linear interpolation to rank the parameters from 1 to 10 as shown in Figure 5. The polygon within the parameter that has a high Z-score number will represent high values, for example 10. The polygon that has a low Z-score will result in a value of 1. The following Equation (14) shows how linear interpolation was calculated: where Obs is the current observation value, Obs max is the maximum observation value, Obs min is the minimum observation value, Rank max is the maximum ranking value, Rank is the determined ranking value and Rank min is the minimum ranking value.

Data Integration of Multiple Environmental and Urban Parameters
Integration techniques can be used to combine remote sensing and GIS data and have been applied for urban modelling and analysis [65]. Previous studies demonstrated two integration techniques, namely PCA and GIS overlay, which are able to combine any type of parameter. In this research, three approaches were demonstrated to integrate the above-mentioned environmental and urban parameters.

Geographic Information System (GIS) Overlay
GIS overlay is a multi-criteria application that uses data layers for specific environmental thresholds. Remote sensing data are presented as digital data in raster format. However, census data are presented in GIS vector format. Remote sensing data can thus be integrated with socio-economic data by converting remote sensing data from raster to vector data [7]. In this research, the GIS overlay integration method was used to combine the urban and environmental parameters in order to serve for the UEQ assessment. All of the parameters were converted from raster to vector data in order to be presented as attribute data, as shown in Figure 3 in Section 3.4. While each parameter has a range of values ranked from 1 to 10, the sum of the data layers can thus present the result of UEQ values. Ranking the parameters was mainly based on the observation values; where the highest value is assigned 10 and the lowest value is assigned 1. However, some parameters, including crime rate, industrial areas and LST, are inversely presented (e.g., the highest crime rate or LST value will be assigned 1, and the lowest crime rate or LST value will be assigned 10), as shown in Figure 5. Then, all of the ranks are summed up to compute the UEQ as shown in Figure 6.

Principal Component Analysis (PCA)
PCA is an analysis technique that compresses the high dimension of data into a lower dimension of data that has most of the variance of the data [14]. PCA is commonly used in many remote sensing applications. The covariance matrix of standard PCA may not be the best option for data that have different measurement units. The correlation matrix can be used instead of the covariance matrix to standardize each parameter to the variance unit or zero mean. In this research, pixel-based and object-based methods were used to assess the UEQ in Toronto. In pixel-based approach, all of the parameters were converted to raster format to extract pixel values for each parameters. Then, the pixel values were used in the PCA model to compute the components that have most of the variance of the data. In object-based PCA, the covariance matrix or correlation matrix mainly is derived from the observation values of the GIS polygons. Then, the covariance matrix or correlation matrix will be used to compute the components in the PCA model to assess the UEQ.

Accuracy Assessment
Several researchers attempted to assess the accuracy of the UEQ results using different methods, including e-mail questionnaires, field-based questionnaires and factor analyses. Regardless of the considerable amount of e-mail questionnaires or field-based questionnaires, both methods require overheads for data collection. In addition, factor analysis used in previous work was preformed using the same parameters that have been incorporated to compute the UEQ, which make it unreliable and biased. Several researchers illustrated that education level, including university certificate or diploma, family income and land values, represents the UEQ in the economic and social aspects [54,[60][61][62]. Since there is a lack of ground truth to validate the results, we propose to use these socio-economic parameters for data validation and to assess the UEQ results. All of the observation data of the three socio-economic parameters were normalized to be in the same scale from 1 to 10. Then, the sum of the socio-economic parameters can thus present the result of reference, as shown in Table 2. In addition, the evaluation of the binary classifiers approach was used to assess the UEQ based on the following two performance measures through data interpretation: precision and accuracy. Precision (P) is a measure that evaluates the probability that a positive outcome is correct using Equation (15): Accuracy (Acc) evaluates the effectiveness of the classifier by its percentage of correct predictions using Equation (16): where TP refers to "True Positive", which means the polygon from the proposed method is located physically in the reference layer; TN refers to "True Negative", which represents the polygons that are not detected in the proposed method and reference layer; FP refers to "False Positive", which means that the polygon of the proposed method does not really exist in the reference layer; and FN refers to "False Negative", which means the reference polygons do not exist in the proposed method. With these three indicators, we assessed the UEQ layer from the results of each proposed method including GIS overlay, and PCA assessed the best method for our datasets. Figure 7 shows the UEQ derived in Toronto using the GIS overlay. The distribution of UEQ in Toronto shows that the highest UEQ zones were found in the zones A, B, C and D in green colour, while the lowest UEQ zones are indicated as red colour in the city. The highest UEQ zones are the consequences of the summation of all of the positive parameters including (high vegetation areas, historical areas, areas supported by public transportation, etc.) that are located within Zones A to D. However, negative values of the parameters, including crime, industrial areas and high LST, are constantly located on the red zones within the city. In contrast, the highest values of UEQ areas were found in the high and moderate density areas, while the lowest values were found in the industrial and low density areas.

Pixel-Based PCA
In this section, an analysis was first conducted to investigate the relationship among all of the parameters. In pixel-based PCA, all of the parameters were converted from vector to raster in order to compute the spatial correlation among the parameters. Some parameters, including built-up areas, LST layer, industrial areas and crime rate regions, were reversed in order to avoid any negative values in the correlation matrix. Pearson's correlation coefficient was computed to investigate the dependence among all of the parameters, which is going to help in the subsequent PCA. Table 3  On the other hand, the reverse crime rate also has a moderate correlation with reverse industrial areas (0.77), reverse built-up areas (0.77), green vegetation (0.75) and the public transportation parameter (0.70). Based on these observations, one can indicate that the high vegetation areas are usually located at low crime rate and low industrial areas within the city. The parameter of low crime rate is also influenced by the transportation within the city because of a high correlation observed between these two parameters. The areas that are covered by public transportation are usually crowded with people, which thus influences the crime rate within the city. These observations also indicate that the reverse built-up areas have a high correlation with industrial areas, which could help to derive the industrial areas using remote sensing data. The high correlation between the parameters may cause redundancy and slow down the processing steps. Therefore, data reduction can help to improve the data processing and cost. Four components were extracted from all of the parameters using the pixel-based PCA approach. Figure 8 shows the UEQ derived using the pixel-based PCA method. PC1 represents the largest percentage of the variance of the data, with 95% of the total variance. However, the combination of Components 2, 3 and 4 contains only 5% of the total variance. Due to the higher variance of Component 1, it represents most of the parameters, including crime rate, NDVI, NDWI, reverse LST, areas close to water bodies, reverse industrial areas, reverse built-up areas, green vegetation and public transportation parameter, as shown in Table 4. The low variance found in Components 2, 3 and 4 showed that the used pixel-based PCA relied only on the first components, as shown in Figure 9.

Object-Based PCA
In the object-based approach, the polygons of each parameter were used in the PCA model to assess the UEQ. Table 5 represents the correlation coefficient matrix among all of the parameters. Population density has a moderate positive correlation coefficient with the historical areas parameter (0.66), where building density has a moderate negative correlation with green vegetation (−0.61), NDVI (−0.68), NDWI (−0.67) and a positive correlation with built-up areas (0.67) and LST (0.78). NDVI has a strong positive relationship with NDWI (0.88) and a moderate negative correlation with green vegetation (0.66). However, NDVI has a high negative correlation with the built-up areas parameter (−0.90) and LST (-0.80) and also has a moderate negative correlation with building density (−0.68). The built-up areas parameter has a strong positive correlation with building density (0.67) and LST (0.79). In addition, the built-up areas parameter has a negative correlation with NDVI (−0.90) and NDWI (−0.89). NDVI has a very high correlation with NDWI and a negative correlation with the built-up areas parameter and LST, as well as having a moderate negative correlation with building density, which indicates that high NDVI values represent low LST and low high building density areas with more green areas. As mentioned in the previous section, data reduction can improve the data processing and cost. Therefore, the object-based approach was used to reduce the size of the data. In this study, five components were extracted in the object-based PCA approach, which have eigenvalues larger than one, as shown in Figure 10. The total variance of the five components is 75% of the overall variance of the data. Preliminary analysis revealed that Component 1 has 36% of the total variance of the dataset. Component 1 shows strong positive loadings with NDVI (0.88), NDWI (0.86), building density (0.80), LST and historical areas (0.86) and strong negative loadings with LST (−0.86) and built-up areas (−0.86). In addition, Component 1 is the best to represent the green areas within the city. Component 2 reveals about 16% of the dataset, which mainly represents industrial areas with a positive correlation of 0.63 and CBD with a positive correlation of 0.76. Component 2 can be used to represent more about the urban areas. Component 3 represents 9% of the dataset, which mainly represents only sports areas with a positive correlation of (0.81). Component 4 reveals 7% of the dataset, which mainly represents public transportation with a positive correlation of 0.70. Table 6 shows the overall map produced from Components 1 to 5, which represents 75% of the overall variance in the data.

UEQ Validation Results
As mentioned in the previous section, four socioeconomic parameters were derived from census data. The combination of education level, family income and land values was used to validate the UEQ results. The evaluation of binary classifiers approach was used to evaluate the UEQ, as mentioned in Section 3.5.3. The results of GIS overlay and PCA (pixel-based and object-based) were validated using socioeconomic parameters as a reference for this study. Since we are looking to highlight the higher UEQ areas, the mean values were used as a threshold to derive the higher UEQ areas. Figure 11 shows the reference layer and the high value of the reference layer. The distribution of the reference layer revealed that the highest values are found in the city centre, the west portions of the city, while most of the low UEQ values are found in the east and down town of the city. Figure 12 shows the GIS overlay analysis and the higher values of GIS overlay. There exist a few areas having high UEQ values located in the north and east of the city. The precision and accuracy measured were found to be 71% and 65%, respectively, for the GIS overlay method. That is mainly because the GIS overlay method uses all of the parameters where some of the parameters may have a negative correlation with the reference layer, which may influence the overall result. Figure 13 shows higher UEQ ranking derived using the pixel-based PCA method. The highest values of pixel-based PCA are mainly located in the centre, north, northwest and northeast portions of the city. Since the pixel-based PCA used 95% of the data, the result of the pixel-based PCA shows lower precision and accuracy with respect to GIS overlay. The precision and accuracy are reported to be 68% and 63%, respectively, for pixel-based PCA. Apparently, the pixel-based PCA reveals a lower completeness level, precision and accuracy than GIS overlay, mainly because the pixel-based PCA considered only nine parameters to generate 95% of the data, and some of these parameters have low correlation with the reference layer. Figure 14 shows the object-based PCA and the higher values of the object-based PCA. The result of the object-based PCA represents high UEQ values in the centre, north, northwest and northeast portions of the city. The overall result of object-based PCA reveals a slightly better precision and accuracy by 1% than the pixel-based PCA method. The main reason why the object-based PCA results were slightly better than the pixel-based PCA is mainly because the object-based PCA method considered five components in the analysis, which have more variation of the parameters. However, only one component was considered in the analysis in pixel-based PCA. One more reason could be because in pixel-based PCA, all of the vector data were converted to raster data. That step may cause a certain loss of spatial information, which may affect the overall results. The overall result of the object-based PCA method yielded a lower precision and accuracy by 1% than the GIS overlay method, as shown in Figure 15, and

Conclusions
In summary, this study aimed to utilize remote sensing and GIS techniques to assess UEQ with a case study in the city of Toronto, Ontario, Canada, through evaluating two methods: GIS overlay and PCA. One of the issues for the UEQ integration method is that remote sensing, GIS and census data are collected at different scales and in different formats, which may require data normalization before further analysis. In this study, The Z-score model was performed as a first step to normalize all of the parameters. Then, linear interpolation was implemented to rank all of the Z-score values from 1 to 10.
Integration techniques including GIS overlay and PCA (both pixel-based and object-based methods) were used to integrate the environmental, urban and socio-economic parameters. GIS overlay is one of the effective tools for integrating different datasets from different data sources. GIS overlay offers an intelligent platform for creating a comprehensive database to evaluate the UEQ. Correlation analysis investigates the dependence found among urban, environmental and socioeconomic parameters. In our case study, it was found that green areas have a strong positive correlation with NDVI and NDWI. There was a negative relationship with the built-up areas parameter, LST, industrial areas, crime rate and building density. Alternatively, PCA provides an efficient method to reduce the data dimension and redundancy. Four components that have eigenvalues over one were derived from the 19 parameters that represented the urban and environmental aspects in the pixel-based PCA method. Five components that have eigenvalues over one were derived from the 19 parameters that represent the urban and environmental aspects in the object-based PCA method. The two methods (pixel-based and object-based) were tested due to the data availability. Other studies can only consider one method of PCA, since they do not have significant contrast in the results with respect to UEQ parameters.
One of the key concerns in UEQ research is to validate the final results derived from different socio-economic references. Despite that some of the existing UEQ studies utilized email or questionnaire surveys to collect the public's opinion for UEQ assessment, this study proposed to use three socio-economic parameters (university certificate or diploma, family income and land values) as a reference for result assessment. The results showed that the precision was 71% for the GIS overlay method, and the accuracy was measured as 65%. The precision level of the pixel-based PCA method yielded 68%, and the accuracy was reported to be 63%, respectively. The precision level of the object-based PCA was 70%, where the accuracy was reported to be 64%. In this study, GIS overlay represented better results than PCA (pixel-based and object-based) with respect to the UEQ results parameters, which may suggest that GIS overlay can be a better method in terms of the integration of multiple parameters.
Although the presented approach can be used by any federal authorities and municipalities in developing and developed countries, where there is a need to improve and design the new areas within the city, there are a few recommendations for similar future studies: (1) more up-to-date remote sensing and GIS data are required to consolidate the findings; (2) census socioeconomic data usually relate to administrative units and can be changed in a shorter period of time, which makes it difficult to be available worldwide; (3) integration among remote sensing, GIS and socioeconomic data needs conversion between data, such as from raster to vector or from vector to raster, a step that may cause a certain loss of spatial information. To conclude, remote sensing and GIS techniques can provide fruitful information to model UEQ. However, other urban and environmental parameters, as well as empirical models (such as different geographically-weighted approaches) should be considered in order to develop a more universal indicator to predict the UEQ. As a result, further research is under way to study different approaches to narrow down the variety of parameters, as well as developing a new technique to retrieve the UEQ in different cities located in Canada.