Modeling the Census Tract Level Housing Vacancy Rate with the Jilin 103 Satellite and Other Geospatial Data

The vacant house is an essential phenomenon of urban decay and population loss. Exploration of the correlations between housing vacancy and some socio-environmental factors is conducive to understanding the mechanism of urban shrinking and revitalization. In recent years, rapidly developing night-time remote sensing, which has the ability to detect artificial lights, has been widely applied in applications associated with human activities. Current night-time remote sensing studies on housing vacancy rates are limited by the coarse spatial resolution of data. The launch of the Jilin1-03 satellite, which carried a high spatial resolution (HSR) night-time imaging camera, provides a new supportive data source. In this paper, we examined this new high spatial resolution night-time light dataset in housing vacancy rate estimation. Specifically, a stepwise multivariable linear regression model was engaged to estimate the housing vacancy rate at a very fine scale, the census tract level. Three types of variables derived from geospatial data and night-time image represent the physical environment, landuse (LU) structure, and human activities, respectively. The linear regression models were constructed and analyzed. The analysis results show that (1) the HVRs estimating model using the Jilin1-03 satellite and other ancillary geospatial data fits well with the Census statistical data (adjusted R2 = 0.656, predicted R2 = 0.603, RMSE = 0.046) and thus is a valid estimation model; (2) the Jilin1-03 satellite night-time data contributed a 28% (from 0.510 to 0.656) fitting accuracy increase and a 68% (from 0.359 to 0.603) predicting accuracy increase in the estimate model of the housing vacancy rate. Reflecting socio-economic conditions, the luminous intensity of commercial areas derived from the Jilin1-03 satellite is the most influential variable to housing vacancy. Land use structure indirectly and partially demonstrated that the social environment factors in the community have strong correlations with residential vacancy. Moreover, the physical environment factor, which depicts vegetation conditions in the residential areas, is also a significant indicator of housing vacancy. In conclusion, the emergence of HSR night light data opens a new door to future microscopic scale study within cities.


Introduction
The phenomenon of urban shrinkage in the Great Lakes watershed region from the 1970s to 2000s in the U.S. was associated with depopulation deindustrialization [1][2][3][4].The traditional industrial cities in this region suffered a rapid decline in industry and population, leaving behind thousands of abandoned residential houses [5][6][7][8].As estimated by the American Community Survey (ACS), 16,842,710 vacant properties existed in the United States in 2016 [9].In this study, we only focus on the vacant residential buildings, also called vacant houses.The causes of vacant houses are complicated and contain factors from policy, economy, and society [8].As a typical central city in the US, Buffalo lost more than 70% of manufacturing jobs and more than 50% of the population since industry declined around the mid-1970s.As a result [10], such a large amount of abandonment properties brought serious social security and economic problems [9].These vacant houses act as a neighborhood safety threat by creating conditions for crime.High vacancy rates reduce a city's tax base and result in dysfunctional economy and housing markets [11,12].Thus, vacant houses served as a symbol of city shrinkage and an indicator of community security and mental health.Relevant studies are beneficial to understanding socio-economic impact drivers and managing community revitalization.
Vacant houses are an expression of landuse (LU) change in the urban shrinkage.In previous studies, land cover (LC) and LU change are widely used to monitor the human-induced changes in the environment [13][14][15][16][17][18], especially urbanization, as the increasing impervious surface and decreasing agriculture and natural vegetated lands [19][20][21].However, the LC change from vegetated lands to urban infrastructures in the urban expansion period is much easier to be detected from remote sensing than the minor LU change in the urban shrinking period, such as housing vacancy.Physical change in LC is reflectance-sensitive, while the LU change, such as vacancy status, makes little difference in remote sensing images.Different from developing countries, the immigration in suburbanization in developed countries is from urban to suburban instead of from rural areas, although they have similar suburban expansion phenomena.Thus, studying the inner-city areas, the origin of immigration is very essential to understanding the urban shrinking phenomenon, although detecting this minor LU change, housing vacancy, is challenging in remote sensing.In this case, we are focusing on the housing vacancy occurring at the origin of immigration in the United States, instead of LULC change in developing countries.
Putting a high priority on the vacant houses, the Census Bureau of United States started sampling field surveys to collect demographic characteristics with the housing vacancy rate (HVR), and this has occurred every 10 years since 1956.The statistical data, HVR, were annually estimated and updated based on the master address file (MAF) and the sampling field survey [22].Although they are based on a costly door-to-door field survey, census data are still the primary data source for a variety of demographic research.However, beyond the Census statistical data, a series of studies tried to derive HVR from other data sources.Based on the relationship between vacant housing and the housing market, Lu et al. [23] calculated the commercial housing vacancy rate in Wuhan, China, using a mathematical method based on the housing price data.Rosen et al. [24] developed a cross-section model to explain natural vacancy rates, which are generated from a price-adjust model using rental price variables in 17 cities in the U.S. Using geographic information analysis, researchers simulated the housing occupancy status by integrating surrounding environmental factors.Bentley et al. [25] performed spatial distribution analysis and factor analysis in housing vacancy estimation in Detroit, MI.Wang et al. [26] measured the relationship between vacancy rate and central business district (CBD), then used a logit regression model to predict the vacancy rate, and reached the maximum value of R 2 , 81.5%.Huuhka et al. [9] used both geographical and statistical methods, which showed that several characteristics help in our understanding of vacancy.Silverman et al. [8] estimated the vacancy and abandonment in Buffalo, NY, using a linear regression model with various socioeconomic characteristics, and reached a maximum R 2 56.8%.Although many studies have been done on HVR estimation, data in these mathematical simulation methods are from out-of-date statistical sources.Thus, they cannot meet the demand of real-time monitoring for changing cities.
As the only data source that is objective, real-time, low-cost, and easily connective with spatial data, remote sensing provides a new view from the sky and a series of real time observation datasets [27].Emmanuel et al. [28] and Ryznar et al. [29] found that the mean value of the normalized difference vegetation index (NDVI) of residential areas was significantly correlated with HVR.Deng et al. [30] explained and modeled residential vacancy by incorporating high spatial resolution (HSR) multispectral aerial photographs and geospatial landuse datasets.The accuracy at census block and parcel level is 47.4% and 80.44%, respectively.Notably, with the development of new sensors, which are able to detect anthropogenic luminosity at night, the capability of remote sensing on human activities and associated applications has been greatly extended.Since the vacancy status of houses is caused by a lack of human activities, the night-time light (NTL) data have been used as a proxy to derive the vacancy rates.Yao et al. [31] assessed the housing vacancy in urban areas in China using nocturnal light data from the Defense Meteorological Satellite Program's Operational Linescan System (DMSP-OLS) in 2011, and Chen et al. [32] used the same dataset to calculate HVR in 15 metropolitan areas in the U.S. with an R 2 of 0.734 in 2015.However, due to the coarse spatial resolution of these datasets, existing studies have only been carried out on the national or metropolitan scale, not finer scales.
Previous studies on HVR estimation using NTL remote sensing are all at very coarse scales, such as the national or metropolitan scale.However, practically, from the city manager's perspective, studies at finer scales, such as dynamics monitoring at the urban micro level and estimating socio-economic differences within the city, are more important in urban planning to understand the mechanism of urban environment change and to make detailed city management decisions.As known, NTL can reflect urban social-economy from an objective way.Thus, finer resolution NTL sensors become pressing requirements in urban remote sensing studies.The Jilin1-03 commercial satellite, which was launched by China Changguang Satellite Technology Co., Ltd. in 2017, carried HSR night-time imaging camera and provided a new supportive data source.Equipped with sensors running at 535 km of the sun-synchronous orbit, the Jilin1-03 satellite has the capacity of video imaging, push-broom imaging, and night-time imaging.Its revisit cycle is 4.5 days when under 30 degrees sideways running.The night time imaging mode of the Jilin1-03 produces images with 12,000 × 5000 pixels, a 0.92 m spatial resolution, and three visible bands: blue, green, and red.The sub-meter level spatial resolution of this satellite imagery provides an epoch-making opportunity to research at sub-city scales.
Comparison of basic information of DMSP/OLS, NPP/VIIRS, and the Jilin1-03 NTL imageries see Table 1.With Jilin1-03 satellite night-time imagery, studies on the housing vacancy status have a chance to be performed at a finer scale.However, as a cutting-edge image data source, no HVR studies using Jilin1-03 NTL images have been explored before.The feasibility of exploring urban social-economic environment using HSR NTL images is unknown.Corresponding to these research questions, the objectives in this study, the first of its kind, is to (1) assess the feasibility of HVR estimation at the census tract level using Jilin1-03 satellite images and other ancillary geospatial data and to (2) evaluate the improvement in HVR estimation from adding fine spatial resolution (0.92 m) night-time images.

Study Area
Our study area is the City of Buffalo, a representative shrinking city located on the shores of Lake Erie in Western New York State, US.Buffalo began its decline more than a half-century ago.Based on the U.S. Census data from 1950 to 2010, the population in the city of Buffalo shrank from 580,132 to 261,310.A population loss of 318,822 amounted to about 55% of the population in 1950.Population loss and housing unaffordability forced thousands of instances of housing abandonment and housing vacancy.The city of Buffalo ranked at the third highest percentage of vacant housing in the nation, following Detroit and New Orleans.The percentage of vacant housing units reached a peak as 22.8% in 2006, when the vacant houses number was 32,637 units [80].After that, the HVR in Buffalo decreased slightly due to the implement of revitalization policy [10].In 2016, Buffalo ranked sixth, with an HVR of 15% [80].This study covers 105.2 km 2 of land area, including the total 77 census tracts in the city of Buffalo in 2016 [8] (Figure 1).

The Jilin1-03 Satellite Night-Time Image
In this study, we used a Jilin1-03 satellite night-time image with a 0.92 m spatial resolution and an 8-bit quantitative level.This image was obtained at a local time of 23:06:45 on September 25th, 2017.The size of the field is 11 × 4.6 km, and our whole study area can be covered through this one shooting.The image product has been orthorectified, geometrically corrected, and mosaicked.The image was projected into the North American Datum of the 1983 Contiguous U.S.A. Albers coordinates system, which has less distortion in this area.

Other Geospatial Data
The geospatial data used include the daytime aerial digital orthoimages, parcel data, and statistical housing vacancy data from the U.S. Census Bureau five-year estimates.
The digital orthoimage product released by New York State High Resolution Statewide Digital Orthoimagery Program, New York State GIS Clearinghouse.The products for a certain county are updated once every three years.For example, for Erie County, the product was updated in 2002, 2005, 2008, 2011, 2014, and 2017.Orthoimages with a 1 ft.(30.48cm) spatial resolution and 4 bands (natural color and near infrared) in 2017 were employed in this study.We projected orthoimages to the same coordinate system as that of Jilin1-03 satellite night-time imagery.
The parcel data are used to make landuse maps.The Buffalo parcel data contain detailed landuse information and geometric parcel boundaries in a vector format that are compiled and provided by the Office of Geographic Information Services.The newest version of parcel data is from 2016 and includes 10 landuse types in the Buffalo city: commercial, recreation, community services, residential, road, industrial, public services, vacant land, park, and water.A landuse map was derived (shown in Figure 2).Some parcels that has missing landuse data were labeled artificially by referencing Google Maps' Street View.

The Jilin1-03 Satellite Night-Time Image
In this study, we used a Jilin1-03 satellite night-time image with a 0.92 m spatial resolution and an 8-bit quantitative level.This image was obtained at a local time of 23:06:45 on 25 September 2017.The size of the field is 11 × 4.6 km, and our whole study area can be covered through this one shooting.The image product has been orthorectified, geometrically corrected, and mosaicked.The image was projected into the North American Datum of the 1983 Contiguous U.S.A. Albers coordinates system, which has less distortion in this area.

Other Geospatial Data
The geospatial data used include the daytime aerial digital orthoimages, parcel data, and statistical housing vacancy data from the U.S. Census Bureau five-year estimates.
The digital orthoimage product released by New York State High Resolution Statewide Digital Orthoimagery Program, New York State GIS Clearinghouse.The products for a certain county are updated once every three years.For example, for Erie County, the product was updated in 2002, 2005, 2008, 2011, 2014, and 2017.Orthoimages with a 1 ft.(30.48 cm) spatial resolution and 4 bands (natural color and near infrared) in 2017 were employed in this study.We projected orthoimages to the same coordinate system as that of Jilin1-03 satellite night-time imagery.
The parcel data are used to make landuse maps.The Buffalo parcel data contain detailed landuse information and geometric parcel boundaries in a vector format that are compiled and provided by the Office of Geographic Information Services.The newest version of parcel data is from 2016 and includes 10 landuse types in the Buffalo city: commercial, recreation, community services, residential, road, industrial, public services, vacant land, park, and water.A landuse map was derived (shown in Figure 2).Some parcels that has missing landuse data were labeled artificially by referencing Google Maps' Street View.
The U.S. Census Bureau estimates the rental and homeowner vacancy annually, as well as the characteristics of units available for occupancy.The statistical HVR data in 2016 for census tract areas in Buffalo, Erie County, were downloaded from the U.S. Census Bureau website [80].
The U.S. Census Bureau estimates the rental and homeowner vacancy annually, as well as the characteristics of units available for occupancy.The statistical HVR data in 2016 for census tract areas in Buffalo, Erie County, were downloaded from the U.S. Census Bureau website [80].
Figure2.Landuse map of Buffalo derived from parcel data.

Methodology
For the purpose of HVR estimating at the census tract level, we adopted a stepwise multivariate linear regression model.Information extracted from the Jilin1-03 satellite night-time imagery, digital orthoimages, and landuse datasets is used in the modeling as explanatory variables, including three types of driving factors: the physical environment, landuse structure, and human activity.The true value of the predictive variable is the statistical housing vacancy data from the U.S. Census Bureau, which is used to construct the model.
The detailed method is in the following steps.First, the Jilin1-03 satellite night-time imagery is pre-processed.Then, explanatory variables from the Jilin1-03 satellite night-time image and other auxiliary geospatial data are extracted.Finally, the multivariate linear regression model with the above variables is used, and the accuracy is assessed.A workflow chart is shown below.(Figure 3) For the purpose of HVR estimating at the census tract level, we adopted a stepwise multivariate linear regression model.Information extracted from the Jilin1-03 satellite night-time imagery, digital orthoimages, and landuse datasets is used in the modeling as explanatory variables, including three types of driving factors: the physical environment, landuse structure, and human activity.The true value of the predictive variable is the statistical housing vacancy data from the U.S. Census Bureau, which is used to construct the model.
The detailed method is in the following steps.First, the Jilin1-03 satellite night-time imagery is pre-processed.Then, explanatory variables from the Jilin1-03 satellite night-time image and other auxiliary geospatial data are extracted.Finally, the multivariate linear regression model with the above variables is used, and the accuracy is assessed.A workflow chart is shown below.(Figure 3)

Pre-Processing of the Jilin1-03 Satellite Night-Time Image
The pre-processing of the image includes three steps: the removal of background noises, the reversion from DN value to radiance, and the conversion from RGB images to the grayscale layer.Pixels with all DN values not greater than 1 were regarded as background pixels and were set to 0, while the others were lit areas.According to the document provided by Changguang Satellite Technology Co., Ltd, which includes the spectral respond curve (Figure 4) and the conversion parameters (Table 2) from the DN value to the radiance.The conversion equation between the DN value to the radiance of three bands under the condition of a 5.6× gain and a 100 ms exposure time is where  is the radiance in Watts m −2 sr −1 as the unit, and  and  are corresponding parameters.RGB images (Figure 5(a)) were converted into the radiance value of three bands.Then, we derived a single grayscale layer (Figure 4(b)) representing the brightness, which can reflect the emergence of human activity.

Pre-Processing of the Jilin1-03 Satellite Night-Time Image
The pre-processing of the image includes three steps: the removal of background noises, the reversion from DN value to radiance, and the conversion from RGB images to the grayscale layer.Pixels with all DN values not greater than 1 were regarded as background pixels and were set to 0, while the others were lit areas.According to the document provided by Changguang Satellite Technology Co., Ltd, which includes the spectral respond curve (Figure 4) and the conversion parameters (Table 2) from the DN value to the radiance.The conversion equation between the DN value to the radiance of three bands under the condition of a 5.6× gain and a 100 ms exposure time is where L is the radiance in Watts m −2 sr −1 as the unit, and a and b are corresponding parameters.RGB images (Figure 5a) were converted into the radiance value of three bands.Then, we derived a single grayscale layer (Figure 4b) representing the brightness, which can reflect the emergence of human activity.

Extracting Variables from the Multi-Source Data
Three types of factors-physical environmental factors, landuse structure factors, and human activity factors-were extracted from daytime remote sensing data, geospatial landuse data, and night-time remote sending data, separately.
Physical environmental factors represent the residential physical environmental qualities during the daytime.According to Deng et al. [30], the mean lot NDVI is a consistently effective measurement both at the block group and parcel level that has a significant positive relation to the residential vacancy.Similar findings were reported at the census tract level in formal studies carried out in Detroit, MI [28,29].To attempt to expend the usage of daytime remote sensing data in this case, we calculated the mean value and the standard deviation of NDVI, as well as the reflectance of red and

Extracting Variables from the Multi-Source Data
Three types of factors-physical environmental factors, landuse structure factors, and human activity factors-were extracted from daytime remote sensing data, geospatial landuse data, and night-time remote sending data, separately.
Physical environmental factors represent the residential physical environmental qualities during the daytime.According to Deng et al. [30], the mean lot NDVI is a consistently effective measurement both at the block group and parcel level that has a significant positive relation to the residential vacancy.Similar findings were reported at the census tract level in formal studies carried out in Detroit, MI [28,29].To attempt to expend the usage of daytime remote sensing data in this case, we calculated the mean value and the standard deviation of NDVI, as well as the reflectance of red and

Extracting Variables from the Multi-Source Data
Three types of factors-physical environmental factors, landuse structure factors, and human activity factors-were extracted from daytime remote sensing data, geospatial landuse data, and night-time remote sending data, separately.
Physical environmental factors represent the residential physical environmental qualities during the daytime.According to Deng et al. [30], the mean lot NDVI is a consistently effective measurement both at the block group and parcel level that has a significant positive relation to the residential vacancy.Similar findings were reported at the census tract level in formal studies carried out in Detroit, MI [28,29].To attempt to expend the usage of daytime remote sensing data in this case, we calculated the mean value and the standard deviation of NDVI, as well as the reflectance of red and near infrared bands for all residential areas in each census tract.The NDVI can be expressed as follows: where R and NIR are the reflectance of red and near infrared band.Landuse structure factors were derived from the landuse map by calculating the following variables in each census tract: (1) the area of each landuse class, (2) the total area of each census tract, and (3) the percentage area of residential and the number of residential housing units of each census tract.Variables are involved because we consider the potential impact from the surrounding landuse types on the occupancy status of residential areas.
As urban nocturnal landscape reflects human settlements and activities, the human activity factors were proposed.For each census tract, we calculated the sum of the brightness and lit areas for each landuse type directly from the night-time image.We then obtained the average brightness and the average luminous intensity for each landuse type in a census tract, using the sum of brightness divided by the total area, and the sum of brightness divided by the lit areas, respectively.

Establishing an HVR Regression Model with Multivariate Explanatory Variables
In order to evaluate the performance of the Jilin1-03 satellite night-time data in the HVR model, we first established Model 1 without explanatory variables from the night-time imagery.Physical environmental factors and landuse structure factors were used to construct Multivariate Linear Ordinary Least Square (OLS) Regression Model 1.The processing is at the census tract level.A stepwise regression method was used.The model formula is as follows: where HVR is the housing vacancy rate at the census tract level, the dependent variable of the regression model; E j and S k are the physical environment and the landuse structure variables, respectively; β j and γ k are model coefficients of the physical environment and landuse structure variables, respectively; b is the constant.
For Model 2, human activity factors extracted from the Jilin1-03 satellite night-time imagery were added to Model 1.Therefore, the Model 2 formula with HSR NTL remote sensing data is where HVR is the housing vacancy rate at the census tract level, the dependent variable of the regression model; H i is the newly added human activity variable; α i is the model coefficient of the human activity variable, respectively; b is the constant.Details of all explanatory variables and their data sources are described in Table 3.

Results of the HVR Estimation Regression Model
In this paper, combining the Jilin1-03 satellite and other geospatial data, two multivariate linear regression models at the census tract level are expressed.Model 1 has an adjusted R 2 of 0.51 and an RMSE of 0.055, and Model 2, which has the factors extracted from the Jilin1-03 satellite night-time data, achieves an adjusted R 2 of 0.656 and an RMSE of 0.046.Specially, the predicted R 2 of Model 1 is 0.359, whereas Model 2 is 0.603.Predicted R 2 is calculated by systematically removing each observation from the data set, estimating the regression equation and determining how well the model predicts the removed observation.Larger values of predicted R 2 indicate models of greater predictive ability for new observations [81].
Table 4 shows the analysis result of Model 1, in which seven significant explanatory variables were selected.Three selected physical environmental variables in Model 1 were the mean value, the standard deviation of the NIR band, and the standard deviation of NDVI.Six selected landuse structure variables were the percentage area of residential in a census tract and the sum of parkland area, vacant land area, and industry area.The normalized coefficients suggest that the sum of vacant land area in a census tract has the most significant influence on residential vacancy, followed by the sum of parkland area and the mean value of the NIR band in the residential area.The p-value indicates that the mean value of NIR band, the standard deviation of NDVI, the percentage area of residential in a census tract, and the sum of parkland area and vacant land area have a significant correlation with HVR at a confidence level of 99.9%.The collinearity statistics result shows that there is no significant multicollinearity among all variables in this regression model (i.e., all with a tolerance >0.1 and a VIF < 10).Moreover, a sensitivity analysis, which can evaluate the effect from different ranges of factors on model simulation results was also carried out (see Figure 6).The sum of vacant land area and the sum of parkland area ranked as the top two.Their total ordered sensitivity index values are 0.328 and 0.282, WHICH indicated their greatest impact on the model simulation results.The results of Model 2, which involved the Jilin1-03 satellite night-time data, are shown in Table 5.Ten significant explanatory variables selected in this model include two physical environmental variables (the mean value of NIR band and the standard deviation of NDVI), four landuse structure factors (the percentage area of residential in a census tract and the sum of parkland area, vacant land, and industry area), and four human activity variables (the average brightness of residential area, the sum brightness of residential area and vacant land, and the luminous intensity of commercial area).The normalized coefficients suggest that the luminous intensity of commercial area derived from night-time imagery ranked as the most influential variable, followed by the sum of vacant land area and parkland area derived from the landuse map.The p-value indicates that all variables are significant, above a 99% confidence level.Additionally, the collinearity statistics result shows that there is no significant multicollinearity among all variables in this regression model (i.e., all with a tolerance >0.1 and a VIF <10).Sensitivity analysis for this model (see Figure 7) shows that the luminous intensity of commercial area has the highest total ordered sensitivity index value of 0.465, the second is the sum of vacant land area (0.171), which means that the variation of these two variables has significant impacts on HVR.

Distribution of Estimated HVR
The HVR for each census tract was estimated using Stepwise Regression Model 2 with the Jilin1-03 satellite night-time data (Equation ( 4)).We mapped the spatial pattern of the HVR of Buffalo (see Figure 8(a)).Figure 6 illustrates that the majority of the vacant houses are concentrated on the east side and in the southwest area of the city.
The spatial pattern of error was also mapped to visually examine the distribution of the error between the estimated value and the statistical data from U.S. Census Bureau (see Figure 8(b)).Most overestimations locate at the due east, northwest, and southwest of the city, whereas underestimations take place in the middle and southeast part.Comparing the distribution of estimated HVR, we find that overestimations mainly occur in areas with higher estimates, while underestimations are associated with lower estimates.This underfitting is probably caused by the impact from factors that are not taken into account in our model.Only linear relationships are presented in our models.

Sensitivity Analysis
First Order Total Order

Distribution of Estimated HVR
The HVR for each census tract was estimated using Stepwise Regression Model 2 with the Jilin1-03 satellite night-time data (Equation ( 4)).We mapped the spatial pattern of the HVR of Buffalo (see Figure 8a).Figure 6 illustrates that the majority of the vacant houses are concentrated on the east side and in the southwest area of the city.

Significant Variable Analysis of the Regression Models
Among physical environmental factors derived from orthoimages, two variables were significant in both Model 1 and Model 2. The mean value of the NIR band and the standard deviation of the NDVI are negatively related to residential vacancy.Both of these represent vegetation conditions.The lawns of residential areas in the high-vacancy-rate census tracts are out of maintenance, and thus withered, and spread to impervious surfaces such as sidewalks (Figure 9(b)).As we know, the spectral reflectance of green plants in the near-infrared band is dramatically high.The negative relationship between the mean value of the NIR band and the housing vacancy as is expected.Similarly, vegetation withering leads to the reduction of high NDVI values, and covered impervious surfaces can cause a rise in low NDVI values.Thus, the standard deviation of the NDVI shows a negative correlation with HSR.These results prove that the physical environmental factor at the census-tract scale can express the difference in HVR.Moreover, the mean value of NIR bands and the standard deviation of NDVI can be significant indicators of housing vacancy.The spatial pattern of error was also mapped to visually examine the distribution of the error between the estimated value and the statistical data from U.S. Census Bureau (see Figure 8b).Most overestimations locate at the due east, northwest, and southwest of the city, whereas underestimations take place in the middle and southeast part.Comparing the distribution of estimated HVR, we find that overestimations mainly occur in areas with higher estimates, while underestimations are associated with lower estimates.This underfitting is probably caused by the impact from factors that are not taken into account in our model.Only linear relationships are presented in our models.

Significant Variable Analysis of the Regression Models
Among physical environmental factors derived from orthoimages, two variables were significant in both Model 1 and Model 2. The mean value of the NIR band and the standard deviation of the NDVI are negatively related to residential vacancy.Both of these represent vegetation conditions.The lawns of residential areas in the high-vacancy-rate census tracts are out of maintenance, and thus withered, and spread to impervious surfaces such as sidewalks (Figure 9b).As we know, the spectral reflectance of green plants in the near-infrared band is dramatically high.The negative relationship between the mean value of the NIR band and the housing vacancy as is expected.Similarly, vegetation withering leads to the reduction of high NDVI values, and covered impervious surfaces can cause a rise in low NDVI values.Thus, the standard deviation of the NDVI shows a negative correlation with HSR.These results prove that the physical environmental factor at the census-tract scale can express the difference in HVR.Moreover, the mean value of NIR bands and the standard deviation of NDVI can be significant indicators of housing vacancy.Among landuse structure factors from the landuse map, which can express the social environmental characteristics of residential areas, such as infrastructure construction and the life quality of local residents, three were selected in our model.Deng et al. [30] utilized the distance from residential parcels to facilities such as nearby commercial services to study how the geospatial distribution of different landuse types affects the occupancy status of houses.In this paper, we use only the landuse type distribution within a census tract and the residential parcels characteristics (the parcel areas and housing units), to explore the relationship between landuse structures and housing vacancy.From Tables 4 and 5, the percentage area of residential in a census tract and the sum of parkland area and industry area were negatively correlated with the residential vacancy, whereas the sum of vacant land area shows a positive correlation.These observations can partly be explained by the community attraction.A larger area of parkland and a higher percentage area of residential mean more infrastructures and better living environments.Thus, these communities are more attractive to residents.Fewer houses in this community are likely to be abandoned and left vacant.From another perspective, deindustrialization and industrial and residential suburbanization made industrial and commercial areas decline in downtown Buffalo [9].Thousands of manufacturing jobs were lost, and thousands of people left, as industry moved out.A large amount of properties were then abandoned and became vacant.The same findings were reported by Silverman et al. [8] and Troy et al. [83].
Among the human activity factors extracted from the Jilin1-03 satellite night-time imagery, the average brightness of residential areas, the sum brightness of residential areas and vacant land, and the luminous intensity of commercial areas have significant correlations with HVR.The luminous intensity of commercial areas is the most influential variable based on the normalized coefficients in Table 5, which implies that HVRs are low in developed regions.Generally, large commercial areas are associated with economic development and higher incomes in a region [11].Better housing affordability leads to higher housing prices and fewer vacant houses.The second significant variable is average brightness in residential areas.The higher average brightness in residential areas implies a lower HVR in the census tract.This is in line with our general knowledge that houses where no one lives are dark.The smaller sum of brightness of vacant land can be with an unsafe public security condition, hence the higher HVR in this region.Among landuse structure factors from the landuse map, which can express the social environmental characteristics of residential areas, such as infrastructure construction and the life quality of local residents, three were selected in our model.Deng et al. [30] utilized the distance from residential parcels to facilities such as nearby commercial services to study how the geospatial distribution of different landuse types affects the occupancy status of houses.In this paper, we use only the landuse type distribution within a census tract and the residential parcels characteristics (the parcel areas and housing units), to explore the relationship between landuse structures and housing vacancy.From Tables 4 and 5, the percentage area of residential in a census tract and the sum of parkland area and industry area were negatively correlated with the residential vacancy, whereas the sum of vacant land area shows a positive correlation.These observations can partly be explained by the community attraction.A larger area of parkland and a higher percentage area of residential mean more infrastructures and better living environments.Thus, these communities are more attractive to residents.Fewer houses in this community are likely to be abandoned and left vacant.From another perspective, deindustrialization and industrial and residential suburbanization made industrial and commercial areas decline in downtown Buffalo [9].Thousands of manufacturing jobs were lost, and thousands of people left, as industry moved out.A large amount of properties were then abandoned and became vacant.The same findings were reported by Silverman et al. [8] and Troy et al. [82].

Improvements, Limitations, and Further Implications
Among the human activity factors extracted from the Jilin1-03 satellite night-time imagery, the average brightness of residential areas, the sum brightness of residential areas and vacant land, and the luminous intensity of commercial areas have significant correlations with HVR.The luminous intensity of commercial areas is the most influential variable based on the normalized coefficients in Table 5, which implies that HVRs are low in developed regions.Generally, large commercial areas are associated with economic development and higher incomes in a region [11].Better housing affordability leads to higher housing prices and fewer vacant houses.The second significant variable is average brightness in residential areas.The higher average brightness in residential areas implies a lower HVR in the census tract.This is in line with our general knowledge that houses where no one lives are dark.The smaller sum of brightness of vacant land can be with an unsafe public security condition, hence the higher HVR in this region.

Improvements, Limitations, and Further Implications
Comparing Model 2 to Model 1, the adjusted R 2 increases from 0.510 to 0.656, and the predicted R 2 from 0.359 to 0.603.RMSE decreases from 0.055 to 0.046 due to the participation of the Jilin1-03 satellite night-time data.Although Model 1 has a good fitting result, an adjusted R 2 of 0.510, and its predicted R 2 is 0.359.This indicates a possible over-fitting problem, and the model will not predict new observations as accurately as existing data.Considering both the adjusted R 2 and predicted R 2 , Model 2 is much better than Model 1 in fitting and predicting accuracy.Using the same training data and model complexity, explanatory variables in Model 2 have more significant and reasonable correlations with HVR.Thus, a reduced prediction bias results from adding the human activity factors.
Comparing with previous studies, the result of Model 2 is better than another HVR estimation model at the census tract level [8] and a model using daytime remote sensing data at the block group level [30].As proven in many studies, NTL images can reflect the intensity of human activity at a large scale.Furthermore, night-time lighting information can also represent socio-economic development at the local scale.One significant phenomenon of human activities in Buffalo worth mentioning is population segregation.The African American population is highly clustered in the middle south, while population in the northwest and mid-south of Buffalo have a lower income [83].As Buffalo lost over 70% manufacturing jobs in the 1980s, workers were not able to afford the loan and tax of their houses anymore.Thus, housing unaffordability became a serious social problem, especially for poor people in African American communities [84].A large amount of residents moved out and their houses were taken back by banks [8,10].On the other hand, no residents or housing buyers were available at that time, especially for the houses in lower-level communities associated with social security threats and undeveloped community services.Usually, low-income communities have less infrastructure and worse security conditions that make them unattractive [12].Even though the housing prices are quite low in these communities, there are still many vacant houses that are difficult to sell due to a lack of attraction.Thus, the HVR is an expression of the social-economic development in a region [11] as well as night-time light.The distribution of vacant houses is highly related to the segregation and social injustice in Buffalo.Although there is only a weak relationship between a vacant house and the reflectance of its particular parcel on an NTL image, we can still find some clues from contextual information that can help to estimate the aggregated HVR on a local scale.
Since this is the first time HSR NTL imagery was used in HVR estimation, it is worth pointing out limitations in this study.First, the Jilin1-03 satellite night-time data contain two weaknesses.One is the limitation of light detection ability.Since the low light detection limit is set to 7E-7 Watt/cm 2 /Sr, weak lights are undetectable and useful information can be lost.Moreover, the Jilin1-03 satellite night-time imagery is not free and need to be ordered.Images of Buffalo are not archived, so the data need to be obtained through reprogramming and have a higher price.A scene of Buffalo costs about 20,000 renminbi, which equals to 2870 USD.Thus, purchasing a bunch of Jilin1-03 images for a large-area study or a multitemporal study is costly.In this study, we only used one scene, but housing vacancy is a long-term phenomenon.With a limited budget, this is the best data we could obtain in our study area.Additionally, the HVR in Buffalo is not changing rapidly.Most night-time lights, such as road lamps and parking lot lamps, are consistent during the night.Although a single image is used in this study, we can still extract useful information from it.Second, the framework regression model is simple, and some useful factors may not be included.As a linear model, the limitations of this method can cause a misinterpretation of the practical meaning of the selected variables.Furthermore, it is still a challenge to infer human-activity parameters through remote sensing.Although 54 influential factors were involved in our models, housing vacancy is highly uncertain as the consequence of human activity prediction.Additionally, our method was designed and tested in the urban area in developed countries.In developing countries, a lack of open detailed landuse maps and different driving mechanisms of housing vacancy imply that our method may not adapt to study areas in developing countries.
Future research should seek to (1) involve more variables when inversing HVR, (2) conduct multi-spectral information from the Jilin1-03 satellite data, (3) employ long-term series images of the Jilin1-03 satellite, (4) test more models and develop advanced methods to extract information from NTL images, and (5) expand the employment of the Jilin1-03 NTL data to other social applications.

Conclusions
To our knowledge, this is the first time HSR night-time remote sensing in HVR modeling at a sub-city scale has been performed.As a brand new dataset, Jilin1-03 was explored in a typical urban social application: HVR estimation.In a stepwise multivariate linear regression model, three driving factors derived from Jilin1-03 and other data were analyzed.The results of this study show two conclusions.
First, the HVR estimating model using the Jilin1-03 satellite and other ancillary geospatial data fits well with census statistical data.Data used to extract variables in this study can be easily obtained and updated on time.Independent from ground surveys and statistical data, the established model using remote sensing and geospatial landuse data provides an alternative method for analyzing or even predicting HVRs, especially for the census year that has no recent HVR data or reliable statistical data.
Second, the Jilin1-03 satellite night-time data contribute a 28% fitting accuracy increase to the estimation model and a 68% predicting accuracy increase, by involving night-time data in the model.The luminous intensity of commercial areas derived from the Jilin1-03 satellite was found to be the most influential variable for housing vacancy, through reflecting socio-economic conditions.The landuse structure indirectly and partially demonstrated that the social environment factors have strong correlations with residential vacancy.Moreover, the physical environment depicting vegetation conditions in the residential areas is also found to be a significant indicator of housing vacancy.
Our study is conducive to urban community management and revitalization policy formulation.The emergence of HSR night light data opens a new door to future studies on a microscopic scale.It is highly important to explore the potential of Jilin1-03 satellite HSR NTL remote sensing data.

Figure 2 .
Figure 2. Landuse map of Buffalo derived from parcel data.

Figure 4 .
Figure 4.The RGB relative spectral response of the camera on the Jilin1-03 satellite.

Figure 4 .
Figure 4.The RGB relative spectral response of the camera on the Jilin1-03 satellite.

Figure 5 .
Figure 5.The Jilin1-03 satellite night-time image with (a) RGB bands and the (b) grayscale layer.

Figure 8 .
Figure 8.A distribution map of (a) the estimated HVR in Buffalo; (b) differences between the estimated value from our model and the Census Bureau statistical data.

Figure 8 .
Figure 8.A distribution map of (a) the estimated HVR in Buffalo; (b) differences between the estimated value from our model and the Census Bureau statistical data.

Figure 9 .
Figure 9. (a) An occupied house in Buffalo, NY.(b) A vacant house in Buffalo, NY.Photos were taken in 2016 by Shengyuan Zou.

Figure 9 .
Figure 9. (a) An occupied house in Buffalo, NY.(b) A vacant house in Buffalo, NY.Photos were taken in 2016 by Shengyuan Zou.

Table 3 .
Explanatory variables of the multivariate linear regression models.