# Population Estimates from Orbital Data of Medium Spatial Resolution: Applications for a Brazilian Municipality

## Abstract

## 1. Introduction

## 2. Materials and Methods

^{2}in 2010, and an annual growth rate of 1.15%. It is a metropolitan municipality with very peculiar occupation characteristics that justify its selection. Contagem is characterized by the presence of extensive vertical areas with high population density, as well as neighborhoods with less verticalization and lower density, in addition to an important industrial complex.

#### 2.1. Dasymetric Mapping

#### 2.2. Methods by Zones and Pixels

#### 2.2.1. Method by Zones

^{2}= 0.84 and the linear correlation coefficient was equal to 0.92 between the estimated and actual values of the population density, while the median error was 17.4% in the training sample and 18.4% in the external validation. On the other hand, the urban population obtained errors of 1% and -3%, to the training sample and external validation, respectively (application to the Geelong district).

#### 2.2.2. Method by Pixels

^{2}) or the mean square of the residuals, for example. The iterations would end when one of these measures had no changes considered relevant from one iteration to another [43,44,45].

#### 2.3. Data Structuring

#### 2.3.1. Estimates Based on 2000 Data

#### 2.3.2. Estimates Based on 2010 Data

#### 2.4. Dependent and Explanatory Variables

- Model 1: dependent variable = density; explanatory variables: bands 1 to 5 and 7;
- Model 2: dependent variable = density logarithm; explanatory variables: bands 1 to 5 and 7;
- Model 3: dependent variable = density logarithm; explanatory variables: bands 1 to 5 and 7 and percentage of occupied area;
- Model 4: dependent variable = density logarithm; explanatory variables: bands 1 to 5 and 7 and percentage of occupied area and subnormal agglomerate (slums);
- Model 5: dependent variable = density logarithm; explanatory variables: bands 1, 4, 5 and 7, percentage of occupied area and subnormal agglomerate (slums);
- Model 6: dependent variable = density logarithm; explanatory variables: bands 1, 4 and 7, percentage of occupied area and subnormal agglomerate (slums);
- Model 7: dependent variable = density logarithm; explanatory variables: bands 1, 4 and 7, percentage of occupied area and subnormal agglomerate (slums) and three interaction variables between bands 1, 4 and 7 and subnormal agglomerate(slums).

- Model 1: dependent variable = population; explanatory variables: bands 1 to 5 and 7;
- Model 2: dependent variable = population; explanatory variables: bands 1 to 5 and 7 and subnormal agglomerate (slums);
- Model 3: dependent variable = population; explanatory variables: bands 1 to 5 and 7, subnormal agglomerate (slums) and situation (urban / rural);
- Model 4: dependent variable = population; explanatory variables: bands 1, 4, 5 and 7, subnormal agglomerate (slums) and situation (urban / rural);
- Model 5: dependent variable = population; explanatory variables: bands 1, 4 and 7, subnormal agglomerate (slums) and situation (urban / rural);
- Model 6: dependent variable = population; explanatory variables: bands 1, 4 and 7, subnormal agglomerate (slums), situation (urban / rural) and three interaction variables between bands 1, 4 and 7 and subnormal agglomerate(slums).

#### 2.5. Error Measures

^{2}

_{back}and the total error. The median relative error corresponded to the median of the absolute values of the relative errors, observed for each census tract (in the applications of [48]) and pixel (in the case of [49]). The second measure was R

^{2}

_{back}, which corresponded to the square of the linear correlation coefficient between population estimates for tracts (or pixels) and the actual population values of those tracts (or pixels). The closer to one (1), the greater the fit of the model. As it is calculated at the sector level, R

^{2}

_{back}is more connected to the median relative error than to the total error, and therefore tends to have low values. The third measure was the total relative error, which represented the variation of the estimated total in relation to the total observed for the set of tracts (or pixels), that is, for the municipality as a whole [48,49].

## 3. Results

#### 3.1. Models and Estimates at the Level of Census Tracts

^{2}

_{back}was relatively weak for all models, either in 2000 or 2010. It is important to remember that, because it is calculated at the sector level, R

^{2}

_{back}is more related to the average error on census tracts than to total error. In the total error (macro level), census tracts in which the population was overestimated were compensated for by census tracts in which the population was underestimated. As the total error works with the sum of the estimated populations, the individual errors of the census tracts end up being diluted. R

^{2}

_{back}does not have this chance of "error compensation" because the calculation "sees" the individual values of the census tracts. Regarding the median relative error (MRE) of the census tracts, this study observed little difference between models (ranging between 0.321 and 0.399 in 2000, and 0.292 and 0.386 in 2010).

#### 3.2. Models and Estimates at the Level of Pixels

^{2}

_{back}showed relatively low values. This refers to the weak relation between population and the explanatory variables at the level of the pixels. This is corroborated by high MREs at the pixel level. However, when analyzing the internal validation of the models, the estimates show very low total errors, lower than 2% in all models in the three databases under analysis, which can be considered an excellent result.

^{2}observed in model 2 (0.184), followed by models 6 (0.146) and 4 (0.126), while the lowest MRE was detected in model 4 (0.314). The lowest total error was also noticed in model 4 (0.02%), below even the results found by [44], of −0.06%. This is considered an excellent result. Therefore, model 4 was chosen for the calculation of estimates from the data of census tract 2000. It should be noted that the incorporation of the variable subnormal agglomerate (model 2) greatly improved the total estimated error, while the incorporation of the variable urban/rural situation (model 3) did not improve the calculated estimate. On the other hand, model 6 was the one that obtained the best results among those that use data from the 2010 census tracts. With R

^{2}

_{back}of 0.269, model 6 had the lowest MRE (0.280) and the lowest total error (−0.40%). It is worth noting that the incorporation of the variables subnormal agglomerate (model 2) and urban/rural situation (model 3) did not improve the total error, when compared to the model that used only the reflectance of bands 1 to 5 and 7 (model 1).

^{2}

_{back}, 0.318 of MRE, and 0.4% of total error (being chosen to calculate the estimate). Neither the incorporation of the variables subnormal agglomerate (model 2) and urban/rural situation (model 3) nor the use of data from the census tracts of 2010 improved the total error, when compared to the model that used only the reflectance of bands 1 to 5 and 7 (model 1). The results show that the models based on pixels fit better when compared to models at the level of the census tracts, especially in relation to the internal validation. Total errors were found to be less than 0.5% in the models at the level of pixels, against 6.29% and -1.5% in the models at the level of the census tracts for the 2000 and 2010 databases.

## 4. Discussion

#### 4.1. About Internal Validation

#### 4.2. About External Validation

## 5. Conclusions

**Figure 1.**RGB345 composition of the Landsat ETM+ satellite imagery in 2000 (left) and 2015 (right) - municipality of Contagem.

**Figure 2.**Evolution of the occupied areas (2000, 2010 and 2015) from the classification of Landsat ETM+ satellite imagery - municipality of Contagem.

**Figure 3.**Dasymetric mapping through Landsat 7 ETM+ satellite imagery pixels. Data from the census tracts of the 2010 census (left) and the statistical grids cells of 2010 (right) – municipality of Contagem.

Characteristics | Sensor Parameters |
---|---|

Spectral bands (μm) | Band 1 - 0.45 a 0.52 |

Band 2 - 0.53 a 0.61 | |

Band 3 - 0.63 a 0.69 | |

Band 4 - 0.78 a 0.90 | |

Band 5 - 1.55 a 1.75 | |

Band 6 - 10.4 a 12.50 | |

Band 7 - 2.09 a 2.35 | |

Band 8 - 0.52 a 0.90 | |

Spatial resolution | 15 meters (panchromatic band) |

30 meters (bands 1 to 5 and 7) | |

60 meters (band 6) | |

Radiometric resolution | 8 bits (256 gray levels) |

Size of the scenes | 170 km (north-south)/183 km (east-west) |

Abbreviation | Variables | Source |
---|---|---|

Dens00 | Density of the Census tract in 2000 | IBGE |

Dens10 | Density of the Census tract in 2010 | IBGE |

Log(Dens)00 | Logarithm of the density of the census tract in 2000 | IBGE |

Log(Dens)10 | Logarithm of the density of the census tract in 2010 | IBGE |

TM1 | Average reflectance in the band 1 (Landsat 7 ETM+) | USGS |

TM2 | Average reflectance in the band 2 (Landsat 7 ETM+) | USGS |

TM3 | Average reflectance in the band 3 (Landsat 7 ETM+) | USGS |

TM4 | Average reflectance in the band 4 (Landsat 7 ETM+) | USGS |

TM5 | Average reflectance in the band 5 (Landsat 7 ETM+) | USGS |

TM7 | Average reflectance in the band 7 (Landsat 7 ETM+) | USGS |

AreaOcup00 | Percentage of occupied area in 2000 | USGS/IBGE |

AreaOcup10 | Percentage of occupied area in 2010 | USGS/IBGE |

AgSub00 | Subnormal agglomerate in 2000 | IBGE |

AgSub10 | Subnormal agglomerate in 2010 | IBGE |

Abbreviation | Variables | Source |
---|---|---|

Pop00pixel(census tract) | Population of the pixel in 2000 | IBGE |

Pop10pixel(cens tract or grid) | Population of the pixel in 2010 | IBGE |

TM1 | Reflectance of the pixel in the band 1 (Landsat 7 ETM+) | USGS |

TM2 | Reflectance of the pixel in the band 2 (Landsat 7 ETM+) | USGS |

TM3 | Reflectance of the pixel in the band 3 (Landsat 7 ETM+) | USGS |

TM4 | Reflectance of the pixel in the band 4 (Landsat 7 ETM+) | USGS |

TM5 | Reflectance of the pixel in the band 5 (Landsat 7 ETM+) | USGS |

TM7 | Reflectance of the pixel in the band 7 (Landsat 7 ETM+) | USGS |

Urban00 | Situation (Urban / Rural) in 2000 | IBGE |

Urban10 | Situation (Urban / Rural) in 2010 | IBGE |

AgSub00 | Subnormal agglomerate in 2000 | IBGE |

AgSub10 | Subnormal agglomerate in 2010 | IBGE |

**Table 4.**Internal validation of models at the census tract level (2000 and 2010) - municipality of Contagem.

Indicators | Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | Model 7 | |
---|---|---|---|---|---|---|---|---|

2000 (sector) | R² (back) | 0.061 | 0.032 | 0.232 | 0.323 | 0.322 | 0.341 * | 0.341 |

Median relative error | 0.361 | 0.399 | 0.346 | 0.349 | 0.330 | 0.340 * | 0.321 | |

Total error (%) | 115.7 | 44.40 | 10.52 | 6.56 | 6.81 | 6.29 * | 6.18 | |

2010 (sector) | R² (back) | 0.262 | 0.179 | 0.281 | 0.322 | 0.335 | 0.341 * | 0.341 |

Median relative error | 0.316 | 0.386 | 0.305 | 0.308 | 0.292 | 0.300 * | 0.298 | |

Total error (%) | 25.82 | 17.42 | −0.26 | −1.82 | −1.76 | −1.50 * | −1.52 |

**Table 5.**External validation: Percentage differences of the estimates via remote sensing for the year 2010 (based on a model created in 2000, at the sector level) and the official estimates of 2010, in relation to the population of the 2010 census - municipality of Contagem.

Municipality | Abs/Diff (%) | Census 2010 | Pop of 2010 (Projection IBGE 2013) | Pop de 2010 (Projection IBGE 2008) | Model of 2000 - Census Tracts |
---|---|---|---|---|---|

Contagem | Absolute | 603,442 | 630,352 | 633,361 | 670,287 |

Difference (%) | 0.0 | 4.5 | 5.0 | 11.1 |

**Table 6.**Comparative analysis: Percentage differences of the estimates via remote sensing (based on models created in 2000 and 2010, at the tract level) and demographic estimates, for the year 2015, in relation to the IBGE's post-census estimates - municipality of Contagem.

Municipality | Abs/Diff (%) | IBGE Post-Census Estimate (2015) | Simple Extrapolation | Vital Rates | Census Ratio | Model of 2000 - Census Tracts | Model of 2010 - Census Tracts |
---|---|---|---|---|---|---|---|

Contagem | Absolute | 648,766 | 636,155 | 666,439 | 622,170 | 701,389 | 691,204 |

Difference (%) | 0.0 | −1.9 | 2.7 | −4.1 | 8.1 | 6.5 |

**Table 7.**Internal validation of models at the pixel level (2000 and 2010) - municipality of Contagem.

Indicators | Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | |
---|---|---|---|---|---|---|---|

2000 (sector) | R² (back) | 0.077 | 0.184 | 0.110 | 0.126 * | 0.110 | 0.146 |

Median relative error | 0.353 | 0.315 | 0.317 | 0.314 * | 0.316 | 0.318 | |

Total error (%) | 1.73 | 0.09 | 1.61 | 0.02 * | 1.67 | 0.84 | |

2010 (sector) | R² (back) | 0.144 | 0.186 | 0.154 | 0.171 | 0.170 | 0.269 * |

Median relative error | 0.312 | 0.293 | 0.302 | 0.282 | 0.281 | 0.280 * | |

Total error (%) | 0.15 | 0.68 | 1.85 | 1.28 | 1.17 | −0.40 * | |

2010 (statistical grid) | R² (back) | 0.099 | 0.127 | 0.136 | 0.110 | 0.170 * | 0.141 |

Median relative error | 0.330 | 0.323 | 0.328 | 0.315 | 0.318 * | 0.319 | |

Total error (%) | -0.28 | −1.19 | −1.16 | −1.22 | 0.40 * | −1.34 |

**Table 8.**External validation: Percentage differences of the estimates via remote sensing for the year 2010 (based on a model created in 2000, at the pixel level) and the official estimates of 2010, in relation to the population of the 2010 census - municipality of Contagem.

Municipality | Abs/Diff (%) | Census 2010 | Pop of 2010 (Projection IBGE 2013) | Pop of 2010 (Projection IBGE 2008) | Model of 2000 - Pixels |
---|---|---|---|---|---|

Contagem | Absolute | 603,442 | 630,352 | 633,361 | 599,371 |

Difference (%) | 0.0 | 4.5 | 5.0 | −0.7 |

**Table 9.**Comparative analysis: Percentage differences of the estimates via remote sensing (based on models created in 2000 and 2010, at the pixel level) and demographic estimates, for the year 2015, in relation to the IBGE's post-census estimates - municipality of Contagem.

Municipality | Abs/Diff (%) | IBGE Post-Census Estimate (2015) | Simple Extrapolation | Vital Rates | Census Ratio | Model of 2000 - Pixels (From Census) | Model of 2010 - Pixels (From Census) | Model of 2010 - Pixels (From Statistical Grids) |
---|---|---|---|---|---|---|---|---|

Contagem | Absolute | 648,766 | 636,155 | 666,439 | 622,170 | 701,199 | 664,205 | 683,835 |

Difference (%) | 0.0 | −1.9 | 2.7 | −4.1 | 8.1 | 2.4 | 5.4 |

