Next Article in Journal
Cross-Sensor Quality Assurance for Marine Observatories
Next Article in Special Issue
Mapping the Population Density in Mainland China Using NPP/VIIRS and Points-Of-Interest Data Based on a Random Forests Model
Previous Article in Journal
A Quantitative Analysis of Surface Changes on an Abandoned Forest Road in the Lejowa Valley (Tatra Mountains, Poland)
 
 
Article
Peer-Review Record

Upscaling Household Survey Data Using Remote Sensing to Map Socioeconomic Groups in Kampala, Uganda

Remote Sens. 2020, 12(20), 3468; https://doi.org/10.3390/rs12203468
by Lisa-Marie Hemerijckx 1,2,*, Sam Van Emelen 1, Joachim Rymenants 1, Jac Davis 3, Peter H. Verburg 3, Shuaib Lwasa 4 and Anton Van Rompaey 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4:
Remote Sens. 2020, 12(20), 3468; https://doi.org/10.3390/rs12203468
Submission received: 31 August 2020 / Revised: 18 October 2020 / Accepted: 20 October 2020 / Published: 21 October 2020
(This article belongs to the Special Issue Remote Sensing Application to Population Mapping)

Round 1

Reviewer 1 Report

The authors address a very relevant topic which is nowadays widely discussed. Using data from different sources is currently a hot topic - and especially using big data sources (here: satellite data) with survey data (here: a small household survey) is a hot topic. Hence, I would love to see an appropriate updated.

In order to allow you better understanding my comments: I come from survey statistics and here are my major concerns about the current status of the article.

Major comments:

My biggest concern is about the sample which is inadequately described. To my view, you generated a single stage cluster sample (but use SRS inference! See e.g. (1)). However, the design and its randomness is not appropriately described. It seems to be a non-random relection of SAU where you draw households. If so, you have to describe the possible selection bias. If not, why don't you apply any type of calibration. How are the selected SAUs distributed within all? Next, you seem to ignore non-response, if I understood correct. This is, indeed, the worst you can do. Especially when you talk about income or wealth, almost all household surveys show significant non-response bias. Please comment on this.

With respect to the statistical methods, you use classification methods. However, these are known to be sensitive towards informative (sampling) designs. Did you investigate this? Especially Bayesian methods have difficulties with clustered data. Further, you may encounter the problem of household versus individual information.

I would have loved to see more on your comparison of methods. My experience is different, though it may have to do with data availability. Can you explain on the dashed line in Figure 3? These lines are essential to implement good prediction approaches (which I would prefer, eg small area methods). However, you need access to the census data. Would you have relevant information for prediction here? This question is important in two ways. First, the original problem itself. Second, any appropriate type of benchmarking and validation would be extremely important. I know that you probably suffer from data access problems, which are even more frustrating in Europe.

Minor comments:

Your citation is a little distracting. Sometimes author year, most cases numbering - but not coherent (double citation eg on page 15 is partially wrong). However, the list of references is a little messy.

In general, I would like to see more precise the variables in use for the prediction. They are to my view "hidden in the text".

Summary:

I agree within your conclusion, it is an intuitive approach. Nevertheless, you rely on a small survey, where I have to little information on. And I even fear that the survey data lead to partially wrong conclusions.

Further, I get the impression, you wanted to have an example to show methods you find fency. In your title, you use the word "extrapolation". Hence, I expect some real details and methodological comparisons on this. But, if so, you should also explore better some benchmarking on (political) relevant questions.

So, please update while sharpening the story and taking more care about the (survey) data (the rs data are much more convincing). And I hope to see a nice update.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

Very interesting and pragmatic research. The importance, design, and methods are very clearly explained. 

A few thoughts/recommendations:

  • You explain that you chose four clusters to have consistency with Vermeiren et al study, but I am wondering how the results might be different with a more quantitatively-oriented method of choosing the cluster numbers. Could there be a better, more descriptive solution? I am also wondering this, in part, because I don't completely see the value in the comparison with this prior study (you mention that the comparison study was focused primarily on income).
  • Did you consider any dimension reduction techniques (like PCA) that would have removed redundancy in your survey data? 
  • You mention that the seeming incongruence in "new middle" (relatively higher income but low living standard) could be explained by higher levels of remittances. But it doesn't appear that you asked respondents about remittances. Is that a hypothesis?

Congratulations!

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

  1. The article focuses on the interesting topic of using remote sensing data to derive census information. This is an area of remote sensing which really lacks attention. However, Unfortunately, the authors have not been able to adopt this concept thoroughly and properly. Actually, the authors have mixed various concepts and arguments in this article.
  2. With regard to the application of remote sensing techniques, this article does not deliver any contribution to the remote sensing community.
  3. The authors claim that they propose a “simple method to extrapolate household survey responses to a larger study area”. However, I cannot see any evidence of this method. Authors just overlay survey data on classified images. The accuracy assessment is rather poor which makes the result unreliable. This concept should be built on a strong and advanced statistical analysis. Alternatively, the authors could have compared multiple methods.
  4. You mention “that several studies rely on visual, manual classification methods” as the weakness of previous articles. In this article, you again repeated that weakness.
  5. In the literature review section, you need to critically review the articles. Here you just list those articles.
  6. The argument about selecting MLC method is not correct. Authors simply could use stronger classification techniques such SVC and decision tree.
  7. Line 9: broad-scale analyses mask local socioeconomic inequalities. Does not make sense!
  8. The heading of section 2.6 is not clear.
  9. Please clearly specify the objectives of your study. I can see a hypothesis that has been discussed in various articles.
  10. The sentences are too long which makes it difficult for the readers to follow.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 4 Report

The paper presents an interesting methodological approach to associate spatial-material properties with social characteristics. The conclusions derived from this approach have been presented convincingly, taking limitations adequately into account.

The following questions and comments should be considered to improve the paper:

Table 2: it is not clear and seems not convincing to double the meaning of the notion of “variable”. For example, the variable “water source” is subdivided into 13 variables; the latter should be phrased characteristics or observation but not variable (which seems to be erroneous statistically).

The four land use classes should be described and explained in more detail. The differentiation along “permanent / semi-permanent” and “size” are fuzzy concepts and need more clarification, both theoretically and methodologically (how can permanent and semi-permanent dwellings or semi-permanent and slum dwellings be distinguished by using remote sensing techniques?).

Lines 257-259: the statement made here is not clear to me. The study uses two methods, remote sensing and household survey. While the first method is dedicated to classifying spatial characteristics, the latter is used to correlate these classified households (or groups of them) with socioeconomic data. How can, then, “a household [be] classified as high income”? This would also be coherent in Figure 3.

Is the use of t-values common with the k-prototype method? To my understanding, t-values are used to evaluate the quality of a sample concerning the population.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

I fully understand it is about geosciences and not statistics and that we stem from different areas. Nevertheless, applying statistical methods urges the needs of appropriateness and correctness.

 

3 comments (remain or are updated):

1: In 2.2, you start with a formula. This gives the impression you take care of representativity. Though I understand your reasons for the lack of probability sampling, this can be taken only as guidance. And knowing the difficulties, you correctly phrase, I would have taken that into account when constructing samples (even under your circumstances). Hence, I suggest to still better clarify that you have taken something close to a convenience sample while looking roughly what should have been done. This certainly has an impact on the results, but surely not on the applicability as the problem itself.

2: How you treat non-response and how you argue on it is unacceptable. Once you observe this during the data gathering it is already a nightmare. Omitting observations is the worst you can do. There are many R packages that also include tests for missingness (or missing completely at random). And many of these packages also include multiple imputations (eg MICE). Even if one uses only standard settings, one might be interested in understanding whether the estimates under imputation differ from what you do.

3: I understand many constraints for cutting sample sizes. Nevertheless, 543 is still low. And the argument others do this as well (and none from survey stats) is not strong. You use sophisticated methods, and measures of uncertainty nowadays are a must in the statistics literature (at least a resampling variance estimator - though we are back to your sampling routine then).

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Thank you for addressing the comments. I can see the quality of the manuscript has been improved significantly. However,  I still have major comments about a couple of methodological aspects of this article.

  1. Eq 1 - It seems that the authors have not used this Eq properly. Let’s assume that this equation and relevant variables are correct. If so, in case that other scholars want to replicate this study on another city even with a double population then we again need to consider the sample size of 543 households. Please clarify the issue.
  2. Eq 2- P represents the (estimated) proportion of the population rather than a standard deviation. The issue is that this Eq has multiple versions
  3. Please provide references for the use of this Eq in similar contexts.
  4. The other problem with the sample size is that you had not estimated a suitable sample size per target category. You just estimated the total sample size.
  5. Please explain how did you determine the number of training samples for classification?
  6. It seems in your study you modeled everything. Even the training samples for satellite images were not derived from the Landsat images. However, you used other images that overlaying those samples on the Landsat images could cause hidden errors which are not possible to be detected by classification accuracy.
  7. Please detail Figure 3, it is very general and does not deliver useful information.
  8. Did the author check the accuracy of modeled maps (Figure 6) or just created those? Please clarify in the manuscript.
  9. Overall, this manuscript could have written in a much simpler structure. A straightforward process but unfortunately written in a complex manner.
  10. I believe this study needs a couple of paragraphs about its limitations.

 

    

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop