Next Article in Journal
Azimuth Multichannel Reconstruction for Moving Targets in Geosynchronous Spaceborne–Airborne Bistatic SAR
Next Article in Special Issue
Deep-Learning-Based Classification of Point Clouds for Bridge Inspection
Previous Article in Journal
Can We Use Satellite-Based Soil-Moisture Products at High Resolution to Investigate Land-Use Differences and Land–Atmosphere Interactions? A Case Study in the Savanna
 
 
Article
Peer-Review Record

Super-Resolution-Based Snake Model—An Unsupervised Method for Large-Scale Building Extraction Using Airborne LiDAR Data and Optical Image

Remote Sens. 2020, 12(11), 1702; https://doi.org/10.3390/rs12111702
by Thanh Huy Nguyen 1,2,*, Sylvie Daniel 2, Didier Guériot 1, Christophe Sintès 1 and Jean-Marc Le Caillec 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Remote Sens. 2020, 12(11), 1702; https://doi.org/10.3390/rs12111702
Submission received: 18 April 2020 / Revised: 20 May 2020 / Accepted: 22 May 2020 / Published: 26 May 2020
(This article belongs to the Special Issue 3D City Modelling and Change Detection Using Remote Sensing Data)

Round 1

Reviewer 1 Report

The authors have proposed an automatic and unsupervised building extraction method. The presented results look promising and have been compared to several existing approaches in this field. Therefore, I recommend accepting this paper in its present form.

Author Response

We appreciate very much the careful revision and the positive comments by the reviewer. Such comments motivate us as they acknowledge the efforts made by the members of this research project.

Reviewer 2 Report

This is a very interesting paper, very well written, organized, presenting accuracy tests to the algorithm that seem very good and real case results. Nevertheless I have some remarks, that I would like the authors to explain. Comments in the PDF file locate the places that raised doubts additionally to the following.

1- From reading the paper, I understood the following: You have a LiDAR point cloud, make an interpolation to densify the height information, which you call SR. You obtain a z-image. Is this a surface? Is it an orthogonal projection like an orthophoto, but with height values instead of true colors? Is this a central projection like a photograph with distances to the projection centre instead of true colors? What is it? If it is an orthogonal projection, why the 3D projection from point cloud to z-image? If it is a central projection, why create this when the point cloud has already terrain coordinates and you can build a grid with the cell size you want and fill the gaps through interpolation?

2- The 3D projection with the camera pose parameters coming from the registration: which camera? Again, are you working with aerial photos (central projection) or orthophotos (orthogonal projection)? What do you call registration of the optical image? Camera pose parameters, or exterior orientation parameters are determined during the flight (with GNSS/IMU), by spatial resection (for one photo) or by aerial triangulation (bundle adjustment of a block of photos). These are not determined for orthophotos. Only a similarity plane transformation, normally.

3- The initialization of the snakes could be shortly explained instead of only pointing to a self-reference. If the initialization is done in the LiDAR point cloud and the snakes evolve in the z-image, why do you have to introduce optical images in the system? If you use them for removing vegetation from the z-image, why do you not explain how you did it? The z-images you show in figure 4 have still trees. I can't see a differentiation according to pixels being on a tree or not in your snake evolution either.

4- Building masks: how are these done? I presume they are done from the initial building points, which you don't show how far they are from the final shape. If they are too far ( for instance as you show in figure 15 a), the building mask will also be too far from the building shape and in the final steps snake points can be mislead evolving inwards when they should still be evolving outwards. If the building masks are very near the final shape, why not taking their contour as the final building contour? 

5- A fundamental question that should be mentioned at the beginning and be addressed in the Conclusions: the objetive of the extracted information. Operators are developped with an objetive. The extracted buildings, for instance in Quebec City, will be used for what? Are they good enough for the objective? What was expected? To know the amount of buildings? To derive the built area? Or to map the buildings? If the presented method for building extraction is accurate or not depends also on the objetive. An application is mentioned, flood risk assessment, but this is very vague. Assessment in terms of how many houses can be affected, or flood calculations where you need to have the buildings accurately mapped to know how the water is deviated in an urban area?

These are the questions that came to my mind as I analysed your paper, which I stress, is a good quality paper. Congratulations for that, and I hope you can address my comments to make it even better and self contained.

Comments for author File: Comments.pdf

Author Response

We appreciate the careful revision and the invaluable comments by the reviewer. We found that they were very helpful and they enabled us to make the paper better and more self-contained.

In the attachment, you will find our responses to the reviewer's comments. They are followed by the revised manuscript with the changes highlighted in blue.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

I thank Your answers and the coorections made.

A minor remark I would like the authors to correct: line 178 to 182 of the revised version. 'Even if the images were orthorectified, or if the external orientation parameters were provided by a GPS/IMU system, such a misalignment cannot be neglected [49,50]. This step is especially necessary when the two datasets are not acquired simultaneously. In the case of orthorectified images, the 3-D projection is a Direct Linear Transformation (DLT) [51], or an orthographic projection if the involved misalignment is not significant.' 

First of all, Your images are already orthorectified and georeferenced. What You are telling here is the way to colour the point cloud from an aerial image. It has nothing to do with orthophotos and LiDAR, and a DLT is certainly not used to align orthophotos with LiDAR point clouds. DLT is also a a central projection transformation. Here, there is a misalignment, but that's all. You just have to make a plane transformation (2D-Helmert or at most, 2D-Affin). Please do not mix translations and rotations with projections. I would suggest to remove these lines from your text (178 after the point -182). The corrections you have made writing 'transformation parameters' throughout the paper are fine.

Another point: vegetation removal. You do say at line 173, that you remove the vegetation based on NDVI from an optical image. Now You say You do not remove vegetation because the removal causes low energy pixels misleading the snakes. I understand your approach, but You shouldn't contradict yourself. Please find a way to explain this contradiction. If you don't remove the vegetation why using optical images at all?

After you correct this minor remarks, I think the paper can be published, and I will be delighted to add it to my 'Interesting Papers' directory.

Author Response

Thank you for the careful and invaluable revisions.

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop