3D Radiometric Thermography Mosaics with Low-Cost Mobile Sensor Stack
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe manuscript presents a workflow for integrating RGB imagery, LiDAR-derived depth maps, and thermal infrared (TIR) data acquired from a low-cost mobile sensor stack into a unified 3D thermographic reconstruction. The topic is timely and relevant, particularly in the context of low-cost sensing and multi-modal data fusion for applications in the built environment and cultural heritage. Overall, the paper is well structured and clearly motivated, and it shows a commendable effort toward developing an open and reproducible workflow. The integration of radiometric thermal data into a 3D framework represents a potentially valuable contribution. At the same time, some aspects of the methodology, especially those related to co-registration and validation, would benefit from further clarification and strengthening in order to fully meet the expected level of rigor for Remote Sensing.
For this reason, I recommend Major Revision, with the aim of improving methodological clarity and robustness.
Co-Registration Methodology
The co-registration of RGB, depth, and thermal data represents the core of the manuscript, and the authors correctly recognize the limitations of simple alignment approaches based on translation and scaling. The introduction of a stereo calibration procedure using a custom thermal-RGB target is a strong and appropriate methodological choice, and the reported reprojection error suggests that the calibration performs reasonably well in image space.
However, the description of the calibration framework would benefit from additional detail. In particular, it would be useful to clarify how intrinsic parameters are modeled for both sensors, including lens distortion, and how reliable feature correspondences are established across modalities with very different spectral characteristics. A brief discussion on the stability of the calibration (for example with respect to acquisition conditions or sensor mounting) would also improve reproducibility.
More importantly, the quantitative validation of the co-registration remains somewhat limited. The current assessment relies mainly on visual inspection and a small set of manually selected points. While these results appear encouraging, a more structured validation, such as the use of independent check points or basic error statistics in object space, would significantly strengthen the conclusions.
In this context, it would also be beneficial to relate the proposed approach to existing multi-sensor co-registration frameworks developed in the UAV domain. Previous studies by Masiero and Cortesi on the integration of thermal, multispectral, and RGB imagery acquired from drone platforms have highlighted the importance of rigorous geometric calibration, careful modeling of extrinsic parameters, and the use of dedicated calibration targets to achieve reliable alignment across modalities. These works emphasize that accurate co-registration cannot rely solely on image-based transformations, but requires a comprehensive characterization of sensor geometry and acquisition conditions. Positioning the present methodology with respect to these established approaches would help to better clarify both its novelty and its current limitations.
The manuscript would also benefit from a clearer discussion of how errors propagate across the multi-sensor processing chain. The workflow involves several sequential steps, each introducing potential sources of uncertainty. Even a qualitative reflection on how these errors may accumulate would help contextualize the reported accuracy and provide a more complete understanding of the system’s performance.
The strategy of reconstructing geometry from RGB images and subsequently replacing them with thermal data is appropriate and consistent with established practices. Nevertheless, its effectiveness depends directly on the precision of the RGB–TIR alignment. Since some residual misalignments are observed, particularly near image edges, it would be useful to briefly quantify these effects and discuss their impact on the final thermographic mosaics.
Finally, the integration of LiDAR-derived depth maps appears to be a useful addition, especially for reconstructing feature-poor surfaces. The current description is largely empirical, and a slightly more explicit explanation of how these data are integrated, along with a simple comparison of results with and without depth information, would make this aspect of the work more generalizable.
Thermographic Aspects
The manuscript provides a useful overview of thermographic acquisition and interpretation, but some aspects could be refined to improve physical clarity. In particular, it would be beneficial to distinguish more explicitly between infrared radiance and temperature, and to more clearly frame the role of emissivity, reflected temperature, and environmental conditions in radiometric measurements.
The radiometric processing pipeline described in the manuscript is interesting and appears to be carefully implemented. Including a compact mathematical formulation, even in simplified form, would improve transparency and reproducibility, especially for readers interested in replicating the approach.
Validation Strategy
The authors appropriately describe their validation as preliminary, and the reported results are encouraging. To further strengthen the work, additional validation strategies could be considered. In particular, the use of contact temperature sensors, such as thermocouples or calibrated surface probes placed at selected locations, could provide independent reference measurements. This would allow a direct comparison between radiometric estimates and physically measured temperatures, thereby improving the reliability of the thermal component of the workflow.
Depth Sensors and Data Sources
The manuscript combines multiple depth sensing approaches, including LiDAR and other iPhone-based systems. While this is an interesting aspect of the work, a clearer distinction between the different sensing principles, resolutions, and expected accuracies would improve readability and help the reader better understand the role of each data source within the pipeline.
Language and Terminology
The manuscript is generally clear and readable. A few expressions could be slightly refined to align more closely with formal scientific language, for example by replacing informal terms such as “black-box AI” with “proprietary processing pipeline.” Similarly, terms such as “survey-grade” or “millimeter-level accuracy” could be more carefully defined or slightly moderated to better reflect the current level of validation.
Final Remarks
The manuscript presents an interesting and promising approach to low-cost multi-sensor integration for 3D thermography. With a more structured validation of the co-registration process, a clearer description of the calibration methodology, and minor refinements in the thermographic formulation and terminology, the work could represent a valuable contribution to the field.
Author Response
Response to Reviewer 1 Comments
Comment 1:
“However, the description of the calibration framework would benefit from additional detail. In particular, it would be useful to clarify how intrinsic parameters are modeled for both sensors, including lens distortion, and how reliable feature correspondences are established across modalities with very different spectral characteristics. A brief discussion on the stability of the calibration (for example with respect to acquisition conditions or sensor mounting) would also improve reproducibility.”
Response 1:
We’ve now added some additional description of the camera formulation used, as well as the model used for radial and tangential distortion, with some additional discussion on the errors introduced by the standard existing purely-affine transformed method (lines 436-459). Additionally, some more information has been provided regarding the optimization process for determining distortion coefficients and stereo parameters (lines 482-496).
“This improved target design allowed for fitting of a stereo calibration model with an overall mean reprojection error of 0.49 pixels across the 51-image calibration set. The intrinsic matrices and distortion coefficients for each camera are estimated using Zhang's planar calibration method \cite{zhang2000flexible}, in which multiple views of a planar target with known feature geometry yield a homography $H_i$ for each view relating target-plane coordinates to image coordinates. Each homography imposes constraints on the image of the conic $\omega = K^{-T} K^{-1}$ and stacking constraints from $N \geq 3$ views allows linear recovery of $K$, from which per-view extrinsics $(R_i, t_i)$ follow. Distortion coefficients $(k_1, k_2, k_3, p_1, p_2)$ and refined intrinsics are then obtained by nonlinear least-squares minimization of the total reprojection error. Multiview stereo calibration extends this by jointly optimizing both cameras' parameters together with a rigid relative pose $(R_{rel}, t_{rel})$, enforcing $X_{th}=R_{rel}X_{rgb}+t_{rel}$ across all views.”
“With this calibrated model, stereo rectification computes a pair of homographies that warp both images onto a common image plane, undistorting them via the estimated distortion coefficients and aligning corresponding points along horizontal scanlines for direct pixel-to-pixel matching between modalities (Figure \ref{rectified}).”
Comment 2:
“More importantly, the quantitative validation of the co-registration remains somewhat limited. The current assessment relies mainly on visual inspection and a small set of manually selected points. While these results appear encouraging, a more structured validation, such as the use of independent check points or basic error statistics in object space, would significantly strengthen the conclusions.”
Response 2:
We agree with this, and see great potential for future work in integrating thermal-visual control points for process validation. Potential approaches have been added to the discussion section, lines 669 - 679.
Comment 3:
“In this context, it would also be beneficial to relate the proposed approach to existing multi-sensor co-registration frameworks developed in the UAV domain. Previous studies by Masiero and Cortesi on the integration of thermal, multispectral, and RGB imagery acquired from drone platforms have highlighted the importance of rigorous geometric calibration, careful modeling of extrinsic parameters, and the use of dedicated calibration targets to achieve reliable alignment across modalities. These works emphasize that accurate co-registration cannot rely solely on image-based transformations, but requires a comprehensive characterization of sensor geometry and acquisition conditions. Positioning the present methodology with respect to these established approaches would help to better clarify both its novelty and its current limitations.”
Response 3:
Thank you for the suggestion, we have included this paper, and others within this domain, to our related work section, and relevant to our use case on the pulpit, which employed sensors which were not well constrained.
(lines 272-278)
Comment 4:
“The manuscript would also benefit from a clearer discussion of how errors propagate across the multi-sensor processing chain. The workflow involves several sequential steps, each introducing potential sources of uncertainty. Even a qualitative reflection on how these errors may accumulate would help contextualize the reported accuracy and provide a more complete understanding of the system’s performance.”
Response 4:
We’ve now added a discussion on potential sources of error along the pipeline on lines 680-693. The errors for radiometric corrections and the succeeding pipeline stages can be decoupled to some extent as we assume constant correction variables across the entire image. Those which follow are error sources typical to most multi-view stereo rig photogrammetry pipelines, including calibration error (which we’ve quantified in the Methods section) as well as errors in the photogrammetry pipeline itself (generally reported by most commercial photogrammetry suites.). A greater degree of automation and control over the full pipeline, as well as means to quantify thermal error (as described in the response to comment 2) would certainly be a valuable direction for future work.
Comment 5:
“The strategy of reconstructing geometry from RGB images and subsequently replacing them with thermal data is appropriate and consistent with established practices. Nevertheless, its effectiveness depends directly on the precision of the RGB–TIR alignment. Since some residual misalignments are observed, particularly near image edges, it would be useful to briefly quantify these effects and discuss their impact on the final thermographic mosaics.”
Response 5:
We initially had just provided the reprojection error from stereo calibration, which in general dominates the majority of registration error between the RGB and thermal image views, but per this suggestion, additional commentary has now been added regarding the practical effects of this over varying imaging distances at the end of section 3.4, as well as a table of values (Lines 497-509):
“The mean reprojection error of 0.49 pixels across the calibration set indicates a well-fit model and bounds the expected 2D registration accuracy between the rectified RGB and thermal images. The corresponding metric registration error at the surface scales linearly with object distance as
\begin{equation}
\sigma_{xy} = \frac{Z \cdot \sigma_{px}}{f}
\end{equation}
\noindent where $Z$ is the object distance, $\sigma_{px} = 0.49$ is the mean reprojection error, and $f=615.54$px is the thermal camera focal length recovered from calibration. Evaluated across typical indoor inspection distances (Table \ref{regerr}), the surface registration error remains on the order of a few millimeters or less, which is substantially smaller than the geometric uncertainty of the photogrammetric reconstruction itself. The calibration is therefore not the limiting factor in the accuracy of the final textured thermal model; thermal features can be expected to land where they belong on the reconstructed surface, with overall metric accuracy governed by the RGB photogrammetry pipeline rather than by the stereo registration step.”
Comment 6:
“Finally, the integration of LiDAR-derived depth maps appears to be a useful addition, especially for reconstructing feature-poor surfaces. The current description is largely empirical, and a slightly more explicit explanation of how these data are integrated, along with a simple comparison of results with and without depth information, would make this aspect of the work more generalizable.”
Response 6:
We decided to detail this a bit more in the discussion section (line 696-713), as we elaborated upon the limitations of the method and the way depth is weighted from different sources (direct measurements vs. photo derived measurements). Certainly this integration needs to be improved and refined in our subsequent work. We also introduced a table (Table 3 in the revised manuscript) which better describes the various depth sources.
Comment 7:
“The manuscript provides a useful overview of thermographic acquisition and interpretation, but some aspects could be refined to improve physical clarity. In particular, it would be beneficial to distinguish more explicitly between infrared radiance and temperature, and to more clearly frame the role of emissivity, reflected temperature, and environmental conditions in radiometric measurements.”
Response 7:
Thank you – some additional information on the thermal image formulation and stronger definitions of the relevant terminology used has been added in section 1.1: IR Thermography in the introduction, providing some details on the imaging characteristics of radiometric thermal sensors (Lines 105 - 127), as well as towards the end of 3.2: TIR Data preparation (Lines 416 - 424)
Comment 8:
“The radiometric processing pipeline described in the manuscript is interesting and appears to be carefully implemented. Including a compact mathematical formulation, even in simplified form, would improve transparency and reproducibility, especially for readers interested in replicating the approach.”
Response 8:
The process has been more thoroughly detailed with changes made in response to the previous comment. Additionally, we’ve added an additional figure (Fig 10) containing a flowchart describing the entire process in order.
Comment 9:
“The authors appropriately describe their validation as preliminary, and the reported results are encouraging. To further strengthen the work, additional validation strategies could be considered. In particular, the use of contact temperature sensors, such as thermocouples or calibrated surface probes placed at selected locations, could provide independent reference measurements. This would allow a direct comparison between radiometric estimates and physically measured temperatures, thereby improving the reliability of the thermal component of the workflow.”
Response 9:
Agreed, this was addressed in the additions made due to comment 2 and 4. The new additions to the discussion section propose a potential method for validation of thermal and multi-view registration accuracy in the final models, by expanding on common methods used to establish spatial accuracy with addition of measurable thermal control points.
Comment 10:
The manuscript combines multiple depth sensing approaches, including LiDAR and other iPhone-based systems. While this is an interesting aspect of the work, a clearer distinction between the different sensing principles, resolutions, and expected accuracies would improve readability and help the reader better understand the role of each data source within the pipeline.
Response 10:
A new table has now been added to section 3.1.3 detailing the different modalities available for retrieving depth data from an iPhone, including some additional information on resolution, expected accuracies, sensor types, and qualitative descriptions.
Comment 11:
The manuscript is generally clear and readable. A few expressions could be slightly refined to align more closely with formal scientific language, for example by replacing informal terms such as “black-box AI” with “proprietary processing pipeline.” Similarly, terms such as “survey-grade” or “millimeter-level accuracy” could be more carefully defined or slightly moderated to better reflect the current level of validation.
Response 11:
Agreed! These terms have been removed, Several terms used have now been replaced, per this guidance. “black-box” is now “proprietary”, “survey-grade” is removed entirely (an especially problematic term in the USA) and “millimeter-level” … has been rephrased in multiple places within the text to be more specific to orders of magnitude demonstrated in results.
Reviewer 2 Report
Comments and Suggestions for AuthorsThis paper addresses the fusion and image mosaic methodologies for RGB imagery, LiDAR data, and thermal infrared (TIR) imagery, constituting a technically meaningful study. Nevertheless, the manuscript requires further refinement and enhancement to meet publication standards:
1) The primary contribution of this work resides in the proposition of a data processing framework with certain referential significance, yet no substantive theoretical or technical innovations are embodied. While the authors place emphasis on image mosaic, the detailed mosaic workflow is insufficiently elaborated, and no raw datasets dedicated to the mosaic task are presented. It is recommended that the authors supplement one to two groups of raw data acquired by the three aforementioned sensors, and furnish a detailed flowchart delineating the entire mosaic pipeline.
2) The sensors integrated with the iPhone, particularly the LiDAR and TIR modules, exhibit relatively restricted measurement range and spatial resolution. However, the dimensions of the targets depicted in Figures 12 and 13 appear considerably large. The authors are advised to specify the scale parameters of these targets. For scenarios where the full target cannot be captured in a single frame, in addition to exhibiting partial raw data, the approaches adopted to generate the final complete mosaic imagery should be explicitly expounded.
3) Given that this paper underscores the significance of thermal infrared image mosaic, the authors are expected to present a representative case to demonstrate the superiority of the proposed method. For example, thermal anomalies and associated detailed features that are undetectable in standalone raw visible or TIR imagery can be clearly visualized via the method developed in this study.
4) Within the Discussion section, the authors should intensively elaborate on the limitations of the proposed methodology and the inherent drawbacks of iPhone-integrated sensors, such as the incapability to perform detection for large-extent outdoor scenes, among other constraints.
Author Response
Response to Reviewer 2 Comments
Comment 1:
“The primary contribution of this work resides in the proposition of a data processing framework with certain referential significance, yet no substantive theoretical or technical innovations are embodied. While the authors place emphasis on image mosaic, the detailed mosaic workflow is insufficiently elaborated, and no raw datasets dedicated to the mosaic task are presented. It is recommended that the authors supplement one to two groups of raw data acquired by the three aforementioned sensors, and furnish a detailed flowchart delineating the entire mosaic pipeline.”
Response 1:
We had already included all datasets as openly licensed downloadable links, referred to multiple times (appendix A, line 635). Not only are they all available for download, but the two case studies are made accessible through a streaming web viewer (507), where users can query and measure the datasets at full resolution. To make this more clear , we have also added to the abstract a note about data availability (33-35)
Comment 2:
“The sensors integrated with the iPhone, particularly the LiDAR and TIR modules, exhibit relatively restricted measurement range and spatial resolution. However, the dimensions of the targets depicted in Figures 12 and 13 appear considerably large. The authors are advised to specify the scale parameters of these targets. For scenarios where the full target cannot be captured in a single frame, in addition to exhibiting partial raw data, the approaches adopted to generate the final complete mosaic imagery should be explicitly expounded.”
Response 2:
Great point, we will add these basic notes into the text. This is certainly a limitation when the lidar is employed, and we have made these limitations more clear in our descriptions of the LiDAR sensors, adding a table (Table 3) but there is no stated cutoff for the thermal sensor, which has a focus set to infinity. The issue becomes resolution, and we will describe this with GSD. We have added some additional descriptions in the text, lines 626-628 for the pulpit, and lines 596-597 for the Geisel Library,
We also added some information about the thermals sensor, detailing that it does not have an explicit range, but is instead limited by its resolution (line 310)
Comment 3:
“ Given that this paper underscores the significance of thermal infrared image mosaic, the authors are expected to present a representative case to demonstrate the superiority of the proposed method. For example, thermal anomalies and associated detailed features that are undetectable in standalone raw visible or TIR imagery can be clearly visualized via the method developed in this study.”
Response 3:
I think you’re misinterpreting our objective, which we feel is stated quite clearly in the main findings (lines 3-13), abstract (lines 15-36), and introduction (lines 39-161). We don’t feel we need to justify the need to create mosaics, beyond what we’ve already stated. We are not claiming that mosaicing will allow users to see anomalies which can’t be perceived by other methods. The point is accurate contextualization (I think we’re clear about this in our introduction and descriptions of findings). It seems self-evident that, for the purposes of documentation and diagnosis, a high resolution overview is very useful, and often superior to a low resolution image from a very limited viewpoint, and that the way to achieve a high resolution view of a larger space is to combine the images as we are doing. If we can look at one image mosaic, instead of hundreds of individual inputs, it’s easier to contextualize. If you are so close to an anomaly that it takes up your whole view, you won’t realize it’s an anomaly. If you’re so close to a small anomaly that you can’t see its context, then you can’t interpret the anomaly. It also seems self-evident that higher dimensionality is preferable (3D over 2D), enabling a more effective contextualization of images within the space. By 3D mosaicing the data we can move between scales in a way we can’t in person, and we can hopefully measure the positions of non-visible features against visible ones with some level of confidence.
Comment 4:
“ Within the Discussion section, the authors should intensively elaborate on the limitations of the proposed methodology and the inherent drawbacks of iPhone-integrated sensors, such as the incapability to perform detection for large-extent outdoor scenes, among other constraints.”
Response 4:
Yes, we agree completely. We have added in depth descriptions of our limitations (lines 674-686) and (701-718) to the discussion describing limitations and a need for further development, and have also, as stated earlier, added text and tables better describing the limitations of the system in regards to depth. We also decided to add a short note about this within our abstract( line 29-30) and “Highlights” section (line 9-11) to make this very clear “short range”.
Reviewer 3 Report
Comments and Suggestions for AuthorsSummary
This manuscript presents a well-executed methodology for creating 3D radiometric thermography mosaics using low-cost mobile sensors, specifically, the iPhone 12 Pro (RGB + LiDAR) and FLIR One Pro thermal camera. The authors introduce a stereo calibration model for multi-modal registration, an open-source pipeline for extracting radiometric thermal data, and demonstrate the approach through two case studies: the Geisel Library (exterior) and the Brunelleschi Pulpit in Florence (interior/heritage). The work achieves millimeter-level registration accuracy and maintains full 16-bit thermal data integrity throughout the processing chain. The manuscript is technically rigorous, ethically transparent about AI use, and makes a valuable contribution to accessible thermal diagnostics.
Major Strengths
- Technical rigor: The calibration methodology (Section 3.3) is exceptionally detailed, including ANSYS thermal simulation of the calibration target (Figure 7), MATLAB stereo calibration with 0.49-pixel reprojection error, and careful validation.
- Open-source contribution: The authors provide custom scripts for FLIR radiometric data extraction, depth map capture, and batch processing, enabling full reproducibility. The datasets are made publicly available with DOIs (Appendices A-C).
- Transparent AI disclosure (exemplary): Section 3.6 explicitly states the use of Claude.ai for literature review, grammar correction, and formatting. The authors take full responsibility for the content. This sets a high standard for ethical AI use in academic publishing.
- Practical validation: Two diverse case studies demonstrate the method's applicability to both exterior (Geisel Library) and interior/heritage (Brunelleschi Pulpit) contexts. Quantitative error analysis (Table 3) shows an average 3D registration error of 2.77mm, consistent with expected GSD ranges.
- Honest discussion of limitations: Section 5 candidly addresses unexplained ARKit depth map errors, challenges with featureless surfaces, the need for improved atmospheric correction, and the limitations of the current approach.
Minor Suggestions for Improvement
- Figures: Several figures referenced in the text (e.g., Fig. 7, Fig. 8, Fig. 9, Fig. 10, Fig. 11) appear to be missing from the provided PDF. Please ensure all figures are included with proper captions.
- Section numbering: There is inconsistency in section numbering (3.1.1, 3.2.1, then 3.1.2). Please renumber for logical flow.
- Uncited references: References [39] and [40] appear in the reference list but are not cited in the text. Please either cite them appropriately or remove.
- Equation (1) units: The GSD formula should specify units for pixel pitch and focal length to avoid ambiguity. Consider adding: "where pixel pitch and focal length are in consistent units (e.g., mm)."
- Table 2 GSD calculation: The GSD at 0.3m is 2.75mm and at 0.5m is 3.3mm – the progression appears non-linear. Please verify the calculations or provide the formula used.
- Section 3.4 heading: The heading "Depth Image and depth pairs" appears truncated. Please correct to "Depth Image and Depth Pairs" or a more descriptive title.
Conclusion
This is a technically strong, well-validated, and ethically transparent manuscript that makes a genuine contribution to accessible thermal diagnostics. The open-source tools and public datasets enhance its value to the community. I recommend acceptance after minor revision.
Comments on the Quality of English Language
The English is clear, professional, and requires no further improvement. I think the authors have used AI tools for grammar correction, which has resulted in polished text.
Author Response
Response to Reviewer 3 Comments
Comment 1:
“Several figures referenced in the text (e.g., Fig. 7, Fig. 8, Fig. 9, Fig. 10, Fig. 11) appear to be missing from the provided PDF. Please ensure all figures are included with proper captions.”
Response 1:
Resolved during transition over to the MDPI LaTeX template and a careful review over all the figure references. Thank you
Comment 2:
“There is inconsistency in section numbering (3.1.1, 3.2.1, then 3.1.2). Please renumber for logical flow.”
Response 2:
Corrected. Ordering is now:
3.1.1. FLIR One Pro Hardware
3.1.2. iPhone Camera Hardware
3.1.3. iPhone Depth Sensors
3.2. TIR Data Preparation
3.3. Calibration and Transformation
3.4. Calibration as a Multi-View Rig
Comment 3:
“References [39] and [40] appear in the reference list but are not cited in the text. Please either cite them appropriately or remove.”
Response 3:
Similarly to comment 1, this has now been fixed along with the transition over to LaTeX. Unused citations have been removed, and a few new ones were added. Everything in the bibliography should now have an associated in-text citation.
Comment 4:
“The GSD formula should specify units for pixel pitch and focal length to avoid ambiguity. Consider adding: "where pixel pitch and focal length are in consistent units (e.g., mm)."
Response 4:
We’ve now stated this more explicitly (lines 322 - 324).
Comment 5:
“The GSD at 0.3m is 2.75mm and at 0.5m is 3.3mm – the progression appears non-linear. Please verify the calculations or provide the formula used.”
Response 5:
A good catch, we’ve looked over and redone the calculations – there turned out to indeed be some transcription errors. The table has been updated with the correct values, computed in accordance with the provided formula.
Comment 6:
“The heading "Depth Image and depth pairs" appears truncated. Please correct to "Depth Image and Depth Pairs" or a more descriptive title.”
Response 6:
Another good catch. The title of this subsection has been changed to “Depth Map Integration,” and the following sentence now begins “RGB image and depth pairs from the iPhone 12 wide lens… ”
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors have addressed all the questions raised in the previous review, but the revisions made to the paper are limited, and the modified sections are not marked in the text. As a result, the revised manuscript is unsatisfactory. For example:
1) The paper's title uses Mosaic as a keyword, so the authors should focus on elaborating how large-scale targets as shown in Figures 13 and 17 are reconstructed using low-cost sensors, and demonstrate the advantages of the proposed method through examples or comparisons. However, in Section 3, the authors mainly discuss the registration and Integration of three data sources, while the actual mosaicing is implemented using existing software.
2) The authors were expected to provide a flowchart of the entire method, but they did not, nor did they clarify the advantages of the proposed method in its application fields. In responding to the third comment, the authors claimed that I misunderstood the research objective of this paper. Even so, the authors should still provide comparisons between their method and existing ones. Instead, the experimental section mainly presents the mosaicing results.
3) Comment 1 requested the authors to supplement raw data to help readers understand the data acquisition of the entire scene. However, the authors did not provide such supplementary data and only offered an accessible link. Furthermore, the dataset in the link is rather large, so general readers will not spend time downloading it.
Overall, the manuscript does not sufficiently highlight its key contributions and innovations. Only Section 3.4 is relatively convincing.
Author Response
Please see the attachment.
Author Response File:
Author Response.docx

