Camera Calibration Optimization Algorithm Based on Nutcracker Optimization Algorithm
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis article proposes a camera calibration optimization algorithm based on the Steller Jay optimization algorithm, which improves calibration accuracy and stability by combining chaotic mapping and sine cosine optimization strategies. The paper addresses an important problem but falls short in technical rigor, experimental validation, and presentation. The following revisions are critical:
1. The claimed superiority of NCS over existing methods lacks sufficient justification. The paper does not adequately situate itself within recent advancements in optimization-based calibration . A deeper literature review and comparison with state-of-the-art methods are necessary.
2.The chaotic mapping and SCA integration lacks theoretical motivation. Why is Chebyshev mapping preferred over other chaotic systems? How does SCA complement NOA’s exploration-exploitation balance? A mathematical analysis of convergence or computational complexity is missing.
3. Comparisons are limited. Include widely-used baselines like Particle Swarm Optimization (PSO), Genetic Algorithms (GA), or recent deep learning-based calibration methods to strengthen claims.
4.The conclusion overstates the results without addressing limitations. For example, the computational overhead of NCS versus traditional methods is ignored. Discuss trade-offs between accuracy and runtime.Future work suggestions are generic. Specify how the algorithm could be extended to multi-camera systems or dynamic environments with moving calibration targets.
5. Tables 2–4 are poorly formatted, with misaligned columns and ambiguous labels (e.g., "Method Method Method"). Use clear headings (e.g., "Algorithm," "Error after 500 iterations") and ensure numerical precision (e.g., excessive decimal places in Table 1).
I recommend a careful revision to address these issues, as the core idea has potential but is currently underdeveloped.
Author Response
Comment 1: The claimed superiority of NCS over existing methods lacks sufficient justification. The paper does not adequately situate itself within recent advancements in optimization-based calibration . A deeper literature review and comparison with state-of-the-art methods are necessary.
Response:
Thank you for your suggestions. Regarding this issue, we have added relevant research content on camera calibration in the introduction section, marked in blue.
Comment 2: The chaotic mapping and SCA integration lacks theoretical motivation. Why is Chebyshev mapping preferred over other chaotic systems? How does SCA complement NOA’s exploration-exploitation balance? A mathematical analysis of convergence or computational complexity is missing.
Response:
Thank you for your suggestions. Regarding this issue, we have added relevant research content on camera calibration in Section 2.2
Among various chaotic systems (e.g., Logistic, Tent, Gauss, and Chebyshev), the Chebyshev map was selected due to its favorable properties for population-based optimization. It exhibits high sensitivity to initial conditions, strong ergodicity, and a higher Lyapunov exponent than most other chaotic systems, enabling the generation of diverse initial populations. Compared with the logistic map, the Chebyshev map avoids short periodic orbits and better maintains randomness, which is critical for escaping local optima during optimization. The chaotic map adopted in this study is the Chebyshev map, which is based on Chebyshev polynomials and exhibits strong randomness and nonlinearity. Its formulation is given as:
In Section 2.3
While NOA performs well in global exploration through foraging and caching behavior, its local exploitation ability may weaken in the later stages of iteration. The integration of the Sine Cosine Algorithm (SCA) addresses this issue by introducing a deterministic yet oscillatory update mechanism that fine-tunes candidate solutions near promising regions. This hybrid strategy leverages the global exploration ability of NOA and the local refinement of SCA, thereby enhancing the overall balance between exploration and exploitation.[14]
In Section 4.2
Although a strict mathematical proof of convergence is challenging due to the stochastic nature of metaheuristic algorithms, we provide a qualitative convergence analysis. The proposed NCS algorithm combines three modules: chaotic initialization, NOA exploration, and SCA-based refinement. Each module satisfies necessary conditions for global search completeness.
Regarding computational complexity, let N be the population size, D the problem dimension, and T the number of iterations. Then the overall complexity is ,consistent with standard population-based optimizers. The inclusion of Chebyshev mapping and SCA modules introduces only a small constant overhead per iteration, which is negligible compared to the gains in convergence stability and accuracy.
Comment 3: . Comparisons are limited. Include widely-used baselines like Particle Swarm Optimization (PSO), Genetic Algorithms (GA), or recent deep learning-based calibration methods to strengthen claims.
Response:
Thank you for your suggestions. The proposers of the methods we compare, such as SMA, NOA, and ORSMA, have already made comparisons with methods like PSO, demonstrating their superiority. Therefore, we directly compare them with these methods to showcase our advantages
Comment 4: The conclusion overstates the results without addressing limitations. For example, the computational overhead of NCS versus traditional methods is ignored. Discuss trade-offs between accuracy and runtime.Future work suggestions are generic. Specify how the algorithm could be extended to multi-camera systems or dynamic environments with moving calibration targets.
Response:
Thank you for your suggestions.
Compared with classical calibration methods (such as Zhang's method or least squares based optimization), the use of chaotic initialization and hybrid meta-heuristics inevitably increases the running time, especially when the problem dimension or population size is large. However, in engineering applications, camera calibration is a preparatory process before the engineering application of cameras. It only requires one calibration. Although we have increased the calibration time, better results have been achieved, and no further computing resources have been consumed for the subsequent work. This is necessary and worthwhile.
As for future work, our goal is to extend the proposed framework to more complex and practical environments. For example, by embedding geometric constraints between cameras in the objective function, this algorithm can adapt to multi-camera systems and achieve joint internal and external parameter estimation. Furthermore, in dynamic environments where the calibration target may be in motion (for example, robotic arm calibration, mapping based on unmanned aerial vehicles), our method can be enhanced by supporting online calibration by combining real-time feature tracking and incremental parameter update strategies. These extensions will further enhance the practicability and universality of our method. However, in this article, our goal is the optimization of camera calibration. The subsequent work will not be elaborated here. Thank you for your suggestions, which have provided us with more ideas for our subsequent work.
Comment 5: Tables 2-4 are poorly formatted, with misaligned columns and ambiguous labels (e.g., "Method Method Method"). Use clear headings (e.g., "Algorithm," "Error after 500 iterations") and ensure numerical precision (e.g., excessive decimal places in Table 1).
Response:
Thank you for your suggestions.
Corresponding modifications have been made in Tables 2-4, and the precision of the parameters has been unified, retaining four decimal places.
Reviewer 2 Report
Comments and Suggestions for AuthorsDear authors,
Some of the remarks are embedded in the attached manuscript pdf.
There is a major issue with the manuscript I want to address. While the language is fine, the way the content is presented makes it very hard for the reader to follow your arguments. In section 4 and 5 everything was presented in a clear and straightforward way. But in section 2 and 3 I'm really missing a "explainatory" approach to present the matter. I know that the math or logic behind the algorithms may be complicated - but its your task as an author to either explain it well to the reader or maybe just to skip the details. Giving the details without explaining or motivating them is rather pointless.
Please bring a better explainatory structure into your manuscript sections 2/3 and care for the image/table properties as commented in the attached pdf.
Comments for author File: Comments.pdf
Author Response
Comments and Suggestions for Authors
Dear authors,
Some of the remarks are embedded in the attached manuscript pdf.
There is a major issue with the manuscript I want to address. While the language is fine, the way the content is presented makes it very hard for the reader to follow your arguments. In section 4 and 5 everything was presented in a clear and straightforward way. But in section 2 and 3 I'm really missing a "explainatory" approach to present the matter. I know that the math or logic behind the algorithms may be complicated - but its your task as an author to either explain it well to the reader or maybe just to skip the details. Giving the details without explaining or motivating them is rather pointless.
Please bring a better explainatory structure into your manuscript sections 2/3 and care for the image/table properties as commented in the attached pdf.
Response:
Thank you for your suggestions.
In response, we have substantially revised Sections 2 and 3 to improve the logical flow and readability. The key revisions include: Added overviews at the beginning of both sections to explain the motivation behind introducing chaotic mapping and hybrid optimization, before diving into technical details. Inserted intuitive explanations before presenting mathematical expressions or algorithmic steps, to ensure that each formula or component serves a clearly stated purpose. The relevant modifications are highlighted in the text.
Reviewer 3 Report
Comments and Suggestions for AuthorsDear editor:
In this paper, the authors propose a camera calibration optimization algorithm based on the Steller Jay optimization algorithm. However, the description of the method and the experimental verification are insufficient, leaving many points that need to be clarified.
- In Equation 4, "if otherwise" should be revised to "otherwise" to maintain consistency.
- In Equation 19, a comma (",") should be inserted between k2 and k3.
- The paper dedicates a significant portion to introducing existing techniques while providing a relatively brief explanation of the proposed algorithm. It is recommended to expand the description of the proposed method.
- The abstract states that the paper proposes a camera calibration optimization algorithm based on the Steller Jay optimization algorithm, whereas the title is "Camera Calibration Optimization Algorithm Based on the Starling-Inspired Strategy," which leads to an inconsistency in content.
- In line 55, NOA is defined as the star crow optimization algorithm, while in lines 77-78, it is defined as the Nutcracker optimization algorithm, causing an inconsistency.
- Figure 3 appears to have limited significance, and an explanation for its inclusion is required.
- The values in Table 1 - Table 4 should maintain the same number of decimal places. Table 4 seems to serve little purpose when placed separately and could be deleted, with its content incorporated into Table 2 and Table 3.
- The authors should provide a detailed description of the experimental environment and parameters, and conduct a thorough analysis of the experimental results. The current content of the experimental section is insufficient.
- The authors should supplement the paper with information on how much the proposed method outperforms other algorithms in terms of relevant metrics.
Author Response
Comment 1: In Equation 4, "if otherwise" should be revised to "otherwise" to maintain consistency.
Response:
Thank you for your suggestions. The relevant issues have been modified
Comment 2: In Equation 19, a comma (",") should be inserted between k2 and k3.
Response:
Thank you for your suggestions. The relevant issues have been modified.
Comment 3: The paper dedicates a significant portion to introducing existing techniques while providing a relatively brief explanation of the proposed algorithm. It is recommended to expand the description of the proposed method.
Response:
Thank you for your suggestions. Our method has been described in detail in Section 2 and Section 3 of the text.
Comment 4: The abstract states that the paper proposes a camera calibration optimization algorithm based on the Steller Jay optimization algorithm, whereas the title is "Camera Calibration Optimization Algorithm Based on the Starling-Inspired Strategy," which leads to an inconsistency in content.
Response:
Thank you for your suggestions. The relevant issues have been modified.
Comment 5: In line 55, NOA is defined as the star crow optimization algorithm, while in lines 77-78, it is defined as the Nutcracker optimization algorithm, causing an inconsistency.
Response:
Thank you for your suggestions. The relevant issues have been modified
Comment 6: Figure 3 appears to have limited significance, and an explanation for its inclusion is required.
Response:
Thank you for your suggestions. The relevant issues have been modified. The 100 iterations and 200 iterations in Figure 3 are validations of the performance of our algorithm. We mainly analyze the performance of the 500 and 1000 iterations of the algorithm to highlight the convergence and stability of our method. The relevant analysis is in Section 4.2.
Comment 7: The values in Table 1 - Table 4 should maintain the same number of decimal places. Table 4 seems to serve little purpose when placed separately and could be deleted, with its content incorporated into Table 2 and Table 3.
Response:
Thank you for your suggestions. The relevant issues have been modified. We list Table 4 for the comparison of the results of 500 and 1000 iterations of different algorithms, which is presented more intuitively.
Comment 8: The authors should provide a detailed description of the experimental environment and parameters, and conduct a thorough analysis of the experimental results. The current content of the experimental section is insufficient.
Response:
Thank you for your suggestions. The relevant issues have been modified.
In Section 4.1
All experiments were conducted on a desktop computer equipped with an Intel Core i7-12700F CPU @ 2.10 GHz, 32 GB RAM, and running Windows 11 with Python 3.10. The optimization algorithms were implemented using the NumPy and SciPy libraries. All calibration tasks were carried out using real images obtained from the camera of the Boya Gongdao R1-10Li underwater robot. The NCS algorithm was tested with the following default parameters: population size N=30, maximum iterations T=500, chaotic map parameter a=4, and SCA control parameters α ∈ [0, 2π], β ∈ [0, 1].
In Section 4.2
NCS achieved the lowest reprojection error across all iterations. For instance, at 1000 iterations, NCS reduced the average reprojection error by 12.3% compared to NOA and 18.7% compared to SMA. Fig. 3(b) shows that NCS converged faster than other algorithms, particularly in the first 300 iterations, highlighting the benefit of chaotic initialization and SCA-based refinement. Across 10 independent runs, NCS exhibited lower standard deviation, suggesting improved robustness against random initialization effects. These results verify that the proposed NCS algorithm is effective in both performance and reliability.
Comment 9: The authors should supplement the paper with information on how much the proposed method outperforms other algorithms in terms of relevant metrics.
Response:
Thank you for your suggestions. The relevant issues have been modified in section 4.2.
NCS achieved the lowest reprojection error across all iterations. For instance, at 1000 iterations, NCS reduced the average reprojection error by 12.3% compared to NOA and 18.7% compared to SMA. Fig. 3(b) shows that NCS converged faster than other algorithms, particularly in the first 300 iterations, highlighting the benefit of chaotic initialization and SCA-based refinement. Across 10 independent runs, NCS exhibited lower standard deviation, suggesting improved robustness against random initialization effects. These results verify that the proposed NCS algorithm is effective in both performance and reliability.
Reviewer 4 Report
Comments and Suggestions for AuthorsThe first thing to note here is that in spite of the title, the paper really deals with optimization. The fact that the algorithms are applied to camera calibration is effectively a secondary topic. Moreover, it is clear that the paper makes no contribution to the advancement of camera calibration, and there are strong indications that the level of knowledge regarding state-of-the-art camera calibration is far from comprehensive. Whereas there may be some novelty in the optimization algorithms reported, there is no innovation whatsoever in relation to camera calibration.
Let me start by relating the industry & research best practise in automated metric camera calibration, which is a core function of the science of photogrammetry, indeed arguably more so than with computer vision because aspects such as accuracy, calibration stability, reliability and scene invariance are critical in applications as diverse as industrial vision metrology; 3D scene reconstruction for heritage recording from drone or terrestrial platforms; engineering structural monitoring; and topographic modelling & mapping from aerial and spaceborne cameras. The principal technique used is “self-calibration”, which is universally implemented via the addition of camera calibration parameters within the photogrammetric bundle adjustment. This process, which has a rigorous functional & stochastic model, centers upon linearized least-square adjustment, with iterative solutions generally involving 3 – 5 iterations, and the whole post photography process consuming a few to several seconds depending upon whether <10, >10 or even hundreds of images are involved. There are well-recognised network design conditions involved for unconstrained solutions, such as convergent imaging geometry and roll angle diversity within the set of images. Single- and multi-camera configurations are equally well accommodated, and there’s a wealth of literature on the approach – NONE of which is referenced in the present paper!
Moreover, the authors make the statement that “Self-calibration techniques … offer flexibility without requiring additional calibration objects but suffer from lower accuracy”. This is nonsense! As stated above, self-calibration is standard industry practise, which has been used for some five decades, and in the case of industrial metrology yields scene-independent reprojection error levels down to the 0.025 pixel level.
In order to highlight shortcomings in the calibration reported in Figure 2, note that there is no orthogonal camera-roll angle diversity in the 14 images. This geometry does thus not support uncoupled (uncorrelated) recovery of interior (intrinsic) orientation parameters of fx, fy, uo & vo. The results in Table 1 highlight this. The Realsense D435i cameras have a maximum resolution of 1920 x 1080 in RGB mode, with 3 micron, square pixels. This means that to a quite tight tolerance fx & fy should be equal, since there is a effectively a single principal distance/focal length. But here the estimates for fx & fy vary by a factor of 5, which is clearly grossly wrong. Also, whereas the solution for uo is plausible, that for vo is not, since this solution (150 pixels) suggests a significantly offset optical axis, which we know the Realsense cameras do not have. The results in Table are clearly erroneous. Applying supplementary ‘optimization’ to these results is a fruitless exercise, which simply yields a lower reprojection error within the same network of images. In order the test the calibration, which would have highlighted the errors, the authors should have fixed these calibration parameters within an independent exterior orientation of a second set of images, preferably of a non-planar object.
Bottom Line: The paper should be rejected. There are too many incorrect generalisations, and the actual calibration/optimization experiment reported is too cursory in nature; a much more comprehensive calibration experiment (without obvious design limitations & grossly erroneous results) is required, and the analysis of the proposed optimization algorithms requires a more comprehensive treatment, along with better experimental design (eg separating the calibration network from additional networks in which the estimated calibration parameters are fixed, and looking at not just the reprojection error in image space, but also the 3D error components within object space resulting from spatial triangulation).
Author Response
Comments and Suggestions for Authors
The first thing to note here is that in spite of the title, the paper really deals with optimization. The fact that the algorithms are applied to camera calibration is effectively a secondary topic. Moreover, it is clear that the paper makes no contribution to the advancement of camera calibration, and there are strong indications that the level of knowledge regarding state-of-the-art camera calibration is far from comprehensive. Whereas there may be some novelty in the optimization algorithms reported, there is no innovation whatsoever in relation to camera calibration.
Let me start by relating the industry & research best practise in automated metric camera calibration, which is a core function of the science of photogrammetry, indeed arguably more so than with computer vision because aspects such as accuracy, calibration stability, reliability and scene invariance are critical in applications as diverse as industrial vision metrology; 3D scene reconstruction for heritage recording from drone or terrestrial platforms; engineering structural monitoring; and topographic modelling & mapping from aerial and spaceborne cameras. The principal technique used is “self-calibration”, which is universally implemented via the addition of camera calibration parameters within the photogrammetric bundle adjustment. This process, which has a rigorous functional & stochastic model, centers upon linearized least-square adjustment, with iterative solutions generally involving 3 – 5 iterations, and the whole post photography process consuming a few to several seconds depending upon whether <10, >10 or even hundreds of images are involved. There are well-recognised network design conditions involved for unconstrained solutions, such as convergent imaging geometry and roll angle diversity within the set of images. Single- and multi-camera configurations are equally well accommodated, and there’s a wealth of literature on the approach – NONE of which is referenced in the present paper!
Moreover, the authors make the statement that “Self-calibration techniques … offer flexibility without requiring additional calibration objects but suffer from lower accuracy”. This is nonsense! As stated above, self-calibration is standard industry practise, which has been used for some five decades, and in the case of industrial metrology yields scene-independent reprojection error levels down to the 0.025 pixel level.
In order to highlight shortcomings in the calibration reported in Figure 2, note that there is no orthogonal camera-roll angle diversity in the 14 images. This geometry does thus not support uncoupled (uncorrelated) recovery of interior (intrinsic) orientation parameters of fx, fy, uo & vo. The results in Table 1 highlight this. The Realsense D435i cameras have a maximum resolution of 1920 x 1080 in RGB mode, with 3 micron, square pixels. This means that to a quite tight tolerance fx & fy should be equal, since there is a effectively a single principal distance/focal length. But here the estimates for fx & fy vary by a factor of 5, which is clearly grossly wrong. Also, whereas the solution for uo is plausible, that for vo is not, since this solution (150 pixels) suggests a significantly offset optical axis, which we know the Realsense cameras do not have. The results in Table are clearly erroneous. Applying supplementary ‘optimization’ to these results is a fruitless exercise, which simply yields a lower reprojection error within the same network of images. In order the test the calibration, which would have highlighted the errors, the authors should have fixed these calibration parameters within an independent exterior orientation of a second set of images, preferably of a non-planar object.
Bottom Line: The paper should be rejected. There are too many incorrect generalisations, and the actual calibration/optimization experiment reported is too cursory in nature; a much more comprehensive calibration experiment (without obvious design limitations & grossly erroneous results) is required, and the analysis of the proposed optimization algorithms requires a more comprehensive treatment, along with better experimental design (eg separating the calibration network from additional networks in which the estimated calibration parameters are fixed, and looking at not just the reprojection error in image space, but also the 3D error components within object space resulting from spatial triangulation).
Response:
Thank you for your suggestions.
This paper focuses on investigating a hybrid optimization strategy designed to improve the numerical optimization performance in camera calibration tasks. While the proposed approach is applied within the context of camera parameter estimation, the primary contribution lies in the design and evaluation of a novel optimization framework. It is not intended to challenge or replace existing geometric calibration models or photogrammetric workflows.
Self-calibration, especially as implemented in photogrammetric bundle adjustment, is a well-established and widely adopted technique. It achieves sub-pixel accuracy (e.g., ~0.025 px) when the imaging network satisfies proper geometric conditions (e.g., convergent views and roll angle diversity). The focus of our work is not to replace such rigorous models, but to explore optimization schemes that may assist parameter refinement or be deployed when traditional methods are limited by data or computation.
Although the proposed optimization method can improve numerical convergence and parameter stability under certain conditions, it does not compensate for inappropriate network geometry or systematic distortions in the imaging setup. In practical calibration workflows, the imaging configuration, initial estimation quality, and geometric diversity remain the decisive factors for overall calibration success. Our method should be viewed as a supplementary optimization tool, rather than a substitute for well-established calibration models and protocols.
After our verification, the camera we adopted was not the RealSense D435i. Instead, we used the underwater images collected by the camera of the R1-10Li model robot from Boya Gongdao. The experimental Settings have been modified in the paper.
Due to the limitations of the experimental equipment and environment, we are temporarily unable to achieve roll angle diversity and three-dimensional error calculation. Our aim is to propose an optimization framework to collect two-dimensional images at different orientations to achieve the calibration optimization of the camera. Although the proposed optimization method can improve numerical convergence and parameter stability under certain conditions, it does not compensate for inappropriate network geometry or systematic distortions in the imaging setup. In practical calibration workflows, the imaging configuration, initial estimation quality, and geometric diversity remain the decisive factors for overall calibration success. Our method should be viewed as a supplementary optimization tool, rather than a substitute for well-established calibration models and protocols.
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsDear author(s),
thanks for your revised manuscript. I really think that the changes increase the readibility quite a lot.
If the editor is fine with the still very small text size in the figures, I suggest to publish the manuscript in its present form.
Author Response
Dear Reviewer,
Thanks for your kind comments.
Yours,
Zelong.
Reviewer 3 Report
Comments and Suggestions for Authors- In the introduction part, the author should adjust the order of references and number them according to the order in which they appear.
- 100 iterations should not appear twice in the caption of Figure 3.
- The experiment carried out by the author is too simple. The author should increase the algorithm verification and scene application results of the actual scene to verify the actual effect of the algorithm.
- The author describes the existing algorithms too much. The algorithm proposed in this paper is not innovative enough.
Author Response
Comments and Suggestions for Authors:
- In the introduction part, the author should adjust the order of references and number them according to the order in which they appear.
- 100 iterations should not appear twice in the caption of Figure 3.
- The experiment carried out by the author is too simple. The author should increase the algorithm verification and scene application results of the actual scene to verify the actual effect of the algorithm.
- The author describes the existing algorithms too much. The algorithm proposed in this paper is not innovative enough.
Response to Reviewer Comments:
We sincerely appreciate the reviewer’s valuable comments and suggestions, which have greatly contributed to the improvement of our manuscript. We have carefully addressed each point as follows:
- On the adjustment of reference order
The references in the introduction section have been thoroughly reorganized and renumbered according to their order of appearance in the text, following the journal’s formatting requirements. - On the repeated labeling of “100 iterations” in Fig. 3
The caption of Fig. 3 has been corrected. The duplication of “100 iterations” has been removed to ensure clarity and consistency. - On the simplicity of experimental validation
In response to this concern, we have significantly enhanced the experimental section by adding extensive validation on actual scene images. Specifically, real-world underwater scenes were used to verify the effectiveness of the proposed calibration algorithm. Furthermore, additional experiments have been conducted to quantitatively evaluate the calibration results through the generation of radial distortion vector fields and distortion symmetry heatmaps. - On the lack of algorithmic innovation
To further strengthen the innovation aspect, we have introduced comprehensive comparisons with several existing optimization algorithms (SSA, NOA, PSO, SMA, and ORSMA). Detailed analyses, including distortion vector field visualization and heatmap-based uniformity assessment, have been provided. The results demonstrate that the proposed NCS-based calibration approach achieves superior distortion correction accuracy and robustness in complex underwater environments compared to traditional methods, thereby addressing the reviewer’s concern about originality.
Once again, we deeply thank the reviewer for the constructive feedback, which has substantially improved the quality and rigor of our manuscript.
Reviewer 4 Report
Comments and Suggestions for AuthorsFirstly, the authors are thanked for adding the additional text on self-calibration in the Introduction. This was very necessary. Unfortunately, however, there remain problems which must be attended to, these mostly related to the actual experiment conducted.
Treating the issues in which they are encountered in the paper:
Section 3.2: The paper has ‘camera calibration’ in the title, so there should be more information than just that listed by the ‘objective function’. Here there needs to a reference (eg to an OPENCV reference document or review paper) and what are termed the ‘internal parameters’ (generally referred to intrinsic parameters) need to be introduced, fx & fy are focal lengths, and uo, vo are image coordinate of the principal point. The reason for spelling this out will soon become apparent.
Results are given for a camera calibration, but the camera specifications are not provided. It’s not good enough to say it’s simply ‘the camera of the Boya Gongdao R1-10Li underwater robot’, especially when again there is no reference and the information seems not to be accessible via Google. In order to evaluate results, readers need to know, at a minimum, the basic camera specs of nominal focal length, image resolution (eg 2048 x 2084 pixels or whatever) and preferably the pixel size. This allows the reader to better understand the results, eg whether the estimated focal length estimates fx & fy are realistic & whether the radial distortion is expected to be barrel or pin cushion.
Table 1 shows the results of Zhang’s calibration method. Unless there is something quite non-standard with the camera lens, these results look to be very much in error. For a normal spherical lens, focal length fx & fy estimates should be very similar, indeed in theory there should only be only one value f = fx = fy. Here there is a 5-fold difference between fx & fy, which makes no sense, even for an anamorphic lens. Something is clearly not correct. Also, for a non-offset (ie standard) lens, uo and vo can be expected to be approximately half the column and row resolution in pixels, since the origin is usual taken at the top left corner and the principal point is close to the centre of the sensor. But here the values of uo = 1190 & vo = 150 suggest an image resolution of approx. 2400 x 300 pixels. The uo value could be correct, but the CCD/CMOS chip is very unlikely to be 2400 x 300, so there’s clearly a problem. The erroneous values might have arisen from excessive projective coupling/correlation between fy & vo due o there apparently being no orthogonal roll variation in the images in Figure 2. Whatever, the results simply don’t look like they could possibly be correct. Knowing the camera specs. would confirm this.
The same situation occurs with the radial and decentring distortion parameter estimates, which vary dramatically in the different solutions. Whereas small variations are anticipated since the k’s are polynomial coefficients, the actual distortion profiles should be the same … but the authors don’t show this; they simply “dump” the parameters values which don’t help the reader at all. Why not show the distortion curves?
The single RMS re-projection error alone is generally not a sufficient indicator of camera calibration accuracy and reliability, and for this reviewer this highlights the main problem with the paper: it purports to deal with a means to improve camera calibration, but really gives not enough information on the experimental validation. The camera is not detailed, the recovery of the some of the camera parameters look to be seriously in error, the computed focal lengths are not analysed and nor are the distortion characteristics addressed, and there’s no validation other than simply presenting a one-number indicator the re-projection error.
Bottom Line: The paper should be rejected in its present form; a major rework & possibly the re-doing of the experimental work is warranted.
Author Response
Comments and Suggestions for Authors:
Firstly, the authors are thanked for adding the additional text on self-calibration in the Introduction. This was very necessary. Unfortunately, however, there remain problems which must be attended to, these mostly related to the actual experiment conducted.
Treating the issues in which they are encountered in the paper:
Section 3.2: The paper has ‘camera calibration’ in the title, so there should be more information than just that listed by the ‘objective function’. Here there needs to a reference (eg to an OPENCV reference document or review paper) and what are termed the ‘internal parameters’ (generally referred to intrinsic parameters) need to be introduced, fx & fy are focal lengths, and uo, vo are image coordinate of the principal point. The reason for spelling this out will soon become apparent.
Results are given for a camera calibration, but the camera specifications are not provided. It’s not good enough to say it’s simply ‘the camera of the Boya Gongdao R1-10Li underwater robot’, especially when again there is no reference and the information seems not to be accessible via Google. In order to evaluate results, readers need to know, at a minimum, the basic camera specs of nominal focal length, image resolution (eg 2048 x 2084 pixels or whatever) and preferably the pixel size. This allows the reader to better understand the results, eg whether the estimated focal length estimates fx & fy are realistic & whether the radial distortion is expected to be barrel or pin cushion.
Table 1 shows the results of Zhang’s calibration method. Unless there is something quite non-standard with the camera lens, these results look to be very much in error. For a normal spherical lens, focal length fx & fy estimates should be very similar, indeed in theory there should only be only one value f = fx = fy. Here there is a 5-fold difference between fx & fy, which makes no sense, even for an anamorphic lens. Something is clearly not correct. Also, for a non-offset (ie standard) lens, uo and vo can be expected to be approximately half the column and row resolution in pixels, since the origin is usual taken at the top left corner and the principal point is close to the centre of the sensor. But here the values of uo = 1190 & vo = 150 suggest an image resolution of approx. 2400 x 300 pixels. The uo value could be correct, but the CCD/CMOS chip is very unlikely to be 2400 x 300, so there’s clearly a problem. The erroneous values might have arisen from excessive projective coupling/correlation between fy & vo due o there apparently being no orthogonal roll variation in the images in Figure 2. Whatever, the results simply don’t look like they could possibly be correct. Knowing the camera specs. would confirm this.
The same situation occurs with the radial and decentring distortion parameter estimates, which vary dramatically in the different solutions. Whereas small variations are anticipated since the k’s are polynomial coefficients, the actual distortion profiles should be the same … but the authors don’t show this; they simply “dump” the parameters values which don’t help the reader at all. Why not show the distortion curves?
The single RMS re-projection error alone is generally not a sufficient indicator of camera calibration accuracy and reliability, and for this reviewer this highlights the main problem with the paper: it purports to deal with a means to improve camera calibration, but really gives not enough information on the experimental validation. The camera is not detailed, the recovery of the some of the camera parameters look to be seriously in error, the computed focal lengths are not analysed and nor are the distortion characteristics addressed, and there’s no validation other than simply presenting a one-number indicator the re-projection error.
Bottom Line: The paper should be rejected in its present form; a major rework & possibly the re-doing of the experimental work is warranted.
Response to Reviewer Comments:
We sincerely appreciate the reviewer’s detailed and constructive feedback. In response to the concerns raised, we have carefully revised the manuscript as follows:
- Addition of Camera Imaging Model and Parameter Definition
In Section 3.1, we have added a detailed description of the camera imaging model, explicitly defining the intrinsic parameters (fx, fy, u0, v0) and their physical meanings. A corresponding reference to the OpenCV calibration model has also been provided to support the theoretical background. - Provision of Camera Specifications
The detailed specifications of the camera used in the experiments have now been included. Specifically, the camera features a resolution of 1920×1080 pixels and employs a 1/2.8-inch CMOS sensor. This information helps readers assess the plausibility of the estimated intrinsic parameters and the expected distortion characteristics.
- Visualization of Distortion Characteristics
In addition to reporting numerical distortion coefficients, we have plotted the distortion curves and generated distortion vector fields for each calibration result. These visualizations provide a more intuitive and comprehensive analysis of the distortion behavior and the correction performance. - Expansion of Experimental Validation
To strengthen the experimental validation, additional comparative experiments have been conducted using multiple optimization algorithms (NCS, NOA, PSO, SMA, and ORSMA). Furthermore, the calibration results have been applied to real-world underwater scene images to demonstrate the practical effectiveness of the proposed approach.
- Addressing the Accuracy and Reliability of Calibration Results
Beyond the single RMS re-projection error, we have introduced additional evaluation metrics, including distortion symmetry heatmaps and distortion uniformity analyses, to more comprehensively assess calibration accuracy and reliability.
We sincerely thank the reviewer again for the valuable comments, which have led to a significant improvement in the quality and completeness of the manuscript. We hope that the revised version meets the expectations and can now be considered for publication.
Round 3
Reviewer 3 Report
Comments and Suggestions for AuthorsAlthough the authors have made numerous improvements, the low innovativeness of the method restricts the quality of the article.
Author Response
Dear Reviewer,
Thank you for your continued effort in evaluating our manuscript and for providing constructive feedback throughout the review process. We sincerely appreciate the insightful comments, which have helped us refine our work. After carefully considering the concerns regarding the perceived lack of innovation, we would like further to clarify the significance and contribution of our study.
To address the limitations of conventional camera calibration optimization methods—such as insufficient global search capability, convergence stagnation, and sensitivity to initial values—this paper introduces the Nutcracker Optimization Algorithm (NOA) into the domain of camera calibration and proposes a novel Hybrid Optimization Algorithm (NCS). This method jointly optimizes the intrinsic parameters and distortion coefficients of the camera by integrating chaos mapping and sine–cosine search strategies. The main contributions of this work are summarized as follows:
1.Development of a hybrid optimization framework for camera calibration:
Based on the NOA, a new hybrid optimization algorithm (NCS) is proposed by incorporating Chebyshev chaotic mapping and sine–cosine search strategies. The framework combines the global exploration ability of NOA, the population diversity enhancement of chaotic mapping, and the local refinement capability of the sine–cosine algorithm. This integrated design significantly improves both global search performance and local convergence precision, offering a robust solution for camera parameter optimization.
2.Application of the proposed algorithm to high-precision camera calibration: To address the limitations of the traditional Zhang calibration method—such as its strong dependence on initial parameter estimates and tendency to converge to local optima—this work introduces the NCS algorithm into the context of high-precision camera calibration. By jointly optimizing the intrinsic matrix and high-order radial/tangential distortion coefficients, the method effectively improves calibration accuracy and reprojection stability. This integration offers a promising alternative optimization paradigm for vision-based robotic systems, especially in scenarios requiring both high accuracy and robustness.
3.Comprehensive multi-angle, multi-scale evaluation of algorithm performance: A systematic experimental framework is established to evaluate the performance of the proposed NCS algorithm in comparison with several benchmark optimization methods, including SSA, SOA, PSO, SMA, and ORSMA. The evaluation is conducted under multiple iteration settings (100, 200, 500, and 1000) to assess the algorithm’s behavior across different optimization scales. Key performance indicators include convergence speed, reprojection error magnitude. Furthermore, a series of visual analyses are incorporated—such as distortion vector field plots, symmetric reprojection error heatmaps, and calibrated image rectification results—to provide an intuitive and quantitative assessment of the calibration quality. These experiments demonstrate the superior convergence stability and accuracy of NCS across diverse calibration scenarios.
The proposed algorithm framework provides an effective solution for underwater camera calibration and visual perception of complex scenes, and offers the necessary support for building a robust underwater vision system.
We hope this clarification adequately demonstrates the value of our approach and its alignment with current challenges. We remain open to any further suggestions the reviewers or editors may have to improve the manuscript. Thank you once again for your time and consideration.
Sincerely,
Zelong
Reviewer 4 Report
Comments and Suggestions for AuthorsThe authors have addressed my principal concerns satisfactorily & I now consider the paper to be acceptable for publication.
Author Response
Dear Reviewer,
Thanks for your kind comments.
Yours,
Zelong