Efficient Two-Stage Autofocus for Micro-Assembly Based on Joint Spatial-Frequency Image Quality Assessment
Abstract
1. Introduction
- 1.
- We propose WaveMamba-IQA, a joint spatial-frequency IQA model for autofocus that integrates DWT and ViT for complementary frequency and spatial feature modeling, and further combines MLTA with a Vision Mamba state space module to enable robust sharpness scoring.
- 2.
- We employ CMA-ES in the focus search stage to alleviate local-extrema issues encountered by conventional quadratic fitting or hill-climbing strategies, improving robustness in reflection-dominated scenes.
- 3.
- We design a geometry-constrained dual-camera autofocus workflow that couples a global horizontal camera with a high-magnification oblique camera for efficient initialization and refinement.
2. Method
2.1. System Setup
2.2. Overall Pipeline
- WaveMamba-IQA sharpness evaluation model;
- Large-range autofocus for the horizontal camera;
- Calculation of the initial position for the oblique camera based on geometric priors;
- Small-range fine autofocus for the oblique camera.
2.3. WaveMamba-IQA Model
2.3.1. Overall Architecture
2.3.2. Wavelet Branch
2.3.3. MLTA Mamba Block
2.4. Autofocus Procedure
2.4.1. Large-Range Global Autofocus for the Horizontal Camera
2.4.2. Oblique Camera Initial Position Estimation Based on Geometric Priors
2.4.3. Small-Range Fine Autofocus for the Oblique Camera
| Algorithm 1 Two-Stage Autofocus Procedure. |
|
3. Experiments
3.1. Datasets
3.2. Implementation Details
3.3. Evaluation Metrics
4. Results
4.1. WaveMamba-IQA Model Performance
4.2. Ablation Study Analysis
4.3. Autofocus Pipeline Testing
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhang, J.; Dai, X.; Wu, W.; Du, K. Micro-Vision Based High-Precision Space Assembly Approach for Trans-Scale Micro-Device: The CFTA Example. Sensors 2023, 23, 450. [Google Scholar] [CrossRef]
- Bettahar, H.; Clevy, C.; Courjal, N.; Lutz, P. Force-Position Photo-Robotic Approach for the High-Accurate Micro-Assembly of Photonic Devices. IEEE Robot. Autom. Lett. 2020, 5, 6396–6402. [Google Scholar] [CrossRef]
- Zhang, Z.; Wang, X.; Zhao, H.; Ren, T.; Xu, Z.; Luo, Y. The Machine Vision Measurement Module of the Modularized Flexible Precision Assembly Station for Assembly of Micro- and Meso-Sized Parts. Micromachines 2020, 11, 918. [Google Scholar] [CrossRef]
- Ruggeri, S.; Fontana, G.; Fassi, I. Micro-Assembly. In Springer Tracts in Mechanical Engineering; Springer: Cham, Switzerland, 2017; pp. 223–259. [Google Scholar] [CrossRef]
- Shen, F.; Zhang, Z.; Xu, D.; Zhang, J.; Wu, W. An Automatic Assembly Control Method for Peg and Hole Based on Multidimensional Micro Forces and Torques. Int. J. Precis. Eng. Manuf. 2019, 20, 1333–1346. [Google Scholar] [CrossRef]
- Tamadazte, B.; Arnould, T.; Dembele, S.; Fort-Piat, N.L.; Marchand, E. Real-time vision-based microassembly of 3D MEMS. In 2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics; IEEE: Piscataway, NJ, USA, 2009; pp. 88–93. [Google Scholar] [CrossRef]
- Gibson, I.; Osterlund, E.; Truant, R. Using beads as a focus fiduciary to aid software-based autofocus accuracy in microscopy. Bio-Protocol 2025, 15, 1376. [Google Scholar] [CrossRef]
- Duceux, G.; Tamadazte, B.; Le-Fort Piat, N.; Dembele, S.; Marchand, E.; Fortier, G. Autofocusing-Based Visual Servoing: Application to MEMS Micromanipulation. In Proceedings of the International Symposium on Optomechatronic Technologies (ISOT); IEEE: Piscataway, NJ, USA, 2010; Volume 12, pp. 1–6. [Google Scholar] [CrossRef]
- Subbarao, M.; Tyan, J.-K. Selecting the optimal focus measure for autofocusing and depth-from-focus. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 864–870. [Google Scholar] [CrossRef]
- Pertuz, S.; Puig, D.; Garcia, M.A. Analysis of Focus Measure Operators for Shape-From-Focus. Pattern Recognit. 2013, 46, 1415–1432. [Google Scholar] [CrossRef]
- Qu, J.W.; Xu, D.; Zhang, D.P.; Xu, J.Z. High-Precision Measurement Method for Microsphere Hole Pose Based on Active Motion of Two Microscopic Cameras. Acta Autom. Sin. 2021, 47, 1315–1326. [Google Scholar] [CrossRef]
- Her, L.; Yang, X. Research of Image Sharpness Assessment Algorithm for Autofocus. In 2019 IEEE 4th International Conference on Image, Vision and Computing (ICIVC); IEEE: Piscataway, NJ, USA, 2019; pp. 93–98. [Google Scholar] [CrossRef]
- Pauwelyn, A.; Carré, M.; Jourlin, M.; Ginhac, D.; Meriaudeau, F. Image Visual Quality: Sharpness Evaluation in the Logarithmic Image Processing Framework. Big Data Cogn. Comput. 2025, 9, 154. [Google Scholar] [CrossRef]
- Yu, S.; Chen, Z.; Yang, Z.; Gu, J.; Feng, B. Exploring Kolmogorov-Arnold networks for realistic image sharpness assessment. arXiv 2024, arXiv:2409.07762. [Google Scholar] [CrossRef]
- Jamil, S. Review of image quality assessment methods for compressed images. J. Imaging 2024, 10, 113. [Google Scholar] [CrossRef]
- Herath, H.M.S.S.; Herath, H.M.K.K.M.B.; Madusanka, N.; Lee, B.-I. A systematic review of medical image quality assessment. J. Imaging 2025, 11, 100. [Google Scholar] [CrossRef] [PubMed]
- Mao, Q.; Liu, S.; Li, Q.; Jeon, G.; Kim, H.; Camacho, D. No-Reference Image Quality Assessment: Past, Present, and Future. Expert Syst. 2025, 42, e13842. [Google Scholar] [CrossRef]
- Yang, S.; Wu, T.; Shi, S.; Lao, S.; Gong, Y.; Cao, M.; Wang, J.; Yang, Y. MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment. arXiv 2022, arXiv:2204.08958. [Google Scholar] [CrossRef]
- Su, S.; Yan, Q.; Zhu, Y.; Zhang, C.; Ge, X.; Sun, J.; Zhang, Y. Blindly Assess Image Quality in the Wild Guided by a Self-Adaptive Hyper Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3667–3676. [Google Scholar]
- Zhu, H.; Li, L.; Wu, J.; Dong, W.; Shi, G. MetaIQA: Deep Meta-learning for No-Reference Image Quality Assessment. arXiv 2020, arXiv:2004.05508. [Google Scholar] [CrossRef]
- Shi, J.; Gao, P.; Qin, J. Transformer-based no-reference image quality assessment via supervised contrastive learning. Proc. AAAI Conf. Artif. Intell. 2024, 38, 4829–4837. [Google Scholar] [CrossRef]
- Yu, X.; Yu, R.; Yang, J.; Duan, X. A robotic auto-focus system based on deep reinforcement learning. In Proceedings of the 5th International Conference on Control, Automation, Robotics and Vision (ICARCV), Singapore, 11–13 August 2018. [Google Scholar] [CrossRef]
- Guan, F.; Li, X.; Yu, Z.; Lu, Y.; Chen, Z. QMamba: On first exploration of vision mamba for image quality assessment. arXiv 2024, arXiv:2406.09546. [Google Scholar] [CrossRef]
- Wei, Y.; Liu, B.; Zhu, Z.; Ma, Y.; Liang, F.; Li, Z. MCN: A mixture capsule network for authentic blind image quality assessment. Knowl. Based Syst. 2025, 331, 114840. [Google Scholar] [CrossRef]
- Lu, Y.; Li, W.; Ning, X.; Dong, X.; Zhang, Y.; Sun, L. Image quality assessment based on dual domains fusion. In Proceedings of the 2020 International Conference on High Performance Big Data and Intelligent Systems (HPBD&IS), Shenzhen, China, 23 May 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Mallat, S.G.A. Theory for Multiresolution Signal Decomposition: The Wavelet Representation. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11, 674–693. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar] [CrossRef]
- Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.-H. Restormer: Efficient Transformer for High-Resolution Image Restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar] [CrossRef]
- Zhu, L.; Liao, B.; Zhang, Q.; Wang, X.; Liu, W.; Wang, X. Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model. arXiv 2024, arXiv:2401.09417. [Google Scholar] [CrossRef]
- Hansen, N. The CMA Evolution Strategy: A Tutorial. arXiv 2016, arXiv:1604.00772. [Google Scholar] [CrossRef]
- Xu, K.; Qin, M.; Sun, F.; Wang, Y.; Chen, Y.-K.; Ren, F. Learning in the Frequency Domain. arXiv 2020, arXiv:2002.12416. [Google Scholar] [CrossRef] [PubMed]
- Gu, A.; Dao, T. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar] [CrossRef]
- Briggs, W.L.; Henson, V.E. The DFT: An Owner’s Manual for the Discrete Fourier Transform; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1995. [Google Scholar]








| Dataset | Image Size | No. of Images | No. of Sequences | Images per Sequence |
|---|---|---|---|---|
| Horizontal Camera | 1600 | 4 | 400 | |
| Oblique Camera | 800 | 4 | 200 |
| Method | SROCC (Horizontal) | PLCC (Horizontal) | SROCC (Oblique) | PLCC (Oblique) |
|---|---|---|---|---|
| MANIQA [18] | 0.9704 ± 0.0032 | 0.9589 ± 0.0041 | 0.9523 ± 0.0038 | 0.9410 ± 0.0030 |
| HyperIQA [19] | 0.9652 ± 0.0044 | 0.9511 ± 0.0037 | 0.9472 ± 0.0031 | 0.9323 ± 0.0038 |
| WaveMamba-IQA (Ours) | 0.9786 ± 0.0029 | 0.9624 ± 0.0023 | 0.9598 ± 0.0025 | 0.9443 ± 0.0031 |
| Comparison | SROCC (Horizontal) | PLCC (Horizontal) | SROCC (Oblique) | PLCC (Oblique) |
|---|---|---|---|---|
| WaveMamba-IQA vs. MANIQA | 0.0169 | 0.1385 | 0.0182 | 0.0418 |
| WaveMamba-IQA vs. HyperIQA | 0.0078 | 0.0001 | 0.0007 | 0.0087 |
| Method | SROCC (Horizontal) | PLCC (Horizontal) | SROCC (Oblique) | PLCC (Oblique) |
|---|---|---|---|---|
| Mamba (w/o Wavelet) | 0.9732 ± 0.0031 | 0.9613 ± 0.0032 | 0.9556 ± 0.0028 | 0.9428 ± 0.0021 |
| Mamba (w/ DFT) | 0.9752 ± 0.0033 | 0.9584 ± 0.0028 | 0.9537 ± 0.0033 | 0.9371 ± 0.0037 |
| WaveMamba-IQA | 0.9786 ± 0.0029 | 0.9624 ± 0.0023 | 0.9598 ± 0.0025 | 0.9443 ± 0.0031 |
| Comparison | SROCC (Horizontal) | PLCC (Horizontal) | SROCC (Oblique) | PLCC (Oblique) |
|---|---|---|---|---|
| WaveMamba-IQA vs. Mamba (w/o Wavelet) | 0.0473 | 0.0524 | 0.0280 | 0.0284 |
| WaveMamba-IQA vs. Mamba (w/ DFT) | 0.0355 | 0.0057 | 0.0101 | 0.0196 |
| Avg Steps (Horizontal) | Avg Steps (Oblique) | Avg Single Inference Time | Avg Total Inference Time | Avg Total Time Including Motion | Success Rate |
|---|---|---|---|---|---|
| 28 | 10 | 0.12 s | 4.18 s | 18.23 s | 98.33% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhang, J.; Kang, T.; Zhao, X.; Sun, M.; Yang, Y. Efficient Two-Stage Autofocus for Micro-Assembly Based on Joint Spatial-Frequency Image Quality Assessment. J. Imaging 2026, 12, 137. https://doi.org/10.3390/jimaging12030137
Zhang J, Kang T, Zhao X, Sun M, Yang Y. Efficient Two-Stage Autofocus for Micro-Assembly Based on Joint Spatial-Frequency Image Quality Assessment. Journal of Imaging. 2026; 12(3):137. https://doi.org/10.3390/jimaging12030137
Chicago/Turabian StyleZhang, Jianpeng, Tianbo Kang, Xin Zhao, Mingzhu Sun, and Yi Yang. 2026. "Efficient Two-Stage Autofocus for Micro-Assembly Based on Joint Spatial-Frequency Image Quality Assessment" Journal of Imaging 12, no. 3: 137. https://doi.org/10.3390/jimaging12030137
APA StyleZhang, J., Kang, T., Zhao, X., Sun, M., & Yang, Y. (2026). Efficient Two-Stage Autofocus for Micro-Assembly Based on Joint Spatial-Frequency Image Quality Assessment. Journal of Imaging, 12(3), 137. https://doi.org/10.3390/jimaging12030137

