CycleGAN-Based Translation of Digital Camera Images into Confocal-like Representations for Paper Fiber Imaging: Quantitative and Grad-CAM Analysis
Abstract
1. Introduction
2. Materials and Methods
2.1. Image Details
2.2. CycleGAN-Based DSC-to-WCM Conversion
2.3. Experiments
2.3.1. Quantitative Image Quality and Similarity Evaluation
2.3.2. Classification-Based Evaluation Using WCM-Trained Models
2.3.3. Grad-CAM-Based Attention Analysis
3. Results
4. Discussion
4.1. Image-Level Evaluation: Fidelity, Structure, and Domain Alignment
4.2. Task-Level Evaluation: Classification and Attention Alignment
4.3. Limitations and Future Directions
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| DSC | Digital Still Camera |
| WCM | White-light Confocal Microscope |
| FID | Fréchet Inception Distance |
| SSIM | Structural Similarity Index Measure |
| MAE | Mean Absolute Error |
| PSNR | Peak Signal-to-Noise Ratio |
| CNN | Convolutional Neural Network |
| Grad-CAM | Gradient-weighted Class Activation Mapping |
References
- ISO 9184-1; Paper, Board and Pulps—Fibre Furnish Analysis. International Organization for Standardization: Geneva, Switzerland, 2022.
- JIS P8120; Paper, Board and Pulps—Fibre Furnish Analysis. Japanese Industrial Standards Committee: Tokyo, Japan, 1994.
- Popović, M.; Dhali, M.A.; Schomaker, L.; van der Plicht, J.; Lund Rasmussen, K.; La Nasa, J.; Degano, I.; Perla Colombini, M.; Tigchelaar, E. Dating ancient manuscripts using radiocarbon and AI-based writing style analysis. PLoS ONE 2025, 20, e0323185. [Google Scholar] [CrossRef] [PubMed]
- Fiorucci, M.; Khoroshiltseva, M.; Pontil, M.; Traviglia, A.; Del Bue, A.; James, S. Machine Learning for Cultural Heritage: A Survey. Pattern Recognit. Lett. 2020, 133, 102–108. [Google Scholar] [CrossRef]
- Grimaude, S.; Remenyi, R.; Gabor, A. Deep learning for historical document analysis and recognition: A survey. J. Imaging 2022, 8, 280. [Google Scholar]
- Kamiya, N.; Ashino, K.; Sakai, Y.; Zhou, Y.; Ohyanagi, Y.; Shibazaki, K. Non-destructive estimation of paper fiber using macro images: A comparative evaluation of network architectures and patch sizes for patch-based classification. NDT 2024, 2, 487–503. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Advances in Neural Information Processing Systems; Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2014; Volume 27. [Google Scholar]
- Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar] [CrossRef]
- Wolterink, J.M.; Dinkla, A.M.; Savenije, M.H.F.; Seevinck, P.R.; van den Berg, C.A.T.; Išgum, I. Deep MR to CT Synthesis Using Unpaired Data. In Simulation and Synthesis in Medical Imaging; Tsaftaris, S.A., Gooya, A., Frangi, A.F., Prince, J.L., Eds.; Springer: Cham, Switzerland, 2017; pp. 14–23. [Google Scholar] [CrossRef]
- Gonzalez, Y.; Shen, C.; Jung, H.; Nguyen, D.; Jiang, S.B.; Albuquerque, K.; Jia, X. Semi-automatic sigmoid colon segmentation in CT for radiation therapy treatment planning via an iterative 2.5-D deep learning approach. Med. Image Anal. 2020, 68, 101896. [Google Scholar] [CrossRef]
- Rivenson, Y.; Wang, H.; Wei, Z.; de Haan, K.; Zhang, Y.; Wu, Y.; Günaydın, H.; Zuckerman, J.E.; Chong, T.; Sisk, A.E.; et al. Virtual histological staining of unlabelled tissue-autofluorescence images via deep learning. Nat. Biomed. Eng. 2019, 3, 466–477. [Google Scholar] [CrossRef]
- Christiansen, E.M.; Yang, S.J.; Ando, D.M.; Javaherian, A.; Skibinski, G.; Lipnick, S.; Mount, E.; O’Neil, A.; Shah, K.; Lee, A.K.; et al. In silico labeling: Predicting fluorescent labels in unlabeled images. Cell 2018, 173, 792–803. [Google Scholar] [CrossRef]
- Wang, H.; Rivenson, Y.; Jin, Y.; Wei, Z.; Gao, R.; Günaydın, H.; Bentolila, L.A.; Kural, C.; Ozcan, A. Deep learning enables cross-modality super-resolution in fluorescence microscopy. Nat. Methods 2019, 16, 103–110. [Google Scholar] [CrossRef] [PubMed]
- Ashino, K.; Zhou, Y.; Ohyanagi, Y.; Shibazaki, K.; Kamiya, N. COMIC: Consumer-Grade Optics to Microscopic Imaging Conversion for Non-Destructive Paper Fiber Analysis. In Proceedings of the Computers and the Humanities Symposium 2024, Sendai, Japan, 7–8 December 2024; Volume 2024, pp. 139–144. [Google Scholar]
- Hosokawa, Y.; Ashino, K.; Shibazaki, K.; Kamiya, N. Comparison of Digital Camera and Confocal Microscope Images for Fiber Type Estimation in Traditional Japanese Paper Using Patch-Based Classification. In Proceedings of the Media Computing Conference, St. Petersburg, Russia, 15–17 October 2025; pp. 1–2. [Google Scholar]
- Inayoshi, T.; Ashino, K.; Kamiya, N. KoMiGaPf2025_DSC: Digital Still Camera (DSC) Macro Image Dataset of Kozo, Mitsumata, and Gampi Fibers, Captured with Olympus Tough TG-5; Zenodo: Geneva, Switzerland, 2025. [Google Scholar] [CrossRef]
- Kamiya, N.; Ashino, K. KoMiGaPf2024_WCM: White Light Confocal Microscope Image Dataset of Kozo, Mitsumata, and Gampi Fibers (20×), Captured with Optelics Hybrid; Zenodo: Geneva, Switzerland, 2024. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. arXiv 2018, arXiv:1706.08500. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 2019, 128, 336–359. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]




| Category | Parameter | Description |
|---|---|---|
| Hardware | CPU | AMD Ryzen Threadripper PRO 5965WX |
| GPU | 3 NVIDIA RTX A6000 (48 GB) | |
| RAM | 256 GB (32 GB 8, 3200 MHz) | |
| Software | OS | Ubuntu 22.04 LTS |
| Framework | PyTorch 1.9.0 (NGC Container: PyTorch 21.06) | |
| CUDA/Python | CUDA 11.3.1/Python 3.8 | |
| Model Config | Generator | ResNet-based (9 blocks) |
| Input Patch Size | 1024 1024 pixels | |
| Batch Size | 1 | |
| Training | Total Epochs | 200 (100 constant + 100 linear decay) |
| Learning Rate | (Adam optimizer, ) | |
| Loss Function | Least Squares GAN Loss + Cycle Consistency Loss | |
| Performance | Inference Time | approx. 1.99 s/image (per patch) |
| (a) | ||||||||
| DSC vs. converted images (Proposed) | DSC vs. WCM images (Reference) | |||||||
| Kozo | Mitsumata | Gampi | Average | Kozo | Mitsumata | Gampi | Average | |
| PSNR (↑) | 7.39 | 7.76 | 9.56 | 8.24 | 6.14 | 6.58 | 7.71 | 6.81 |
| SSIM (↑) | 0.32 | 0.30 | 0.23 | 0.28 | 0.20 | 0.18 | 0.12 | 0.17 |
| MAE (↓) | 182.44 | 181.65 | 153.41 | 172.50 | 165.85 | 165.21 | 142.39 | 157.82 |
| (b) | ||||||||
| Converted vs. WCM images (Proposed) | DSC vs. WCM images (Reference) | |||||||
| Kozo | Mitsumata | Gampi | Average | Kozo | Mitsumata | Gampi | Average | |
| FID (↓) | 212.39 | 244.30 | 135.48 | 197.39 | 381.03 | 399.14 | 364.00 | 381.39 |
| (a) | ||||||||
| Converted images (Proposed) | Original WCM images (Reference) | |||||||
| Kozo | Mitsumata | Gampi | Average | Kozo | Mitsumata | Gampi | Average | |
| Accuracy | 96.06 | 95.11 | 98.22 | 96.46 | 99.78 | 99.50 | 99.72 | 99.67 |
| Precision | 94.91 | 94.21 | 94.94 | 94.69 | 99.34 | 100.00 | 99.17 | 99.50 |
| Recall | 93.17 | 90.92 | 100.00 | 94.69 | 100.00 | 98.50 | 100.00 | 99.50 |
| F1-score | 94.03 | 92.54 | 97.40 | 94.66 | 99.67 | 99.24 | 99.59 | 99.50 |
| (b) | ||||||||
| Converted images (Proposed) | Original WCM images (Reference) | |||||||
| Kozo | Mitsumata | Gampi | Average | Kozo | Mitsumata | Gampi | Average | |
| Accuracy | 98.64 | 98.64 | 99.94 | 99.07 | 99.56 | 98.86 | 99.31 | 99.24 |
| Precision | 98.81 | 97.21 | 99.83 | 98.62 | 98.76 | 99.91 | 97.76 | 98.88 |
| Recall | 97.08 | 98.75 | 100.00 | 98.61 | 99.92 | 96.67 | 100.00 | 98.86 |
| F1-score | 97.94 | 97.97 | 99.92 | 98.61 | 99.34 | 98.26 | 98.97 | 98.86 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Kamiya, N.; Ashino, K.; Hosokawa, Y.; Shibazaki, K. CycleGAN-Based Translation of Digital Camera Images into Confocal-like Representations for Paper Fiber Imaging: Quantitative and Grad-CAM Analysis. Appl. Sci. 2026, 16, 814. https://doi.org/10.3390/app16020814
Kamiya N, Ashino K, Hosokawa Y, Shibazaki K. CycleGAN-Based Translation of Digital Camera Images into Confocal-like Representations for Paper Fiber Imaging: Quantitative and Grad-CAM Analysis. Applied Sciences. 2026; 16(2):814. https://doi.org/10.3390/app16020814
Chicago/Turabian StyleKamiya, Naoki, Kosuke Ashino, Yuto Hosokawa, and Koji Shibazaki. 2026. "CycleGAN-Based Translation of Digital Camera Images into Confocal-like Representations for Paper Fiber Imaging: Quantitative and Grad-CAM Analysis" Applied Sciences 16, no. 2: 814. https://doi.org/10.3390/app16020814
APA StyleKamiya, N., Ashino, K., Hosokawa, Y., & Shibazaki, K. (2026). CycleGAN-Based Translation of Digital Camera Images into Confocal-like Representations for Paper Fiber Imaging: Quantitative and Grad-CAM Analysis. Applied Sciences, 16(2), 814. https://doi.org/10.3390/app16020814

