Hyperspectral Imaging Combined with Deep Learning for the Detection of Mold Diseases on Paper Cultural Relics
Abstract
1. Introduction
2. Related Work
2.1. Dimension Reduction
2.2. Vision Transformer
2.3. Segmentation Network Based on the Combination of ViT and CNN
3. Methodology
3.1. Dimensionality Reduction with Locally Linear Embedding (LLE)
3.2. Mold Semantic Segmentation
3.2.1. Overall Network Architecture
3.2.2. Feature Pyramid Module
3.2.3. MAXViT Module
3.2.4. Dynamic Sparse Attention Module
3.3. Experimental Design
3.3.1. Experimental Procedure
3.3.2. Spectral Data Acquisition
3.3.3. Experimental Configuration
4. Experiments and Result Analysis
4.1. Experimental Comparative Analysis
4.2. Ablation Experiment
4.3. Application Testing
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Chen, K.; Guo, M.; Shi, S. Nondestructive Detection of Trichoderma on Surface of Paper Cultural Relics by Reflective Fiber Optic Spectroscopy. Acta Opt. Sin. 2024, 44, 2006004. [Google Scholar]
- Durmuş, E.; Güneş, A.; Kalkan, H. Detection of aflatoxin and surface mould contaminated figs by using Fourier transform near-infrared reflectance spectroscopy. J. Sci. Food Agric. 2017, 97, 317–323. [Google Scholar] [CrossRef] [PubMed]
- De Silveira, G.; Forsberg, P.; Conners, T.E. Scanning Electron Microscopy: A Tool for the Analysis of Wood Pulp Fibers and Paper. Surface Analysis of Paper; CRC Press: Boca Raton, FL, USA, 2020; pp. 41–71. [Google Scholar]
- Wang, R.; Yan, M.; Jiang, M.; Li, Y.; Kang, X.; Hu, M.; Liu, B.; He, Z.; Kong, D. Label-Free and Selective Cholesterol Detection Based on Multilayer Functional Structure Coated Fiber Fabry-Perot Interferometer Probe. Anal. Chim. Acta 2023, 1252, 341051. [Google Scholar] [CrossRef]
- Xue, Z.; Yu, Q.; Zhong, N.; Zeng, T.; Tang, H.; Zhao, M.; Zhao, Y.; Tang, B. Fiber Optic Sensor for Nondestructive Detection of Microbial Growth on a Silk Surface. Appl. Opt. 2022, 61, 4463–4470. [Google Scholar] [CrossRef]
- Farrugia, J.; Griffin, S.; Valdramidis, V.P.; Camilleri, K.; Falzon, O. Principal Component Analysis of Hyperspectral Data for Early Detection of Mould in Cheeselets. Curr. Res. Food Sci. 2021, 4, 18–27. [Google Scholar] [CrossRef]
- Long, Y.; Tang, X.; Fan, S.; Zhang, C.; Zhang, B.; Huang, W. Identification of Mould Varieties Infecting Maize Kernels Based on Raman Hyperspectral Imaging Technique Combined with Multi-Channel Residual Module Convolutional Neural Network. J. Food Compos. Anal. 2024, 125, 324–351. [Google Scholar] [CrossRef]
- Lan, R.; Li, Z.; Liu, Z.; Gu, T.; Luo, X. Hyperspectral Image Classification Using K-Sparse Denoising Autoencoder and Spectral-Restricted Spatial Characteristics. Appl. Soft Comput. 2019, 74, 693–708. [Google Scholar] [CrossRef]
- Wang, S.; Cen, Y.; Qu, L.; Li, G.; Chen, Y.; Zhang, L. Virtual Restoration of Ancient Mold-Damaged Painting Based on 3D Convolutional Neural Network for Hyperspectral Image. Remote Sens. 2024, 16, 2882. [Google Scholar] [CrossRef]
- Liu, Y.; Li, Y.; Zhao, T.; Kang, H. Vision Sensing-Driven Tunnel Crack Detection Method Using Particle Filtering-Integrated YOLOv5 Model. J. Circuits Syst. Comput. 2025, 34, 2550181. [Google Scholar] [CrossRef]
- Liu, F.; Wang, L. UNet-based model for crack detection integrating visual explanations. Constr. Build. Mater. 2022, 322, 126265. [Google Scholar] [CrossRef]
- Shafiq, M.; Gu, Z. Deep Residual Learning for Image Recognition: A Survey. Appl. Sci. 2022, 12, 2076–3417. [Google Scholar] [CrossRef]
- Kang, D.; Han, Y.; Zhu, J.; Lai, J. An Axially Decomposed Self-Attention Network for the Precise Segmentation of Surface Defects on Printed Circuit Boards. Neural Comput. Appl. 2022, 34, 13697–13712. [Google Scholar] [CrossRef]
- Wang, J.; Lu, S.-Y.; Wang, S.-H.; Zhang, Y.-D. RanMerFormer: Randomized vision transformer with token merging for brain tumor classification. Neurocomputing 2024, 573, 127216. [Google Scholar] [CrossRef]
- Stéphane, d.A.; Touvron, H.; Leavitt, M.; Morcos, A.; Biroli, G.; Sagun, L. ConViT: Improving vision transformers with soft convolutional inductive biases. J. Stat. Mech. Theory Exp. 2022, 114005, 1–27. [Google Scholar]
- Guo, X.; Lin, X.; Yang, X.; Yu, L.; Cheng, K.-T.; Yan, Z. UCTNet: Uncertainty-guided CNN-Transformer hybrid networks for medical image segmentation. Pattern Recognit. J. Pattern Recognit. Soc. 2024, 152, 110491. [Google Scholar] [CrossRef]
- Yuan, J.; Zhu, A.; Xu, Q.; Wattanachote, K.; Gong, Y. CTIF-Net: A CNN-Transformer Iterative Fusion Network for Salient Object Detection. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 3795–3805. [Google Scholar] [CrossRef]
- Wu, S.; Hadachi, A.; Lu, C.; Vivet, D. Transformer for multiple object tracking: Exploring locality to vision. Pattern Recognition Letters 2023, 170, 70–76. [Google Scholar] [CrossRef]
- Wu, X.; Lu, H.; Li, K.; Wu, Z.; Liu, X.; Meng, H. Hiformer: Sequence Modeling Networks With Hierarchical Attention Mechanisms. IEEE/ACM Trans. Audio Speech Lang. Process. 2023, 31, 3993–4003. [Google Scholar] [CrossRef]
- She, Y. Feature Pyramid Networks and Long Short-Term Memory for EEG Feature Map-Based Emotion Recognition. Sensors 2023, 23, 1622. [Google Scholar]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Yu, W.; Zhang, M.; Shen, Y. Learning a Local Manifold Representation Based on Improved Neighborhood Rough Set and LLE for Hyperspectral Dimensionality Reduction. Signal Process. 2019, 164, 20–29. [Google Scholar] [CrossRef]
- Tu, S.T.; Chen, J.Y.; Yang, W.; Sun, H. Laplacian Eigenmaps-Based Polarimetric Dimensionality Reduction for SAR Image Classification. IEEE Trans. Geosci. Remote Sens. 2012, 50, 170–179. [Google Scholar] [CrossRef]
- Li, W.; Zhang, L.; Zhang, L.; Du, B. GPU Parallel Implementation of Isometric Mapping for Hyperspectral Classification. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1532–1536. [Google Scholar] [CrossRef]
- Xia, L.; Lee, C.; Li, J.J. Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters. Nat. Commun. 2024, 15, 1753. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zha, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Ramachandran, P.; Parmar, N.; Vaswani, A.; Bello, I.; Levskaya, A.; Shlens, J. Stand-Alone Self-Attention in Vision Models. Advances in Neural Information Processing Systems 32 (NeurIPS 2019). In Proceedings of the 32nd Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 1 of 20. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv 2021, arXiv:2103.14030. [Google Scholar] [CrossRef]
- Wang, W.; Wang, J.; Quan, D.; Yang, M.; Sun, J.; Lu, B. PolSAR Image Classification Via a Multigranularity Hybrid CNN-ViT Model With External Tokens and Cross-Attention. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 8003–8019. [Google Scholar] [CrossRef]
- Peng, Y.; Liang, F. Tumor segmentation method for breast ultrasound images incorporating CNN and ViT. CAAI Trans. Intell. Syst. 2024, 19, 556–564. [Google Scholar]
- Nguyen-Tat, T.B.; Vo, H.-A.; Dang, P.-S. QMaxViT-Unet+: A query-based MaxViT-Unet with edge enhancement for scribble-supervised segmentation of medical images. Comput. Biol. Med. 2025, 187, 586–598. [Google Scholar] [CrossRef]
- Jiang, M.; Zhai, F.; Kong, J. Sparse Attention Module for optimizing semantic segmentation performance combined with a multi-task feature extraction network. Vis. Comput. 2022, 38, 2473–2488. [Google Scholar] [CrossRef]
- Sun, Z.; Zhang, C.; Zhang, M. Adaptive sparse attention module based on reciprocal nearest neighbors. J. Electron. Imaging 2024, 33, 033038. [Google Scholar] [CrossRef]
- Sun, B.; Liu, C.; Wang, Q.; Bi, K.; Zhang, W. MFFBi-Unet: Merging Dynamic Sparse Attention and Multi-scale Feature Fusion for Medical Image Segmentation. Interdiscip. Sci. Comput. Life Sci 2025. Early Access. [Google Scholar] [CrossRef] [PubMed]
- Zeng, W.; He, M. Rice disease segmentation method based on CBAM-CARAFE-DeepLabv3+. Crop Prot. 2024, 180, 106665. [Google Scholar] [CrossRef]
- Lin, A.; Chen, B.; Xu, J.; Zhang, Z.; Lu, G.; Zhang, D. DS-TransUNet: Dual Swin Transformer U-Net for Medical Image Segmentation. IEEE Trans. Instrum. Meas. 2022, 71, 3178991. [Google Scholar] [CrossRef]











| Layer | Configuration Parameters | Output Size | |
|---|---|---|---|
| Input | - | - | 3 × W × H |
| encoder | Layer | configuration parameters | output size |
| S1 | |||
| S2 | |||
| S3 | |||
| S4 | |||
| decoder | Layer | configuration parameters | output size |
| Z1 | |||
| Z2 | |||
| Z3 | |||
| Z4 |
| Main Technical Indicators | Indicator Parameters | Main Technical Indicators | Indicator Parameters |
|---|---|---|---|
| Spectral Range | 400 nm~1000 nm | AD Dynamic Range | 12 bits |
| Spectral Resolution | <3 nm | Image Resolution | 1920 × 1200 |
| Number of Spectral Channels | 300 | Lens | Canon 24 mm |
| Detector | CMOS/InGaAs (TE Cooled) | Pixel Size | 5.86 μm × 5.86 μm |
| Models | Parameter Count (M) | FLOPs | Aspergillus niger | Penicillium citrinum | Cladosporium cladosporioides |
|---|---|---|---|---|---|
| mIoU (%) | mIoU (%) | mIoU (%) | |||
| ResNet50 | 35.3 | 36.3 G | 87.7 | 83.3 | 85.4 |
| Deeplabv3 | 39.6 | 40 G | 92.4 | 82.5 | 86.3 |
| MaxVIT | 26.8 | 9.2 G | 94.3 | 80.6 | 88.4 |
| TransUnet | 93.2 | 31.5 G | 96.3 | 89.6 | 87.8 |
| HiFormer | 34.2 | 17.4 G | 96.1 | 90.3 | 87.9 |
| Ours | 42.8 | 15.7 G | 96.8 | 90.1 | 89.3 |
| Model | Stage | Param | Flops | Penicillium citrinum | |||
|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | mIoU | |||
| Ours | 43 M | 15.8 G | 86.9 | ||||
| √ | 43 M | 15.7 G | 89.5 | ||||
| √ | √ | 43 M | 15.7 G | 90.1 | |||
| √ | √ | √ | 42 M | 15.6 G | 90.0 | ||
| √ | √ | √ | √ | 42 M | 15.6 G | 88.7 | |
| MaxUnet | 26 M | 9.2 G | 82.4 | ||||
| √ | 26 M | 9.1 G | 83.9 | ||||
| √ | √ | 26 M | 9.0 G | 87.3 | |||
| √ | √ | √ | 25 M | 8.9 G | 86.3 | ||
| √ | √ | √ | √ | 25 M | 8.9 G | 84.2 | |
| Model | Parameters | Complexity | K | Penicillium citrinum |
|---|---|---|---|---|
| mIoU | ||||
| Ours | 41 M | 15.5 G | 24 | 87.6 |
| 42 M | 15.6 G | 32 | 88.9 | |
| 42 M | 15.6 G | 40 | 89.6 | |
| 43 M | 15.7 G | 48 | 89.2 | |
| 43 M | 15.7 G | 56 | 90.1 | |
| 43 M | 15.8 G | 64 | 87.1 | |
| MaxUnet | 24 M | 8.8 G | 24 | 85.1 |
| 24 M | 8.9 G | 32 | 84.8 | |
| 25 M | 8.9 G | 40 | 83.9 | |
| 25 M | 9.0 G | 48 | 84.7 | |
| 26 M | 9.0 G | 56 | 85.8 | |
| 26 M | 9.1 G | 64 | 85.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, Y.; Song, Q.; Song, T.; Dong, S.; Wu, Q.; Long, Z. Hyperspectral Imaging Combined with Deep Learning for the Detection of Mold Diseases on Paper Cultural Relics. Heritage 2025, 8, 495. https://doi.org/10.3390/heritage8120495
Zhao Y, Song Q, Song T, Dong S, Wu Q, Long Z. Hyperspectral Imaging Combined with Deep Learning for the Detection of Mold Diseases on Paper Cultural Relics. Heritage. 2025; 8(12):495. https://doi.org/10.3390/heritage8120495
Chicago/Turabian StyleZhao, Ya, Qiankun Song, Tao Song, Shaojiang Dong, Qian Wu, and Zourong Long. 2025. "Hyperspectral Imaging Combined with Deep Learning for the Detection of Mold Diseases on Paper Cultural Relics" Heritage 8, no. 12: 495. https://doi.org/10.3390/heritage8120495
APA StyleZhao, Y., Song, Q., Song, T., Dong, S., Wu, Q., & Long, Z. (2025). Hyperspectral Imaging Combined with Deep Learning for the Detection of Mold Diseases on Paper Cultural Relics. Heritage, 8(12), 495. https://doi.org/10.3390/heritage8120495
