Next Article in Journal
Integrated GPR and Electrochemical Methods for Monitoring Steel Rebar Corrosion in Reinforced Structure
Previous Article in Journal
Coupled ESEM and XRD Analysis of Montmorillonite Hydration: Real-Time Swelling Quantification and Kinetic Characterization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Non-Destructive Species Discrimination of Japanese Bast Fibers: A Feasibility Study Using Micro-Hyperspectral Imaging and Chemometrics

1
Life and Culture Department, Seirei Women’s Junior College, Akita 011-0937, Japan
2
Faculty of Fine Arts, Aichi University of the Arts, Nagakute 480-1194, Japan
3
School of Information Science and Technology, Aichi Prefectural University, Nagakute 480-1198, Japan
*
Author to whom correspondence should be addressed.
Submission received: 15 March 2026 / Revised: 5 May 2026 / Accepted: 12 May 2026 / Published: 15 May 2026

Abstract

Accurate paper fiber identification is essential for cultural heritage conservation. Traditional staining methods are destructive, while macroscopic AI models often lack physicochemical interpretability. This study explores the feasibility of a non-destructive analytical approach using micro-hyperspectral imaging (Micro-HSI) to overcome both limitations. Three traditional Japanese bast fibers, Kozo, Mitsumata, and Gampi, were analyzed as standard reference samples. Relative reflectance spectra were extracted from microscopic fiber regions using Micro-HSI. Dynamic normalization and Savitzky–Golay first-derivative filtering were applied to suppress scattering effects and baseline drift. Principal component analysis (PCA) and linear discriminant analysis (LDA) were applied in parallel for dimensionality reduction and supervised classification, respectively. The results indicated that unsupervised PCA exhibited substantial inter-class overlap because of the shared cellulose matrix among the fiber types. In contrast, supervised LDA amplified subtle chemical differences and achieved clear separation among the three fibers. Feature-loading analysis indicated that the classification was mainly associated with visible range reflectance characteristics, lignin π→π* absorption bands in the 400–450 nm region, and near-infrared O−H and C−H overtone vibrations near 835 nm. Leave-One-Specimen-Out Cross-Validation yielded an overall accuracy of 77.8%, with error-free classification of Kozo (F1 = 1.00) and misclassification limited to the chemically similar Gampi and Mitsumata pair. This proof-of-concept study demonstrates that combining Micro-HSI with chemometric analysis enables non-destructive fiber discrimination while retaining physicochemically interpretable spectral features. The findings also establish a microscopic spectral reference framework for future non-destructive analysis of historical paper materials.

Graphical Abstract

1. Introduction

1.1. The Materiality of Paper Artifacts and the Urgency of Conservation

Paper is both a primary carrier of historical records and a structurally vulnerable organic substrate. Moisture fluctuation, temperature cycling, light, and microbial activity degrade it over time. These factors cleave cellulose chains and oxidize lignin, leading to measurable consequences for mechanical stability and visual legibility [1].
Effective restoration requires fiber compatibility between repair materials and the original substrate; mismatches in chemistry or physical behavior introduce secondary damage such as tearing or discoloration at the repair interface. For many historical papers, however, source fibers remain poorly documented. Broad labels (such as hemp, bark fiber, bamboo) are frequently the only available descriptors, and these categories do not resolve the species-level chemical and structural differences that vary with plant origin and pulping method [2]. For conservation work, which demands species-level precision, this gap represents a significant practical challenge.

1.2. Limitations of Traditional Fiber Identification

Standard fiber identification relies on destructive staining (most commonly the JIS P8120 iodine-zinc chloride test) combined with optical microscopy [3]. Neither approach is well-suited to heritage objects. Physical sampling removes material that cannot be replaced, and the morphological approach has inherent limitations: beating and aging routinely eliminate the surface features (such as lumens, pits, cross-markings) upon which identification relies. For closely related bast fibers such as mulberry species, even intact samples present ambiguous morphology [4]. Consequently, curators must often classify valuable historical papers as having an “undetermined fiber composition”.

1.3. Spectroscopic Approaches for Fiber Discrimination

Spectroscopic techniques provide an alternative approach for identifying bast fibers based on their molecular composition. Reflectance and infrared spectroscopy have been widely applied to characterize natural fibers in textiles and plant materials. Previous studies demonstrated that spectral reflectance combined with multivariate analysis can effectively distinguish different natural textile fibers in a non-destructive manner [5].
Similarly, infrared spectroscopy has been used to discriminate traditional bast fibers employed in cultural artifacts. Combining spectral measurements with chemometric analysis enables the detection of subtle biochemical differences between bast fibers [6].
These studies demonstrate the potential of optical spectroscopy for fiber classification. However, most spectroscopic measurements rely on bulk samples or macroscopic measurements, which may include mixed materials or background interference from paper structures.

1.4. Hyperspectral Imaging and the Need for Micro-Scale Analysis

Hyperspectral imaging (HSI) has recently emerged as a powerful tool for the non-destructive analysis of cultural heritage materials. By recording spectral information at each pixel, HSI enables spatially resolved acquisition. Recent studies have successfully applied short-wave near-infrared hyperspectral imaging to predict chemical properties and visualize compositional variations in historical paper [7], and hyperspectral data combined with multivariate statistical models such as Principal Component Analysis (PCA) can further improve classification performance in complex spectral datasets [8]. When combined with microscopic optics, this spatially resolved capability allows spectral extraction directly from individual fiber regions, bypassing the background interference inherent in bulk measurements.
Alongside chemical methods, macroscopic AI screening provides a rapid, non-destructive alternative for paper tracing. As we demonstrated in our previous study on patch-based classification [9], macro image networks are highly effective for high-throughput preliminary screening. This approach has a fundamental limit, however, when applied to complex or mixed-material papers: macroscopic RGB sensors capture only morphological and textural features, and their diagnostic power deteriorates sharply when fibers are physically degraded or heavily blended during manufacturing.
Despite these technological advances, most existing hyperspectral and AI studies focus on the macroscopic imaging of entire documents, leaving the micro-scale fiber level largely unexplored. Consequently, the intrinsic spectral signatures of individual fibers within paper structures remain poorly characterized.

1.5. Research Aim

To address this limitation, this study explores the feasibility of using micro-hyperspectral imaging (Micro-HSI) for the identification of bast fibers at the microscopic level. Three traditional Japanese bast fibers (Kozo, Mitsumata, and Gampi) were selected as standard samples. Spectral signatures were extracted from individual fiber regions and analyzed using chemometric methods to establish a spectral reference for species-level fiber identification and to assess the potential of Micro-HSI as a non-destructive diagnostic tool for cultural heritage conservation.

2. Materials and Methods

2.1. Optical and Chemical Basis of Bast Fiber Spectroscopy

The Visible and Near-Infrared (VNIR) optical response of bast fibers is governed by three biopolymers (cellulose, hemicellulose, and lignin), whose contributions are chemically distinct but spectrally overlapping. Cellulose and hemicellulose together dominate the bulk signal. Specifically, cellulose is largely transparent across the visible range, while its O−H and C−H bonds generate overtone and combination absorption bands in the NIR [10]. Meanwhile, hemicellulose adds a broadly similar hydroxyl and ether signature that blends into the cellulose baseline without introducing sharp discriminating features [10]. The chemical contrast needed for fiber identification comes primarily from lignin. Aromatic phenols in lignin undergo π→π* electronic transitions that strongly attenuate reflectance in the UV and blue-violet region. This response varies with lignin retention across species and pulping conditions [11]. Because cellulose composition is nearly identical across bast fiber types, residual lignin distribution is what makes spectral discrimination possible. Hyperspectral imaging in the VNIR range captures these subtle reflectance differences at the spatial resolution of individual filaments, converting species-level biochemical variation into measurable spectral contrast without any physical sampling.

2.2. Micro-Hyperspectral Imaging Principles

To capture these biopolymer-level spectral differences at the fiber scale, Micro-HSI was employed. In Micro-HSI, microscopic imaging and spectral acquisition are combined to generate a three-dimensional data cube Ι (x,y,λ), in which each spatial pixel contains a complete reflectance spectrum across the measured wavelength range. Unlike macroscopic spectrometers, Micro-HSI allows for the precise selection of microscopic regions of the fibers as regions of interest (ROIs), thereby effectively circumventing the diffuse reflection background interference caused by air voids in the paper.
In this study, relative reflectance spectra were calculated using a standard white reference panel according to:
R   ( x , y , λ ) = I sample ( x , y , λ ) I white ( x , y , λ )
where Isample represents the recorded intensity from the fiber sample and Iwhite represents the intensity from the reference panel under identical illumination conditions. The sensor’s dark current was not physically subtracted; as a constant additive offset, it is eliminated mathematically during first-derivative pre-processing (Section 2.6). Wavelength-dependent dark-current components may not be fully removed by differentiation alone, however, and represent a minor residual source of systematic noise.

2.3. Sample Selection and Preparation

Our analysis centers on the three primary bast fibers in Japanese papermaking: Kozo (Broussonetia papyrifera), Mitsumata (Edgeworthia chrysantha), and Gampi (Diplomorpha sikokiana). We acquired all materials from a fully authenticated 1973 archive (Encyclopedia of Handmade Japanese Paper, The Mainichi Newspapers).
To establish a robust optical baseline, we utilized six 100% pure paper specimens per species, all free of fillers and chemical additives. Restricting the reference set to unprocessed materials ensures that the resulting spectra reflect intrinsic fiber chemistry rather than processing variables, a necessary condition for later application to historically aged or mixed-pulp samples.

2.4. Experimental Setup and Hardware Configuration

Data acquisition was performed using a custom Micro-HSI setup: a push-broom hyperspectral camera (NH-9, EBA Japan, Tokyo, Japan) mounted on an upright metallurgical microscope (ECLIPSE LV100ND, Nikon, Tokyo, Japan) through a standard C-mount. The optical path operated in reflection mode, illuminated by a stabilized 12 V, 50 W halogen lamp fixed at a 45° angle. The detector covers the 350–1100 nm range at a 5 nm resolution. To ensure sufficient spatial resolution for isolating individual fibers from the background, all hyperspectral cubes were captured using a 50× objective lens. The technical specifications of the imaging platform are summarized in Table 1.
All hyperspectral cubes in this study were acquired at a scan rate of 100 lines s−1, an exposure time of 9.93 ms per line, and a gain of 50. Acquisition parameters were automatically encoded in the raw data filenames by the HSDAnalyzer software version 1.2 (EBA Japan, Tokyo, Japan), which also applied white reference correction for any difference in acquisition conditions between the sample and white-panel measurements prior to reflectance calculation.

2.5. Data Acquisition and Standardization Protocol

All observations were standardized under invariant magnification, illumination, and exposure settings. During the calibration phase, hyperspectral data from a standard white panel was registered as the system baseline, allowing the software to automatically convert the raw fiber images into calibrated relative reflectance. ROI selection was performed manually on these calibrated files to isolate pure fiber signatures. To maintain consistency, a single operator completed all selections, though no formal intra-operator repeatability was assessed in this feasibility study. Sampling was strictly confined to the paraxial region and the primary focal plane. We specifically selected the bulky, central segments of the fiber bodies to bypass edge diffraction artifacts, while deliberately avoiding air voids and surface impurities.
Ten ROIs were sampled from each specimen. For each fiber type, six independent paper specimens were analyzed, yielding 60 individual ROI spectra per class for chemometric modeling (Section 2.6). The per-specimen mean spectrum, computed by averaging the ten ROIs within each specimen, was used exclusively for the normalized reflectance visualizations presented in Section 3.1. Prior to the combined analysis, each specimen was inspected individually to assess within-species spectral consistency; the results are reported in Section 3.1.
A notable feature of the proprietary software (HSDAnalyzer version 1.2) is that it scales 100% relative reflectance to a raw integer of 4000. However, the export automatically generates a %Average (/4000 × 100) column, which restores the true percentage. These pre-calibrated values were parsed directly into our chemometric models.

2.6. Spectral Pre-Processing and Chemometrics

All pre-processing and chemometric steps were implemented in custom Python 3.9 scripts using SciPy and scikit-learn, with Processing 4.2 used for visualization.
For visualization, each mean spectrum was scaled by its within-file maximum reflectance value:
R norm ( λ ) = R ( λ ) R m a x   × 100 %
This per-spectrum rescaling procedure, referred to throughout as dynamic normalization, places all spectra on a common 0–100% axis without distorting their spectral shape.
Coefficient of variation (CV) analysis was used to identify the optimal wavelength window for modeling; the selected range and justification are detailed in Section 3.2. Based on those results, the 400–1000 nm window was adopted for all chemometric modeling, with data outside this range excluded from further analysis.
The uneven thickness and surface topology of handmade paper introduce baseline drift unrelated to fiber chemistry. A Savitzky–Golay (S–G) first-derivative filter (11-point window, second-order polynomial) was applied to the truncated spectra to address this [12]. Differentiation removes constant additive offsets, substantially reducing both scattering-induced baseline shifts and dark-current contributions from the camera sensor. The resolution of faint chemical absorption features is also improved, as derivative transformation sharpens inflection points that are otherwise obscured by the background slope.
For chemometric modeling, the first-derivative spectra were standardized by Z-score transformation:
Ζ ( λ ) = R ( λ )   μ ( λ ) σ ( λ )  
where μ ( λ ) and σ ( λ ) are the mean and standard deviation across all training spectra at wavelength λ .
While the normalized mean spectra provide a visual basis for identifying morphological differences, all chemometric analyses, including PCA, Linear Discriminant Analysis (LDA), and cross-validation, were conducted on the full 60 individual ROI spectra per class to ensure statistically robust and generalizable results. The standardized spectra were submitted to two parallel analyses. PCA was applied without class labels to identify the principal axes of spectral variance; the number of components retained was determined by identifying the elbow in the scree plot (see Section 3.3 for detailed results). Because unsupervised decomposition (like PCA) often struggles to isolate trace lignin and hemicellulose differences within a dominant shared cellulose matrix, LDA was additionally applied. Using class labels, LDA found the projection that maximized between-class relative to within-class variance, orienting the analysis toward class-separating features rather than total variance.
Classifier performance was evaluated by Leave-One-Specimen-Out Cross-Validation (LOSO-CV). In each fold, all ROIs from one specimen were held out as the test set; the LDA model was retrained on the remaining 17 specimens and used to predict the withheld specimen’s class by majority vote across its ten ROIs. This procedure was repeated for all 18 specimens, yielding an accuracy estimate that reflects specimen-level generalizability and avoids data leakage between ROIs from the same physical sample. The complete analytical workflow, from sample preparation through classification and validation, is summarized in Figure 1.

3. Results

3.1. Raw Spectral Signatures and Baseline Calibration

The following observations are based on the per-class mean spectra computed for visualization; quantitative chemometric analysis is reported in Section 3.3, Section 3.4 and Section 3.5.
Single-specimen inspection revealed that within-species spectral morphology was consistent across specimens despite differences in absolute reflectance level. Mitsumata spectra shared a characteristic concave depression between approximately 450 and 850 nm. Kozo spectra showed a gradual upward slope from shorter to longer visible wavelengths, reproducible across all six specimens. Gampi displayed a moderately flat, intermediate trajectory distinct from both.
Per-class mean spectra were computed to confirm this consistency across the full dataset. Prior to normalization, absolute reflectance levels varied substantially across specimens of the same species, producing broad SD envelopes and visually overlapping profiles (Figure 2); this amplitude variation reflects physical scattering from fiber surface roughness rather than chemical differences between species. Following dynamic normalization, the characteristic morphological features of each fiber type became resolvable in the mean curves. As shown in Figure 3, this reflects the intended effect of dynamic normalization: rescaling suppresses absolute intensity differences driven by physical scattering, exposing the underlying spectral shape of each fiber type at the cost of increased relative dispersion. Specifically, the profiles reveal the concave visible range profile of Mitsumata, the gradual upward slope of Kozo, and the flatter intermediate trajectory of Gampi.
Between 400 and 750 nm, the three fibers diverge visibly (Figure 3). Kozo (red) shows a gradual upward slope across the visible range, reaching its relative maximum near 750 nm. Gampi (green) follows a flatter, slightly concave trajectory at moderate reflectance. Mitsumata (blue) records the lowest overall visible range reflectance among the three, but is distinguished by higher frequency fluctuation and a localized reflectance peak at 425 nm that exceeds both Kozo and Gampi at that wavelength. This feature was not observed in the other two species.
Above 750 nm, all three fibers show a step-like reflectance increase peaking near 850 nm. Their relative ranking shifts beyond 900 nm: Mitsumata reflectance progressively exceeds the group mean, while Kozo drops to the lowest relative position in this region.

3.2. Wavelength Selection via Coefficient of Variation Analysis

The coefficient of variation (CV) was computed across the full spectral range to evaluate data consistency (Figure 4). In the visible region (450–750 nm), CV remained between 13% and 24%, indicating that dynamic normalization effectively suppressed scattering-induced variance and preserved consistent intra-class spectral profiles.
Beyond 760 nm, CV increased sharply, exceeding 50% at several wavelengths. This elevation is attributed to reduced detector sensitivity combined with weak C−H and O−H overtone signals in this region. Inter-species variation in CV was also observed: Mitsumata showed the highest mean CV (27.53%), reflecting greater structural heterogeneity at the fiber level, while Kozo was the most uniform (19.74%). Based on these findings, subsequent chemometric modeling was confined to the 400–1000 nm range to maintain an acceptable signal-to-noise ratio while retaining the principal VNIR absorption features.

3.3. Unsupervised Dimensionality Reduction and Inter-Class Overlap

The number of principal components was determined by inspecting the scree plot (Figure 5). The explained variance curve shows a pronounced elbow after PC3, beyond which successive components contribute less than 5% variance each; PC1–3 were therefore retained for exploratory analysis, jointly accounting for 62.54% of total spectral variance (PC1: 28.93%, PC2: 22.97%, PC3: 10.63%).
The score plots, however, revealed substantial inter-class mixing across all PC pairs, with 95% confidence ellipses overlapping for all three fiber types (Figure 6a–c). Notably, intra-class dispersion also differs markedly among species: Kozo forms the most compact cluster, a pattern that may reflect more thorough removal of non-cellulosic components during pulping and a consequently more uniform fiber chemistry; Mitsumata shows the greatest scatter, indicating greater structural heterogeneity, which is consistent with the CV results in Section 3.2.
The observed overlap is consistent with the dominant contribution of the shared cellulose matrix to spectral variance. Where bulk polysaccharide structure drives much of the signal, PCA has insufficient sensitivity to resolve the trace lignin and hemicellulose differences that distinguish the three species. This behavior is expected for unsupervised decomposition of closely related cellulosic materials, and indicates that supervised classification is needed for species-level discrimination.

3.4. Interpretation of PCA Loading Features

Before proceeding to supervised classification, the PCA loading vectors were examined to identify which spectral regions drive the observed variance structure (Figure 7). PC1 (peak loading ≈ 0.153 at 585 nm) is dominated by broad visible range reflectance, with elevated loadings spanning 500–700 nm. This distribution corresponds to the differing visible range slopes among the three fiber types, particularly the gradual upward slope characteristic of Kozo, rather than discrete chemical absorption bands. PC1 therefore reflects aggregate light-scattering behavior associated with fiber microstructure instead of specific molecular markers.
PC2 and PC3 capture the chemically diagnostic signals. PC2 loading peaks sharply at approximately 400 nm (≈0.156), consistent with π→π* electronic transitions in lignin aromatic rings; this feature tracks residual lignin retained after pulping. PC3 is concentrated in the NIR, with a particularly pronounced loading peak at 835 nm (≈0.226), a wavelength associated with vibrational modes in hydrogen-bonded polysaccharide networks and indicative of variation in hemicellulose or cellulose organization.
Taken together, the three components separate the spectral variance into distinct physical-chemical contributions: bulk light-scattering behavior (PC1), lignin chromophore absorption (PC2), and NIR polysaccharide vibrational response (PC3). This breakdown offers a partial but chemically interpretable account of the spectral variance relevant to LDA discrimination.

3.5. Supervised Species Discrimination via LDA

Applied to the same standardized spectra, LDA resolved the inter-class overlap that PCA could not. In the fitted discriminant space, Kozo, Mitsumata, and Gampi form visually distinct clusters with no overlapping data points (Figure 8). LD1 (73.16% of discriminant variance) provides the primary separation: Kozo is positioned at a centroid of LD1 = 5.90, well separated from both Mitsumata and Gampi, which cluster on the negative side of the axis. LD2 (26.84%) resolves the remaining overlap between Mitsumata (LD2 centroid: 3.03) and Gampi (LD2 centroid: −3.16), with intra-class clusters remaining compact throughout.
The separation achieved by LDA indicates that species-specific biochemical differences, attributed to residual lignin and hemicellulose retained after pulping, produce measurable spectral contrast even where raw reflectance profiles appear similar. Consequently, Micro-HSI combined with first-derivative pre-processing and supervised LDA was sufficient for species-level discrimination among these three bast fiber types under controlled reference conditions. Whether this discrimination can be extended to specimens not included in model training is examined through cross-validation in the following section.

3.6. Cross-Validation Performance

The cluster separation observed in Figure 8 reflects the model fit on the full training dataset. To assess whether this separation generalizes to unseen specimens, LOSO-CV was applied as described in Section 2.6.
The classifier achieved an overall specimen-level accuracy of 77.8% (14/18 correct). Kozo was identified with perfect precision and recall (F1 = 1.00), while Mitsumata and Gampi each achieved an F1-score of 0.67 (Table 2). The confusion matrix indicates that all misclassifications occurred between Mitsumata and Gampi, with no cross-class errors involving Kozo (Figure 9).

4. Discussion

Material compatibility in paper restoration requires fiber identification methods that avoid physical sampling. Although macroscopic hyperspectral imaging and deep-learning-based screening tools can partially address this problem, their performance is still affected by surface heterogeneity and aging-related optical noise. By contrast, Micro-HSI operates at the fiber scale, allowing spectra to be collected directly from individual fibers rather than from the paper surface as a whole. As a result, the spectral response more closely reflects intrinsic biopolymer composition than surface morphology or coating effects.
At the microscopic scale, the shared cellulose backbone of Kozo, Mitsumata, and Gampi dominates the variance structure, an effect that becomes more pronounced when signals are integrated across the full paper matrix, as in hyperspectral mapping across entire documents [7]. Unlike PCA, which emphasizes overall variance, LDA maximizes separation between predefined classes. The resulting cluster boundaries are also consistent with established spectroscopic assignments: the 400–450 nm region corresponds to lignin aromatic transitions [11], and the 835 nm feature to polysaccharide network vibrations [10].
Among the three fibers, Mitsumata produced the most distinctive spectral response in the visible range, characterized by a concave reflectance profile and a localized peak near 425 nm that was not observed in Kozo or Gampi. This feature may be related to differences in lignin chromophore distribution. The higher mean CV of Mitsumata (27.53%) relative to Kozo (19.74%) further supports greater structural heterogeneity at the fiber level.
The partial confusion between Gampi and Mitsumata in cross-validation is consistent with their chemical similarity. Both species are processed by similar traditional cooking methods that substantially reduce lignin content, likely producing fibers with comparable residual aromatic chromophore concentrations. Kozo retains comparatively stronger lignin-associated spectral contrast in the 400–450 nm region. By comparison, Gampi and Mitsumata show more overlapping reflectance profiles within this discriminating band, which likely reduces LDA’s capacity to maintain reliable specimen-level separation.
The ten ROIs sampled from each specimen are not fully independent observations because they share the same physicochemical background. LOSO-CV partially addresses this issue by treating each specimen, rather than each ROI, as the unit of validation. However, the effective sample size for assessing generalizability remains limited to 18 specimens (six per class).
The reference spectra reported here were obtained from modern, unaged specimens. Reflectance in the visible range, particularly the lignin π→π* band associated with PC2, may shift because of photochemical oxidation and yellowing. Although the NIR polysaccharide features near 835 nm are likely to be more stable, aging may still affect spectral stability and class separation over time.
The use of a 50× objective lens, while essential for fiber-level spatial resolution, introduces a practical constraint: the short working distance limits applicability to flat, accessible surfaces and may preclude direct analysis of bound volumes or large fragile artifacts without prior surface preparation. Furthermore, wavelength-dependent dark-current components may not be fully eliminated by first-derivative preprocessing and therefore remain a minor source of systematic noise that should be characterized in future work.
The reference dataset was restricted to pure, unprocessed specimens in order to isolate intrinsic fiber signatures. Historical documents contain additional variables, including mixed pulps, degradation products, and surface coatings, which were intentionally excluded from the present study. Extending the method to real artifacts will require spectral unmixing approaches for multi-component systems, as well as expanded reference datasets incorporating naturally aged samples. Integration with white-light confocal microscopy may further improve spatial registration between chemical and structural information.

5. Conclusions and Future Developments

Micro-HSI combined with dynamic normalization and S–G first-derivative filtering successfully extracted fiber-level spectral signatures from traditional Japanese bast paper specimens without physical sampling. Dynamic normalization suppressed scattering-induced baseline variation, and derivative transformation removed residual drift while sharpening chemical absorption features.
PCA of the processed spectra revealed substantial inter-class overlap among Kozo, Mitsumata, and Gampi, which is attributable to their shared cellulose variance structure. Supervised LDA resolved the three fiber types into visually distinct clusters in the fitted discriminant space. The discrimination was mainly associated with lignin-associated spectral contrast in the 400–450 nm region and polysaccharide vibrational features near 835 nm, consistent with established spectroscopic interpretations. Cross-validation at the specimen level achieved an overall classification accuracy of 77.8%, with perfect classification of Kozo and partial confusion between Mitsumata and Gampi.
The two-stage analytical workflow, combining visual inspection of mean spectra with quantitative validation of individual ROIs, may also be useful for future microspectral studies of paper fibers. The study also provides a controlled microscopic spectral reference dataset for pure Japanese bast fibers, intended to support the calibration of macroscopic Stage 2 models within the Paper Road project, where uncharacterized fiber composition remains a source of prediction error.
Several limitations identified in this study suggest important directions for future research. First, the reference dataset was limited to modern, unaged specimens. Because visible range reflectance, particularly the lignin π→π* band, may shift due to photochemical oxidation and yellowing, future studies should investigate how natural aging influences the classification boundaries identified in this work.
Second, the partial confusion observed between Gampi and Mitsumata likely reflects their chemically similar low-lignin characteristics. Expanding the spectral database with additional specimens and incorporating species-specific chemical markers may improve specimen-level discrimination.
Third, adapting the acquisition protocol for bound volumes or fragile large-scale artifacts remains an important technical challenge.
Finally, the application of this method to historical documents will require spectral unmixing strategies capable of handling mixed-pulp systems, together with expanded reference datasets that include naturally aged and chemically treated materials.

Author Contributions

Conceptualization, Y.Z.; methodology, Y.Z.; software, Y.Z.; formal analysis, Y.Z.; investigation, Y.Z., Y.O. and A.I.; writing—original draft preparation, Y.Z.; writing—review and editing, Y.Z. and K.S.; supervision, K.S. and K.M.; project administration, K.S.; resources, K.S. and K.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Grant-in-Aid for Scientific Research, Japan Society for the Promotion of Science (JSPS), grant number 22H00003 (project: “Elucidation of the Paper Road by data science—Based on Quantitative, Qualitative research and AI Multidimensional analysis”).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The processed spectral data supporting the conclusions of this article are openly available in Zenodo (Zenodo, CERN, Geneva, Switzerland) at https://doi.org/10.5281/zenodo.19784679.

Acknowledgments

During the preparation of this manuscript, the authors used Claude Sonnet 4.6 (Anthropic, San Francisco, CA, USA) for the purposes of language editing and text revision. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
Micro-HSIMicro-Hyperspectral Imaging
VNIRVisible and Near-Infrared
NIRNear-Infrared
UVUltraviolet
PCAPrincipal Component Analysis
LDALinear Discriminant Analysis
PCPrincipal Component
LDLinear Discriminant
CVCoefficient of Variation
SDStandard Deviation
ROIRegion of Interest
S–GSavitzky–Golay
LOSO-CVLeave-One-Specimen-Out Cross-Validation (CV here denotes cross-validation)
NDTNon-Destructive Testing
JISJapanese Industrial Standard

References

  1. Area, M.C.; Cheradame, H. Paper aging and degradation: Recent findings and research methods. BioResources 2011, 6, 5307–5337. [Google Scholar] [CrossRef]
  2. Avataneo, C.; Sablier, M. New criteria for the characterization of traditional East Asian papers. Environ. Sci. Pollut. Res. Int. 2017, 24, 2166–2181. [Google Scholar] [CrossRef] [PubMed]
  3. JIS P 8120:1998; Paper, Board and Pulps—Fiber Analysis. Japanese Industrial Standards. Japanese Standards Association: Tokyo, Japan, 1998.
  4. Lukesova, H.; Holst, B. Identifying plant fibers in cultural heritage with optical and electron microscopy: How to present results and avoid pitfalls. Herit. Sci. 2024, 12, 12. [Google Scholar] [CrossRef]
  5. Garside, P.; Wyeth, P. Identification of Cellulosic Fibres by FTIR Spectroscopy-Thread and Single Fibre Analysis by Attenuated Total Reflectance. Stud. Conserv. 2003, 48, 269–275. [Google Scholar] [CrossRef]
  6. Okuyama, M.; Sato, M.; Akada, M. The Study on Excavated Bast Fibres Using Synchrotron Polarized FT-IR Micro-Spectroscopy. Sen’i Gakkaishi 2012, 68, 55–58. [Google Scholar] [CrossRef][Green Version]
  7. Wu, Y.; Wang, B.; Chen, J.; Huang, X.; Xu, J.; Wei, W.; Chen, K. Non-destructive prediction and pixel-level visualization of polysaccharide-based properties in ancient paper using SWNIR hyperspectral imaging and machine learning. Carbohydr. Polym. 2025, 352, 123198. [Google Scholar] [CrossRef] [PubMed]
  8. Picollo, M.; Cucci, C.; Casini, A.; Stefani, L. Hyper-Spectral Imaging Technique in the Cultural Heritage Field: New Possible Scenarios. Sensors 2020, 20, 2843. [Google Scholar] [CrossRef] [PubMed]
  9. Kamiya, N.; Ashino, K.; Sakai, Y.; Zhou, Y.; Ohyanagi, Y.; Shibazaki, K. Non-Destructive Estimation of Paper Fiber Using Macro Images: A Comparative Evaluation of Network Architectures and Patch Sizes for Patch-Based Classification. NDT 2024, 2, 487–503. [Google Scholar] [CrossRef]
  10. Schwanninger, M.; Rodrigues, J.C.; Fackler, K. A review of band assignments in near infrared spectra of wood and wood components. J. Near Infrared Spectrosc. 2011, 19, 287–308. [Google Scholar] [CrossRef]
  11. Sadeghifar, H.; Ragauskas, A. Lignin as a UV light blocker—A review. Polymers 2020, 12, 1134. [Google Scholar] [CrossRef] [PubMed]
  12. Rinnan, A.; van den Berg, F.; Engelsen, S.B. Review of the most common pre-processing techniques for near-infrared spectra. TrAC Trends Anal. Chem. 2009, 28, 1201–1222. [Google Scholar] [CrossRef]
Figure 1. Schematic overview of the complete analytical workflow. The pipeline is categorized by different colored backgrounds. The light green panel displays the sample preparation of three Japanese bast fibers, denoted by color-coded squares: Kozo (red), Mitsumata (blue), and Gampi (green). The light blue panel illustrates the Micro-HSI acquisition process using a Micro-HSI system equipped with a hyperspectral camera and an upright metallurgical microscope. Within this section, the inset microscopic image of Kozo fibers at 50× magnification illustrates the ROI selection and relative reflectance calculation; specifically, the different colored frames indicate the specific regions of interest (ROIs) selected, and the corresponding-colored lines in the spectral plot represent the relative reflectance spectra extracted from these ROIs. The light purple panel outlines the spectral pre-processing steps (dynamic normalization, S–G first-derivative filtering, CV-based wavelength selection, and Z-score standardization). Finally, chemometric models (PCA and LDA) are developed, and the LDA classifier is evaluated using Leave-One-Specimen-Out Cross-Validation to yield the final classification results.
Figure 1. Schematic overview of the complete analytical workflow. The pipeline is categorized by different colored backgrounds. The light green panel displays the sample preparation of three Japanese bast fibers, denoted by color-coded squares: Kozo (red), Mitsumata (blue), and Gampi (green). The light blue panel illustrates the Micro-HSI acquisition process using a Micro-HSI system equipped with a hyperspectral camera and an upright metallurgical microscope. Within this section, the inset microscopic image of Kozo fibers at 50× magnification illustrates the ROI selection and relative reflectance calculation; specifically, the different colored frames indicate the specific regions of interest (ROIs) selected, and the corresponding-colored lines in the spectral plot represent the relative reflectance spectra extracted from these ROIs. The light purple panel outlines the spectral pre-processing steps (dynamic normalization, S–G first-derivative filtering, CV-based wavelength selection, and Z-score standardization). Finally, chemometric models (PCA and LDA) are developed, and the LDA classifier is evaluated using Leave-One-Specimen-Out Cross-Validation to yield the final classification results.
Ndt 04 00015 g001
Figure 2. Uncalibrated relative reflectance spectra of Kozo (red), Mitsumata (blue), and Gampi (green) prior to dynamic normalization. Solid lines indicate mean reflectance; shaded regions indicate SD envelopes across sampled ROIs. In the visible range, SD envelopes overlap substantially across all three species. Kozo shows the narrowest intra-class dispersion; Mitsumata the widest; Gampi intermediate.
Figure 2. Uncalibrated relative reflectance spectra of Kozo (red), Mitsumata (blue), and Gampi (green) prior to dynamic normalization. Solid lines indicate mean reflectance; shaded regions indicate SD envelopes across sampled ROIs. In the visible range, SD envelopes overlap substantially across all three species. Kozo shows the narrowest intra-class dispersion; Mitsumata the widest; Gampi intermediate.
Ndt 04 00015 g002
Figure 3. Dynamically normalized relative reflectance spectra of Kozo (red), Mitsumata (blue), and Gampi (green). Solid lines indicate mean reflectance; shaded regions indicate SD envelopes across sampled ROIs. Following dynamic normalization, the mean spectral profiles become more distinguishable. Specifically, Mitsumata exhibits a concave visible range profile, while Kozo displays a gradual upward slope. However, the SD envelopes show increased overlap compared to the uncalibrated spectra.
Figure 3. Dynamically normalized relative reflectance spectra of Kozo (red), Mitsumata (blue), and Gampi (green). Solid lines indicate mean reflectance; shaded regions indicate SD envelopes across sampled ROIs. Following dynamic normalization, the mean spectral profiles become more distinguishable. Specifically, Mitsumata exhibits a concave visible range profile, while Kozo displays a gradual upward slope. However, the SD envelopes show increased overlap compared to the uncalibrated spectra.
Ndt 04 00015 g003
Figure 4. Coefficient of variation (CV) across the full spectral range for Kozo (red), Mitsumata (blue), and Gampi (green). Within the 450–750 nm window, CV curves remain low and stable for all three species, though with diverging trends: Gampi shows a gradual upward trajectory while Kozo and Mitsumata trend downward. Across the broader 400–1000 nm range, inter-species differences in CV behavior become more pronounced. Beyond 760 nm, CV rises sharply in all three fiber types, attributed to reduced detector sensitivity and weak overtone signals in this region; above 950 nm, partial overlap between species curves is observable at several wavelengths. These characteristics justify restricting subsequent chemometric modeling to the 400–1000 nm window.
Figure 4. Coefficient of variation (CV) across the full spectral range for Kozo (red), Mitsumata (blue), and Gampi (green). Within the 450–750 nm window, CV curves remain low and stable for all three species, though with diverging trends: Gampi shows a gradual upward trajectory while Kozo and Mitsumata trend downward. Across the broader 400–1000 nm range, inter-species differences in CV behavior become more pronounced. Beyond 760 nm, CV rises sharply in all three fiber types, attributed to reduced detector sensitivity and weak overtone signals in this region; above 950 nm, partial overlap between species curves is observable at several wavelengths. These characteristics justify restricting subsequent chemometric modeling to the 400–1000 nm window.
Ndt 04 00015 g004
Figure 5. Scree plot showing the explained variance ratio of each principal component (bars) and cumulative variance (line). A pronounced elbow is visible after PC3; successive components each contribute less than 5% of the total variance. PC1–3 were retained for analysis, jointly accounting for 62.54% of total spectral variance.
Figure 5. Scree plot showing the explained variance ratio of each principal component (bars) and cumulative variance (line). A pronounced elbow is visible after PC3; successive components each contribute less than 5% of the total variance. PC1–3 were retained for analysis, jointly accounting for 62.54% of total spectral variance.
Ndt 04 00015 g005
Figure 6. PCA score plots with 95% confidence ellipses for Kozo (red), Mitsumata (blue), and Gampi (green): (a) PC1 vs. PC2, (b) PC1 vs. PC3, and (c) PC2 vs. PC3. Across all three projections, inter-class overlap is substantial, consistent with the SD envelope overlap observed in the normalized spectra. Intra-class dispersion differs markedly among species: Mitsumata shows the largest confidence ellipse and highest data scatter; Kozo is the most tightly clustered; Gampi falls intermediate.
Figure 6. PCA score plots with 95% confidence ellipses for Kozo (red), Mitsumata (blue), and Gampi (green): (a) PC1 vs. PC2, (b) PC1 vs. PC3, and (c) PC2 vs. PC3. Across all three projections, inter-class overlap is substantial, consistent with the SD envelope overlap observed in the normalized spectra. Intra-class dispersion differs markedly among species: Mitsumata shows the largest confidence ellipse and highest data scatter; Kozo is the most tightly clustered; Gampi falls intermediate.
Ndt 04 00015 g006
Figure 7. PCA loading vectors for the first three principal components (PC1 in black, PC2 in orange, and PC3 in purple) across the 400–1000 nm spectral range. Background shading distinguishes the visible region (light green) from the near-infrared region (light pink). PC1 loadings are concentrated in the visible range, peaking at 585 nm. PC2 shows a sharp maximum at 400 nm, adjacent to the UV boundary. PC3 exhibits a particularly pronounced feature: a sharp positive loading peak between 820 and 850 nm (maximum at 835 nm), with markedly greater amplitude variation in this region compared to PC1 and PC2. Key loading maxima are annotated for each component.
Figure 7. PCA loading vectors for the first three principal components (PC1 in black, PC2 in orange, and PC3 in purple) across the 400–1000 nm spectral range. Background shading distinguishes the visible region (light green) from the near-infrared region (light pink). PC1 loadings are concentrated in the visible range, peaking at 585 nm. PC2 shows a sharp maximum at 400 nm, adjacent to the UV boundary. PC3 exhibits a particularly pronounced feature: a sharp positive loading peak between 820 and 850 nm (maximum at 835 nm), with markedly greater amplitude variation in this region compared to PC1 and PC2. Key loading maxima are annotated for each component.
Ndt 04 00015 g007
Figure 8. LDA score plot for Kozo (red), Mitsumata (blue), and Gampi (green), each n = 60. All three fiber types form visually separated clusters with no overlapping data points in this projection. LD1 (73.16% of discriminant variance) provides the primary axis of separation, with Kozo positioned well apart from the Mitsumata and Gampi pair; LD2 (26.84%) resolves the remaining overlap between the latter two species.
Figure 8. LDA score plot for Kozo (red), Mitsumata (blue), and Gampi (green), each n = 60. All three fiber types form visually separated clusters with no overlapping data points in this projection. LD1 (73.16% of discriminant variance) provides the primary axis of separation, with Kozo positioned well apart from the Mitsumata and Gampi pair; LD2 (26.84%) resolves the remaining overlap between the latter two species.
Ndt 04 00015 g008
Figure 9. Confusion matrix from Leave-One-Specimen-Out Cross-Validation (18 specimens total, six per class). Rows indicate true labels; columns indicate predicted labels. All four misclassifications occurred between Gampi and Mitsumata; Kozo was classified without error.
Figure 9. Confusion matrix from Leave-One-Specimen-Out Cross-Validation (18 specimens total, six per class). Rows indicate true labels; columns indicate predicted labels. All four misclassifications occurred between Gampi and Mitsumata; Kozo was classified without error.
Ndt 04 00015 g009
Table 1. Technical specifications of the Micro-HSI platform.
Table 1. Technical specifications of the Micro-HSI platform.
ParameterSpecification
Camera typePush-broom HSI (NH-9, EBA Japan, Tokyo, Japan)
Spectral range350–1100 nm (covering UV-Vis-NIR)
Spectral resolution5 nm
Scan rate1.0–109.0 lines s−1 (100 lines s−1)
Exposure time0.02–9.99 ms per line
Gain1–100
Bit depth12-bit
Detector resolution2048 (H) × 1080 (V) pixels
Microscope platformECLIPSE LV100ND (Nikon, Tokyo, Japan))
Objective lens50× (NA 0.8)
Illumination12 V–50 W halogen lamp
Table 2. Leave-One-Specimen-Out Cross-Validation results for all 18 specimens.
Table 2. Leave-One-Specimen-Out Cross-Validation results for all 18 specimens.
Predicted: GampiPredicted: KozoPredicted: MitsumataF1-Score
True: Gampi4020.67
True: Kozo0601.00
True: Mitsumata2040.67
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, Y.; Ohyanagi, Y.; Iwata, A.; Shibazaki, K.; Murakami, K. Non-Destructive Species Discrimination of Japanese Bast Fibers: A Feasibility Study Using Micro-Hyperspectral Imaging and Chemometrics. NDT 2026, 4, 15. https://doi.org/10.3390/ndt4020015

AMA Style

Zhou Y, Ohyanagi Y, Iwata A, Shibazaki K, Murakami K. Non-Destructive Species Discrimination of Japanese Bast Fibers: A Feasibility Study Using Micro-Hyperspectral Imaging and Chemometrics. NDT. 2026; 4(2):15. https://doi.org/10.3390/ndt4020015

Chicago/Turabian Style

Zhou, Yexin, Yoichi Ohyanagi, Akiko Iwata, Koji Shibazaki, and Kazuhito Murakami. 2026. "Non-Destructive Species Discrimination of Japanese Bast Fibers: A Feasibility Study Using Micro-Hyperspectral Imaging and Chemometrics" NDT 4, no. 2: 15. https://doi.org/10.3390/ndt4020015

APA Style

Zhou, Y., Ohyanagi, Y., Iwata, A., Shibazaki, K., & Murakami, K. (2026). Non-Destructive Species Discrimination of Japanese Bast Fibers: A Feasibility Study Using Micro-Hyperspectral Imaging and Chemometrics. NDT, 4(2), 15. https://doi.org/10.3390/ndt4020015

Article Metrics

Back to TopTop