AI-Powered Spectral Imaging for Virtual Pathology Staining

Soker, Adam; Almagor, Maya; Mai, Sabine; Garini, Yuval

doi:10.3390/bioengineering12060655

Open AccessArticle

AI-Powered Spectral Imaging for Virtual Pathology Staining

¹

Biomedical Engineering Faculty & Russell Berrie Nanotechnology Institute, Technion—Israel Institute of Technology, Haifa 3200003, Israel

²

Department of Physiology and Pathophysiology, University of Manitoba, Winnipeg, MB R3T 2N2, Canada

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Bioengineering 2025, 12(6), 655; https://doi.org/10.3390/bioengineering12060655

Submission received: 17 March 2025 / Revised: 11 May 2025 / Accepted: 3 June 2025 / Published: 15 June 2025

(This article belongs to the Special Issue Recent Advances in Optical Imaging and Machine Learning in Biomedicine)

Download

Browse Figures

Versions Notes

Abstract

Pathological analysis of tissue biopsies remains the gold standard for diagnosing cancer and other diseases. However, this is a time-intensive process that demands extensive training and expertise. Despite its importance, it is often subjective and not entirely error-free. Over the past decade, pathology has undergone two major transformations. First, the rise in whole slide imaging has enabled work in front of a computer screen and the integration of image processing tools to enhance diagnostics. Second, the rapid evolution of Artificial Intelligence has revolutionized numerous fields and has had a remarkable impact on humanity. The synergy of these two has paved the way for groundbreaking research aiming for advancements in digital pathology. Despite encouraging research outcomes, AI-based tools have yet to be actively incorporated into therapeutic protocols. This is primary due to the need for high reliability in medical therapy, necessitating a new approach that ensures greater robustness. Another approach for improving pathological diagnosis involves advanced optical methods such as spectral imaging, which reveals information from the tissue that is beyond human vision. We have recently developed a unique rapid spectral imaging system capable of scanning pathological slides, delivering a wealth of critical diagnostic information. Here, we present a novel application of spectral imaging (SI) for virtual Hematoxylin and Eosin (H&E) staining using a custom-built, rapid Fourier-based SI system. Unstained human biopsy samples are scanned, and a Pix2Pix-based neural network generates realistic H&E-equivalent images. Additionally, we applied Principal Component Analysis (PCA) to the spectral information to examine the effect of down sampling the data on the virtual staining process. To assess model performance, we trained and tested models using full spectral data, RGB, and PCA-reduced spectral inputs. The results demonstrate that PCA-reduced data preserved essential image features while enhancing statistical image quality, as indicated by FID and KID scores, and reducing computational complexity. These findings highlight the potential of integrating SI and AI to enable efficient, accurate, and stain-free digital pathology.

Keywords:

spectral imaging; virtual staining; digital pathology; artificial intelligence in medicine

1. Introduction

Artificial Intelligence (AI), particularly machine learning (ML), has seen widespread adoption across numerous domains, with healthcare emerging as one of its most impactful arenas. AI technologies have been applied across diverse clinical areas, from improving hematological analysis and managing blood disorders [1] to identifying antibiotic resistance patterns in Mycobacterium tuberculosis [2] and improving time-series predictions through temporal context modeling in dynamic patient monitoring [3]. In medical imaging, AI has been used for automated diagnostics, image segmentation, disease classification, and even prognosis prediction, showing the potential to improve clinical outcomes and reduce human error.

Within the field of pathology, AI has demonstrated remarkable capabilities in detecting cancerous regions, quantifying histological features, and assisting with diagnostic standardization [4,5,6]. These tools hold the potential to significantly enhance efficiency and accuracy in diagnostic pathology. However, despite promising research and technological progress, the clinical integration of AI remains limited due to concerns over reliability, interpretability, and regulatory challenges [7,8].

A major bottleneck in digital pathology is the need for high-quality stained tissue slides, which involves time-intensive sample preparation, chemical reagents, and human expertise. This creates a gap in exploring methods that can eliminate or reduce reliance on physical staining without compromising diagnostic accuracy. Virtual staining, particularly using AI and advanced imaging techniques, offers a potential solution.

In this context, whole slide imaging (WSI) has played a pivotal role. WSI enables the digitization of pathological slides, providing a foundation for AI-based image analysis. It has also facilitated the creation of large, annotated datasets, a key resource for developing ML algorithms. However, standard RGB imaging captures only limited information, motivating the pursuit of richer data modalities.

Spectral imaging (SI) is one such modality, capturing detailed spectral information beyond the capabilities of conventional RGB imaging and revealing subtle molecular and structural differences in tissues. Indeed, a few studies combining SI with digital pathology have demonstrated its potential in the field [9,10,11,12]. However, due to the complexity of SI acquisition methods, these applications have not been extensively explored [13], and their full potential when integrated with AI has yet to be realized. To bridge this gap, we have developed a unique rapid SI system capable of scanning an entire biopsy within a practical time frame for pathological use [14]. Spectral separation relies on Fourier spectroscopy, which utilizes an interferogram placed in the optical path, enabling the simultaneous scanning of spectra for all pixels in the image as described in the Methodology Section. The Fourier method is known for its high throughput advantages, translating to a better signal-to-noise- ratio and shorter acquisition time.

One of the potential applications of SI is virtual staining (VS), where unstained samples are scanned, and AI analysis of the spectra generates color images that mimic traditional stained tissue. This method has two relevant applications. First, VS can generate a color image from an unstained tissue, typically resembling Hematoxylin and Eosin (H&E), the most used stain by pathologists. The second application involves scanning a stained sample, such as H&E, and transforming the image to simulate a different stain, such as MT, Jones [15], or even a fluorescent marker. This capability would provide pathologists with significantly more information than a single stain.

Prima facie, virtual H&E staining may appear unnecessary if the sample is expected to undergo conventional staining, given that H&E remains the gold standard and is used in over 85% [5] of biopsy samples. However, the virtual generation of special stains from H&E images offers significant advantages, as certain histopathological features cannot be identified with H&E alone and typically require additional staining, which is time-consuming, labor-intensive, and dependent on specialized equipment. Training an AI model to convert H&E-stained images into special-stained counterparts, however, presents a major challenge due to the difficulty of obtaining matched pairs of tissue samples with different stains.

A recent paper presented a novel approach for creating registered image pairs of tissue samples stained with H&E and another special stain [5]. The work is based on two networks: the first transforms images of unstained tissue into virtual H&E, and the second transforms H&E images into another special stain. The former model trains on a dataset of unstained and H&E corresponding images of a sample, and the main model trains on the virtual H&E stain resulting from the first model and a matching image of the sample with another special stain. This type of unstained-to-stained AI transformation can be used for in vivo applications, eliminating the need for chemical staining [16].

In this study, we propose a framework for virtual H&E staining using spectral data from unstained human biopsy samples followed by AI analysis (Figure 1). We used breast cancer biopsy samples that were scanned by our SI system twice, first without any stain, and again following standard H&E staining protocol (Figure 1A). The spectral images of the stained dataset went through a process of reduction to Red–Green–Blue (rRGB) based on the CIE 1931 scheme [17]. The unstained dataset was also reduced to rRGB (Figure 1B), but in addition, it was reduced by Principal Component Analysis (PCA) [18]. In contrast to previous works on virtual H&E staining, we did not make use of autofluorescence [5,19] in the invisible range; instead, we measured only the bright-field transmission in the visible spectral range. This is simpler and faster in comparison to fluorescence measurements, and yet, we achieved high performance, emphasizing the applicability of spectral information.

The VS model is based on the Pix2pix model [20] (Figure 1C) and was trained several times with different numbers of PCA components, with rRGB, and with the full spectrum. For all pre-processing methods, the results had comparable spatial metrics such as L1 [20], RMSE [21], PNSR [22], and SSIM [23], with a minor advantage for PCA with 10 and 11 elements. However, when we evaluated the results with statistical measures such as FID [24] and KID [25], PCA with five components outperformed all other tests, including the full spectrum and rRGB.

2. Methodology

This study follows a multi-stage process that integrates SI with deep learning for virtual staining of pathology samples. First, we acquired spectral images of breast cancer biopsy tissues using our custom-developed rapid Fourier-based SI system, capturing data from each sample both before and after H&E staining. Next, we preprocessed the spectral data by reducing dimensionality through Principal Component Analysis (PCA) and converting images to Red–Green–Blue (rRGB) representations. These datasets were manually registered and then used to train multiple Pix2Pix conditional GAN models for virtual staining. We evaluated model performance across different preprocessing methods using both spatial (L1, RMSE, SSIM, and PSNR) and statistical (FID and KID) metrics. Finally, we evaluated the quality of the data with an expert. This section outlines these stages, including SI system and data acquisition, datasets preparation, network architecture design, and dimensionality reduction.

2.1. Spectral Imaging System

Unlike traditional RGB imaging, which assigns three values per pixel, a spectral image captures a full spectrum for each pixel, typically consisting of 10–100 different wavelengths (see examples of 40 wavelengths in Figure 2). This results in a three-dimensional data structure that is widely used across various fields, including remote sensing, agriculture, and even art preservation [13]. Previous studies have highlighted the significance of SI in various life sciences applications [9,19,26], particularly in genetic screening through spectral karyotyping (SKY) [27,28]. However, its use in pathology remains limited, despite its potential to enhance AI applicability for clinical practice. One major challenge of SI is its relatively long acquisition time due to the large volume of data collected compared to standard RGB imaging. Most existing SI methods rely on a set of color filters matched to specific wavelength ranges, where the number of spectral points is determined by the number of filters used.

Lately we developed a spectral imaging system based on Fourier spectroscopy [14], that measures the spectrum indirectly by measuring an interference pattern known as the interferogram for each pixel (Figure 3). Each interferogram, captured for each pixel in the image, represents intensity as a function of optical path difference (OPD) generated by a Sagnac interferometer. These interferograms are collected simultaneously for all the pixels in the image (Figure 3). The system has no moving parts, except for a microscope computer-controlled stage that is needed for scanning the sample anyway, similarly to WSI systems [29].

The spectrum at every pixel is derived from the interferogram using a fast Fourier transform along with well-established pre- and post-processing steps [30]. These operations are standard in Fourier spectroscopy and include the following: 1. apodization, which removes high-frequency artifacts in the spectrum arising from the finite length of the interferogram; 2. zero-filling, which better emphasizes the spectral resolution by interpolating additional data points; and 3. phase correction, which converts the complex spectra obtained after the Fourier transform into real-valued spectra. A detailed description of these common procedures can be found in Lindner et al. [14].

Biopsy slides are measured on an Olympus IX81 inverted microscope equipped with a motorized stage (Prior Scientific, Cambridge, UK). The samples are measured with a 20

\times

objective lens with NA = 0.8. A typical spectrum consists of 40 points in the visible spectral range of 400–750 nm. The spectral resolution in Fourier spectroscopy changes along the spectral range from 5 nm at 400 nm to 20 nm at 800 nm [14]. The CMOS camera (Lumenera Lt225 NIR, now Teledyne Lumenera, Ottawa, ON, Canada) has a pixel size of 5.5 × 5.5 μm² so that each pixel images a sample area of 275 × 275 nm². This is oversampling as the spatial resolution is limited to ~610 nm by the diffraction limit. The acquisition occurs ‘on the fly’ while the stage moves at a constant speed and the camera measures 150 images/second with an exposure time of 10 μs. Although the stage continues to move during camera exposure, its speed is carefully selected to ensure that the image smear remains below a quarter of a pixel in size, thereby preserving the spatial resolution. The system can also operate in ‘stop-and-go’ mode, which extends the acquisition time and is typically used when imaging low-light conditions that require longer exposure times.

At the end of the acquisition process, the recorded intensities at each sample point are assembled to construct the interferogram at every pixel. The performance of the system was previously evaluated and shown to achieve high-quality imaging, limited by the diffraction limit [14]. Further details are provided in the Supplementary Materials.

2.2. Sample Preparation

Pathology tissue array slides were provided by TissueArray (Derwood, MD, USA) and include 9 cases of breast cancer in patients of different ages (44–58). Each case included 1–2 paraffin-embedded tissue samples,

5 μ m

thick and approximately 1.5 mm in diameter. Each sample was scanned twice, once before staining and again after H&E staining using the ab245880 H&E Kit (Abcam, Cambridge, UK). The main steps included deparaffinization and hydration of the sections, spectral imaging (SI) scanning, staining with Hematoxylin, rinsing with distilled water, brief application of Bluing Reagent, a second rinse, dipping in absolute alcohol, staining with Eosin Y solution, rinsing with alcohol, dehydration through graded alcohols, clearing, mounting in synthetic resin, and final rescanning.

The registration of unstained and stained images of the same tissue was performed manually using a Graphical User Interface (GUI) that we developed (MATLAB Version R2021b). To simplify the procedures, small 256 × 256 pixel images were measured from each biopsy pair with 50% overlap. Each image corresponds to a sample area of ~70 × 70

{μ m}^{2}

, and each tissue sample typically yielded 1000 to 2000 small images. For model development, we assembled a training dataset comprising 13,000 images from six different patients, and a test dataset containing 5000 samples from three different patients.

2.3. Principal Component Analysis

Principal Component Analysis (PCA) is employed as a data preprocessing technique to reduce the dimensionality of the spectral information before it is fed into the deep learning model. PCA is a widely utilized statistical technique in data analysis that facilitates a reduction in the dimensionality of datasets while preserving as much variance of data as possible [18]. This reduction is achieved by transforming the original variables into a new set of uncorrelated variables known as principal components, ordered by the amount of variance they capture from the data. The core idea behind PCA is to identify the directions along which the variation in the data is maximized. These directions are orthogonal to each other, ensuring that the principal components are linearly independent.

Mathematically, PCA involves covariance matrix computation, which is then decomposed into its eigenvalues and eigenvectors, representing the directions and the magnitude of the principal components, respectively. The original data is projected onto the eigenvectors to form the principal components, ranked according to their variance. By selecting the top

n

components, a reduced-dimensionality representation of the data is obtained. Another perspective on PCA is that it reduces the dimensionality of a dataset while preserving as much relevant information as possible. The loss is defined as

L o s s (n) = \frac{{‖ x_{i} - {\hat{x}}_{i} (n) ‖}_{2}^{2}}{{\sum_{i = 1}^{m} ‖ x_{i} ‖}_{2}^{2}}; {\hat{x}}_{i} (n) = \sum_{j = 1}^{n} p_{i, j} v_{j}

(1)

where

x_{i}

(resized to one-dimensional) is the real spectral image,

{\hat{x}}_{i} (n)

is the reconstruction image of

x_{i}

,

v_{j}

is the PC vector,

p_{i j}

is the

x_{i}

projection on

v_{j}

, and

n

is the size of the reduced dimension. Because

{\hat{x}}_{i} (n)

is a projection of

x_{i}

onto a reduced dimension, this can be considered a loss of information. It means that a perfect reconstruction gives a

Loss (n)

that equals zero. The PCA solutions are given by the eigenvectors of the covariance matrix:

C = \frac{1}{m - 1} {\bar{X}}^{T} \bar{X}; C v_{j} = λ_{j} v_{j}; \bar{X} = (\overset{|}{\underset{|}{x_{1}}}, \overset{|}{\underset{|}{x_{2}}} \dots \overset{|}{\underset{|}{x_{m}}}) - \sum_{j = 1}^{n} {\hat{x}}_{j}

(2)

where

m

is the sample number,

λ_{i}

are the sorted eigenvalues (

\forall p > q : λ_{p} > λ_{q}

) corresponding to the eigenvectors

v_{j}

, and

\bar{X}

’s columns are the centralized original samples. In this case Equation (3) is given as follows:

L o s s (n) = 1 - \frac{\sum_{i = 1}^{n} λ_{j}}{\sum_{i = 1}^{N} λ_{j}}

(3)

where

N

is the size of the full spectrum (N = 40).

An additional benefit of PCA is that noises with low standard deviations do not affect the lower dimensions [31]. As each spectrum contains a high dimension of 40 spectral channels, it can be computationally expensive and potentially redundant. Therefore, our PCA approach that was applied to the unstained spectral images before model training allowed us to assess the impact of the reduced spectral representation on the quality of virtual H&E staining. PCA was applied to reduce the number of spectral channels to 3–15 candidate components, and the minimum number required to preserve high-quality virtual staining performance was determined. The impact of this dimensionality reduction on both spatial and statistical image quality metrics was subsequently analyzed to assess its effectiveness for spectral-to-stain image translation.

2.4. Network Architecture

To generate stained tissue images from spectral data of unstained tissue sections, we employed a conditional Generative Adversarial Network (GAN) based on the Pix2Pix framework, designed for image-to-image translation, originally introduced by Phillip Isola et al. [20]. The network is trained to map input images to output images, learning to generate a realistic image conditioned on an input image. A conditional GAN (cGAN) is a type of GAN where the generator and discriminator are conditioned on some additional information. The generation process is guided by this input, allowing for more controlled and specific outputs. In our case, the input to the network is a spectral image of an unstained tissue section, and the target is its corresponding H&E-stained image. The generator is trained to produce realistic stained images that closely match the ground truth, while the discriminator learns to distinguish between real and generated (virtual) stains.

The generator in Pix2Pix adopts a U-Net architecture [32] (Figure 4A), which consists of an encoder–decoder structure with skip connections between corresponding layers. These skip connections are essential for preserving spatial resolution and fine structural details during the translation process. Each convolutional layer uses 3 × 3 kernels, followed by batch normalization and a LeakyReLU activation function with a negative slope of 0.1.

The discriminator is implemented as a PatchGAN (Figure 4B), which classifies local image patches as real or fake, rather than the entire image. This design encourages the generator to produce realistic texture and fine-grained details. Like the generator, all convolutional layers in the discriminator include batch normalization and LeakyReLU activations (slope = 0.1).

Pix2Pix uses a combination of GAN loss and similarity loss. The former is calculated as the extent to which the generator succeeds in ‘fooling’ the discriminator and thereby helps in generating realistic images, while the latter ensures that the generated images closely resemble the ground truth (GT) images at a pixel level. The similarity loss is computed as the L1 loss between the GT and the virtually stained (VS) images, following the application of a Gaussian blur (kernel size 5 × 5) to both, to emphasize overall structural features over pixel-level noise. The generator loss can be defined as

{L_{G}}_{(x, y) ~ (p_{X, Y})} (G (x), D, y, x) = \log (D (G (x), y)) + γ_{1} {‖ \hat{G} (x) - \hat{y} ‖}_{1} + γ_{2} T V (x)

(4)

where

G

and

D

are the generator and discriminator,

P_{X, Y}

is the GT dataset of unstained and H&E pairs,

γ_{1}

and

γ_{2}

are hyper-parameters equal to 500 and

10^{- 4}

, respectively,

\hat{G} (x)

and

\hat{y}

are the modified

G (x)

and

y

after the Gaussian blur, and

TV (x)

is a total variation regulation term [33], given by

T V (x) = \sum_{c = 1}^{3} \sqrt{\sum_{i = 1}^{H - 1} \sum_{j = 1}^{W} {(x_{i, j, c} - x_{i + 1, j, c})}^{2} + \sum_{i = 1}^{H} \sum_{j = 1}^{W - 1} {(x_{i, j, c} - x_{i, j + 1, c})}^{2}}

(5)

where

H

and

W

are the spatial dimensions, and

c

is the image color. This regulation enhances the quality of realistic images, as natural images typically exhibit low variance, resulting in smooth and coherent images.

Figure 4. Model architecture. (A) The generator model, which is based on U-net. All convolutions had kernels with a size of 3

\times

3 and included batch normalization and a LeakyReLU [34] function with a parameter of 0.1. (B) The discriminator model, based on the Patch model. Here too, all convolutions included batch normalization and a LeakyReLU function with a parameter of 0.1.

Figure 4. Model architecture. (A) The generator model, which is based on U-net. All convolutions had kernels with a size of 3

\times

3 and included batch normalization and a LeakyReLU [34] function with a parameter of 0.1. (B) The discriminator model, based on the Patch model. Here too, all convolutions included batch normalization and a LeakyReLU function with a parameter of 0.1.

Like all GANs, we train the discriminator in parallel using the following loss function:

{L_{D}}_{(x, y) ~ (p_{X, Y})} (G (x), D, y, x) = \frac{1}{2} \log (D (x, y)) - \frac{1}{2} \log (D (G (x), y))

(6)

This formulation ensures balanced adversarial training by encouraging the discriminator to distinguish between real and generated pairs while pushing the generator toward more convincing outputs. Changes in the loss function throughout the training process are shown in the Supplementary Material.

3. Results

Model Training

The model was trained multiple times using varying numbers of PCA components ranging from 1 to 15 and 20, as well as with the full SI data and rRGB inputs. For all methods, the datasets were normalized by centering and scaling each dimension separately to the average and STD of all the training databases [35]. Model parameters optimization was performed using the Adam optimizer [36] over 100 epochs, with the learning rate initially set to 0.02 and decayed by a factor of 0.85 every five epochs. All training processes were conducted on an NVIDIA GeForce RTX 3090 GPU (NVIDIA, Santa Clara, CA, USA).

As one can see in Table 1 and Figure 5, all dataset methods achieved similar results in the spatial metrics, including L1, RMSE, PNSR, and SSI, with a small advantage for PCA with 10 and 11 elements. However, the evaluation of the results using statistical measures (FID and KID) shows a clear advantage for PCA with five elements. These results are consistent with Figure 6, which shows information loss per number of PCA components, where the loss is defined as

1 - Loss (n)

(Equation (3)). As shown in Figure 6, using PCA with five components preserves approximately 75% of the original information. Despite information loss, the PCA-reduced data achieve high performance due to lower dimensionality and random noise cleaning [37,38].

In Figure 7 the results of four cases (images) are presented, including an unstained sample, an H&E-stained sample, and the results from five models: three based on PCA (with five, ten, and twenty components), one based on rRGB, and one using the full spectrum. The results show that the model using the full spectrum mimics the stained tissue across the image with high accuracy. Nevertheless, it suffers from a few artifacts, particularly in bright regions. Additionally, both the full-spectrum and rRGB-based models failed to detect some nuclei and, in some cases, hallucinated nuclei that do not exist.

Based on an expert assessment, there is a high level of concordance between the commonly used Hematoxylin/Eosin (H&E) staining of the tissue and the ‘fake’ staining that was generated by AI. This is also supported by the statistical analysis presented above.

We hypothesize that the spatial metrics are limited because of structural changes in the staining process and artifacts, as one can see in Figure 7, which prevent high performance of the model without overfitting in the training set. This fact can explain the comparable spatial metric values. However, these models showed high-quality results, demonstrating the advantage of using SI data over RGB data. We assume that using cleaner data and autofluorescence information could improve the results even more.

4. Discussion

Although virtual H&E staining holds considerable promise for medical applications, further in-depth research is still required. To date, SI-based studies have been relatively scarce, primarily due to the technique’s inherent complexity and lengthy acquisition times. In this work, we present an innovative SI system based on Fourier spectroscopy that provides full spectral resolution at the 10 nm level while significantly shortening acquisition time, overcoming a major limitation of previous methods.

Previous studies have demonstrated the advantages of incorporating spectral information into AI-driven applications such as nucleus segmentation [9], cancer cells segmentation [10,39], and cancer detection [26,40,41]. Several works have also highlighted the value of SI in VS applications [19,42], as well as the utility of PCA in processing spectral data [43]. However, the extended acquisition time required for SI has remained a major limitation, often necessitating a compromise between data volume and spatial resolution. Fourier spectroscopy-based systems address this challenge by enabling the use of SI without these compromises.

A major challenge in SI that remains is the vast amount of data, much of which may be irrelevant, along with the noise introduced during acquisition. To address this, we evaluated a supervised AI-based VS approach using PCA for spectral dimensionality reduction. PCA serves as an effective down-sampling technique, while preserving essential information and simultaneously reducing noise.

The rapid SI system and its AI-based analysis have demonstrated excellent performance, surpassing both state-of-the-art results based on RGB information and results based on the full spectrum. This advantage persists even when one-dimensional PCA is utilized, further highlighting the effectiveness of the method. While the improvement in spatial metrics was minor, likely due to limitations caused by morphological differences between the unstained tissue and the H&E-stained tissue, a noteworthy improvement in statistical metrics was observed, with the highest result obtained for five spectral components.

We hypothesize that this behavior stems from the finite set of biological constituents that make up the tissue, each associated with characteristic absorption and transmission spectra. Consequently, the measured spectral data can be effectively represented by a finite number of components.

Accordingly, the integration of spectral imaging with AI and PCA in the analysis of pathological samples offers a rich source of information that is doable, practical, and able to enhance diagnostic accuracy in pathology.

Several points remain to be elucidated in future studies, including the following:

There is a difference in the morphology of unstained and stained tissue irrespective of ‘fake’ or true H&E staining. This is most obvious for areas without cells that are found throughout tissue (see Figure 7); such areas became smaller and changed their shapes after H&E staining. Considering this observation, other morphologies must change as well but are not as obvious to the eye. Such changes after H&E staining are not currently known in the field. Thus, further studies will need to be performed to define the staining-induced differences in morphology and to determine whether they matter in clinical diagnosis.
While the concordance between real H&E and ‘fake’ H&E is high (Table 1), ‘fake’ H&E appears to sometimes add ‘stained’ structures (looking like cells) or fails to identify some cells that were stained by H&E. Further studies need to investigate the reason for this difference.
It is expected that these minor differences between ‘fake’ H&E and true H&E staining should not impact the diagnostic value of the ‘fake’ H&E. It is important to perform a blind study that will compare a series of patient samples examined after fake H&E and true H&E staining. If the examination of such samples by a pathologist leads to identical results in diagnosis, the minor differences reported in this study should not matter for clinical evaluation.

5. Conclusions

Our results highlight the significance of incorporating spectral information beyond standard RGB data. By leveraging such methods, a biopsy stained with one staining method could yield information equivalent to that of another or even multiple staining methods. This approach has the potential to enhance diagnostic accuracy, providing pathologists with critical insights while eliminating the need for complex and time-consuming multiple-staining procedures.

Furthermore, here we used only the visible light range (400–700 nm) for spectral measurements. Nevertheless, we believe that also incorporating the invisible light spectrum, especially in the infrared spectral range, or further physical information, such as polarization and Raman scattering, could further improve the results.

The integration of spectral imaging with AI presents a transformative opportunity for the field of pathology. By enabling stain-free or stain-to-stain analysis, this approach has the potential to streamline diagnostic workflows, improve accuracy, and reduce turnaround times. The ability to extract rich biochemical and structural information from native tissue further enhances diagnostic sensitivity and specificity. Ultimately, these advancements not only support more precise and efficient pathology practices but also contribute to earlier disease detection and improved patient outcomes, marking a significant step forward in global healthcare.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/bioengineering12060655/s1, Table S1: The table of all the wavelengths that the system has detected; Figure S1: All the wavelengths according to their serial number; Figure S2: Similarity loss for all methods throughout the training stages (every 5 epochs).

Author Contributions

Conceptualization, A.S., M.A. and Y.G.; Methodology, A.S., M.A. and Y.G.; Software, A.S.; Validation, M.A.; Investigation, A.S., M.A. and Y.G.; Data curation, M.A.; Writing—original draft, A.S., M.A. and Y.G.; Writing—review & editing, A.S., M.A., S.M. and Y.G.; Visualization, A.S., M.A. and Y.G.; Supervision, Y.G.; Project administration, Y.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was Partially supported (AS, MA and YG) by the Israeli Science Foundation grants 2019/17 and 2624/22, the European ATTRACT program for Hyperspectral imaging for precision medicine in cancer diagnostics (HipMed) 2022 and the Zimin Institute for AI Solutions in Healthcare 2024. SM is partially supported by Canada Research Chair Tier 1.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

All human tissues are collected under HIPPA approved protocols.

Data Availability Statement

Due to the unique nature and complexity of the hyperspectral imaging data used in this study, we have opted to make the dataset available upon request. This approach ensures that we can provide appropriate guidance regarding the data’s structure, acquisition process, and intended use. Researchers interested in accessing the data are welcome to contact the corresponding authors at adam-soker@campus.technion.ac.il or yuval.garini@bm.technion.ac.il.

Conflicts of Interest

The authors declare no conflict of interest.

References

El Alaoui, Y.; Elomri, A.; Qaraqe, M.; Padmanabhan, R.; Yasin Taha, R.; El Omri, H.; EL Omri, A.; Aboumarzouk, O. A Review of Artificial Intelligence Applications in Hematology Management: Current Practices and Future Prospects. J. Med. Internet Res. 2022, 24, e36490. [Google Scholar] [CrossRef]
Serajian, M.; Testagrose, C.; Prosperi, M.; Boucher, C. A Comparative Study of Antibiotic Resistance Patterns in Mycobacterium Tuberculosis. Sci. Rep. 2025, 15, 5104. [Google Scholar] [CrossRef] [PubMed]
Irani, H.; Metsis, V. Enhancing Time-Series Prediction with Temporal Context Modeling: A Bayesian and Deep Learning Synergy. Int. FLAIRS Conf. Proc. 2024, 37. [Google Scholar] [CrossRef]
Shamai, G.; Livne, A.; Polónia, A.; Sabo, E.; Cretu, A.; Bar-Sela, G.; Kimmel, R. Deep Learning-Based Image Analysis Predicts PD-L1 Status from H&E-Stained Histopathology Images in Breast Cancer. Nat. Commun. 2022, 13, 6753. [Google Scholar] [CrossRef]
De Haan, K.; Zhang, Y.; Zuckerman, J.E.; Liu, T.; Sisk, A.E.; Diaz, M.F.P.; Jen, K.-Y.; Nobori, A.; Liou, S.; Zhang, S.; et al. Deep Learning-Based Transformation of H&E Stained Tissues into Special Stains. Nat. Commun. 2021, 12, 4884. [Google Scholar] [CrossRef]
Yoon, C.; Park, E.; Misra, S.; Kim, J.Y.; Baik, J.W.; Kim, K.G.; Jung, C.K.; Kim, C. Deep Learning-Based Virtual Staining, Segmentation, and Classification in Label-Free Photoacoustic Histology of Human Specimens. Light. Sci. Appl. 2024, 13, 226. [Google Scholar] [CrossRef] [PubMed]
Kleppe, A.; Skrede, O.-J.; De Raedt, S.; Liestøl, K.; Kerr, D.J.; Danielsen, H.E. Designing Deep Learning Studies in Cancer Diagnostics. Nat. Rev. Cancer 2021, 21, 199–211. [Google Scholar] [CrossRef] [PubMed]
Khare, S.K.; Blanes-Vidal, V.; Booth, B.B.; Petersen, L.K.; Nadimi, E.S. A Systematic Review and Research Recommendations on Artificial Intelligence for Automated Cervical Cancer Detection. WIREs Data Min. Knowl. Discov. 2024, 14, e1550. [Google Scholar] [CrossRef]
Soker, A.; Brozgol, E.; Barshack, I.; Garini, Y. Advancing Automated Digital Pathology by Rapid Spectral Imaging and AI for Nuclear Segmentation. Opt. Laser Technol. 2025, 181, 111988. [Google Scholar] [CrossRef]
Wahid, A.; Mahmood, T.; Hong, J.S.; Kim, S.G.; Ullah, N.; Akram, R.; Park, K.R. Multi-Path Residual Attention Network for Cancer Diagnosis Robust to a Small Number of Training Data of Microscopic Hyperspectral Pathological Images. Eng. Appl. Artif. Intell. 2024, 133, 108288. [Google Scholar] [CrossRef]
Gao, H.; Wang, H.; Chen, L.; Cao, X.; Zhu, M.; Xu, P. Semi-Supervised Segmentation of Hyperspectral Pathological Imagery Based on Shape Priors and Contrastive Learning. Biomed. Signal Process. Control 2024, 91, 105881. [Google Scholar] [CrossRef]
Lanaras, C.; Baltsavias, E.; Schindler, K. Hyperspectral Super-Resolution by Coupled Spectral Unmixing. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 13–16 December 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 3586–3594. [Google Scholar]
Garini, Y.; Tauber, E. Spectral Imaging: Methods, Design, and Applications. In Biomedical Optical Imaging Technologies: Design and Applications; Liang, R., Ed.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 111–161. ISBN 978-3-642-28391-8. [Google Scholar]
Lindner, M.; Shotan, Z.; Garini, Y. Rapid Microscopy Measurement of Very Large Spectral Images. Opt. Express 2016, 24, 9511. [Google Scholar] [CrossRef]
Bai, B.; Yang, X.; Li, Y.; Zhang, Y.; Pillar, N.; Ozcan, A. Deep Learning-Enabled Virtual Histological Staining of Biological Samples. Light. Sci. Appl. 2023, 12, 57. [Google Scholar] [CrossRef]
Li, J.; Garfinkel, J.; Zhang, X.; Wu, D.; Zhang, Y.; De Haan, K.; Wang, H.; Liu, T.; Bai, B.; Rivenson, Y.; et al. Biopsy-Free in Vivo Virtual Histology of Skin Using Deep Learning. Light. Sci. Appl. 2021, 10, 233. [Google Scholar] [CrossRef]
Fairman, H.S.; Brill, M.H.; Hemmendinger, H. How the CIE 1931 Color-Matching Functions Were Derived from Wright-Guild Data. Color. Res. Appl. 1997, 22, 11–23. [Google Scholar] [CrossRef]
Maćkiewicz, A.; Ratajczak, W. Principal Components Analysis (PCA). Comput. Geosci. 1993, 19, 303–342. [Google Scholar] [CrossRef]
McNeil, C.; Wong, P.F.; Sridhar, N.; Wang, Y.; Santori, C.; Wu, C.-H.; Homyk, A.; Gutierrez, M.; Behrooz, A.; Tiniakos, D.; et al. An End-to-End Platform for Digital Pathology Using Hyperspectral Autofluorescence Microscopy and Deep Learning-Based Virtual Histology. Mod. Pathol. 2024, 37, 100377. [Google Scholar] [CrossRef]
Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5967–5976. [Google Scholar] [CrossRef]
Hodson, T.O. Root-Mean-Square Error (RMSE) or Mean Absolute Error (MAE): When to Use Them or Not. Geosci. Model. Dev. 2022, 15, 5481–5487. [Google Scholar] [CrossRef]
Korhonen, J.; You, J. Peak Signal-to-Noise Ratio Revisited: Is Simple Beautiful? In Proceedings of the 2012 Fourth International Workshop on Quality of Multimedia Experience, Melbourne, Australia, 5–7 July 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 37–38. [Google Scholar]
Hassan, M.; Bhagvati, C. Structural Similarity Measure for Color Images. Int. J. Comput. Appl. 2012, 43, 7–12. [Google Scholar] [CrossRef]
Yu, Y.; Zhang, W.; Deng, Y. Frechet Inception Distance (FID) for Evaluating GANs; China University of Mining and Technology Beijing Graduate School: Beijing, China, 2021. [Google Scholar]
Bińkowski, M.; Sutherland, D.J.; Arbel, M.; Gretton, A. Demystifying MMD GANs. arXiv 2021, arXiv:1801.01401. [Google Scholar]
Brozgol, E.; Kumar, P.; Necula, D.; Bronshtein-Berger, I.; Lindner, M.; Medalion, S.; Twito, L.; Shapira, Y.; Gondra, H.; Barshack, I.; et al. Cancer Detection from Stained Biopsies Using High-Speed Spectral Imaging. Biomed. Opt. Express 2022, 13, 2503. [Google Scholar] [CrossRef] [PubMed]
Garini, Y.; Macville, M.; Du Manoir, S.; Buckwald, R.A.; Lavi, M.; Katzir, N.; Wine, D.; Bar-Am, I.; Schröck, E.; Cabib, D.; et al. Spectral Karyotyping. Bioimaging 1996, 4, 65–72. [Google Scholar] [CrossRef]
Garini, Y.; Young, I.T.; McNamara, G. Spectral Imaging: Principles and Applications. Cytom. Part. J. Int. Soc. Anal. Cytol. 2006, 69, 735–747. [Google Scholar] [CrossRef]
Farris, A.B.; Cohen, C.; Rogers, T.E.; Smith, G.H. Whole Slide Imaging for Analytical Anatomic Pathology and Telepathology: Practical Applications Today, Promises, and Perils. Arch. Pathol. Lab. Med. 2017, 141, 542–550. [Google Scholar] [CrossRef] [PubMed]
Bell, R. Introductory Fourier Transform Spectroscopy; Elsevier: Amsterdam, The Netherlands, 2012. [Google Scholar]
Boyat, A.K.; Joshi, B.K. A Review Paper: Noise Models in Digital Image Processing. Signal Image Process. Int. J. 2015, 6, 63–75. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI, Munich, Germany, 5–9 October 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Chan, T.; Esedoglu, S.; Park, F.; Yip, A. Total Variation Image Restoration: Overview and Recent Developments. In Handbook of Mathematical Models in Computer Vision; Paragios, N., Chen, Y., Faugeras, O., Eds.; Springer US: Boston, MA, USA, 2006; pp. 17–31. ISBN 978-0-387-28831-4. [Google Scholar]
Xu, J.; Li, Z.; Du, B.; Zhang, M.; Liu, J. Reluplex Made More Practical: Leaky ReLU. In Proceedings of the 2020 IEEE Symposium on Computers and Communications (ISCC), Rennes, France, 7–10 July 2020; pp. 1–7. [Google Scholar]
Huang, L.; Qin, J.; Zhou, Y.; Zhu, F.; Liu, L.; Shao, L. Normalization Techniques in Training DNNs: Methodology, Analysis and Application. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10173–10196. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. ICLR 2015, 3, 7–9. [Google Scholar]
Stelzer. Contrast, Resolution, Pixelation, Dynamic Range and Signal-to-noise Ratio: Fundamental Limits to Resolution in Fluorescence Light Microscopy. J. Microsc. 1998, 189, 15–24. [Google Scholar] [CrossRef]
Young, I.T. Quantitative Microscopy. IEEE Eng. Med. Biol. Mag. 1996, 15, 59–66. [Google Scholar] [CrossRef]
Liu, K.; Lin, S.; Zhu, S.; Chen, Y.; Yin, H.; Li, Z.; Chen, Z. Hyperspectral Microscopy Combined with DAPI Staining for the Identification of Hepatic Carcinoma Cells. Biomed. Opt. Express 2021, 12, 173. [Google Scholar] [CrossRef]
Ortega, S.; Halicek, M.; Fabelo, H.; Camacho, R.; Plaza, M.D.L.L.; Godtliebsen, F.; Callicó, G.M.; Fei, B. Hyperspectral Imaging for the Detection of Glioblastoma Tumor Cells in H&E Slides Using Convolutional Neural Networks. Sensors 2020, 20, 1911. [Google Scholar] [CrossRef] [PubMed]
Souza, M.M.; Carvalho, F.A.; Sverzut, E.F.V.; Requena, M.B.; Garcia, M.R.; Pratavieira, S. Hyperspectral Imaging System for Tissue Classification in H&E-Stained Histological Slides. In Proceedings of the 2021 SBFoton International Optics and Photonics Conference (SBFoton IOPC), Sao Carlos, Brazil, 31 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–4. [Google Scholar]
Bayramoglu, N.; Kaakinen, M.; Eklund, L.; Heikkila, J. Towards Virtual H&E Staining of Hyperspectral Lung Histology Images Using Conditional Generative Adversarial Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy, 22–29 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 64–71. [Google Scholar]
Zhu, R.; He, H.; Chen, Y.; Yi, M.; Ran, S.; Wang, C.; Wang, Y. Deep Learning for Rapid Virtual H&E Staining of Label-Free Glioma Tissue from Hyperspectral Images. Comput. Biol. Med. 2024, 180, 108958. [Google Scholar] [CrossRef]

Figure 1. Method workflow. (A) Scanning tissue sections with the spectral imaging system, once before the staining process and then again after H&E staining. (B) After dividing the images into smaller patches and registering them in pairs, the images are down-sampled to rRGB. Unstained images are also down-sampled using PCA. (C) The pairs are used to train supervised Pix2Pix GAN models, once with rRGB and once with PCA, to create images of virtual H&E stain.

Figure 2. Example of nuclei and connective tissue spectra from images of an unstained sample (left) and an H&E-stained sample (right). Note that the unstained tissue is highly transparent and reveals very few details. To improve its visibility, we applied a color enhancement that renders the image bluish and somewhat artificial appearance, while emphasizing the spatial details of the tissue. The bottom graphs show the average transmission spectra from the blue and red squares shown in the images. The images area is

141 \times 141 [{μ m}^{2}]

as measured with a 20× magnification.

Figure 2. Example of nuclei and connective tissue spectra from images of an unstained sample (left) and an H&E-stained sample (right). Note that the unstained tissue is highly transparent and reveals very few details. To improve its visibility, we applied a color enhancement that renders the image bluish and somewhat artificial appearance, while emphasizing the spatial details of the tissue. The bottom graphs show the average transmission spectra from the blue and red squares shown in the images. The images area is

141 \times 141 [{μ m}^{2}]

as measured with a 20× magnification.

Figure 3. SI system scheme. The collimated beam emerging from the microscope objective lens passes through the Sagnac interferometer, where an optical path difference is introduced for each beam based on its entrance angle, as illustrated by blue and yellow ray traces. At the exit from the interferometer, a lens focuses the light onto a CMOS camera. As the sample is scanned, an interferogram is recorded at each pixel. These interferograms are then Fourier-transformed to retrieve the corresponding spectra, yielding a full spectrum for each pixel in the image.

Figure 5. The six metrics for all types of datasets are displayed across three different scatter plots. All dataset methods achieved similar results in the spatial metrics, including L1, RMSE, PNSR, and SSI, with a small advantage for PCA-10 and PCA-11. However, the evaluation of the results using statistical measures (FID and KID) shows a clear advantage for PCA-5. Arrows on the figure titles indicate the direction of better results.

Figure 6. Reconstruction information of 1-Loss(n) as a function of the number of components. Notably, the first component captures over 50% of the total information, while the first 5 components account for 75% of the information.

Figure 7. Examples of images reconstructed from unstained tissues using different models. The reconstructions based on full SI and rRGB exhibited higher noise levels and reduced effectiveness in identifying nuclei compared to results obtained using PCA. In addition, the VS method presented a cleaner result than the GT method. Each image is size is

141 \times 141 [{μ m}^{2}]

as measured with a 20 × magnification. The black arrows indicate inaccuracies of the full SI- and rRGB-based models compared to the PCA-5 based model.

Figure 7. Examples of images reconstructed from unstained tissues using different models. The reconstructions based on full SI and rRGB exhibited higher noise levels and reduced effectiveness in identifying nuclei compared to results obtained using PCA. In addition, the VS method presented a cleaner result than the GT method. Each image is size is

141 \times 141 [{μ m}^{2}]

as measured with a 20 × magnification. The black arrows indicate inaccuracies of the full SI- and rRGB-based models compared to the PCA-5 based model.

Table 1. Summary of the performance of selected methods across six different evaluation metrics. The models were trained using the full spectral information, RGB-only reduction, and various numbers of principal components (5,10,15 and 20). Bold values indicate the best performance for each metric. The arrows next to the metrics names indicate the direction of better results. A down-pointing arrow means that a smaller number relates to a better performance and vice versa.

Dataset	FID↓	KID↓	L1↓	RMSE↓	SSIM↑	PSNR↑
Full SI	$0.1002$	$147 \cdot 10^{3}$	$0.0481$	$0.0687$	$0.693$	$26.56$
rRGB	$0.1096$	$155 \cdot 10^{3}$	$0.0490$	$0.0696$	$0.689$	$26.33$
PCA-5	$0.0614$	$80 \cdot 10^{3}$	$0.0486$	$0.0690$	$0.686$	$26.46$
PCA-10	$0.0687$	$90 \cdot 10^{3}$	0.0500	$0.0714$	$0.684$	$26.73$
PCA-15	$0.1005$	$149 \cdot 10^{3}$	$0.0491$	$0.0699$	$0.694$	$26.31$
PCA-20	0.1180	$140 \cdot 10^{3}$	0.0504	0.0717	0.687	26.30

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Soker, A.; Almagor, M.; Mai, S.; Garini, Y. AI-Powered Spectral Imaging for Virtual Pathology Staining. Bioengineering 2025, 12, 655. https://doi.org/10.3390/bioengineering12060655

AMA Style

Soker A, Almagor M, Mai S, Garini Y. AI-Powered Spectral Imaging for Virtual Pathology Staining. Bioengineering. 2025; 12(6):655. https://doi.org/10.3390/bioengineering12060655

Chicago/Turabian Style

Soker, Adam, Maya Almagor, Sabine Mai, and Yuval Garini. 2025. "AI-Powered Spectral Imaging for Virtual Pathology Staining" Bioengineering 12, no. 6: 655. https://doi.org/10.3390/bioengineering12060655

APA Style

Soker, A., Almagor, M., Mai, S., & Garini, Y. (2025). AI-Powered Spectral Imaging for Virtual Pathology Staining. Bioengineering, 12(6), 655. https://doi.org/10.3390/bioengineering12060655

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AI-Powered Spectral Imaging for Virtual Pathology Staining

Abstract

1. Introduction

2. Methodology

2.1. Spectral Imaging System

2.2. Sample Preparation

2.3. Principal Component Analysis

2.4. Network Architecture

3. Results

Model Training

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI