1. Introduction
Histopathological imaging is crucial for clinical diagnosis and biomedical research because it reveals the microscopic structure of tissue. In particular, whole slide imaging (WSI) has brought a digital revolution to traditional histopathology by enabling storing, sharing, and analyzing high-resolution images of the tissue at scale. This advancement not only improves diagnostic efficiency but also makes it possible for experts to collaborate remotely. These advances enhance clinical decision-making and improve patient outcomes [
1]. However, despite these improvements, both traditional and digital histopathology methods still encounter technical and practical challenges.
Digital whole slide imaging (WSI) faces issues such as color fidelity, large file sizes that impact storage and transfer, and technical difficulties such as slide scanning errors and focus issues, making it more difficult to use in clinical routines. Similarly, traditional microscopy remains limited by difficulties in accurately grading dysplasia, quantifying mitoses, and precisely delineating tumor margins [
2]. Emerging imaging modalities, such as hyperspectral and multispectral imaging, have shown potential for biomarker detection and digital staining; however, their clinical translation is constrained by the lack of standardized acquisition protocols and extensive validation studies [
3].
In addition to logistical benefits, digital microscopy in histological imaging can increase diagnostic accuracy by enabling detailed examination of tissue structure and cellular characteristics. Computational methods and artificial intelligence (AI) in the workflow of pathology have further enhanced this process with automated extraction of features, pattern recognition, and reproducibility in diagnostics. These developments highlight the growing need for interdisciplinary collaboration among pathologists, computer scientists, and clinicians to ensure that newly developed tools align with the practical needs of clinical diagnostics [
1]. However, these computational methods still depend on traditional stained images, which limit how much biological information they can utilize.
Deep learning specifically has shown considerable impact on digital pathology by automating tasks such as tissue segmentation, enhancing digital staining techniques, and enabling quantitative tissue classification [
4]. Recent studies demonstrate that deep learning algorithms, including convolutional neural networks, achieve diagnostic accuracy comparable to experienced pathologists in tasks like tumor detection and histological grading [
5]. Moreover, integration of deep learning-based digital pathology with genomic and transcriptomic data presents further opportunities for diagnostic precision and personalized medicine approaches [
6].
Despite these developments, hematoxylin and eosin (H&E) staining remains limited in several aspects, despite being a classical technique in traditional histopathology. While it is useful for providing morphological details, it lacks the capacity to reveal the molecular or functional details required for accurate diagnosis. Furthermore, H&E interpretation is subjective and largely dependent on the pathologist’s skill and experience, introducing inherent variability in diagnosis. Additionally, H&E staining only provides a static snapshot of tissue structure at a single point in time. As a result, it can miss important variations within different regions of the tissue and cannot show how biological processes change over time. These limitations are particularly significant in complex diseases such as cancer, where the tissue environment varies significantly. H&E also has limited sensitivity for detecting subtle abnormalities and an inability to measure biochemical activity within tissues [
7].
To overcome these challenges, researchers have developed multimodal imaging methods that provide a broader range of biological information. Highly advanced methods such as Coherent Anti-Stokes Raman Scattering (CARS) microscopy, Two-Photon Excitation Fluorescence (TPEF) microscopy, and Second Harmonic Generation (SHG) microscopy, which provide non-destructive visualization of the tissue and deliver molecular, structural, and functional information without the need for staining [
8]. These modalities offer subcellular resolution and enable quantitative, high-throughput analysis. When integrated into digital pathology platforms, they enable large-scale automated assessments that address the increasing needs of modern diagnostics. Multimodal imaging provides a comprehensive characterization of tissue structures by combining complementary imaging techniques to integrate both morphological and chemical composition.
Despite rapid technological progress, several obstacles remain for the widespread clinical adoption of digital and multimodal imaging in pathology. Key challenges include the availability of high-quality annotated data, variability in imaging protocols, computational and storage demands, and the integration of new tools into existing laboratory workflows [
9]. Multi-center validation and standardization are required to ensure that new imaging techniques provide reliable and reproducible results. Nevertheless, recent research demonstrates that combining deep learning with multi-modal imaging can automate diagnostic processes, reduce pathologist workload, and support clinical decision-making by improving objectivity and reproducibility [
10].
In the context of assessing different imaging modalities, information content is essential for comparison. Information content describes the features that can be extracted from an image and is influenced by characteristics such as texture, intensity variation and the spatial relationships between pixels [
11]. Quantitative measurement of information content is based on various techniques, including statistical and histogram analysis methods and entropy-based measurements. These techniques provide systematic approaches to describing images [
12].
Comparative analysis of information content across imaging modalities has important clinical and research implications. By identifying the modality that provides the most diagnostic or biological information, clinicians and researchers can make more focused and efficient imaging decisions. These comparisons lead to the development of hybrid imaging protocols that integrate the strengths of various techniques and enhance diagnostic accuracy. In addition, optimizing imaging strategies based on information content can improve data quality, reduce acquisition time, and minimize patient risk. Therefore, a systematic evaluation of information content is essential for advancing imaging technologies and methodologies [
7,
13].
In this study, we compare the information content of histological and multimodal images systematically, using various feature extraction techniques and quantitative analysis metrics, to identify which modality provides the most information from tissue.
2. Related Work
Although digital pathology and biomedical image analysis have advanced significantly, the comparative analysis of multimodal imaging methods remains limited. Recent work by Bocklitz et al. [
14] demonstrates how integrating nonlinear microscopy (CARS, TPEF, SHG; LSM 510 Meta, Zeiss, Jena, Germany) with Raman spectroscopy can generate pseudo-H&E images, illustrating the potential for enhanced cancer screening by combining diverse modalities. Similarly, Barker et al. [
15] showed that analyzing quantitative feature metrics across whole-slide images can improve automated tumor classification, highlighting the importance of information content analysis. Despite these advances, most studies focus on single-modality analysis of methods like H&E stained whole-slide images, which limits the investigation of the combined diagnostic advantages offered by complementary imaging techniques.
Madabhushi and Bhargava [
16] identified fundamental approaches for integrating molecular image analysis with digital pathology but highlighted the lack of operational frameworks for systematic multimodal comparison. Waqas et al. [
17] presented the potential of multimodal learning that merges the strengths of histology with radiology and omics data, but reported limited benchmarking against unimodal approaches, while Baig et al. [
18] reviewed applications of generative AI in anatomic pathology and observed that multimodal large language models remain underexplored in combined pathology interpretation. Campanella et al. [
5] showed that automated analysis of information content in large, diverse pathology image datasets can achieve clinical-grade diagnostic performance, supporting the practical use of information theory driven metrics in digital pathology. Expanding on these foundations, Li et al. [
19] conducted a comprehensive review of deep learning-based multimodal fusion strategies for medical image classification, categorizing current methods into input, intermediate, and output fusion, and emphasizing the potential of Transformer models for capturing complex information content across modalities. Beyond underrepresentation, a major gap is the absence of quantitative evaluations of information content across modalities. Many existing studies rely on qualitative evaluations, with a lack of objective measurement to evaluate the effectiveness of multimodal integration. Meyer et al. [
20] expanded multimodal microscopy with high spectral resolution CARS, demonstrating enhanced diagnostic capabilities in arterial tissue by combining information from multiple optical channels. Schmitt et al. [
21] further demonstrated the clinical potential of multimodal CARS, TPEF, and SHG microscopy for intraoperative tissue classification, underlining the growing role of integrated nonlinear imaging in clinical workflows.
Rajkumar and Kavitha [
22] demonstrated that entropy and cross-entropy calculations can directly quantify the increase in information content achieved by fusing images from different modalities, providing an objective measure of fusion quality. Razlighi and Kehtarnavaz [
12] proposed spatial entropy as a potential measure of comparing image information, but this remains underexplored in the context of histopathology. Kothari et al. [
23] highlighted the critical importance of extracting and analyzing information metrics at the pixel and object level, demonstrating that quantitative analysis of information content directly supports decision making in whole slide pathology imaging. Krafft et al. [
24] compared CARS and linear Raman imaging in colon tissue, illustrating that combining these modalities enhances chemical mapping and tissue characterization beyond what is possible with a single method. Race et al. [
25] presented a deep learning-based annotation transfer workflow for multimodal mass spectrometry imaging and H&E, which enables automated integration of disparate data sources and improved analysis of tumor heterogeneity.
Several studies have emphasized the importance of information theory in image evaluation. Tsai et al. [
26] proved that radiographic image quality could be directly measured using Shannon entropy and that information content correlates with diagnostic clarity. Kuwil [
27] presented a new feature extraction approach based on statistical encoding of distributions of data that has the potential to greatly improve the classification rate in medical images. Similarly, Al-Thelaya et al. [
1] reviewed the applications of color histograms and discriminative deep learning networks in the analysis of whole-slide histological images and highlighted the value of robust feature extraction techniques. Furthermore, Park et al. [
7] used radiomics-based extraction in neuroimaging, underscoring the importance of quantitative evaluation in multimodal imaging contexts.
Together, these studies highlight two gaps in the literature: the underrepresentation of multimodal approaches in comparative image analysis and the absence of standardized quantitative frameworks for evaluating the content of information across combined imaging modalities. Although feature extraction and analysis techniques have been developed, few studies have systematically used these methods to test the contribution of individual imaging sources to image features, or how they complement each other in histological interpretation. This emphasizes the need for systematic research that not only compares the imaging modalities and measures the information content they provide. Addressing this gap, our study presents a systematic, metric-based approach for evaluating both H&E and multimodal histological images, offering new insight into the advantages of multimodal integration.
3. Materials and Methods
3.1. Data Collection
This study uses histological and multimodal images originally introduced by Chernavskaia et al. [
28], which are obtained from inflammatory bowel disease (IBD) patients’ colon tissue sections. The entire imaging pipeline is presented in
Figure 1, which describes the process of obtaining H&E and multimodal datasets from the same biopsy section.
The multimodal dataset consists of label-free nonlinear multimodal images that were obtained using Coherent Anti-Stokes Raman Scattering (CARS) microscopy, Two-Photon Excitation Fluorescence (TPEF) microscopy, and Second Harmonic Generation (SHG) microscopy. Each modality highlights specific biological structures, as shown in
Figure 2. The same tissue sections were later stained with hematoxylin and eosin (H&E) and imaged with standard brightfield microscopy. This provided spatial correspondence of the multimodal images with standard histopathological views.
A total of 20 image pairs were analyzed, each containing a multimodal image and its corresponding H&E-stained image. Each image pair contained the same anatomical regions in a single tissue section. To ensure fair and spatially consistent comparison among modalities, each image pair was aligned into a coordinate system using the Lucas–Kanade optical flow registration method [
29]. The resulting transformation matrices were applied to align multimodal images with their corresponding H&E-stained images. This alignment ensured a fair comparison between modalities before the extraction of features as well as quantitative analysis.
3.2. Feature Extraction
Feature extraction employed both deep learning- and radiomics-based methods. To capture localized details, each image was divided into 100 non-overlapping patches of the same size. Feature extraction was performed individually on each patch in order to maintain spatial detail.
Pre-trained convolutional neural networks, including VGG16 [
30], ResNet50 [
31], DenseNet121 [
32], InceptionV3 [
33], and MobileNetV2 [
34], were employed to extract deep features. Each of these networks was used in average pooling mode and applied to the input images without additional training. The resulting extracted features were saved in CSV format for analysis.
In addition to deep features, radiomic features were extracted with the PyRadiomics library. These consisted of first-order statistics, texture features, and shape-based descriptors [
35]. Together, the deep learning and radiomics features offered complementary representations of H&E and multimodal images for quantitative comparison.
3.3. Quantitative Analysis of Information Content
Various quantitative methods were applied to the extracted features to compare and evaluate the information content in H&E and multimodal images. Principal component analysis (PCA) was used for dimensionality reduction and to assess variance distribution. The cumulative proportion of variance explained by the principal components was then calculated. This curve was then used to quantify information content using the following metrics:
A key metric was the inverse area under the curve (1-AUC) derived from the cumulative variance plot. A high value of 1-AUC indicates that a significant amount of variance is explained by fewer components, suggesting a more compact and informative feature representation.
In addition, an inverse power law curve fitting model was applied to the cumulative variance distribution:
where
is the learning rate,
is the decay rate, and
. It describes the decay of explained variance across the principal components and offers some insight into the efficiency of information representation [
36].
Uncertainty and distribution spread of extracted features were calculated using Shannon entropy. Higher entropy values indicate more complexity and variability in information content. For visualization, radar plots were generated to compare multiple information content metrics across imaging modalities. Each radar plot incorporated Shannon entropy, inverse area under the curve (1-AUC) of the PCA variance plot, the 95th percentile of cumulative explained variance (PC95), and a logarithmic expression of the inverse power law parameters, computed as
. This multi-metric visualization enables comparative analysis of how each modality captures and distributes information. Collectively, these metrics provide a multifaceted analysis of the information content representations, as shown in
Figure 3.
3.4. Implementation Details
All analyses were implemented using custom-written functions in Python 3.9 programming language. Radiomic features were obtained using the PyRadiomics library, which is installed directly from its official GitHub repository (pip install git+
https://github.com/AIM-Harvard/pyradiomics.git, accessed on 11 December 2024). Deep learning features were extracted using pre-trained convolutional neural networks from TensorFlow Keras, including DenseNet121, MobileNetV2, InceptionV3, ResNet50, and VGG16. All feature extraction and evaluation procedures were carried out on a local machine equipped with an AMD Ryzen Threadripper 3960X 24-core processor (48 threads, 3.79 GHz), 128 GB RAM, and dual NVIDIA GeForce RTX 3090 GPUs with 24 GB of GDDR6 RAM each. The machine was set up with CUDA 12.6.
4. Results
4.1. Comparison of H&E, Multimodal, and Combined Datasets
To evaluate the differences in the information content of the H&E, MM, and combined datasets, six feature extraction methods were employed and evaluated using Shannon entropy, inverse area under the curve (1-AUC), log-transformed inverse power law fit, and the number of principal components required to explain 95% of the variance (PC95). We present a general pipeline of the analysis in
Figure 4, which summarizes the main steps of data collection, feature extraction, and analysis.
The combined dataset demonstrated the highest entropy across all feature extraction methods, indicating greater complexity and variability in the extracted features. For example, entropy values for the combined dataset were 0.5131 (DenseNet121), 0.5385 (InceptionV3), and 0.5740 (MobileNetV2), compared to 0.4482, 0.5001, and 0.5310 for H&E, respectively.
Similarly, 1-AUC values were higher for the combined dataset across all feature extraction methods. As shown in
Figure 5, a higher proportion of the variance was captured by fewer principal components, reflecting a more concentrated information content representation. For example, MobileNetV2 provided 1-AUC values of 0.0617 for H&E, 0.0880 for multimodal, and 0.1053 for combined dataset. Other feature extraction methods also showed higher values for combined dataset.
The combined dataset required a larger number of components to explain 95% of the cumulative variance (PC95) compared to individual modalities. For example, the PC95 for the combined dataset achieved 62 for MobileNetV2 compared to 33 and 47 for H&E and multimodal, respectively.
The logarithmic formulation of the inverse power law, computed as
, was higher (least negative) for the combined dataset across all methods. This indicates a slower decay of cumulative variance and, thus, higher information content. For example, the values for DenseNet121 Combined, InceptionV3 Combined, and MobileNetV2 Combined were −0.7670, −0.6001, and −0.8534, compared to more negative values for individual H&E or multimodal datasets alone. Corresponding PCA variance plots with the fitted inverse power law functions for each of the feature extraction methods are shown in
Figure 6. The PC95 values and fitting parameters a, b, and c are also provided for the H&E, MM, and combined datasets.
The radar plots summarizing entropy, 1-AUC, PC95, and log-inverse power law parameters visually confirmed that the combined dataset encloses a greater area consistently, indicating higher information content as presented in
Figure 7.
Overall, the results demonstrate that combining H&E and multimodal image analysis enhances the descriptive features extracted across all evaluated measures and methods of feature extraction.
4.2. Comparison of Individual Channels in the Multimodal Image
To evaluate the contribution of the individual channels of multimodal images, the red (CARS microscopy @ 2930 cm
−1), green (TPEF microscopy @ 503–548 nm), and blue (SHG microscopy) images (LSM 510 Meta, Zeiss, Jena, Germany) were analyzed separately as well as their combination in RGB. Six feature extraction methods were applied, followed by the calculation of Shannon entropy, the logarithmic form of the inverse power law fit, the number of principal components that account for 95% of the variance (PC95), and 1-AUC (see
Supplementary Figures S1 and S2). This comparison aims to determine whether the combination of color channels enhances the information content of multimodal images.
Across all feature extraction methods, the combined RGB configuration had the highest entropy, indicating higher information content in the extracted features. For example, entropy values for combined images reached 0.5040 in the case of MobileNetV2, 0.4867 with ResNet50, and 0.5014 with PyRadiomics, compared with individual channels. Among individual images, TPEF microscopy @ 503-548 nm (Green 1) and CARS microscopy @ 2930 cm−1 (Red 1) images tended to have a greater entropy value compared to SHG microscopy (Blue) images.
The outcomes of the inverse power law fit highlighted differences across channels, with combined RGB images generally showing higher information content. Regarding PC95, the combined dataset required a higher number of principal components to achieve 95% cumulative variance, indicating that a broad range of features contributed to the overall representation. For instance, MobileNetV2 needed 76 components for the combined data, compared with 24, 26, and 29 for single modalities.
All measurement metrics are summarized visually in
Figure 8 with radar plots. From these plots, we see that the combined RGB channel covered the largest area across all feature extraction methods, demonstrating superior in information content. CARS microscopy @ 2930 cm
−1 (Red 1) and TPEF microscopy @ 503-548 nm (Green 1) images performed similarly, while SHG microscopy (Blue) consistently had the lowest values.
Collectively, these findings indicate that combining the individual modalities significantly enhances the information content of feature representations in multimodal images.
4.3. Comparison of CARS Microscopy Channels at 2930 cm−1, 2850 cm−1, and Their Combination
In order to compare the spectral variation in CARS microscopy, the CARS microscopy @ 2930 cm
−1 (Red 1) and CARS microscopy @ 2850 cm
−1 (Red 2) channels were compared individually and combined. Feature extraction was conducted using six different techniques. Each channel configuration was evaluated using four criteria: Shannon entropy, 1-AUC, PC95, and the logarithmic expression of the inverse power law (see
Supplementary Figures S3 and S4). A summary of these metrics is shown in
Figure 9.
Although CARS microscopy @ 2930 cm−1 (Red 1) and CARS microscopy @ 2850 cm−1 (Red 2) each contain valuable information, their combination offers greater entropy, PC95, and 1-AUC in most cases. These findings suggest that incorporating multiple CARS microscopy spectral components enhances the tissue’s overall information content.
5. Discussion
The information content of multimodal images and histological tissue images was evaluated quantitatively using various feature extraction methods and information content-based analysis metrics. Three comparative analyses were performed: (1) H&E vs. multimodal vs. combined datasets, (2) individual multimodal channels vs. their combination, and (3) CARS microscopy spectral components @ 2930 cm−1 and @ 2850 cm−1 vs. their combination. In all cases, the combined datasets outperformed the individual modalities with higher entropy, 1-AUC values, PC95, and less negative inverse power law logarithmic formulation values.
In the H&E vs. multimodal image comparison, the combined dataset showed higher information content than individual modalities. This pattern persisted across all six feature extraction methods, with the most significant differences observed by MobileNetV2 and InceptionV3. In the second comparison, the combined dataset again demonstrated the greatest information content, with the CARS microscopy @ 2930 cm−1 (Red 1) and TPEF microscopy @ 503-548 nm (Green 1) images generally superior to the SHG (Blue) microscopy. Finally, in the CARS microscopy @ 2930 cm−1 (Red 1) vs. CARS microscopy @ 2850 cm−1 (Red 2) comparison, the combined CARS channel produces the best results across all metrics. This confirms that the combination of spectral domains across various CARS microscopy excitation wavelengths provides complementary chemical and structural information.
These observations support the idea that combining the modalities enhances the quality of the extracted feature representations. The consistency of the results across evaluation metrics and feature types suggests that a quantitative evaluation of information content is a reliable framework for comparing modalities and for developing more informative image-based diagnostic systems.
In conclusion, this paper quantitatively demonstrated that combining imaging modalities increase information content, as evidenced by consistently higher Shannon entropy, 1-AUC values, and PC95 across all tested cases, including H&E vs. multimodal microscopy vs. combined datasets, individual multimodal microscopy, and CARS microscopy channel splits. These objective criteria provide a reproducible framework for comparing imaging modalities and support the added value of multimodal approaches in biomedical image analysis. It also provides a method for objectively comparing imaging using quantitative criteria. These findings have implications for both research and clinical practice, where improving the accuracy and interpretability of image analysis remains essential.
While these results are promising, there are limitations to note. The evaluation was performed on a selected dataset of colonic tissue sections acquired using a specific imaging setup. This may affect the generalizability of the results to other tissue types, disease conditions or imaging environments. Nevertheless, the consistent trends observed across all feature extraction methods and measurement criteria suggest that the proposed approach is robust and could be applicable in a broader context. Further validation with diverse datasets is necessary to establish its adaptability and reliability.
Looking ahead, future research could extend this work in several directions. A key next step is the extension of the evaluation framework to other tissue types and diseases to determine its generalizability across a broad range of biomedical contexts. The inclusion of additional imaging modalities, for example immunohistochemistry, may further reveal the advantages of multimodal fusion.
In addition, connecting the evaluated information content with practical tasks such as classification, segmentation, or registration would help clarify how differences in information content influence diagnostic or analytical outcomes. Lastly, developing standardized tools or open-source libraries for automated analysis of image information content would facilitate wider application of this methodology across digital pathology and biomedical image research.
6. Conclusions
This study presents a robust quantitative framework for evaluating information content in histological and multimodal tissue images using multiple feature extraction methods and established analysis metrics. Comparison among H&E, multimodal, and combined imaging modalities consistently demonstrate that combined datasets capture more information content based on entropy, 1-AUC, PC95, and inverse power law parameters. The higher information content of combined datasets provides more biological details of tissue images. These details can improve diagnostic accuracy and pathology interpretation.
Our results emphasize the practical advantages of integrating various imaging channels. By applying complementary information content metrics and extraction pipelines, we discovered that combining multiple imaging modalities significantly increases the diversity and complexity of the features extracted from whole images and individual channels. Analysis of CARS microscopy, TPEF microscopy and SHG microscopy channels revealed that each modality offers distinct insights and their combination enhances tissue characterization more effectively than using any single modality. The improvements observed in our combined datasets reflect a general enhancement across all information content metrics, suggesting the broad applicability of our framework for research and routine clinical decision-making.
As digital pathology advances, the use of objective evaluation strategies that are unified, reproducible, and transparent comparison strategies will become increasingly important for clinical workflows. Our framework offers a reproducible approach to evaluate how different imaging modalities contribute to feature representation in histological image analysis. In order to bridge the gap between quantitative analysis and clinical use requires accessible software and standardized workflows are required that can be integrated into digital pathology systems and medical records for routine adoption by multidisciplinary care teams. This approach could support the development of more effective imaging strategies and inform future computational pathology efforts by guiding modality selection and feature evaluation. Future work should focus on integrating this quantitative evaluation with clinical outcome data in order to assess diagnostic and prognostic improvements directly. Additionally, the development of user-friendly tools for automated analysis of information content will facilitate integration into research and clinical workflows, accelerating the translation of multimodal imaging into practical benefits.
Our study is based on a single, well-characterized colonic tissue dataset that uses a specific imaging and processing pipeline. This may limit the generalizability of our findings. Future work should validate these findings across different tissue types, disease states, technical platforms, and larger, more varied cohorts. Additionally, future work should investigate how quantitative metrics relate to clinical endpoints, such as diagnostic accuracy, prognostic value, and therapeutic decision support.
In summary, our findings support the systematic quantification of information content as a foundation for integrating new imaging modalities and analytic techniques in digital pathology. We anticipate that the continued refinement and validation of our framework, combined with efforts to relate imaging information to patient outcomes, will help advance precision diagnostics and improve clinical care.