Interpreting Performance of Deep Neural Networks with Partial Information Decomposition

Tianyue Liu; Binghui Guo; Ziqiao Yin; Zhilong Mi; Donghui Jin

doi:10.3390/e28010050

,

and

¹

School of Artificial Intelligence, Beihang University, Beijing 100191, China

²

Zhongguancun Laboratory, Beijing 100194, China

³

LMIB and SKLCCSE, Beihang University, Beijing 100191, China

⁴

Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beijing 100083, China

Entropy2026, 28(1), 50;https://doi.org/10.3390/e28010050
(registering DOI)

Version Notes

Order Reprints

Abstract

Robustness to distributional shifts remains a critical limitation for deploying deep neural networks (DNNs) in real-world applications. While DNNs excel in standard benchmarks, their performance often deteriorates under unseen or perturbed conditions. Understanding how internal information representations relate to such robustness remains underexplored. In this work, we propose an interpretable framework for robustness assessment based on partial information decomposition (PID), which quantifies how neurons redundantly, uniquely, or synergistically encode task-relevant information. Analysis of PID measures computed from clean inputs reveals that models characterized by higher redundancy rates and lower synergy rates tend to maintain more stable performance under various natural corruptions. Additionally, a higher rate of unique information is positively associated with improved classification accuracy on the data from which the measure is computed. These findings provide new insights for understanding and comparing model behavior through internal information analysis, and highlight the feasibility of lightweight robustness assessment without requiring extensive access to corrupted data.

Keywords:

partial information decomposition; model interpretation; corruption robustness

1. Introduction

In recent years, deep neural networks (DNNs) have achieved remarkable success in various domains such as image recognition [1], recommendation systems [2], and natural language processing [3]. However, their performance heavily relies on the distribution of training data [4], making them vulnerable to distributional shifts between training and test domains [5]. This issue is particularly critical in autonomous driving, where complex and dynamic environments introduce natural corruptions such as illumination changes, adverse weather, and blur or defocus of images, all of which can severely impair visual perception models [6]. Such degradations not only reduce model accuracy but may also lead to serious safety failures, making corruption robustness a key prerequisite for deploying DNNs in autonomous driving systems. Empirical evidence shows that mainstream DNNs often experience substantial accuracy degradation when exposed to such corruptions [7]. Although augmenting training data with corrupted samples may improve robustness, it often leads to overfitting to seen corruption types, limiting generalization to unseen degradation patterns [8]. These challenges underscore the need for reliable corruption robustness evaluation and a deeper understanding of its structural foundations. Mainstream corruption robustness evaluation methods typically assess model performance degradation under naturally corrupted inputs using benchmark datasets [9], or quantify sensitivity through output fluctuations and confidence shifts [10]. However, these approaches often require large volumes of corrupted data and are computationally expensive. More importantly, being based on posterior observations, they offer limited insight into the intrinsic relationship between a model’s internal information structure and its robustness.

Information theoretic approaches have increasingly informed the theoretical understanding of DNNs [11]. Researchers have sought to characterize how information is transmitted and transformed across neural layers using concepts such as mutual information and entropy, aiming to uncover their intrinsic links to generalization, representational efficiency, and robustness. The Information Bottleneck (IB) theory [12] provided an early perspective by framing deep learning as a trade-off between input compression and task-relevant retention, with subsequent studies linking compression dynamics to generalization [13,14].

Building on this, partial information decomposition [15] (PID) has emerged as a refined multivariate framework that decomposes mutual information into redundant, synergistic, and unique components, enabling deeper insights into how neurons share, integrate, and specialize information. PID has been applied to both biological and artificial neural systems. In neuroscience, Luppi et al. [16] used PID to map the brain’s informational architecture, while Varley et al. [17] employed partial entropy decomposition to uncover higher-order structures in human activity. In artificial systems, Wollstadt et al. [18] defined redundancy and relevancy for feature selection, Wibral et al. [19] proposed PID as a unified framework for neural goal functions, and Dewan et al. [20] applied Diffusion PID to interpret generative models. Together, these studies highlight PID’s role in elucidating information dynamics, refining feature selection, and advancing model interpretability. Recent studies have begun to investigate the functional roles of the components defined by PID. Proca et al. [21] applied PID to simple neural networks across supervised and reinforcement learning tasks, finding that synergistic information supports multimodal integration and multitask learning, while redundancy correlates with robustness, highlighting the role of internal information dynamics in general learning ability. Moreover, Tax et al. [22] used PID to analyze hidden neuron representations in Boltzmann machines, revealing a staged learning pattern from redundancy to uniqueness. While these findings provide empirical support for the functional significance of PID components, they are mostly derived from small-scale networks and have not yet been extended to more complex architectures or realistic tasks.

Inspired by the aforementioned studies, this work investigates how the structure of neuronal information interactions relates to model performance, with particular emphasis on robustness under natural corruptions. We introduce a robustness assessment approach based on PID measures, aiming to estimate a model’s stability across corruption scenarios solely from the information structure revealed by its neural activations on clean images. Just as network depth is widely seen as a structural indicator of model expressiveness [23], we explore whether information structure can likewise serve as a prior for robustness, enabling efficient model assessment and comparison. Our main contributions are: (1) we reveal how redundant, synergistic, and unique information components differentially account for performance variation under corruption; (2) through empirical analysis on two benchmark corruption datasets and six mainstream architectures, we validate the potential of inferring robustness using PID metrics computed from clean samples, providing theoretical insights into model assessment and design.

2. Partial Information Decomposition

In information theory, Mutual Information (MI) [24] is a fundamental measure used to quantify the dependence between two random variables. For two discrete random variables X and Y, mutual information is defined as follows:

I (X; Y) = \sum_{x \in X} \sum_{y \in Y} p (x, y) log \frac{p (x, y)}{p (x) p (y)},

(1)

where

p (x, y)

denotes the joint probability distribution of X and Y and

p (x)

,

p (y)

are the marginal distributions. MI can also be equivalently expressed in terms of entropy:

I (X; Y) = H (Y) - H (Y ∣ X),

(2)

here,

H (Y)

represents the Shannon entropy of Y, quantifying the information content of the variable:

H (Y) = - \sum_{y \in Y} p (y) log p (y) .

(3)

The term

H (Y | X)

, known as conditional entropy, measures the remaining uncertainty in Y given knowledge of X:

H (Y | X) = - \sum_{x \in X} \sum_{y \in Y} p (x, y) log p (y | x) .

(4)

MI thus measures the reduction in uncertainty about Y given knowledge of X, i.e., how much information X provides about Y. However, in multivariate settings, MI cannot distinguish between different types of contribution from multiple sources, for example, whether they offer redundant, unique, or synergistic information about the target variable. To address this limitation, PID provides a framework for decomposing mutual information into interpretable atomic components.

PID, introduced by Williams and Beer [15], extends Shannon’s framework to analyze how multiple source variables jointly contribute to a target variable. Given a set of sources

X = {X_{1}, X_{2}, \dots, X_{n}}

and a target Y, PID decomposes the total mutual information

I (X; Y) = H (Y) - H (Y ∣ X)

into the following fundamental components:

Redundant Information: Information that is shared by multiple source variables—i.e., the same information about Y is provided by more than one source.

Unique Information: Information that is exclusively provided by a single source variable and not available from any other sources.

Synergistic Information: Information that can only be obtained through the joint consideration of multiple source variables, which cannot be accessed from any individual source alone.

To illustrate the decomposition more intuitively, consider a simplified case with two source variables

X_{1}

and

X_{2}

, and a single target variable Y. The mutual information between the joint sources and the target can be decomposed into a sum of partial information atoms as shown in Equation (5),

I (X_{i}; Y)

denotes the mutual information between a single source

X_{i}

and the target Y, and

I (X_{1}, X_{2}; Y)

denotes the mutual information between the joint sources

(X_{1}, X_{2})

and the target Y.

R (X_{1}, X_{2}; Y)

represents the redundant information about Y shared by

X_{1}

and

X_{2}

,

U (X_{i}; Y)

represents the information uniquely provided by

X_{i}

, and

S (X_{1}, X_{2}; Y)

represents the synergistic information about Y that is only provided jointly by

X_{1}

and

X_{2}

. The relationships among these components can be intuitively illustrated using a Venn diagram, as depicted in Figure 1.

Figure 1. Partial information diagram.

\{\begin{matrix} I (X_{1}, X_{2}; Y) & = R (X_{1}, X_{2}; Y) + U (X_{1}; Y) + U (X_{2}; Y) + S (X_{1}, X_{2}; Y) \\ I (X_{1}; Y) & = R (X_{1}, X_{2}; Y) + U (X_{1}; Y) \\ I (X_{2}; Y) & = R (X_{1}, X_{2}; Y) + U (X_{2}; Y) \end{matrix}

(5)

This results in an underdetermined system of three equations with four unknowns. Although PID provides a conceptual framework to differentiate redundant, unique, and synergistic information, it does not prescribe a specific method for computing these quantities. Currently, there is no universally accepted redundancy function, and alternative formulations characterize distinct facets of multivariate information. Two commonly used redundancy functions are

I_{min}

, which was originally proposed by [15], and

I_{M M I}

[25]. As demonstrated in the study by [21], these two measures exhibit consistent behavior in various experimental settings. Therefore, for computational tractability, we adopt the

I_{min}

measure to compute the redundancy function, that is,

R (X_{1}, X_{2}; Y) = min \{I (X_{1}; Y), I (X_{2}; Y)\} .

(6)

3. Methods

3.1. Datasets and Models

This study utilizes the CIFAR10 and ImageNet datasets for experimental purposes. CIFAR10 consists of 10 object classes with images sized at 32 × 32 × 3. ImageNet comprises a larger scale image collection, for simplicity, 100 classes are selected from the 1000 categories available in the ILSVRC-2012 subset. Since the purpose of this work is to compare models’ classification performance and PID measures, the number of ImageNet classes used does not affect the relative behaviors of the models, and the conclusions remain generalizable.

To evaluate model performance under noisy and degraded conditions, ImageNet-C and CIFAR10-C corruption benchmark datasets are used, as introduced in [26]. These datasets simulate common types of real-world degradation, including four major corruption categories: noise, blur, weather, and digital distortions. Two specific corruption types are selected from each category, including Shot Noise, Gaussian Noise, Motion Blur, Defocus Blur, Snow, Fog, Pixelate and Contrast. Detailed descriptions of these corruption types are available in [26]. Each type of corruption contains five severity levels, denoted as s. Since images at

s = 5

are heavily degraded and typically render models ineffective, only levels 1 to 4 are included in this work. Uncorrupted (clean) images from the original ImageNet and CIFAR10 datasets are treated as

s = 0

.

Six convolutional neural networks with diverse architectures are selected for analysis of information interaction characteristics: MobileNet V2 [27], AlexNet [28], VGG 16 [29], Inception V3 [30], ResNet 50 [31], and DenseNet 121 [32]. All models are initialized using pretrained weights provided in PyTorch 2.9.1 and are not fine-tuned on any corrupted data. Each model is evaluated in all types and severity levels, its performance and PID characteristics are analyzed accordingly. All experiments are conducted on a multi-core CPU platform equipped with dual Intel Xeon Gold-class processors. Under this setup, the PID analysis for a single model and dataset takes close to two hours.

3.2. Corruption Robustness

To evaluate model stability under image corruptions, this study adopts the corruption robustness definition proposed by [26], which is formally expressed as follows:

E_{c \in C} [\underset{(x, y) \sim D}{Pr} (f (c (x)) = y)],

(7)

where

C

denotes the set of corruption functions,

D

represents the distribution of clean image data, and f is the image classification model. Unlike adversarial robustness, which focuses on performance in the worst case, this formulation emphasizes performance on average under natural image degradations. To comprehensively assess model robustness, three evaluation metrics are employed.

Mean corruption accuracy ( $mCA$ ): Let

A_{0}^{f}

represent the model classification accuracy in clean images. For each type of corruption c and severity level s, the accuracy of the model is indicated as

A_{s, c}^{f}

. The specific accuracy of corruption is defined as the average over severity levels:

{C A}_{c}^{f} = 1 / 4 \sum_{s = 1}^{4} A_{s, c}^{f}

. Averaging across all selected corruption types yields the mean corruption accuracy:

m C A = \frac{1}{| C |} \sum_{c \in C} (\frac{1}{4} \sum_{s = 1}^{4} A_{s, c}^{f}) .

(8)

Accuracy standard deviation ( $σ_{Acc}$ ): it quantifies performance variability across clean and corrupted data and is defined as follows:

σ_{A c c} = std (\{A_{0}^{f}\} \cup \{A_{s, c}^{f} ∣ s = 1, \dots, 4; c \in C\}),

(9)

specifically,

σ_{A c c}

is computed as the standard deviation over the classification accuracy on clean data (

A_{0}^{f}

) together with the accuracies obtained under all considered corruption types

c \in C

and severity levels

s = 1, \dots, 4

, thereby measuring the dispersion of model performance across different evaluation conditions.

Relative accuracy degradation ( $Δ Acc$ ): it measures the performance drop under corruption relative to clean data:

Δ A c c = \frac{m C A - A_{0}^{f}}{A_{0}^{f}} .

(10)

3.3. The Computation of PID

To investigate the relationship between the interaction of neuronal information and model performance, PID is performed on neuron activations obtained from the test datasets across various models. In information theoretic terms, the sources in PID can refer to individual neurons, specific input dimensions, or combinations of features treated as independent random variables. In this study, the sources are defined as the set of neurons within a given network layer. Specifically, we select neurons from the last pooling layer of each model, as this layer captures high level semantic representations that are critical to final decision-making [33]. The target variable is defined as the label of the ground truth class that corresponds to each input. During testing, sampled neuron activations are used to estimate the probability distributions of network activity, which are subsequently employed to quantify redundant, unique, and synergistic information shared between neuron pairs with respect to the classification target. This analysis enables a comparative analysis of the information interaction patterns across different models.

The PID framework is applicable to any number of source variables K, where K can represent the full set of neurons in a layer (full-order computation, denoted as PID-K) or be restricted to

K = 2

for second-order analysis (PID-2). In PID-2, information decomposition is performed only for neuron pairs, and the average of these pairwise decompositions is taken as the final metric. As the number of neurons increases, full-order decomposition over all neurons becomes computationally intractable. In addition, evaluating PID-2 over all neuron pairs can be computationally expensive for large layers. To address this, a uniform sampling strategy can be applied to efficiently approximate the second-order measurements. Moreover, prior work [21] has shown that second-order and full-order PID measures exhibit consistent qualitative behavior across different tasks. Based on these theoretical foundations, this study employs second-order PID analysis with uniform sampling, balancing computational feasibility and analytical reliability.

To analyze the information interaction among neurons within an information theoretic framework, following the approach in [21], the continuous activation values of neurons are discretized using a binning strategy that divides the range into 10 intervals of equal width, thereby transforming the continuous distributions into discrete probability distributions. 10 bins are used to ensure sufficient samples in each source–target pair for reliable estimation. Using the resulting discrete representations, the mutual information between the activation of a single neuron

X_{i}

and the class label Y, denoted as

I (X_{i}; Y)

, is computed. Likewise, for each sampled neuron pair

(X_{i}, X_{j})

, the joint mutual information

I (X_{i}, X_{j}; Y)

is calculated.

With these mutual information values, partial information decomposition is applied according to Equations (5) and (6) to decompose the joint information into redundant information

R (X_{i}, X_{j}; Y)

, synergistic information

S (X_{i}, X_{j}; Y)

, and unique information

U_{1} (X_{i}; Y)

and

U_{2} (X_{j}; Y)

. This decomposition enables the quantification of both individual and joint neuronal contributions to the classification task, thereby establishing a basis for analyzing the information processing mechanisms of different models. To ensure comparability across models and corruption types, we normalize the decomposed information by the total mutual information, yielding the relative measures of redundancy rate (RR), synergy rate (SR), and uniqueness rate (UR).

4. Results

4.1. Preliminary Observations on the Connection Between Neuronal PID Structure and Model Robustness

Each model is evaluated on the clean test set and eight corrupted test sets, with both prediction outputs and neuronal activations recorded. Partial information decomposition is subsequently applied to neuron pairs using clean image samples, in order to investigate potential relationships between internal information processing mechanisms and model behavior under corruption.

Figure 2a presents the classification accuracy of each model on the ImageNet-C and CIFAR10-C datasets across eight corruption types and five severity levels (

0 \leq s \leq 4

), together with the corresponding accuracy standard deviation

σ_{A c c}

, where a smaller

σ_{A c c}

indicates less variation in accuracy across corruption types and severity levels. Figure 2b illustrates the PID results on clean images, including RR, SR, and UR. A comparison of the two subfigures indicates that models with smaller

σ_{A c c}

tend to exhibit higher RR and lower SR values, whereas UR does not show a consistent trend. These observations provide preliminary evidence that the internal information structure of neurons may be related to model robustness under corruption.

Figure 2. (a) Comparison of experimental results for different models; (b) PID measures on clean images (

s = 0

). Models with smaller performance variability across corruption conditions tend to exhibit higher RR and lower SR values, whereas UR shows no consistent trend.

4.2. Rank Correlation Analysis Between PID Measures and Robustness

To further assess the statistical validity of the trends observed in the previous section, a non-parametric rank correlation analysis is conducted to quantify the relationship between PID measures and model robustness. Unlike correlation metrics that rely on linear assumptions, Spearman’s

ρ

and Kendall’s

τ

are better suited for for capturing monotonic relationships, particularly when dealing with nonlinearity and ordinal consistency. Accordingly, this study applies both

ρ

and

τ

to systematically assess the rank correlation between three PID measures (RR, SR, UR) and the three robustness measures (

m C A

,

σ_{A C C}

,

Δ A c c

). This analysis aims to explore the potential of these metrics as predictors for model robustness.

The rank correlation results are presented in Figure 3. They reveal a consistent trend between the information decomposition metrics measured on images that

s = 0

and the robustness of models under corrupted conditions. Notably, RR exhibits a strong negative correlation with

σ_{A C C}

, with both Spearman’s

ρ

and Kendall’s

τ

reaching −1 and p-values below 0.01 across both datasets. This result indicates that models with higher redundancy tend to show smaller performance fluctuations across various corruption types and severities. This phenomenon can be attributed to the fault-tolerant properties of redundancy in neural networks. In scenarios where multiple neurons transmit overlapping information, the network remains stable even if part of the neurons is disrupted by noise or distortion, as the unaffected neurons compensate for the loss, thereby maintaining the overall stability of the system [34]. Similar conclusions were drawn by Barret et al. [25], who demonstrated that redundancy enhances a model’s resistance to external perturbations and ensures that critical information is robustly transmitted under varying input conditions.

Figure 3. Rank correlations between PID measures on clean images (

s = 0

) and robustness indicators on corrupted images. (a,b) show results on the ImageNet-C, (c,d) correspond to the CIFAR10-C. Significance levels are denoted by asterisks: * p < 0.05, ** p < 0.01, and *** p < 0.001.

SR exhibits a negative rank correlation with model robustness across both datasets. Specifically, models with higher SR tend to show greater fluctuations in accuracy under different types and severities of corruption, while the strength and statistical significance of this correlation vary between datasets, suggesting a less consistent association compared to RR. This observation can be explained by findings from [21], which suggest that neurons relying heavily on synergistic representations are more sensitive to input perturbations. Because synergistic information emerges through the combined activity of multiple sources, disruption to any one of these sources can compromise the integrity of the shared information. Consequently, the failure of such neurons to maintain stable information integration undermines the decision-making performance of model.

The differences in RR and synergy rate SR across models can be traced to their architectural design, reflecting how structural mechanisms shape internal information dynamics. ResNet50 and DenseNet121 consistently exhibit higher RR and lower SR, which can be attributed to their skip connections and dense feature reuse. These design choices promote overlapping information pathways, thereby enhancing redundancy and reducing reliance on fragile synergistic interactions. In contrast, AlexNet and MobileNetV2 show lower RR and higher SR, indicating that their relatively shallow or lightweight convolutional structures depend more on joint feature integration. Such reliance on synergy makes them more sensitive to corruption-induced perturbations, as disruption of any source neuron can compromise the shared representation. InceptionV3 demonstrates moderate RR but the lowest SR, suggesting that its multi-branch architecture fosters representational diversity while minimizing dependence on synergistic encoding. Finally, VGG16 lies in the middle range, with balanced RR and SR values, consistent with its deep yet sequential convolutional design that neither strongly emphasizes redundancy nor synergy. These observations highlight that redundancy and synergy rates reflect architectural design choices. Models with mechanisms that encourage overlapping information transmission tend to achieve greater robustness, whereas architectures that rely heavily on synergistic integration are more vulnerable to corruption.

UR shows a strong positive correlation with

m C A

, although the statistical significance is weaker in the CIFAR10-C dataset. Its rank correlation with

Δ A c c

is statistically significant on ImageNet-C but not on CIFAR10-C, and its correlations with

σ_{A c c}

are not statistically significant on both datasets; therefore, the association between UR and robustness indicators is not consistent across datasets. This suggests that UR may be more involved in task-specific feature representation while playing a relatively limited role in general robustness mechanism.

In summary, these findings demonstrate that the PID measures of neurons, particularly redundancy and synergy, capture structural differences in how models respond to corrupted inputs. These results provide both theoretical support and empirical evidence for developing model evaluation approaches grounded in internal information structure.

4.3. Connection Between Unique Information and Classification Performance

Building upon the established association between neuronal information structure and model robustness, this section investigates how information interaction patterns relate to classification performance under varying corruption conditions. To this end, we evaluate model performance and compute PID measures on clean images (

s = 0

) and mildly corrupted images (

s = 1

) across eight corruption types in both ImageNet-C and CIFAR10-C. The results are illustrated in Figure 4a,b, where the x-axis represents corruption types and different markers denote distinct models. The results reveal that models achieving higher accuracy typically exhibit higher UR values, suggesting that unique information may reflect a model’s ability to extract discriminative features.

Figure 4. Classification accuracy and PID measures of six models on (a) CIFAR10-C (

s = 0

and

s = 1

), (b) ImageNet-C (

s = 0

and

s = 1

), (c) ImageNet-C (

s = 3

), and (d) ImageNet-C (

s = 4

). Models with higher accuracy generally exhibit higher UR under clean and mild corruption, whereas at severe corruption levels accuracy declines sharply and the correlation with UR becomes unclear.

However, this trend weakens as the severity of corruption increases. Figure 4c,d presents the results for

s = 3

and

s = 4

on ImageNet-C, where the classification performance of all models declines significantly, and the differences in UR become increasingly disordered. At these higher corruption levels, accuracy approaches random behavior, as severe distortions overwhelm the useful signal, which obscures the correlation with UR and makes the trend less clear. This observation implies that UR may serve as a meaningful structural indicator of model capacity under high-quality input conditions, whereas under severe corruption, models may rely more on redundant mechanisms or other robustness strategies to maintain performance.

To further examine the discriminative capacity of unique information in classification tasks, UR were compared between correctly and incorrectly classified samples. As depicted in Figure 5, for corruption levels ranging from

s = 1

to

s = 4

, correctly classified inputs consistently exhibit higher UR values than misclassified ones. This effect is particularly pronounced under mild corruption conditions, supporting the inference that increased unique information is linked to improved discriminative capability.

Figure 5. Comparison of UR between correctly and incorrectly classified samples on ImageNet-C. (a–d) correspond to corruption severity levels 1 through 4, respectively. Correctly classified samples consistently exhibit higher UR values than misclassified samples, with the difference most pronounced under mild corruption.

Theoretical support for this observation can be found in prior studies on the dynamics of information learning. Tax et al. [22] performed an analysis of neural information dynamics using the PID framework during the training of Boltzmann machines and observed that neurons first acquire redundant information before gradually specializing to encode unique information about the target variable. This transition reflects an increasing ability to capture discriminative features, thereby enhancing classification performance. From the perspective of statistical decision theory, Venkatesh et al. [35] further demonstrated that the amount of unique information provides an upper bound on the minimum risk achievable by a given information source in a decision task. In other words, neurons with higher UR contribute to lower uncertainty in classification decisions. Therefore, the positive relationship between UR and classification performance is supported by both empirical evidence and theoretical foundations.

The UR also reflects architectural design. InceptionV3, ResNet50, and DenseNet121 consistently show higher UR, indicating that multi-branch, residual, and dense connectivity promote feature specialization and discriminative capacity. MobileNetV2 exhibits intermediate UR, balancing efficiency with moderate uniqueness. In contrast, VGG16 and AlexNet display low UR, consistent with their sequential or shallow structures that rely more on shared representations. Overall, architectures that encourage diverse and specialized feature encoding achieve higher UR, while simpler sequential designs yield lower UR and weaker discriminative ability.

5. Conclusions

Within the framework of PID, the structure of neuronal information interactions is examined as a potential indicator of model robustness. PID measures derived from clean-image neuron activations are analyzed to explore the relationship between internal information interaction mechanisms and model robustness under corruption. This approach facilitates a novel robustness evaluation paradigm that does not rely on extensive corrupted test samples, thereby contributing both theoretical insights and practical value to model assessment.

Experiments conducted on the ImageNet-C and CIFAR10-C datasets reveal consistent trends in the connection between information decomposition and robustness. Models with a higher redundancy rate tend to achieve more stable performance across diverse corruption types and severity levels, whereas higher synergy rate is generally associated with increased performance variability. These results suggest that redundancy contributes critically to robustness by supporting tolerance to input degradation, while a greater dependence on synergistic information may heighten sensitivity to noise and perturbations. In addition, the unique information rate correlates with classification performance on high-quality inputs, indicating that unique information reflects the degree of specialization in encoding discriminative features. However, as corruption severity increases, the influence of unique information diminishes, and the network increasingly relies on redundancy to preserve performance.

Overall, our findings highlight the distinct functional roles of the components in PID. Redundancy is essential for robustness, synergy facilitates complex feature integration but introduces sensitivity, and uniqueness contributes to accuracy under high-quality input conditions. Future research should further investigate strategies for modulating the information structure of neural models to achieve a better balance between robustness and accuracy. For instance, specific architectural designs or training procedures could be developed to enhance redundancy and thereby improve stability under challenging conditions. Additionally, identifying methods to promote the learning of unique information without compromising robustness could be a promising direction toward building more interpretable neural networks.

Author Contributions

Conceptualization, methodology, validation, visualization, and writing—original draft preparation, T.L.; investigation, T.L. and B.G.; writing—review and editing, Z.Y., Z.M., D.J. and B.G.; supervision and funding acquisition, B.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Science and Technology Major Project grant No. 2021ZD0201302, and Zhongguancun Laboratory.

Data Availability Statement

The data used in this study are publicly available [26].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Rawat, W.; Wang, Z. Deep convolutional neural networks for image classification: A comprehensive review. Neural Comput. 2017, 29, 2352–2449. [Google Scholar] [CrossRef]
Zhang, S.; Yao, L.; Sun, A.; Tay, Y. Deep learning based recommender system: A survey and new perspectives. ACM Comput. Surv. (CSUR) 2019, 52, 1–38. [Google Scholar] [CrossRef]
Young, T.; Hazarika, D.; Poria, S.; Cambria, E. Recent trends in deep learning based natural language processing. IEEE Comput. Intell. Mag. 2018, 13, 55–75. [Google Scholar] [CrossRef]
Torralba, A.; Efros, A.A. Unbiased look at dataset bias. In CVPR 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 1521–1528. [Google Scholar]
Hendrycks, D.; Gimpel, K. A baseline for detecting misclassified and out-of-distribution examples in neural networks. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017; pp. 1–12. [Google Scholar]
Dong, Y.; Kang, C.; Zhang, J.; Zhu, Z.; Wang, Y.; Yang, X.; Su, H.; Wei, X.; Zhu, J. Benchmarking robustness of 3d object detection to common corruptions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 1022–1032. [Google Scholar]
Rusak, E.; Schott, L.; Zimmermann, R.S.; Bitterwolf, J.; Bringmann, O.; Bethge, M.; Brendel, W. A simple way to make neural networks robust against diverse image corruptions. In Computer Vision–ECCV 2020, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020, Proceedings, Part III 16; Springer: Berlin/Heidelberg, Germany, 2020; pp. 53–69. [Google Scholar]
Geirhos, R.; Temme, C.R.; Rauber, J.; Schütt, H.H.; Bethge, M.; Wichmann, A.F. Generalisation in humans and deep neural networks. arXiv 2018, arXiv:1808.08750. [Google Scholar]
Schneider, S.; Rusak, E.; Eck, L.; Bringmann, O.; Brendel, W.; Bethge, M. Improving robustness against common corruptions by covariate shift adaptation. Adv. Neural Inf. Process. Syst. 2020, 33, 11539–11551. [Google Scholar]
Chun, S.; Oh, S.J.; Yun, S.; Han, D.; Choe, J.; Yoo, Y. An empirical evaluation on robustness and uncertainty of regularization methods. arXiv 2020, arXiv:2003.03879. [Google Scholar] [CrossRef]
Yu, S.; Giraldo, L.G.S.; Príncipe, J.C. Information-theoretic methods in deep neural networks: Recent advances and emerging opportunities. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21), Virtual, 19–27 August 2021; pp. 4669–4678. [Google Scholar]
Tishby, N.; Zaslavsky, N. Deep learning and the information bottleneck principle. In Proceedings of the 2015 IEEE Information Theory Workshop (itw), Jeju Island, Republic of Korea, 11–15 October 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1–5. [Google Scholar]
Shwartz-Ziv, R.; Tishby, N. Opening the black box of deep neural networks via information. arXiv 2017, arXiv:1703.00810. [Google Scholar] [CrossRef]
Gabrié, M.; Manoel, A.; Luneau, C.; Macris, N.; Krzakala, F.; Zdeborová, L. Entropy and mutual information in models of deep neural networks. arXiv 2018, arXiv:1805.09785. [Google Scholar] [CrossRef]
Williams, P.L.; Beer, R.D. Nonnegative decomposition of multivariate information. arXiv 2010, arXiv:1004.2515. [Google Scholar] [CrossRef]
Luppi, A.I.; Rosas, F.E.; Mediano, P.A.; Menon, D.K.; Stamatakis, E.A. Information decomposition and the informational architecture of the brain. Trends Cogn. Sci. 2024, 28, 352–368. [Google Scholar] [CrossRef] [PubMed]
Varley, T.F.; Pope, M.; Grazia, M.; Joshua; Sporns, O. Partial entropy decomposition reveals higher-order information structures in human brain activity. Proc. Natl. Acad. Sci. USA 2023, 120, e2300888120. [Google Scholar] [CrossRef]
Wollstadt, P.; Schmitt, S.; Wibral, M. A rigorous information-theoretic definition of redundancy and relevancy in feature selection based on (partial) information decomposition. J. Mach. Learn. Res. 2023, 24, 1–44. [Google Scholar]
Wibral, M.; Priesemann, V.; Kay, J.W.; Lizier, J.T.; Phillips, W.A. Partial information decomposition as a unified approach to the specification of neural goal functions. Brain Cogn. 2017, 112, 25–38. [Google Scholar] [CrossRef] [PubMed]
Dewan, S.; Zawar, R.; Saxena, P.; Chang, Y.; Luo, A.; Bisk, Y. Diffusion pid: Interpreting diffusion via partial information decomposition. Adv. Neural Inf. Process. Syst. 2024, 37, 2045–2079. [Google Scholar]
Proca, A.M.; Rosas, F.E.; Luppi, A.I.; Bor, D.; Crosby, M.; Mediano, P.A. Synergistic information supports modality integration and flexible learning in neural networks solving multiple tasks. Plos Comput. Biol. 2024, 20, e1012178. [Google Scholar] [CrossRef] [PubMed]
Tax, T.M.; Mediano, P.A.; Shanahan, M. The partial information decomposition of generative neural network models. Entropy 2017, 19, 474. [Google Scholar] [CrossRef]
Rolnick, D.; Tegmark, M. The power of deeper networks for expressing natural functions. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018; pp. 1–14. [Google Scholar]
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Barrett, A.B. Exploration of synergistic and redundant information sharing in static and dynamical gaussian systems. Phys. Rev. E 2015, 91, 052802. [Google Scholar] [CrossRef]
Hendrycks, D.; Dietterich, T. Benchmarking neural network robustness to common corruptions and perturbations. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019; pp. 1–16. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobilenetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25. Available online: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf (accessed on 27 December 2025). [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Huang, G.; Liu, Z.; Maaten, L.V.D.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Zhang, C.; Liu, A.; Liu, X.; Xu, Y.; Yu, H.; Ma, Y.; Li, T. Interpreting and improving adversarial robustness of deep neural networks with neuron sensitivity. IEEE Trans. Image Process. 2020, 30, 1291–1304. [Google Scholar] [CrossRef] [PubMed]
Schneidman, E.; Bialek, W.; Berry, M.J. Synergy, redundancy, and independence in population codes. J. Neurosci. 2003, 23, 11539–11553. [Google Scholar] [CrossRef] [PubMed]
Venkatesh, P.; Gurushankar, K.; Schamberg, G. Capturing and interpreting unique information. In Proceedings of the 2023 IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan, 25–30 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 2631–2636. [Google Scholar]

Figure 1. Partial information diagram.

Figure 2. (a) Comparison of experimental results for different models; (b) PID measures on clean images (

s = 0

). Models with smaller performance variability across corruption conditions tend to exhibit higher RR and lower SR values, whereas UR shows no consistent trend.

Figure 3. Rank correlations between PID measures on clean images (

s = 0

) and robustness indicators on corrupted images. (a,b) show results on the ImageNet-C, (c,d) correspond to the CIFAR10-C. Significance levels are denoted by asterisks: * p < 0.05, ** p < 0.01, and *** p < 0.001.

Figure 4. Classification accuracy and PID measures of six models on (a) CIFAR10-C (

s = 0

and

s = 1

), (b) ImageNet-C (

s = 0

and

s = 1

), (c) ImageNet-C (

s = 3

), and (d) ImageNet-C (

s = 4

). Models with higher accuracy generally exhibit higher UR under clean and mild corruption, whereas at severe corruption levels accuracy declines sharply and the correlation with UR becomes unclear.

Figure 5. Comparison of UR between correctly and incorrectly classified samples on ImageNet-C. (a–d) correspond to corruption severity levels 1 through 4, respectively. Correctly classified samples consistently exhibit higher UR values than misclassified samples, with the difference most pronounced under mild corruption.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.