An Information Theoretic Approach to Privacy-Preserving Interpretable and Transferable Learning

: In order to develop machine learning and deep learning models that take into account the guidelines and principles of trustworthy AI, a novel information theoretic approach is introduced in this article. A uniﬁed approach to privacy-preserving interpretable and transferable learning is considered for studying and optimizing the trade-offs between the privacy, interpretability, and transferability aspects of trustworthy AI. A variational membership-mapping Bayesian model is used for the analytical approximation of the deﬁned information theoretic measures for privacy leakage, interpretability, and transferability. The approach consists of approximating the information theoretic measures by maximizing a lower-bound using variational optimization. The approach is demonstrated through numerous experiments on benchmark datasets and a real-world biomedical application concerned with the detection of mental stress in individuals using heart rate variability analysis.


Introduction
Trust in the development, deployment, and use of AI is essential in order to fully utilize the potential of AI to contribute to human well-being and society.Recent advances in machine and deep learning have rejuvenated the field of AI with an enthusiasm that AI could become an integral part of human life.However, a rapid proliferation of AI will give rise to several ethical, legal, and social issues.

Trustworthy AI
In response to the ethical, legal, and social challenges that accompany AI, guidelines and ethical principles have been established [1][2][3][4] in order to evaluate the responsible development of AI systems that are good for humanity and the environment.These guidelines have introduced the concept of trustworthy AI (TAI), and the term TAI has quickly gained attention in research and practice.TAI is based on the idea that trust in AI will allow AI to realize its full potential in contributing to societies, economies, and sustainable development.As "trust" is a complex phenomenon being studied in diverse disciplines (i.e., psychology, sociology, economics, management, computer science, and information systems), the definition and realization of TAI remains challenging.While forming trust in technology, users express expectations about the technology's functionality, helpfulness and reliability [5].The authors in [6] state that "AI is perceived as trustworthy by its users (e.g., consumers, organizations, and society) when it is developed, deployed, and used in ways that not only ensure its compliance with all relevant laws and its robustness but especially its adherence to general ethical principles".
Academicians, industries, and policymakers have developed in recent times for TAI several frameworks and guidelines including "Asilomar AI Principles" [7], "Montreal Declaration of Responsible AI" [8], "UK AI Code" [9], "AI4People" [4], "Ethics Guidelines for Trustworthy AI" [1], "OECD Principles on AI" [10], "Governance Principles for the New Generation Artificial Intelligence" [11], and "Guidance for Regulation of Artificial Intelligence Applications" [12].However, it was argued in [13] that AI ethics lack a reinforcement mechanism, and economic incentives could easily override commitment to ethical principles and values.
The five principles of ethical AI [4] (i.e., beneficence, non-maleficence, autonomy, justice, and explicability) have been adopted for TAI [6].Beneficence refers to promoting the wellbeing of humans, preserving dignity, and sustaining the planet.Non-maleficence refers to avoiding bringing harm to people and is especially concerned with the protection of people's privacy and security.Autonomy refers to the promotion of human autonomy, agency, and oversight including the restriction of AI systems' autonomy, where necessary.Justice refers to using AI for correcting past wrongs, ensuring shared benefits through AI, and preventing the creation of new harms and inequities by AI.Explicability comprises an epistemological sense and an ethical sense.Explicability refers in the epistemological sense to the explainable AI developed by creating interpretable AI models with high levels of performance and accuracy.In the ethical sense, explicability refers to accountable AI.

Motivation and Novelty
The core issues related to machine and deep learning that need to be addressed in order to fulfill the five principles of trustworthy AI are listed in Table 1.Solution approaches to address issues concerning TAI have been identified in Table 1; however, a unified solution approach addressing all major issues does not exist.Despite the importance of the outlined TAI principles, their major limitation, as identified in [6], concerns the fact that the principles are highly general and provide little to no guidance for how they can be transferred into practice.To address this limitation, a data-driven research framework for TAI was outlined in [6].However, to the best knowledge of the authors, no previous study presented a unified information theoretic approach to study the privacy, interpretability, and transferability aspects of trustworthy AI in a rigorous analytical manner.This motivated us in this study to develop a novel information theoretic approach for addressing the privacy, interpretability, and transferability aspects of trustworthy AI in a rigorous analytical manner.This study introduces a unified information theoretic approach to "privacy-preserving interpretable and transferable learning", as represented in Figure 1, for addressing trustworthy AI issues, which is the novelty of this study.An information theoretic unified approach to "privacy-preserving interpretable and transferable learning" for studying the privacy-interpretability-transferability trade-offs while addressing beneficence, non-maleficence, autonomy, justice, and explicability principles of TAI.

Goal and Aims
Our goal is to develop a novel approach to trustworthy AI based on the hypothesis that information theory enables taking into account the privacy, interpretability, and transferability aspects of trustworthy AI principles during the development of machine learning and deep learning models by providing a way to study and optimize the inherent trade-offs.The aims focused on the development of our approach are the following: Aim 1: To develop an information theoretic approach to privacy that enables the quantification of privacy leakage in terms of the mutual information between sensitive private data and the data released to the public without the availability of prior knowledge about data statistics (such as joint distributions of public and private variables).

Aim 2:
To develop an information theoretic criterion for evaluating the interpretability of a machine learning model in terms of the mutual information between noninterpretable model outputs/activations and corresponding interpretable parameters.

Aim 3:
To develop an information theoretic criterion for evaluating the transferability (of a machine learning model from source to target domain) in terms of the mutual information between source domain model outputs/activations and target domain model outputs/activations.

Aim 4:
To develop analytical approaches to machine and deep learning allowing for the quantification of model uncertainties.

Aim 5:
To develop a unified approach to "privacy-preserving interpretable and transferable learning" for an analytical optimization of privacy-interpretability-transferability trade-offs.

Methodology
Figure 2 outlines the methodological workflow.For an information theoretic evaluation of the privacy leakage, interpretability, and transferability, we provide a novel method that consists of following three steps:

Variational Membership Mapping Bayesian Models
In order to derive analytical expressions for the defined privacy leakage, interpretability, and transferability measures, the stochastic inverse models (governing the relationships amongst variables) will be required.In this study, the variational membership mappings are leveraged to build the required stochastic inverse models.Membership mappings [14,15] have been introduced as an alternative to deep neural networks in order to address the issues such as determining the optimal model structure, smaller training dataset, and iterative time-consuming nature of numerical learning algorithms [16][17][18][19][20][21][22].A membership mapping represents data through a fuzzy set (characterized by a membership function such that the dimension of the membership function increases with an increasing data size).A remarkable feature of membership mappings is that these allow an analytical approach to the variational learning of a membership-mappings-based data representation model.Our idea is to employ membership mappings for defining a stochastic inverse model, which is then inferred using variational Bayesian methodology.

Variational Approximation of Information Theoretic Measures
The variational membership-mapping Bayesian models are used to determine the lower bounds on the defined information theoretic measures for privacy leakage, interpretability, and transferability.The lower bounds are then maximized using variational optimization methodology to derive analytically the expressions that approximate the privacy leakage, interpretability, and transferability measures.The analytically derived expressions form the basis of an algorithm that practically computes the measures using available data samples, where expectations over unknown distributions are approximated by sample averages.

Contributions
The main contributions of this study are following: The study introduces a novel information theoretic unified approach (as represented in Figure 1) to address the 1.
Issues I1 and I2 of beneficence principle by means of transfer and federated learning; 2.
Issues I3 and I4 of non-maleficence principle by means of privacy-preserving data release mechanisms; 3.
Issue I5 of autonomy principle by means of analytical machine and deep learning algorithms that enable the user to quantify model uncertainties and hence to decide the level of autonomy given to AI systems; 4.
Issue I6 of justice principle by means of federated learning; 5.
Issue I7 of explicability principle by means of interpretable machine and deep learning models.

Information Theoretic Quantification of Privacy, Interpretability, and Transferability
The most important feature of our approach is that the notions of privacy, interpretability, and transferability are quantified by information theoretic measures allowing for the study and optimization of trade-offs (such as trade-off between privacy and transferability or trade-off between privacy and interpretability) in a practical manner.

Computation of Information Theoretic Measures without Requiring the Knowledge of Data Distributions
It is possible to derive analytical expressions for the defined measures provided that the knowledge regarding the data distributions is available.However, in practice, the data distributions are unknown, and thus, a way to approximate the defined measures is required.Therefore, a novel method that employs recently introduced membership mappings [14][15][16][17][18][19][20][21][22], is presented for approximating the defined privacy leakage, interpretability, and transferability measures.The method relies on inferring a variational Bayesian model that facilitates an analytical approximation of the information theoretic measures through variational optimization methodology.A computational algorithm is provided for practically calculating the privacy leakage, interpretability, and transferability measures.Finally, an algorithm is presented that provides 1.
Information theoretic evaluation of privacy leakage, interpretability, and transferability in a semi-supervised transfer and multi-task learning scenario; 2.
An adversary model for estimating private data and for simulating privacy attacks; and 3.
An interpretability model for estimating interpretable parameters and for providing an interpretation to the non-interpretable data vectors.

Organization
This text is organized into sections.The proposed methodology relies on the membership mappings for data representation learning.Therefore, Section 2 has been dedicated to the review of membership mappings.An application of membership mappings to solve an inverse modeling problem by developing a variational membership-mapping Bayesian model is considered in Section 3. Section 4 presents the most important result of this study on the variational approximation of information leakage and development of a computational algorithm for calculating information leakage.The measures for privacy leakage, interpretability, and transferability are formally introduced in Section 5. Section 5 further provides an algorithm to study the privacy, interpretability, and transferability aspects in a unified manner.The application of proposed measures to study the trade-offs is also demonstrated through the experiments made on the widely used MNIST and "Of-fice+Caltech256" datasets in Section 6. Section 6 further considers a biomedical application concerned with the detection of mental stress in individuals using heart rate variability analysis.Finally, the concluding remarks are provided in Section 7.

Mathematical Background
This section reviews the membership mappings and transferable deep learning from [14,15,22].For a detailed mathematical study of the concepts used in this section, the readers are referred to previous works [14,15,22].
• For convenience, the values of a function f ∈ F(X ) at points in the collection R |x| → [0, 1] be a membership function satisfying the following properties: Nowhere Vanishing: Positive and Bounded Integrals: The functions ζ x are absolutely continuous and Lebesgue integrable over the whole domain such that for all x ∈ S, we have

Consistency of Induced Probability Measure:
The membership function induced probability measures P ζ x , defined on any A ∈ B(R |x| ), as are consistent in the sense that for all x, a ∈ S: The collection of membership functions satisfying the aforementioned assumptions is denoted by 4), ( 6), x ∈ S}. (7)

Review of Variational Membership Mappings
Definition 1 (Student-t Membership Mapping [14]).A Student-t membership-mapping, F ∈ F(X ), is a mapping with input space X = R n and a membership function ζ x ∈ Θ that is Student-t like: is the degrees of freedom, m y ∈ R |x| is the mean vector, and K xx ∈ R |x|×|x| is the covariance matrix with its (i, j)−th element given as where kr : R n × R n → R is a positive definite kernel function defined as where x i k is the k-th element of x i , σ 2 is the variance parameter, and Under modeling scenario (11), [22] presents an algorithm (stated as Algorithm 1) for the variational learning of membership mappings.
With reference to Algorithm 1, we have following:

•
The degrees of freedom associated to the Student-t membership mapping ν ∈ R + \ [0, 2] is chosen as • The auxiliary inducing points are suggested to be chosen as the cluster centroids: where cluster_centroid({x i } N i=1 , M) represents the k-means clustering on {x i } N i=1 .

•
The parameters (w 1 , • • • , w n ) for kernel function (10) are chosen such that w k (for k ∈ {1, 2, • • • , n}) is given as where x i k is the k-th element of vector x i ∈ R n .• K aa ∈ R M×M and K xa ∈ R N×M are matrices with their (i, j)-th elements given as where kr : R n × R n → R is a positive definite kernel function defined as in (10).

•
The scalar-valued function τ(M, σ 2 ) is defined as where a is given by (13), ν is given by (12), and parameters (w 1 , • • • , w n ) (which are required to evaluate the kernel function for computing matrices K xx , K aa , and K xa ) are given by (14).
M×p is a matrix with its j-th column defined as • The disturbance precision value β is iteratively estimated as where F j (x i ) is the estimated membership-mapping output given as Here, G(x) ∈ R 1×M is a vector-valued function defined as where kr : R n × R n → R is defined as in (10).
Definition 2 (Membership-Mappings Prediction [22]).Given the parameters set M = {α, a, M, σ, w} returned by Algorithm 1, the learned membership mappings could be used to predict output corresponding to any arbitrary input data point x ∈ R n as where G(•) ∈ R 1×M is a vector-valued function (21).

Review of Membership-Mappings-Based Conditionally Deep Autoencoders
Definition 3 (Membership-Mapping Autoencoder [15]).A membership-mapping autoencoder, G : R p → R p , maps an input vector y ∈ R p to G(y) ∈ R p such that where is a matrix such that the product Py is a lower-dimensional encoding for y.
A conditionally deep membership-mapping autoencoder, D : R p → R p , maps a vector y ∈ R p to D(y) ∈ R p through a nested composition of finite number of membership-mapping autoencoders such that where G l (•) is a membership-mapping autoencoder (Definition 3).
CDMMA discovers layers of increasingly abstract data representation with the lowestlevel data features being modeled by the first layer and the highest-level data features being modeled by the end layer [15,22].An algorithm (stated as Algorithm 2) has been provided in [15,22] for the variational learning of CDMMA.

3:
Define P l ∈ R n l ×p such that the i-th row of P l is equal to the transpose of eigenvector corresponding to the i-th largest eigenvalue of a sample covariance matrix of dataset Y.

4:
Define a latent variable where ŷl−1 is the estimated output of the (l − 1)-th layer computed using (22) for the parameters set Define M l max as Compute parameters set M l = {α l , a l , M l , σ l , w l }, characterizing the membership mappings associated to the l-th layer, using Algorithm 1 on dataset (x l,i , y i ) | i ∈ {1, • • • , N} with the maximum possible number of auxiliary points Definition 5 (CDMMA Filtering [15,22]).Given a CDMMA with its parameters being represented by a set M = {{M 1 , • • • , M L }, {P 1 , • • • , P L }}, the autoencoder can be applied for filtering a given input vector y ∈ R p as follows: x l (y; M) = P l y, l = 1 Here, ŷl−1 is the output of the (l − 1)-th layer estimated using (22).Finally, CDMMA's output, D(y; M), is given as For a big dataset, the computational time required by Algorithm 2 for learning will be high.To circumvent the problem of large computation time for processing big data, it is suggested in [15,22] that the data be partitioned into subsets and corresponding to each data subset, a separate CDMMA is learned.This motivates the defining of a wide CDMMA as in Definition 6.For the variational learning of wide CDMMA, Algorithm 3 follows from [15,22], where the choice of number of subsets as S = N/1000 is driven by the consideration that each subset contains around 1000 data points, since processing the data points up to 1000 by CDMMA is not computationally a challenge.Definition 6 (A Wide CDMMA [15,22]).A wide CDMMA, W D : R p → R p , maps a vector y ∈ R p to W D(y) ∈ R p through a parallel composition of S (S ∈ Z + ) number of CDMMAs such that where D s (y) is the output of the s-th CDMMA.
Algorithm 3 Variational learning of wide CDMMA [15,22] Require: Build a CDMMA, M s , by applying Algorithm 2 on Y s taking n as the subspace dimension; maximum number of auxiliary points as equal to r max × #Y s (where #Y s is the number of data points in Y s ); and L is the number of layers.4: end for 5: return the parameters set P = {M s } S s=1 .
Definition 7 (Wide CDMMA Filtering [15,22]).Given a wide CDMMA with its parameters being represented by a set P = {M s } S s=1 , the autoencoder can be applied for filtering a given input vector y ∈ R p as follows: where D(y; M s ) is the output of the s-th CDMMA estimated using (30).

Membership Mappings for Classification
A classifier (i.e., Definition 8) and an algorithm for its variational learning (stated as Algorithm 4) follows from [15,22].Definition 8 (A Classifier [15,22] where W D(y; P c ), computed using (34), is the output of the c-th wide CDMMA.The classifier assigns to an input vector the label of that class whose associated autoencoder best reconstructs the input vector.
Algorithm 4 Variational learning of the classifier [15,22] Require: Build a wide CDMMA, P c = {M s c } S c s=1 , by applying Algorithm 3 on Y c for the given n, r max , and L. 3: end for 4: return the parameters set {P c } C c=1 .

Review of Membership-Mappings-Based Privacy-Preserving Transferable Learning
A privacy-preserving semi-supervised transfer and multi-task learning problem has been recently addressed in [22] by means of variational membership mappings.The method, as suggested in [22], involves the following steps:

Optimal Noise Adding Mechanism for Differentially Private Classifiers
The approach suggested in [22] relies on a tailored noise adding mechanism to achieve a given level of differential privacy loss bound with the minimum perturbation of the data.In particularly, Algorithm 5 is suggested for a differentially private approximation of data samples and Algorithm 6 is suggested for building a differentially private classifier.
Algorithm 5 Differentially private approximation of data samples [22] Require: A differentially private approximation of data samples is provided as where y +i j is the j-th element of Algorithm 6 Variational learning of a differentially private classifier [22] Require: Differentially private approximated dataset: , by applying Algorithm 4 on Y + for the given n, r max , and L.

Semi-Supervised Transfer Learning Scenario
The aim is to transfer the knowledge extracted by a classifier trained using a source dataset to the classifier of the target domain such that the privacy of the source dataset is preserved.Let {Y sr c } C c=1 be the labeled source dataset where

Latent Subspace Transformation Matrices
For a given subspace dimension n st ∈ {1, 2, • • • , min(p sr , p tg )}, the source domain transformation matrix V +sr ∈ R n st ×p sr is defined as with its i-th row equal to the transpose of the eigenvector corresponding to the i-th largest eigenvalue of the sample covariance matrix computed on differentially private approximated source samples.The target domain transformation matrix V tg ∈ R n st ×p tg is defined as with its i-th row equal to the transpose of the eigenvector corresponding to the i-th largest eigenvalue of the sample covariance matrix computed on target samples.

Subspace Alignment
A target sample is mapped to source-data-space via following transformation: Both labeled and unlabeled target datasets are transformed to define the following sets: where {n| 1 , n| 2 , • • • } is a monotonically non-decreasing sequence.

source2target Model
The mapping from source to target domain is learned by means of a variational membership-mappings-based model as in the following: where N tg = |D| is the total number of target samples, W D(•; •) is defined as in (34), is defined as in (40), and is defined as in (43).
where ŷ •; M sr→tg is the output of the source2target model computed using (22).

Variational Membership-Mapping Bayesian Models
We consider the application of membership mappings to solve the inverse modeling problem related to x = f t→x (t), where f t→x : R q → R n is a forward map.Specifically, a membership-mappings model is used to approximate the inverse mapping f −1 t→x .

A Prior Model
Given a dataset: {(x i , t i ) | i ∈ {1, • • • , N}}, Algorithm 1 can be used to build a membership-mappings model characterized by a set of parameters, say M x→t = {α x→t , a, M, σ, w} (where x → t indicates the mapping from x to t has been approximated by the membership mappings).It follows from (22) that the membership-mappings model predicted output corresponding to an input x is given as where G(•) ∈ R 1×M is a vector-valued function defined as in (21).The k-th element of t is given as where α x→t k is the k-th column of matrix α x→t .Expression (49) allows estimating for any arbitrary x the corresponding t using a membership-mappings model.This motivates introducing the following prior model: where k ∈ {1, (54)
The variational Bayesian approach minimizes the difference (in term of KL divergence) between variational and true posteriors via analytically maximizing negative free energy L over variational distributions.However, the analytical derivation requires the following widely used mean-field approximation: = q(θ 1 ) • • • q(θ q )q(γ). (71) Applying the standard variational optimization technique (as in [23][24][25][26][27][28][29]), it can be verified that the optimal variational distributions maximizing L are as follows: where the parameters ( Λk , mk , âγ , bγ ) satisfy the following: Algorithm 7 is suggested for variational Bayesian inference of the model.

Algorithm 7
Variational membership-mapping Bayesian model inference The functionality of Algorithm 7 is as follows: • Step 1 builds a variational membership-mappings model using Algorithm 1 from previous work [22].

•
The loop between step 5 and step 7 applies variational Bayesian inference to iteratively estimate the parameters of optimal distributions until convergence.
Remark 1 (Computational Complexity).The computational complexity of Algorithm 7 is asymptotically dominated by the computation of inverse of M × M dimensional matrix Λk in (75) to calculate mk .Thus, the computational complexity of Algorithm 7 is given as O(M 3 ), where M is the number of auxiliary points.
The optimal distributions determined using Algorithm 7 define the so-called Variational Membership-Mapping Bayesian Model (VMMBM) as stated in Remark 2.

Evaluation of the Information-Leakage
Consider a scenario where a variable t is related to another variable x through a mapping f t→x such that x = f t→x (t).The mutual information I(t; x) measures the amount of information obtained about variable t through observing variable x.Since x = f t→x (t), the entropy H(t) remains fixed independent of mapping f t→x , and thus, the quantity I(t; x) − H(t) is a measure of the amount of information about t leaked by the mapping f t→x .Definition 9 (Information Leakage).Under the scenario that x = f t→x (t), a measure of the amount of information about t leaked by the mapping f t→x is defined as The quantity IL f t→x is referred to as the information leakage.
This section is dedicated to answering the question: How to calculate the information leakage without knowing data distributions?

Variational Approximation of the Information Leakage
The mutual information between t and x is given as where g(x) p(x) denotes the expectation of a function of random variable g(x) with respect to the probability density function p(x); H(t) and H(t|x) are marginal and conditional entropies, respectively.Consider the conditional probability of t which is given as where θ is a set defined as in (64).Let q(θ, γ) be an arbitrary distribution.The log conditional probability of t can be expressed as log(p(t|x)) = dθ dγ q(θ, γ) log(p(t|x)) (89) = dθ dγ q(θ, γ) log p(θ, γ, t|x) q(θ, γ) to express (91) as where KL is Kullback-Leibler divergence of p(θ, γ|t, x) from q(θ, γ).Using (87), That is, Since Kullback-Leibler divergence is always non-zero, it follows from (95) that L p(t,x) provides a lower bound on IL f t→x i.e., Our approach to approximate IL f t→x is to maximize its lower bound with respect to variational distribution q(θ, γ).That is, we seek to solve Result 1 (Analytical Expression for the Information Leakage).Given the model ( 78)-(81), IL f t→x is given as Here, (•) is the digamma function and the parameters ( Λk , mk , āγ , bγ ) satisfy the following: It follows from ( 78) and (80) that Using ( 105) and ( 70)-( 71) in (103), we have Thus, Now, L(q(θ, γ), t, x) p(t,x) can be maximized with respect to q(θ k ) and q(γ) using variational optimization.It can be seen that optimal distributions maximizing L(q(θ, γ), t, x) p(t,x) are given as where the parameters ( Λk , mk , āγ , bγ ) satisfy (99)-(102).The maximum attained value of L(q(θ, γ), t, x) p(t,x) is given as max q(θ,γ) −KL(q * (γ) p(γ; âγ , bγ )) where (•) is the digamma function.After substituting the maximum value in (97) and calculating Kullback-Leibler divergences, we obtain (98).

An Algorithm for the Computing of Information Leakage
Result 1 forms the basis of Algorithm 8 that computes the information leakage using available data samples.Algorithm 8 Estimation of information leakage, IL f t→x = I(t; x) − H(t), using variational approximation , 1000) (i.e., constraining the maximum possible number of auxiliary points M max below 1000 for computational efficiency) to obtain variational membership-mappings Bayesian model The functionality of Algorithm 8 is as follows: • Step 1 applies Algorithm 7 for the inference of a variational membership-mappings Bayesian model.
Remark 4 (Computational Complexity).The computational complexity of Algorithm 8 is asymptotically dominated by the computation of inverse of M × M dimensional matrix Λk in (100) to calculate mk .Thus, the computational complexity of Algorithm 8 is given as O(M 3 ), where M is the number of auxiliary points.

Example 1 (Verification of Information Leakage Estimation Algorithm).
To demonstrate the effectiveness of Algorithm 8 in estimating information leakage, a scenario is generated where t ∈ R 10 and x ∈ R 10 are Gaussian distributed such that x = t + ω; t ∼ N (0, 5I 10 ); ω ∼ N (0, σI 10 ) with σ ∈ [1,15].Since the data distributions in this scenario are known, the information leakage can be theoretically calculated and is given as For a given value of σ, 1000 samples of t and x were simulated and Algorithm 8 was applied for estimating information leakage.The experiments were carried out at different values of σ ranging from 1 to 15. Figure 3 compares the plots of estimated and theoretically calculated values of information leakage against σ.A close agreement between the two plots in Figure 3 verifies the effectiveness of Algorithm 8 in estimating information leakage without knowing the data distributions.

Definitions
To define formally the information theoretic measures for privacy leakage, interpretability, and transferability; a few variables and mappings are introduced in Table 2. Definitions 10-12 provide the mathematical definitions of the information theoretic measures.Since the defined measures are in the form of information leakages, Algorithm 8 could be directly applied for practically computing the measures provided the availability of data samples.

A Unified Approach to Privacy-Preserving Interpretable and Transferable Learning
The presented theory allows us to develop an algorithm that implements privacypreserving interpretable and transferable learning methodology in a unified manner.Algorithm 9 is presented for a systematic implementation of the proposed privacypreserving interpretable and transferable deep learning methodology.The functionality of Algorithm 9 is as follows: Algorithm 9 Algorithm for privacy-preserving interpretable and transferable learning tg * }}; and the differential privacy parameters: d ∈ R + , ∈ R + , δ ∈ (0, 1).1: A differentially private approximation of source dataset, Y +sr = {Y +sr c } C c=1 , is obtained using Algorithm 5 on Y sr . 2: Differentially private source domain classifier, {P +sr c } C c=1 , is built using Algorithm 6 on Y +sr taking subspace dimension as equal to min(20, p sr ) (where p sr is the dimension of source data samples), ratio r max as equal to 0.5, and number of layers as equal to 5. 3: Taking subspace dimension n st = min( p sr /2 , p tg ), the source domain transformation matrix V +sr ∈ R n st ×psr is defined as with its i-th row equal to the transpose of the eigenvector corresponding to the i-th largest eigenvalue of sample covariance matrix computed on differentially private approximated source samples.The target domain transformation matrix V tg ∈ R n st ×p tg is defined as with its i-th row equal to the transpose of the eigenvector corresponding to the i-th largest eigenvalue of the sample covariance matrix computed on target samples.4: For the case of heterogenous source and target domains, the subspace alignment approach is used to transform target samples via (40) and ( 41 • Step 2 builds the differentially private source domain classifier following Algorithm 6 from previous work [22].

•
Step 6 results in the building of the target domain classifier using the method of [22].

•
An information theoretic evaluation of privacy leakage, interpretability, and transferability is undertaken at step 8, 9, and 10, respectively.

•
Step 8 also provides the adversary model BM y + sr →x sr , which can be used to estimate private data and thus to simulate privacy attacks; • Step 9 also provides the interpretability model BM y + sr →t sr , that can be used to estimate interpretable parameters and thus provide an interpretation to the non-interpretable data vectors.

Experiments
Experiments have been carried out to demonstrate the application of the proposed measures (for privacy leakage, interpretability, and transferability) to privacy-preserving interpretable and transferable learning.The methodology was implemented using MAT-LAB R2017b and the experiments have been made on an iMac (M1, 2021) machine with 8 GB RAM.

MNIST Dataset
The MNIST dataset contains 28 × 28 sized images divided into a training set of 60,000 images and test set of 10,000 images.The images' pixel values were divided by 255 to normalize the values in the range from 0 to 1.The 28 × 28 normalized pixel values of each image were flattened to an equivalent 784-dimensional data vector.

Interpretable Parameters
For an MNIST digits dataset, there exist no additional interpretable parameters other than the pixel values.Thus, we defined corresponding to a pixel values vector y ∈ [0, 1] 784 an interpretable parameter vector t ∈ {0, 1} 10 such that the j-th element t j = 1, if the j-th class label is associated to y; otherwise, t j = 0.That is, interpretable vector t, in our experimental setting, represents the class label assigned to data vector y.

Private Data
Here, we assume that pixel values are private, i.e., x sr = y sr .

Semi-Supervised Transfer Learning Scenario
A transfer learning scenario was considered in the same setting as in [22,30] where 60,000 training samples constituted the source dataset; a set of 9000 test samples constituted the target dataset, and the classification performance was evaluated on the remaining 1000 test samples.Out of 9000 target samples, only 10 samples per class were labeled and the remaining 8900 target samples remained as unlabeled.

Experimental Design
Algorithm 9 is applied with the differential privacy parameters as d = 1 and δ = 1 × 10 −5 .The experiment involves six different privacy-preserving semi-supervised transfer learning scenarios with privacy-loss bound values as = 0.1, = 0.25, = 0.5, = 1, = 2, and = 10.For the computation of privacy leakage, interpretability measure, and transferability measure in Algorithm 9, a subset consisting of 5000 randomly selected samples was considered.

Results
The experimental results have been plotted in Figure 4. Figure 4a-c display the privacy-accuracy trade-off curve, privacy-interpretability trade-off curve, and privacytransferability trade-off curve respectively.The following observations are made:

•
As expected and observed in Figure 4f, the transferability measure is positively correlated with the accuracy of source-domain classifier on target test samples.• Since we have defined the interpretable vector associated to a feature vector as representing the class label, the positive correlations of interpretability measure with the source domain classifier's accuracy and the transferability measure are observed in Figure 4e and Figure 4f, respectively.

•
The results also verify the robust performance of Algorithm 9 under transfer and multi-task learning scenario, since the classification performance in the transfer and multi-task learning scenario, unlike the performance of the source domain classifier, is not adversely affected by a reduction in privacy leakage, interpretability measure, and transferability measure as observed in Figure 4a,e,f.Table 3 reports the results obtained by the models that correspond to minimum privacy leakage, maximum interpretability measure, and maximum transferability measure.The robustness of transfer and multi-task learning scenario is further highlighted in Table 3.To achieve the minimum value of privacy leakage, the accuracy of source domain classifier must be decreased to 0.1760; however, the transfer and multi-task learning scenario achieves the minimum privacy leakage value with the accuracy of 0.9510.As observed in Table 3, the maximum transferability-measure models also correspond to the maximum interpretabilitymeasure models.As a visualization example, Figure 5 displays noise-added data samples for different values of information theoretic measures.

Office and Caltech256 Datasets
The "Office+Caltech256" dataset has 10 common categories of both Office and Cal-tech256 datasets.The dataset has four domains: amazon, webcam, dslr, and caltech256.This dataset has been widely used [31][32][33][34] for evaluating multi-class accuracy performance in a standard domain adaptation setting with a small number of labeled target samples.Following [32], the 4096-dimensional deep-net VGG-FC6 features are extracted from the images.However, for the learning of classifiers, the 4096-dimensional feature vectors are reduced to 100-dimensional feature vectors using principal components computed from the data of amazon domain.Thus, corresponding to each image, a 100-dimensional data vector is constructed.

Interpretable Parameters
Corresponding to a data vector y ∈ R 100 , an interpretable parameter vector t ∈ {0, 1} 10  is defined such that the j-th element t j = 1, if the j-th class label is associated to y; otherwise, t j = 0.That is, interpretable vector t, in our experimental setting, represents the class-label assigned to data vector y.

Private Data
Here, we assume that the extracted image feature vectors are private, i.e., x sr = y sr .
The number of training samples per class in the source domain is 20 for amazon and is 8 for the other three domains; 2.
The number of labeled samples per class in the target domain is 3 for all the four domains.

Experimental Design
Taking a domain as the source and another domain as the target, 12 different transfer learning experiments are performed on the four domains associated to the "Office+Caltech256" dataset.Each of the 12 experiments is repeated 20 times via creating 20 random train/test splits.In all of the 240 (= 12 × 20) experiments, Algorithm 9 is applied three times with varying values of privacy-loss bound: first with differential privacy parameters as (d = 1, = 0.01, δ = 1 × 10 −5 ), second with differential privacy parameters as (d = 1, = 0.1, δ = 1 × 10 −5 ), and third with differential privacy parameters as (d = 1, = 1, δ = 1 × 10 −5 ).As Algorithm 9 with different values of privacy-loss bound will result in different models, the transfer and multi-task learning models that correspond to the maximum interpretability measure and maximum transferability measure are considered for an evaluation.
ILS (1-NN) [32]: This method learns an Invariant Latent Space (ILS) to reduce the discrepancy between domains and uses Riemannian optimization techniques to match statistical properties between samples projected into the latent space from different domains.2.

3.
MMDT [34]: The Maximum Margin Domain Transform (MMDT) method adapts max-margin classifiers in a multi-class manner by learning a shared component of the domain shift as captured by the feature transformation.4.
HFA [36]: The Heterogeneous Feature Augmentation (HFA) method learns common latent subspace and a classifier under the max-margin framework.5.
OBTL [33]: The Optimal Bayesian Transfer Learning (OBTL) method employs a Bayesian framework to transfer learning through the modeling of a joint prior probability density function for feature-label distributions of the source and target domains.

Results
Tables 4-15 report the results, and the first two best performances have been marked.Finally, Table 16 summarizes the overall performance of the top four methods.As observed in Table 16, the maximum transferability-measure model remains as best performing in the maximum number of experiments.The most remarkable result observed is that the proposed methodology, despite being privacy-preserving, ensuring the differential privacy-loss bound to be less than equal to 1 and not requiring access to source data samples, performs better than even the non-private methods.

An Application Example: Mental Stress Detection
The mental stress detection problem is considered as an application example of the proposed privacy-preserving interpretable and transferable learning approach.The dataset from [17], consisting of heart rate interval measurements of different subjects, is considered for the study of an individual stress detection problem.In [17], a membership-mappingsbased interpretable deep model was applied for an estimation of stress score; however, the current study deals with application of the proposed privacy-preserving interpretable and transferable deep learning method to solve the stress classification problem.The problem is concerned with the detection of stress in an individual based on the analysis of recorded sequence of R-R intervals, {RR i } i .The R-R data vector at i−th time-index, y i , is defined as That is, the current interval and history of previous d intervals constitute the data vector.
Assuming an average heartbeat of 72 beats per minute, d is chosen as equal to 72 × 3 = 216 so that the R-R data vector consists of on average 3-minute-long R-R intervals sequences.A dataset, say {y i } i , is built via (1) preprocessing the R-R interval sequence {RR i } i with an impulse rejection filter [37] for artifacts detection and (2) excluding the R-R data vectors containing artifacts from the dataset.The dataset contains the stress score on a scale from 0 to 100.A label of either "no-stress" or "under-stress" is assigned to each y i based on the stress score.Thus, we have a binary classification problem.

Interpretable Parameters
Corresponding to a R-R data vector, there exists the set of interpretable parameters: mental demand, physical demand, temporal demand, own performance, effort, and frustration.These are the six components of stress acquired using the NASA Task Load Index [38].The NASA Task Load Index provides subjective assessment of stress where an individual provides a rating on the scale from 0 to 100 for each of the six components of stress (mental demand, physical demand, temporal demand, own performance, effort, and frustration).Thus, corresponding to each 217−dimensional R-R data vector, there exists a six-dimensional interpretable parameters vector acquired using the NASA Task Load Index.

Private Data
Here, we assume that heart rate values are private.As instantaneous heart rate is given as HR i = 60/RR i ; thus, information about private data is directly contained in the R-R data vectors.

Semi-Supervised Transfer Learning Scenario
Out of the total subjects, a randomly chosen subject's data serve as the source domain data.Considering every other subject's data as the target domain data, the transfer learning experiment is performed independently on each target subject where 50% of the target subject's samples are labeled, and the remaining unlabeled target samples also serve as test data for evaluating the classification performance.However, only the target subjects, with data containing both the classes and at least 60 samples, were considered for experimentation.There are in total 48 such target subjects.The private source domain data must be protected while transferring knowledge from source to target domain; and 2.
The interpretability of the source domain model should be high.
In view of the aforementioned requirements, the models that correspond to the minimum privacy leakage and maximum interpretability measure amongst all the models obtained corresponding to 10 different choices of differential privacy-loss bound are considered for detecting stress.

Results
Figure 6 summarizes the experimental results where accuracies obtained by both minimum privacy-leakage models and maximum interpretability-measure models have been displayed as box plots.It is observed in Figure 6 that the transfer and multi-task learning improves considerably the performance of source domain classifier.Table 17 reports the median values (of privacy leakage, interpretability measure, transferability measure, and classification accuracy) obtained in the experiments on 48 different subjects.The robust performance of transfer and multi-task learning scenario is further observed in Table 17.As a visualization example, Figure 7 displays the noise-added source domain heart rate interval data for different values of information theoretic measures.

Concluding Remarks
The paper has introduced information theoretic measures for privacy leakage, interpretability, and transferability to study the trade-offs.This is the first study to develop an information theory-based unified approach to privacy-preserving interpretable and transferable learning.The experiments have verified that the proposed measures (for privacy leakage, interpretability, and transferability) can be used to study the trade-off curves (between privacy leakage, interpretability measure, and transferability measure) and thus to optimize the models for the given application requirements such as the requirement of minimum privacy leakage, the requirement of maximum interpretability measure, and the requirement of maximum transferability measure.The experimental results on the MNIST dataset showed that the transfer and multi-task learning scenario improved remarkable the accuracy from 0.1760 to 0.9510 while ensuring the minimum privacy leakage.The experiments on Office and Caltech256 datasets indicated that the proposed methodology, despite ensuring differential privacy-loss bound to be less than equal to 1 and not requiring an access to source data samples, performed better than even existing non-private methods in six out of 12 transfer learning experiments.The stress detection experiments on a real-world biomedical data led to the observation that the transfer and multi-task learning scenario improved the accuracy from 0.3411 to 0.9647 (while ensuring the minimum privacy leakage) and from 0.3602 to 0.9619 (while ensuring the maximum interpretability measure).The considered unified approach to privacy-preserving interpretable and transferable learning involves membership-mappings-based conditionally deep autoencoders, albeit other data representation learning models could be explored.The future work includes the following:

•
Although the text has not focused on federated learning, the transfer learning approach could be easily extended to the multi-party system and the transferability measure could be calculated for any pair of parties.• Also, the explainability of the conditionally deep autoencoders follows, similar to in [17], via estimating interpretable parameters from non-interpretable data feature vectors using variational membership-mapping Bayesian model.• Furthermore, the variational membership-mapping Bayesian model quantifies uncertainties on the estimation of parameters (of interest), which is also important for a user's trust on the model.

Figure 1 .
Figure 1.An information theoretic unified approach to "privacy-preserving interpretable and transferable learning" for studying the privacy-interpretability-transferability trade-offs while addressing beneficence, non-maleficence, autonomy, justice, and explicability principles of TAI.

Figure 2 .
Figure 2. The proposed methodology to evaluate privacy leakage, interpretability, and transferability in terms of the information leakages.1.4.1.Defining Measures in Terms of the Information Leakages The privacy, interpretability, and transferability measures are defined in terms of the information leakages: • Privacy leakage is measured as the amount of information about private/sensitive variables leaked by the shared variables; • Interpretability is measured as the amount of information about interpretable parameters leaked by the model; • Transferability is measured as the amount of information about the source domain model output leaked by the target domain model output.

1. 5 . 1 .
A Unified Approach to Study the Privacy, Interpretability, and Transferability Aspects of Trustworthy AI

Figure 3 .
Figure 3.A comparison of the estimated information leakage values with the theoretically calculated values.

Require:
The labeled source dataset: Y sr = {Y sr c } C c=1 (where Y sr c = {y i,c sr ∈ R psr | i ∈ {1, • • • , N sr c }} represents the c-th labeled samples); the set of private data: X sr = {X sr c } C c=1 (where X sr c = {x sr ∈ R nsr | x sr = f −1 xsr →ysr (y sr ), y sr ∈ Y sr c }); the set of interpretable parameters: T sr = {T sr c } C c=1 (where T sr c = {t sr ∈ R q | t sr = f −1 tsr →ysr (y sr ), y sr ∈ Y sr c }); the set of a few labeled target samples: {Y tg c } C c=1 (where Y tg c = {y i,c tg ∈ R p tg | i ∈ {1, • • • , N tg c }} is the set of c-th labeled target samples); the set of unlabeled target samples:

Figure 4 .
Figure 4.The plots between privacy leakage, interpretability measure, transferability measure, and accuracy for MNIST dataset.

Figure 5 .
Figure 5.An example of a source domain sample corresponding to different levels of privacy leakage, interpretability measure, and transferability measure.

Figure 6 .
Figure 6.The box plots of accuracies obtained in detecting mental stress on 48 different subjects.

Figure 7 .
Figure 7.A display of source domain R-R interval data corresponding to different levels of privacy leakage, interpretability measure, and transferability measure.

Table 1 .
Core issues with TAI principles and solution approaches.
• Let B(R N ) denote the Borel σ−algebra on R N , and let λ N denote the Lebesgue measure on B(R N ).

Table 2 .
Introduced variables and mappings.: R n sr → R p sr mapping from private variables to noise-added data vector, i.e., y + sr = f x sr →y + sr (x sr ) f t sr →y + sr : R q → R p sr mapping from interpretable parameters to noise-added data vector, i.e., y + sr = f t sr →y + sr (t sr ) sr tg→sr ∈ R p sr representation of target domain data vector y tg in source domain via transformation (39) {P tg c } C c=1 target domain autoencoders, representing data features of each of C classes, obtained via Algorithm 6

Table 2 .
Cont.R p tg → {1, • • • , C} mapping assigning class label to target domain data vector y tg via (47), i.e., f y tg →c (y tg ) = ĉ y tg→sr (y tg ); {P R p sr transformation of y tg to source domain and filtering through the autoencoder that represents the source domain feature vectors of the same class as that of y tg , i.e., ŷsr tg = W D y tg→sr (y tg ); P +sr f y tg →c (y tg ) ŷtg tg ∈ R p sr transformation of y tg to source domain and filtering through the autoencoder that represents the target domain feature vectors of the same class as that of y tg , i.e., ŷtg tg = W D y tg→sr (y tg ); P Privacy leakage (by the mapping from private variables to noise-added data vector) is a measure of the amount of information about private/sensitive variable x sr leaked by the mapping f x sr →y + tg → ŷtg tg ŷsr tg Definition 10 (Privacy Leakage).sr and is defined as IL f xsr →y + sr := I x sr ; f x sr →y + sr (x sr ) − H(x sr ) (110) = I x sr ; y + sr − H(x sr ).(111) Definition 11 (Interpretability Measure).Interpretability (of noise-added data vector) is measured as the amount of information about interpretable parameters t sr leaked by the mapping f t sr →y + sr and is defined as IL f tsr →y + sr := I t sr ; f t sr →y + sr (t sr ) − H(t sr ) (112) = I t sr ; y + sr − H(t sr ).

Table 3 .
Results of experiments on MNIST dataset for evaluating privacy leakage, interpretability, and transferability.

Table 4 .
Accuracy (in %, averaged over 20 experiments) obtained in amazon→caltech256 semisupervised transfer learning experiments.The first and second best performances have been marked.

Table 5 .
Accuracy (in %, averaged over 20 experiments) obtained in amazon→dslr semi-supervised transfer learning experiments.The first and second best performances have been marked.

Table 6 .
Accuracy (in %, averaged over 20 experiments) obtained in amazon→webcam semi-supervised transfer learning experiments.The first and second best performances have been marked.

Table 7 .
Accuracy (in %, averaged over 20 experiments) obtained in caltech256→amazon semisupervised transfer learning experiments.The first and second best performances have been marked.

Table 8 .
Accuracy (in %, averaged over 20 experiments) obtained in caltech256→dslr semi-supervised transfer learning experiments.The first and second best performances have been marked.

Table 9 .
Accuracy (in %, averaged over 20 experiments) obtained in caltech256→webcam semisupervised transfer learning experiments.The first and second best performances have been marked.

Table 10 .
Accuracy (in %, averaged over 20 experiments) obtained in dslr→amazon semi-supervised transfer learning experiments.The first and second best performances have been marked.

Table 11 .
Accuracy (in %, averaged over 20 experiments) obtained in dslr→caltech256 semi-supervised transfer learning experiments.The first and second best performances have been marked.

Table 12 .
Accuracy (in %, averaged over 20 experiments) obtained in dslr→webcam semi-supervised transfer learning experiments.The first and second best performances have been marked.

Table 13 .
Accuracy (in %, averaged over 20 experiments) obtained in webcam→amazon semisupervised transfer learning experiments.The first and second best performances have been marked.

Table 14 .
Accuracy (in %, averaged over 20 experiments) obtained in webcam→caltech256 semisupervised transfer learning experiments.The first and second best performances have been marked.

Table 15 .
Accuracy (in %, averaged over 20 experiments) obtained in webcam→dslr semi-supervised transfer learning experiments.The first and second best performances have been marked.

Table 16 .
Comparison of the methods on "Office+Caltech256" dataset.

Table 17 .
Results (median values) obtained in stress detection experiments on a dataset consisting of heart rate interval measurements.