1. Introduction
Quantum technologies are increasingly being integrated into various disciplines, ranging from Quantum Key Distribution (QKD) [
1], which enables secure key exchanges, to the advancement of quantum information processing technologies like quantum computers [
2]. Furthermore, Quantum Machine Learning (QML) emerges as a promising field, leveraging quantum computational advantages to address complex problems [
3]. Despite the fact that the physical realization of quantum computers and quantum circuits currently trails behind that of their classical counterparts, the theoretical and conceptual frameworks of quantum technologies have demonstrated promising potential across a broad spectrum of applications. Here, QML represents a frontier in computational science, blending quantum computing’s potential with classical machine learning’s algorithmic precision. The promise of QML lies in its capacity to process and analyze complex, high-dimensional datasets beyond the reach of current classical methodologies [
3]. Central to these quantum information technologies are feature maps, which are instrumental in encoding classical data onto quantum circuits or qubits. These processes rely on the algebraic principles underlying the symmetries of the SU(2) Lie group [
4], and these underlying algebraic structures of information encodings are the exact focus of this article.
Lie groups, which are continuous transformations, are instrumental in numerous physical and mathematical theories, providing a rich lexicon for describing symmetries and corresponding transformations [
5,
6].
This paper introduces a novel approach to data obfuscation using the exponential map of Lie-group generators, tested for publicly available medical data. Similar to how quantum computing aims to harness the multidimensional and symmetrical qualities of quantum states, our method applies these same properties—specifically, the symmetries found in Lie groups—to alter data within a high-dimensional space. This connection highlights our methodology’s interdisciplinary approach, effectively merging concepts from quantum mechanics, algebra, and machine learning in a novel way.
Central to our investigation is the following question: Can Lie-group theory effectively obfuscate sensitive data while retaining its essential properties for successful machine learning applications, thereby preserving the accuracy and informational value of the original dataset?
Large-scale healthcare analytics exemplify the pressing need for scalable privacy mechanisms in big-data pipelines. Federated learning studies show that sensitive imaging records can be queried without raw data exposure [
7] and that verifiable aggregation further mitigates poisoning threats in distributed settings [
8]. These findings motivate the Lie-group obfuscation presented here, positioning it as a complementary layer for secure, high-dimensional data processing within the journal’s core areas of big data and cognitive computing.
Our research builds on the foundational work of Schuld et al. [
9,
10,
11] and IBM’s Qiskit [
12] in the field of quantum machine learning (QML) and their exploration of feature maps for projecting data onto qubits.
Our main contributions are outlined as follows:
We develop a novel data obfuscation framework using the exponential map of Lie-group generators, tailored for privacy-preserving processing of medical data used in machine learning approaches.
We show where and how the invertibility of our obfuscation technique breaks down by injecting noise into the exponential map of Lie-group generators, thereby making it impossible to recover the original data.
We demonstrate the efficacy of this approach in maintaining and occasionally surpassing the predictive accuracy of machine learning models compared to non-obfuscated datasets.
We establish a conceptual link between the principles of quantum machine learning and our obfuscation methodology, highlighting the potential for cross-disciplinary innovation in leveraging symmetries for data privacy, thereby showing the applicability of quantum mechanical concepts in this context.
The remainder of this article is organized as follows: We provide a collection of related related work in
Section 2.
Section 3 provides our methodology, i.e., a background on quantum feature maps, Lie groups and how to use them for data obfuscation, and where invertibility of the exponential map breaks down by injecting noise.
Section 4 describes our experimental setup and the employed datasets.
Section 5 presents our results. Finally,
Section 7 concludes our approach, presents our findings, discusses the implications, and gives an outlook on future applications.
2. Related Work
The protection of patient privacy is paramount in medical data processing, making data obfuscation a critical area of research. Data obfuscation techniques aim to mask sensitive information while maintaining the utility of the data for machine learning applications. This ongoing research is vital, as it addresses the dual challenge of protecting patient confidentiality and enabling the extraction of actionable insights from medical data [
13].
One common approach is data anonymization, where identifiers such as names and social security numbers are removed or replaced with pseudonyms. For instance, the k-anonymity model [
14] ensures that each record is indistinguishable from at least k-1 others regarding certain attributes. However, ref. [
15] highlighted the vulnerability of k-anonymity to re-identification attacks, leading to the development of more sophisticated methods. Studies by Lu et al. [
16] have applied homomorphic encryption to medical datasets, enabling secure analysis without compromising patient confidentiality.
Deep learning (DL)-based algorithms for image classification have demonstrated remarkable results in improving healthcare applications’ performance and efficiency. To address privacy concerns, especially in cloud-based solutions, data obfuscation techniques like variational autoencoders (VAEs) combined with random pixel intensity mapping can be used to enable DL model training on secured medical images while ensuring privacy [
17].
Olatunji et al.’s comprehensive review [
13] of healthcare data anonymization techniques underscores the delicate balance between privacy and utility in the context of modern big data and machine learning challenges, which also applies to data obfuscation.
Quantum information processing presents novel opportunities for advancing machine learning, particularly through quantum machine learning (QML). The integration of quantum computing with machine learning algorithms has the potential to revolutionize data processing, offering significant improvements in speed and efficiency.
A key concept in QML is the use of quantum feature maps, which embed classical data into high-dimensional quantum states. This process can enhance the representational capacity of machine learning models. Havlíček et al. [
4] demonstrated that quantum feature maps could enable the classification of complex datasets that are challenging for classical models.
Quantum algorithms can potentially provide new methods for data privacy. For example, Lloyd et al. [
18] proposed quantum algorithms for principal component analysis (PCA), which can be applied to obfuscate data while preserving essential features for machine learning tasks. Such approaches leverage the principles of quantum mechanics to enhance data security and utility simultaneously.
Synthetic data generation [
19] involves creating artificial datasets that resemble real data but do not contain actual patient information. Techniques such as the use of generative adversarial networks (GANs) have been employed to generate realistic medical data for machine learning. While synthetic data can effectively preserve privacy, ensuring the fidelity and utility of such data is an area of active investigation. For a similar purpose but to create exemplary test classification datasets, Raubitzek et al. [
20] showed that one can use Lie algebras to create synthetic and artificial data. This approach was tested using both quantum machine learning and classical machine learning algorithms.
3. Methodology
We start this section by discussing the fundamentals of quantum information processing and quantum machine learning necessary to understand our ideas, which we then expand to present our novel approaches. Overall, the presented approach is based on ideas from [
21], where researchers employed arbitrary Lie groups and their respective generators to construct kernel matrices to be used in a quantum kernel estimator. However, the current approach differs such that we break these symmetries and exploit this to obfuscate data and increase the overall amount of data. However, in the following, we need to reiterate the initial steps of feature-map construction from [
21] to then show the insertion of noise and how to alter the employed feature maps and symmetry groups.
Quantum machine learning consists of two main steps: the feature-encoding step and the actual quantum computation. In this work, we focus exclusively on the first step.
Standard quantum feature encoding can be described as follows:
where
denotes the quantum state resulting from the application of a feature map () to the initial state ();
represents a state in the complex Hilbert space ();
is a unitary operator encoding the classical data (x) into a quantum state;
is the initial state of the system before encoding.
Thus, Equation (
1) describes the process of encoding classical data (
x) into a quantum state (
) via the unitary feature map (
).
These feature maps, particularly those in the Pauli class, exploit the symmetry properties of
. The behavior of Pauli-class feature maps is governed by the Pauli matrices, which consist of three
complex matrices:
These matrices represent the foundational elements of the Lie algebra associated with the group. refers to the set of all unitary matrices with a determinant of 1. The corresponding Lie algebra, , is defined as the collection of all traceless Hermitian matrices.
The Lie algebra (
) is constructed using the Pauli matrices scaled by
:
We expand this concept to include Lie groups
and
, introducing noise to the generators to subtly disrupt these symmetries and obfuscate the original data, thereby rendering it non-reproducible. To illustrate this, we first discuss quantum feature maps, with particular attention to two commonly used in IBM’s Qiskit [
12], demonstrating where the mechanics of Lie groups come into play. We then extend this approach to incorporate
and
as described in [
5,
6].
Among the many quantum feature maps, the Z and ZZ feature maps are standard options implemented in IBM’s Qiskit [
12]. These maps utilize the properties of Pauli matrices to generate rotations within a complex two-dimensional space, enabling the encoding of classical data into the quantum domain.
The core idea is akin to a standard rotation matrix parameterized by an angle
; however, this approach extends to complex rotations parameterized by the Pauli matrices. This leads to the following feature maps, which are variations of Equation (
1) and are illustrated in
Figure 1:
The Z Feature Map:
The Z feature map utilizes the Pauli-Z operator to encode classical data into quantum states. For a given data point (
x), it applies a phase rotation to each qubit in a quantum register, proportional to the corresponding feature value in
x. This operation can be mathematically expressed as follows:
where
represents the Pauli-Z matrix acting on the
j-th qubit and
denotes the
j-th component of the data vector (
x). This results in a rotation around the Z axis of the Bloch sphere, encoding the data within the phase of the quantum state, as shown in
Figure 1.
The ZZ Feature Map:
Extending the principles of the Z feature map, the ZZ feature map incorporates entanglement between qubits to enhance the richness of the feature space. It applies two-qubit gates that are modulated by the product of pairs of classical data features, further enriching the quantum representation. This operation is depicted in
Figure 1.
Efficient data encoding into a quantum circuit is crucial in quantum information processing. For each data feature, a corresponding manipulation is required. Our approach extends beyond standard quantum feature encoding, which utilizes transformations for individual qubits. For example, encoding 8 features necessitates at least 8 generators. This requirement is satisfied by groups such as , which provides 8 generators, specifically the Gell-Mann matrices.
We parameterize these matrices using normalized features within the exponential map, yielding a group element that is applied to a normalized vector. The result is a complex three-component vector encoding the information of the data sample. This method enables the use of arbitrary Lie groups for data encoding, provided the group’s generators are constructible. This concept, along with an illustrative example, is shown in
Figure 2.
In our approach, we select a Lie group from the families of
or
that is sufficiently large, ensuring it has a number of generators greater than or equal to the number of features. Using the normalized feature vector (
), we parameterize the generators to obtain the corresponding group element (
):
where
represents the individual components of the feature vector,
represents the generators of the chosen symmetry group, and
is a
matrix representing the group element. If the number of generators exceeds the number of features, the parameters for the surplus generators are set to zero. This encoding maps our data samples or vectors into a new feature space and corresponding feature vector represented as follows:
For subsequent machine learning processes, we separate the real and complex components of the resulting feature vector, thereby obtaining
features.
Incorporating a noise term (
) into each set of generators guarantees the data to be obfuscated. This is mathematically represented by a random uniform noise component added to each component of the summed-up set of parameterized generators; if the parameterized set of generators is complex, we add both a real and a complex noise component. This results in the following expression for our noisy group elements:
This addition of noise effectively perturbs each group element (
) generated by the exponential map, leading to a slightly altered encoded quantum state.
This expansion of feature maps to arbitrary Lie groups enhances our ability to represent and manipulate data. By using the diverse symmetries and structures of different Lie groups, we can design feature maps that are tailor-made for specific types of data or learning tasks.
Mathematically, adding a small noise vector () ensures that the perturbed quantum state remains within a vicinity of the original state near the manifold, preserving the relative distances and geometric relationships crucial for machine learning algorithms. This proximity guarantees that, while the data is obfuscated enough to protect privacy, it retains sufficient structure for effective learning.
Finally, we can apply the feature map from Equation (
6) to each sample several times—every time with a different noise component—and, thus, use our approach not only to obfuscate data but also to increase the amount of data, i.e., synthesize additional data, thereby multiplying the amount of data.
3.1. Retrieving the Original Data
Given the previously outlined discussion on the construction of our data obfuscation based on the exponential map of a Lie group, we want to ensure that our original data is not retrievable, which we do by using the following construction of our noise component ().
First of all, we need to make some assumptions about our discussion. We need to assume, first, that an attacker that wants to acquire the original data is familiar with our obfuscation approach and with Lie groups, corresponding algebras, etc. Then, we need to assume an attacker knows about our base vector, as discussed in Equation (
5) and, finally, that the attacker is capable of reproducing the transformation matrix from our transformed feature vector, i.e.,
thereby reproducing
. This starts the discussion on how to choose the noise such that one cannot retrieve the original features (
) from our transformation matrix (
).
First, we need to discuss if and when the exponential map of a Lie group is invertible, and thus the data retrievable:
3.1.1. Local Invertibility
The exponential map, denoted as , where is the Lie algebra of a Lie group (G), is locally invertible around the identity element of G. This follows from the Inverse Function Theorem, which applies because the differential of the exponential map at the identity (zero in the Lie algebra) is the identity map, making it a local diffeomorphism at this point.
3.1.2. Global Invertibility
Globally, the exponential map is generally not invertible. This is because the map can be neither injective (one-to-one) nor surjective (onto):
Injectivity: The exponential map is not injective if there exist elements () such that but . This can occur, for example, when X and Y differ by a multiple of in certain directions in , particularly for compact or periodic dimensions of G.
Surjectivity: The exponential map may not be surjective for some Lie groups, meaning not all elements of the group can be expressed as the exponential of some element in the algebra. A typical example is non-connected groups where the exponential map reaches only the connected component of the identity.
Given these arguments, we need to look at the most extreme case: Injectivity and surjectivity are given globally for a particular Lie group, and the attacker knows which Lie group we used to encode our data and, furthermore, knows the set of generators we used. Thus, we construct our noise in the following way to make our original data non-retrievable:
This noise (
) can be decomposed into two components:
where
denotes noise that can be expressed as a linear combination of the generators of the Lie group (with a different parameterization vector (
)) and
is a residual noise matrix that cannot be expressed as a linear combination of the generators. This results in the following cases: If the noise is
and
, then the following occurs. As discussed before, given the most extreme case, one can reconstruct the generators. One can obtain a feature vector by assigning different parameterizations to these generators. However, one cannot retrieve the original feature vector exactly. The features will have a small deviation in each of its components. This means the noise injected into the exponential map slightly distorts the original features. Therefore, we construct
such that
where
is a controllable parameter, i.e., the level of noise that we inject into our dataset. Furthermore, we distribute
randomly among the coefficients (
). In conclusion, one cannot retrieve the original feature vector unless one knows precisely the random numbers/coefficients (
).
The next case we need to discuss is the case in which our residual noise component is not zero (), assuming that . In this case, the matrix () resulting from applying the exponential map, i.e., , is not part of the regarded symmetry group; thus, our initial symmetry is broken, and we leave the Lie group’s manifold. However, this means the following:
Loss of Group Structure: The resulting matrix is no longer guaranteed to satisfy the properties that define the group (closure, associativity, identity, and invertibility). Hence, it cannot be inverted within the context of the group.
Breaking Symmetry: The exponential map no longer maps elements of the Lie algebra to the Lie group, breaking/violating the symmetry and making the inverse mapping undefined.
Non-recoverability of Original Features: Since the transformation is no longer within the group, one cannot apply the inverse of the exponential map to recover the original features. The noise () introduces components that do not belong to the algebra; hence, the original structure and information are obfuscated beyond recoverability.
In conclusion, the introduction of residual noise that cannot be expressed as a linear combination of the generators fundamentally disrupts the structure and invertibility of the exponential map, ensuring that the original feature vector cannot be reconstructed from the transformed vector. Furthermore, the noise injected into the parameterizations of the regarded generators ensures a slight distortion of the original features, which further obfuscates the original data. Thus, we conclude that the obfuscated data cannot be reconstructed.
4. Experiments
We performed experiments on four datasets to measure if the data obfuscated using our augmented noisy Lie-group approach can still be classified with a machine learning approach. This means we transformed all four datasets with varying amounts of noise and multipliers (i.e., synthetic data), performed machine learning classification with 80% training data and 20% test or validation data, and noted the accuracy of the machine learning prediction on the test data. We also compared this accuracy to the same machine learning approach but without obfuscating/transforming the data. Also,
Appendix A provides results for other obfuscation techniques with the same setup. This experimental design is depicted in
Figure 3. In the following, we discuss the details of our approach, such as the normalization, the employed machine learning algorithms, and the regarded datasets.
Normalization of Features: We normalized all features to the range of [0, ] to effectively utilize the exponential map with our chosen Lie groups. Furthermore, all categorical features were projected into a numerical space such that we gave each category a distinct value between 0 and .
Datasets and Data Augmentation: We employed four distinct datasets, each subjected to five levels of noise and data augmentation, i.e., we changed the noise parameter where the noise was sampled from. Data synthetization was performed by multiplying the dataset size by factors ranging from 1 (no augmentation) up to 5, i.e., creating different noisy samples for each data point.
Bayesian Optimization and LGBM Classifier: For the classification tasks, we utilized a Light Gradient Boosting Machine (LGBM) classifier. LGBM is known for its efficiency and effectiveness in handling large datasets and high-dimensional feature spaces, making it an apt choice for our experiments [
22]. Bayesian optimization with 100 iterations was employed to search through the hyperparameter space, ensuring the optimal configuration for each experimental condition.
Evaluation Strategy: The datasets were split into training and testing sets using an 80/20 ratio. The performance of the LGBM classifier, trained on the feature-mapped data, was compared against the same LGBM implementation with standard preprocessing, scaling the data to the interval of [0, 1] for both numerical and categorical features. We chose the standard accuracy score as our primary metric for evaluation.
Datasets
The following four datasets are medical datasets and are binary classifications such that the outcome is an identified disease or not. All four datasets are publicly available and can easily be fetched from online databases.
Pima Indians Diabetes Database (OpenML: diabetes, ID: 37) https://www.openml.org/search?type=data&sort=runs&id=37&status=active (accessed on 24 August 2025): 768 samples, 9 numeric features (pregnancies, plasma glucose, blood pressure, skinfold thickness, serum insulin, BMI, pedigree, and age), and a binary target (diabetes: yes/no).
Breast Cancer Coimbra Dataset (OpenML: breast-cancer-coimbra, ID: 42900) https://www.openml.org/search?type=data&sort=runs&id=42900&status=active (accessed on 24 August 2025): 116 samples, 10 numeric features (age, BMI, glucose, insulin, HOMA, leptin, adiponectin, resistin, and MCP-1), and a binary target (cancer: yes/no).
5. Results
The experimental results highlight the efficacy of incorporating Lie group-based feature maps with noise for data obfuscation while maintaining the utility of machine learning models. Applying Bayesian optimization and LGBM classifiers across multiple datasets and conditions provides a robust evaluation framework for our methodology.
The performance of the LGBM classifier proved resilient to the levels of noise and the degree of data obfuscation applied, as reflected by the accuracy measurements. Injecting noise into the data, with the goal of making it more private, did not undermine the model’s ability to predict correctly, suggesting that our method is practical for privacy-preserving machine learning. The accuracy figures, alongside our baseline with original features, are listed in
Table 1. In
Figure 4 and
Figure 5, we show how our method compares with the baseline: enhancements are highlighted in light blue, and cases where the baseline is better are indicated in purple. We also chart the differences; when there is no change, we consider it a win for our method, as the goal is to maintain the baseline accuracy, at least.
We also provide a benchmark comparison relative to other obfuscation methods in
Appendix A.
Each dataset depicts at least one instance in which our method outdid the baseline in accuracy. In fact, for some datasets, our method held up well under most test scenarios. This indicates that, regardless of how much noise we added or how much we increased the dataset size—up to five times—the method was as good as or better than the baseline. We didn’t expect to beat the baseline in every case, as that is not the main goal of data obfuscation, but our findings confirm that transforming the data and shifting it into a different feature space preserves enough information for machine learning models to work effectively.
6. Discussion
In this article, we introduce a novel approach for data obfuscation based on the mathematical framework of Lie groups, where noise is injected in the exponential map to generate obfuscated feature vectors. Experiments were conducted using two Lie-group families—SU(
n) and SL(
n)—together with a Light Gradient Boosting Machine (LGBM) classifier. The results serve as a proof of concept, showing that this methodology can enhance data privacy while maintaining utility for machine learning tasks. As shown in
Table 1, the classifier performed well on transformed data, often achieving results close to or better than the benchmark with unobfuscated data. Additional comparisons with other obfuscation techniques are presented in
Appendix A, where accuracies of our approach are among the strongest across tested methods.
The utility of the method arises from several factors: the unspecified group remains unknown to potential attackers; the parametrization of the generators hides the original feature space; and the injected noise creates an ball around the true information both on and off the Lie manifold, preventing invertibility of the exponential map. This makes recovery of the original data infeasible.
This property is especially relevant for sensitive applications, such as medical data. The method renders raw data inaccessible while retaining the information content needed for reliable machine learning classification. For example, medical records can be obfuscated and still allow for accurate AI-based analysis without exposing patient-level data.
The presented experiments confirm that classification accuracy can remain high under Lie-group transformations, comparable to results obtained with unobfuscated data, particularly with appropriate parameterizations or data augmentation (
Figure 4 and
Figure 5). Furthermore, when evaluated against other obfuscation methods—such as PCA with noise, random projection, feature shuffling, and Gaussian mechanisms—our approach consistently matched or outperformed alternatives in terms of accuracy across multiple datasets.
When information leakage was evaluated via mutual information, Lie-group methods consistently achieved the lowest leakage across datasets, with the single exception of the ilpd dataset, where feature shuffling performed better. Overall, our method combines strong classification accuracy with consistently low leakage, showing favorable trade-offs compared to the baselines in
Appendix A and
Appendix B.
The appendix results further contextualize these findings. Random projections showed inconsistent performance across datasets. PCA with Gaussian perturbations yielded competitive results in some cases, but outcomes were highly sensitive to the chosen noise variance, raising concerns about robustness. Feature shuffling, when applied globally, severely reduced utility due to indiscriminate permutation. Gaussian mechanisms provided strong theoretical privacy guarantees but quickly degraded accuracy as noise increased. In contrast, Lie-group transformations offered a balanced middle ground: low leakage, higher stability than PCA + Gaussian, and better utility than feature shuffling and Gaussian mechanisms.
Beyond empirical results, the approach connects conceptually to quantum information processing. In quantum systems, noise often poses a challenge, yet in this context, injected noise enables privacy-preserving computation. By violating Lie-group symmetries slightly, we draw an analogy to noisy or non-Hermitian quantum evolutions, which break unitarity in a manner similar to how noise in our generators breaks strict invertibility. This suggests a natural alignment between Lie-group obfuscations and quantum-inspired machine learning.
An additional motivation for using noisy Lie-group exponential maps, rather than simpler methods such as random projection or differential privacy, lies in their hardware potential. In principle, one could build quantum or optical circuits implementing specific Lie-group symmetries, where unavoidable device-level noise (manufacturing imperfections, thermal fluctuations, quantum effects...) could be used for obfuscation. Since noise profiles are device-specific, reconstruction would require access to the hardware (or at least substantial information about it), making reverse engineering drastically more difficult. Our results show compatibility across multiple symmetry groups, suggesting that platforms already implementing SU(2)-like transformations, such as quantum computers or quantum key distribution systems, could provide proof-of-principle hardware realizations.
These observations also align with the No Free Lunch theorem [
23], which states that no method is universally optimal across all tasks. Our experiments showed that performance depends on parameter choices such as multipliers and noise levels: in some cases, accuracy exceeded the baseline, while in others, it did not. This variability reflects the theorem’s principle according to which effectiveness depends on dataset and problem specifics.
In the broader landscape of privacy-preserving methods, Differential Privacy (DP) remains the most established standard [
24,
25]. DP guarantees bounded sensitivity by injecting calibrated noise into queries or model updates but often at the expense of model accuracy. For example, DP-SGD [
26] prevents memorization of training data but typically reduces performance in complex tasks. Earlier methods such as
k-anonymity [
14] and
l-diversity [
27] rely on grouping records but are vulnerable to linkage attacks. In contrast, Lie-group obfuscations achieve privacy through algebraic structure, dimensional hiding, and non-invertible noise injection while preserving competitive utility.
In terms of computational cost, the proposed method is not part of the machine learning training loop itself but, instead, consists of straightforward matrix manipulations during the feature transformation stage. These operations decouple cleanly from the learning process, as they are applied once to the data prior to training. In practice, the exponential map–based transformations run within a few seconds per dataset, making them lightweight preprocessing steps. Thus, no severe computational overheads are expected for our technique compared to standard obfuscation or augmentation methods.
Taken together, our findings indicate that Lie-group obfuscations are a promising addition to privacy-preserving machine learning. They combine structured, symmetry-based richness with the robustness of noise injection, offering both theoretical grounding and practical utility. While not a replacement for existing standards such as DP, they represent a complementary approach that can extend the design space of privacy-preserving techniques, with potential applicability in both software and hardware contexts.
7. Conclusions
This work presented a proof-of-concept framework for privacy-preserving data obfuscation based on Lie groups and noisy exponential maps. We tested two group families, SU(n) and SL(n), across four biomedical datasets using a machine learning classifier. Input features were obfuscated through Lie-group transformations from two families (SU(n) and SL(n)), and the resulting representations were classified using a Light Gradient Boosting Machine (LGBM) model. The experiments showed that Lie-group obfuscations can maintain competitive predictive performance while reducing the recoverability of original data. In addition to classification accuracy, we evaluated information leakage through mutual information, finding that Lie-group methods consistently achieved low leakage compared to other baselines, with only isolated cases where alternatives performed better.
The approach differs from conventional techniques such as random projection or Gaussian mechanisms by combining symmetry-based transformations with injected noise, which makes the exponential map non-invertible and the reconstruction of original data infeasible. This makes it suitable for sensitive applications such as medical analytics, where reliable model predictions are needed without exposing raw patient records.
The study underlines the proof-of-concept nature of the work. Results varied depending on dataset and parameter settings, in line with the No Free Lunch theorem, meaning that effectiveness depends on the problem context and configuration. The present contribution is an evaluation to establish Lie-group transformations as a quantum-inspired foundation for privacy-preserving machine learning.
Most importantly, while the approach is quantum-inspired and draws conceptual links to noisy quantum feature maps, no quantum hardware implementation was used in this work. All experiments were purely classical, meaning that practical deployment on quantum or hybrid systems remains to be demonstrated.
Future work should address scaling of the obfuscation pipeline to high-throughput data streams; integration with federated learning architectures for secure, decentralized analytics; and application to heterogeneous clinical records where privacy-aware retrieval is required. Further benchmarking with additional group families, classifiers, and adversarial models will show the robustness and generality of the method.
Overall, this study demonstrates that Lie group-based obfuscation can serve as a mathematically grounded, quantum-inspired mechanism for balancing privacy and utility. By leveraging symmetry, dimensional obfuscation, and noise, the method opens a concrete research direction at the intersection of privacy, machine learning, and quantum-inspired computation.
We also acknowledge that the study relies on openly available datasets, which is a limitation, and that the approach should ideally be tested on a broader variety of datasets to further validate its general applicability, as is the case with most techniques presented at the proof-of-concept stage. The program code is available at
https://github.com/Raubkatz/Quantum_Data_Obfuscation (accessed on 24 August 2025).