An Understanding of the Vulnerability of Datasets to Disparate Membership Inference Attacks

: Recent efforts have shown that training data is not secured through the generalization and abstraction of algorithms. This vulnerability to the training data has been expressed through membership inference attacks that seek to discover the use of speciﬁc records within the training dataset of a model. Additionally, disparate membership inference attacks have been shown to achieve better accuracy compared with their macro attack counterparts. These disparate membership inference attacks use a pragmatic approach to attack individual, more vulnerable sub-sets of the data, such as underrepresented classes. While previous work in this ﬁeld has explored model vulnerability to these attacks, this effort explores the vulnerability of datasets themselves to disparate membership inference attacks. This is accomplished through the development of a vulnerability-classiﬁcation model that classiﬁes datasets as vulnerable or secure to these attacks. To develop this model, a vulnerability-classiﬁcation dataset is developed from over 100 datasets—including frequently cited datasets within the ﬁeld. These datasets are described using a feature set of over 100 features and assigned labels developed from a combination of various modeling and attack strategies. By averaging the attack accuracy over 13 different modeling and attack strategies, the authors explore the vulnerabilities of the datasets themselves as opposed to a particular modeling or attack effort. The in-class observational distance, width ratio, and the proportion of discrete features are found to dominate the attributes deﬁning dataset vulnerability to disparate membership inference attacks. These features are explored in deeper detail and used to develop exploratory methods for hardening these class-based sub-datasets against attacks showing preliminary mitigation success with combinations of feature reduction and class-balancing strategies.


Introduction
Data, and more importantly, relevant, unique, and hard-to-acquire data, have become a valuable asset of the 21st century. Therefore, when these data provide some sort of competitive edge, whether that be commercial or military, the ability to protect these data from discovery becomes of the utmost importance. In addition, with the increase in legislation to protect data rights, such as with the European Union's General Data Protection Regulation, this protection becomes a requirement [1]. However, the ability to protect these data, even through the generalization and abstraction of machine-learning algorithms, is at risk [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16].
The use of AI and machine-learning solutions has increased greatly throughout industry and government; however, the understanding of the vulnerabilities and security issues within these solutions has not kept up with this trend. Recently, research groups have begun to demonstrate these weaknesses and to develop mitigation strategies. This relatively new area of research is a concentration of cybersecurity referred to as artificial intelligence (AI) security and focuses on the vulnerabilities of models and algorithms to attack.
Several key areas of attack within this field include model theft, data poisoning, evasion, and model inversion attacks. Model theft attacks seek to replicate the function of models and can lead to the loss of proprietary information, loss of revenue from deployed models, and the ability for an adversary to better predict potential actions given that they also have similar predictions as the victim. Data poisoning attacks inject malicious data into training datasets to cause general model performance degradation or directed misclassification or prediction to provide an adversarial advantage. General degradation of performance can cause a loss of trust in the system, while directed misclassification or prediction can provide calculated damage to larger organizational mission directives.
Evasion attacks, such as data poisoning attacks, seek to degrade model performance or cause directed misclassification or prediction. However, instead of tainted training data, evasion attacks utilize model inputs that seem normal to general inspection but prey on model weaknesses for the disruption of input classification or prediction. Finally, model inversion attacks seek to gather information on the training data used for the development of the attacked model. This attack is divided into property inference attacks, introduced by Ateniese et al., and membership inference attacks, introduced by Fredrickson et al. [7,16].
Property inference attacks seek an understanding of a training dataset's statistical information. An example of issues caused by this attack include the use of this information to understand competitor training datasets and, thus, build better classifiers and potentially violate intellectual property rights. Membership inference attacks seek to determine the inclusion of specific records within the training dataset of a model and can result in privacy-infringement issues, such as the discovery of personally identifiable information (PII) and personal health information (PHI) as well as identification of proprietary or confidential information.
This effort focuses on membership inference attacks and, in particular, explores disparate membership inference attacks. Disparate membership inference attacks differ from general membership inference attacks in that they focus on attacking individual classes instead of the entire dataset as a whole. As discussed in more detail in Introduction: Previous Work, recent efforts have shown increased attack success when targeting more vulnerable subgroups instead of the entire dataset. These studies found minority subsets of data to be more vulnerable to attack, even after models were trained with fairness constraints and differential privacy, unless these were applied to an extent that sacrificed the accuracy of the model. This increased vulnerability to attack of minority subsets of datasets can prove troublesome for both privacy and competition. Typically, smaller subsets of data within a dataset are less represented because they are harder to obtain. In the case of health classification algorithms, these could be observations of patients with rare diseases. In the case of commercial competition, these could be examples of rare findings within a manufacturing or marketing dataset of key competitive advantages. In either of these cases, the discovery of that information by an adversary can prove detrimental to the organizations and individuals involved, whether through loss of privacy, profit, or competitive advantage.

Previous Work
The following section details the previous work understanding vulnerabilities of minority subgroups of data to membership inference attack and vulnerabilities to disparate membership inference attack. This work highlights the some of the vulnerabilities that these subgroups face, shows improved attack performance when using pragmatic attacks, and sets the stage for the discussion of the need for an understanding of dataset vulnerability to these disparate membership inference attacks.
Long et al. utilized the disparate vulnerabilities in order to show a pragmatic approach to membership inference, in which they were able to show increases in precision over nondeterministic methods on the order of 44% for the MNIST dataset [17,18]. In particular, they were able to show an increase in precision from 51.7% to 95.05% by targeting the more vulnerable subgroups. Long selected vulnerable records by first estimating the number of neighbors of a potential record within the sample space available to the adversary, and deemed those with fewer neighbors as more vulnerable due to their potential to uniquely influence the target algorithm. To determine the neighbors of a potentially vulnerable record, the group trained shadow models to mimic the behavior of the target model. These shadow models were then trained both with and without the target record in order to determine the influence of that record on the shadow model.
The group utilized the intermediate outputs of the shadow models on the record, which implies the record's influence, as a new feature vector for the record. For classification models without intermediate layers, the new feature vectors were created by concatenating the model's prediction vectors. The neighbor/not neighbor classification was evaluated based on the cosine distance between their feature vectors in comparison to a neighborthreshold.
Tonni et al. provided a study on data and model dependencies of membership inference attacks [19]. In agreement with the studies discussed above, they found that class imbalance resulted in increased accuracy of membership inference attacks. They also found an increase in accuracy of the attack with more feature imbalance, and a decrease in accuracy with an increase in the entropy of the training dataset. The feature balance is the probability ratio of one feature versus all the other feature values in C j where C j = {∪ x i∈X x i · a j } = . Therefore, for a dataset D(X, y), the set of distinct feature vectors for the features set A, where a single feature is defined as a j ∈ A, is C = {∪ x i∈X x i } = , and the probability ratio is defined as The entropy of the training dataset H D was measured by taking the mean entropy over the n number of features, where a j ∈ A are the features of the dataset D. Truex et al. compiled a study that evaluated the importance of datasets, target models, and federated learning in relationship to the success of a membership inference attack [20]. This work indicated that the uniqueness of the class boundary definition is a main contributing factor to the vulnerability of an algorithm to membership inference attacks. The number of classes was deemed important through its characterization of the number of regions into which the input space R m is divided, where m is the number of features.
With more classes, each region is smaller, and therefore each region will more tightly surround the provided training instances, allowing for a more successful attack. The inclass standard deviation provided insight into the similarity of feature vectors within the dataset. The more similar a particular observation is to other observations, the less likely it is to significantly impact the decision boundary and, therefore. be inferred through the attack. Therefore, according to this study, the more complex the classification problem, the more likely the success of a membership inference attack.
Yaghini et al. also demonstrated the vulnerability to membership inference as a result of the size and distribution disparities of subgroups [21]. Further, they discovered that this problem continues even when models are trained with fairness constraints and differential privacy, unless these are applied to an extent that sacrifices the accuracy of the classifier. In a similar vein, Bagdasaryan et al. proved that the reduction in accuracy as a result of differential privacy measures disproportionately affects minority subgroup populations within the dataset [22]. Chang et al. proved that attempts to increase fairness in algorithms increases the privacy risk of those subgroups [23]. This is a result of the forcing of the models to equally fit the under-privileged subgroups. This forced equalization of fitting results in a memorization of the training data from the unprivileged subgroups and, therefore, a reduction in the privacy of these groups.

Contributions
The current work focuses on disparate membership inference attacks, which seek to single out individual classes within the training dataset that may be more vulnerable than others. In particular, this study separates itself from those listed above by exploring the vulnerability of datasets to this type of attack instead of the models. This results in the creation of a disparate vulnerability-classification dataset, a disparate vulnerability classifier, and an exploration of potential mitigation strategies. Dataset owners can use this information to determine the potential vulnerability of their datasets to this type of attack and use that understanding to make any necessary changes to that dataset-using insight from the provided mitigation exploration-or to determine any other security measures that should be taken in terms of eventual model deployment, such as API access restriction.
The remainder of the article is laid out as follows. Section 2 discusses the methodology associated with the development of a vulnerability-classification dataset, the attack process, the creation of the vulnerability-classification model, labeling and feature engineering of the vulnerability-classification dataset, and exploratory hardening. Section 3 discusses the results of the vulnerability classification and the associated features. Section 4 provides the results of the hardening exploration. Sections 5 and 6 provide detailed discussions on an understanding of the vulnerabilities of datasets to disparate membership inference attack and the exploratory hardening efforts, respectively. Finally, Section 7 provides a summary of this work and details future efforts to continue the progression of this research.

Methodology
This section discusses the methodology utilized for the development of the dataset used for the creation of the vulnerability-classification model, the methodology used to create victim models and their attacks, as well as the methodology used to generate the exploratory hardening procedures. The first subsection discusses the datasets that were utilized in the creation of the vulnerability-classification dataset. As this article studies the vulnerability of datasets to membership inference attack instead of model vulnerability, a collection of various datasets modeled in different ways were utilized to create this vulnerability-classification dataset.
The next subsection provides an overview of the membership inference attack process, including the development and standardization of victim, shadow, and attack models. Following is a discussion of the labeling ideology for determination of which datasets should be labeled as vulnerable or secure. Section 2.4 discusses the engineering of features to describe the evaluated datasets followed by a discussion on feature selection. Finally, the development of the vulnerability-classification model and exploratory hardening efforts are presented.

Data
In order to create the vulnerability metric, 105 different datasets from the UCI Machine Learning Repository and Kaggle dataset repository were utilized . In order to focus on the datasets themselves and remove the effects of the utilized classification algorithms and attack models, combinations of classification models and attack models were used for each dataset as described in Table 1. More information on the attack method is provided in Methodology: Membership Inference Attack.
All classification models created from the datasets-henceforth, referred to as victim models given that these are the attacked models-were developed using the default settings for each function as defined in the scikit-learn Python library [64]. Several metrics for the attack and victim models were collected, including the accuracy, F1-score, precision, and recall, and were then averaged over all 13 combinations of attacks to develop a singular observation for each dataset. This averaging of metrics allowed for the capture of the dataset response to a variety of attacks and a separation of the vulnerability metric from the type of victim/shadow/attack model combination utilized.
When combined with the features developed for the dataset (as discussed in Methodology: Feature Selection), this provided a vulnerability metric dataset of 118 features and 877 instances, since each class within a dataset is an individual observation. Of these 877 instances, 110 (12.5%) were held out as a test set while insuring that original datasets remained entirely in the training or testing set in order to prevent data leakage. Before selecting a dataset to include in the study, each dataset was inspected to ensure minimal missing data. If a dataset included a feature with greater than 25% missing data, that feature was removed. However, if it was determined that removal of too many features was necessary for proper classification, the dataset was not included. If the number of missing data entries was small enough to allow for dropping observations with missing data while maintaining the dataset utility, this method was utilized to remove missing data. If not, but the feature had less than 25% missing values, then the missing values were imputed using the feature average.
All categorical variables were one-hot encoded. Any binary features remained as such. Finally, prior to utilization, all datasets whose values were outside a zero-to-one range were standardized using the default settings of the MinMaxScaler function within the scikit-learn library. By maintaining consistency in data preparation across all utilized datasets, control was maintained in the process. With the exception of scaling, the same preprocessing steps were completed both before development of the victim model and before the development of the dataset features for vulnerability classification as discussed in Methodology: Development of Dataset Features.
It should also be noted that any dataset that was too small-less than roughly 100 observations in the macro dataset-was difficult to attack using the methodology discussed later (depending on the number of classes in which the dataset was divided) and was not used in the study. This was a result of the neural-net-attack methodology needing to divide the dataset into subgroups for training attack models on individual classes as described below.

Membership Inference Attack
Membership inference attacks can be characterized based on adversarial knowledge of the model being attacked. This knowledge can be white-box, gray-box, or black boxlisted in order of increasing difficulty. This study follows the same adversarial knowledge conventions as Truex et al. [20]. White-box knowledge indicates that the adversary has access to some portion or version of the real training data, gray-box indicates that the adversary has some statistical information on the training data, and black-box indicates that the adversary has nothing more than publicly available information on the training data.
This study assumes white-box knowledge of the training data. By erring to an easier attack by the adversary, the vulnerability-classification model developed will be based on the most vulnerable type of dataset. Anything other than white-box knowledge will result in a more secure dataset. Note that these definitions refer to the adversary's training data knowledge and not their access or understanding of developed victim models. This study assumes black-box access to victim models, meaning that an adversary has access only to the inputs and results vectors of those models. This assumption is justified through the common use of black-box deployment for models when those models are open to access outside the parent organization.
The membership inference attack utilized to develop the vulnerability metric follows the shadow model methodology as developed by Shokri et al. and was chosen due to its general acceptance as a valid membership inference attack methodology as well its ability to successfully attack black-box models [65]. This attack begins with the development of a shadow model training dataset, which was developed through the utilization of some knowledge of the original training dataset-white-box knowledge of the training data-providing an easier situation for the attacker and, thus, a more reliable vulnerability metric.
In this effort, the shadow model training data was developed from a random extraction of 60% to 80% from the original training dataset, dependent on the original size of the dataset. This left 20% to 40% of the original data to be used as test data. For this study, one half of the test data was used to train the victim model and labeled as Trained. The other half of the test data was simply processed through the already trained model and labeled as Not-Trained. In this way, a test dataset of observations labeled as Trained/Not-Trained were developed to evaluate the performance of the attack model. Figure 1 provides a visual description of the data split.
Continuing with Figure 1, the training dataset was then passed through the victim model in order to obtain proper classification. No knowledge of the victim model was required-instead, simply access for input and receipt of output probability vectors (a black-box model) were needed. Next, one-half of this, now properly labeled, shadow model training dataset was utilized to train an ensemble of shadow models that seeks to mimic the characteristics of the victim model. Through the utilization of an ensemble of shadow models made up of various model types and hyperparameter settings, the ensemble can account for different possibilities of victim model architectures and behaviors. This shadow model ensemble development is discussed in greater detail below.
Once the shadow models were developed, the one-half of the training dataset that was utilized to train the shadow models, was labeled as Trained, and the half that was not used was labeled as Not-Trained. The entire shadow model training dataset was then passed through the shadow model ensemble in order to obtain an output probability vector. This vector along with the label of Trained/Not-Trained and the original set of dataset features were utilized as an attack model dataset in order to create a binary classification model that can determine whether an observation was utilized in training of the victim model as shown in Figure 2. Diagram illustrating the flow of data through the attack process. The original dataset is divided into a training (60-80%) and testing dataset (20-40%). One half of the training data is used to develop the shadow model ensemble and is labeled as Trained data given that it is used to train the ensemble. The other half of the training data is labeled as Not-Trained data since it is not used to develop the ensemble but is instead simply passed through the ensemble for the retrieval of output vectors. These two halves are then recombined as a labeled Trained/Not-Trained dataset that is used to train the attack model, which can classify if an observation was used to train the ensemble or not. Finally, this attack model was tested on the victim model with the previous testing dataset-half of which was used to train the victim model and labeled Trained data, and the other half of which was simply passed through the victim model to obtain the output vectors.

Figure 2.
Visual description of the shadow model ensemble and attack model development. The dataset used to train shown in the upper half of the image was passed through the victim model prior to this process in order to obtain proper labeling as described in Figure 1. The upper portion of this image shows the split of the data so that 50% is for training the shadow model ensemble, and the other 50% is not. These two halves were then recombined into the attack model training dataset. The lower half of the image completes the series, showing the passing of the entire attack model training dataset through the shadow model ensemble to acquire the output probability vector from the ensemble. This finalized attack model dataset, consisting of the original dataset features, the output probability vector from the shadow model ensemble, and the Trained/Not-Trained label, was then used to train the attack model.

Development and Standardization of Victim Models
To maintain the concentration of the study on the vulnerability of the datasets themselves as opposed to the models, all victim models utilized in Table 1 were developed and standardized in the same way. The datasets utilized consisted of solely numerical and/or binary features, or if they contained categorical features, the categorical features were one-hot encoded. Prior to classification algorithm training, the datasets were standardized utilizing the default settings of the MinMaxScaler function of scikit-learn.
The following four classification algorithms were used, all from scikit-learn, and all utilizing their default settings with exceptions as noted in parenthesis: RandomForest (100 estimators and no preset depth), LogisticRegression (L2 penalty and lbfgs solver), SVC (rbf kernel and gamma scale), and NaiveBayesGB (α = 1.0 and "True" priors). In addition, a neural network classifier was utilized, which was again kept standard to include a single hidden layer with 128 nodes activated by a ReLu activation and a learning rate of 0.005. Given that all datasets were divided into individual classes within this study, all output layers were binary and were thus activated using a Sigmoid activation. Adam optimization was utilized with a binary crossentropy loss function and a 0.005 learning rate. The neural net classifier was implemented using the TensorFlow Python package.

Development and Standardization of Shadow Models
In their seminal work using the shadow model membership inference attack, Shokri utilized neural networks for both the shadow model ensemble and the attack model [65]. In her work building on Shokri's efforts, Truex found that the shadow model type and attack model type had little effect on the success of the attack but showed promising results through the use of decision trees [20]. However, in order to remove the effect of shadow and attack model type selection from the vulnerability metric, this study utilized combinations of shadow and attack models as described in Table 1. In addition, each shadow model has several hyperparameters, which were chosen at random-visualized in Table 2-in order to develop an ensemble of models that can mimic the victim model.
As mentioned previously and depicted in Figures 1 and 2, a portion (60-80%) of the original dataset was set aside for attack model development. Of this, 50% was used to train the shadow model ensemble and was subsequently labeled as Trained data to indicate that it was used to train the ensemble. The ensemble consisted of 20 models, each with an evenly weighted vote, and with a random selection of hyperparameters as described in Table 2.
The type of model used in the ensemble was determined based on the given combination as shown in Table 1. Using many different models with various, randomly selected hyperparameters in the ensemble provides for better capture of the intricacies of the victim model, thereby, allowing for a better understanding of how that model may be incorporating the training data within its structure. For each dataset, 13 different combinations of victim, shadow, and attack models were created and evaluated to provide emphasis on the dataset instead of the model combination.  2 provide a visual description of the development of the attack model. Following the development of the shadow model ensemble, the 50% of data that was used to train the ensemble and labeled as Trained was recombined with the 50% of the data that was withheld from the ensemble training and labeled Not-Trained. This new dataset was then passed through the shadow model ensemble in order to obtain the output probability vector. This new dataset consisting of the original dataset features, the Trained/Not-Trained label, and the output probability vector was subsequently utilized to train the attack model with the Trained/Not-Trained label as the target variable and the remaining features as the input. The specific model type was determined based on the combination being evaluated as shown in Table 1.
To maintain experimental control over the attack to provide a more universal vulnerability metric, the attack models were kept standard across all datasets as was done for the victim model development. The neural net model utilized a single hidden layer of 64 nodes and a binary output node. The hidden layer was activated using a ReLu activation and the output layer by a sigmoid activation. The model was optimized using an Adam optimizer, a binary cross-entropy loss function, and a learning rate of 0.0001. The random forest model utilized 100 estimators, no predefined depth, and the remainder of the parameters set to the default settings from scikit-learn. The logistic regression model made use of an L2-penalty, an lbfgs solver, and the remainder of the parameters set to the default settings from scikit-learn. The support vector classifier model utilized an rbf kernel, a gamma scale, and the remainder of the parameters set to the default settings from scikit-learn. Finally, the Naive Bayes model was created using an α = 1.0, "True" priors, and the remainder of the parameters set to the default settings from scikit-learn.
While the other attack models could be directly trained on the attack model dataset, the use of the neural net model required a set of two models-one for each binary outcome for the class-based sub-dataset. The neural net models are more capable of capturing the subtleties of the Trained/Not-Trained observations when focused on a particular class and, therefore, require a hard-coded class selection protocol in order to assign the observation to the correct attack neural net based on the predicted class [65]. Given that this study divided the individual classes of each dataset into individual sub-datasets for disparate attack evaluation, the classes of a given subset consisted of the positive and negative Boolean evaluation of membership within the given class.

Understanding of Labeling
To develop the vulnerability-classification model, the data needed to be labeled. Table 3 provides a statistical description of the average accuracy of attack found within the training dataset of this study-averaged across the various combinations of attacks as defined in Table 1. As the attack model was developed on an even class split of data-considering the Trained/Not-Trained label division-and with a binary target, accuracy was deemed to be the most relevant metric on which to develop the vulnerability label. Additionally, given the relatively narrow interquartile range of accuracy-stretching from 0.530 to 0.697-as shown in Table 3, we decided that a binary vulnerable/secure label would best suit the study with the threshold of vulnerability set to the mean of the average attack accuracy. Using an attack accuracy of 62%, we established the guessing percentage as 53%. The labels and training were all confined to the training set in order to maintain complete neutrality of the test set. The test set labels were based on the same threshold as found in the training set in order to maintain consistency.

Development of Dataset Features
This study evaluated the vulnerability of datasets to disparate membership inference attacks, requiring a dataframe consisting of observations made of class-based sub-datasets and features describing those subsets. Therefore, 118 features were developed to describe each dataset and class-based sub-dataset within the study. The features developed were meant to capture as many statistical subtleties of the datasets as possible to explore what properties of a dataset could lead to disparate membership inference attack vulnerability. In addition to features developed by the team, inspiration for features were also derived from work by Brazdil et al. [66].
Among others, the features developed and integrated include measures of depth, width, entropy, correlation, skewness, kurtosis, mutual information, principal component explanation, number of classes, observation distances, and proportions of categorical, binary, and numerical features. Full descriptions of the developed and utilized features can be found in Table A1 located in Appendix A. These features were applied to the overall dataset and subsets of the macro dataset created for each class within the dataset as indicated by the feature description. For example, if a particular dataset had ten classes, then this study divided that dataset into ten separate datasets consisting of a binary label for the class being evaluated and then applied the features to that dataset.
Several features within the training dataset contained disperse distributions, which allowed for sub-samples of the dataset to fail Kolmogorov-Smirnov tests. However, given that the nature of the vulnerability metric requires such a diverse population of datasets, a methodology for standardizing these observations for modeling was required. We discovered that this dispersion of distributions was caused through several datasets having large outliers. The removal of these outliers could cause misleading results as these outliers and features could provide insight into potential vulnerabilities.
Therefore, to avoid the saturating effect of these features, the data was scaled using scikit-learn's MinMaxScaler with its default settings. The scaling factor was generated using the training dataset and then applied to both the training set prior to model development and to the holdout test set prior to testing.

Feature Selection
As indicated below in Methodology: Vulnerability Classification, the resulting number of observations within the training dataset was 767. In order to avoid a wide dataset and potential overfitting or difficulty in classification, the feature set was reduced. Feature selection was performed utilizing an ensemble methodology of Pearson Correlation, χ 2 , and recursive feature selection was performed using logistic regression. Pearson Correlation selection was implemented using Numpy's corrcoef function between the features and labels with default settings for each feature. χ 2 selection was implemented by first scaling the data using scikit-learn's MinMaxScaler with default settings. Then, scikit-learn's SelectKBest function with the χ 2 score function and other parameters set to default was fit to the data. Selections were returned using the SelectKBest's get s upport function. Recursive feature selection was implemented using scikit-learn's RFE, recursive feature elimination, function with a logistic regression estimator, number of features to select set to ten, and with the step set to ten. The Boolean result of keeping or removing the feature for each of these three methods was then placed in a dataframe in descending order based on the number of "keep" votes attributed to the feature.
This methodology was used to down-select the original 118 features to 15. These 15 features were then reduced to seven through an iterative modeling effort and are shown in Table 4 in order of importance based on feature importance ranking of the feature selection ensemble. This iterative modeling involved using the training data in various models with an array of hyperparameter settings in an effort to find the optimal mapping of observations and labels. This model then optimized the number of features by starting with the top 15 features and working down to the eventual seven features found as optimal as discussed in more detail below.

Vulnerability Classification
All methods up to this point were used to develop the vulnerability-classification dataset. This dataset consisting of observations of class-based sub-datasets, features detailing those subsets, and the labels associated with disparate membership inference attack accuracies averaged over various combinations of victim/shadow/attack models was then utilized to develop the vulnerability-classification model.
Several modeling methods were evaluated, including Random Forests, Logistic Regression, Decision Trees, Naive Bayes, and ensemble methodologies, to develop the vulnerability-classification model. These methods were evaluated with all top 15 features, as well as the top 10 and top five features. Ultimately, an ensemble model of a logistic regression model using a liblinear solver, a Naive Bayes model, and a random forest classifier using a minimum of five samples per leaf and 200 estimators was found to provide the best results.
Any hyperparameters not directly mentioned were set to the default scikit-learn settings. This model was then used to down-select the features to the seven shown in Table 4. The model was found utilizing leave-one-out cross validation (LOOCV) over the training dataset and resulted in the training and testing results as shown in Table 5. Given the small size of the vulnerability metric training dataset, ADASYN (Adaptive Synthetic) data sampling was utilized to develop additional data observations [67]. The synthetic data were developed within each training fold through the use of the default settings of the ADASYN function in the imbalanced-learn library [68].

Hardening Exploration
Based on the results found in the vulnerability study and discussed in more detail below, four different methods of hardening against membership inference attack were chosen for exploration. These methods focused on the reduction of the width ratio, increase of feature entropy, and reduction of disparities in class size. Unlike in a macro-level vulnerability study, each dataset in this study consisted of only "one class" given that the datasets were actually class-based sub-datasets as described above. However, this "one class" was represented with a binary label of "represented by this class" or "not represented by this class", and therefore class size disparity still existed and was still considered within the hardening process.
The first method was a feature reduction method, which removed features based on correlation. Features that were more than 80% correlated with other features were removed. The second was a feature reduction method based on manifold theory. Using isometric mapping across a scale of increasing number of components up to a count equal to the original number of features, an elbow plot was created to determine the optimal number of components to maintain. This number of components was then used again in the isometric mapping process to reduce the feature size of the dataframe. Isometric mapping was applied using the default setting from the scikit-learn library for Python.
The third method was an oversampling method using a conditional tabular generative adversarial network (CTGAN) [69]. CTGAN was chosen for oversampling to provide an equal number of observations in classes through oversampling while maintaining the same data structure. Other methods, such as ADASYN and SMOTE, rely more on linear connections between observations, while CTGAN learns the original distribution of the subset of data to be sampled. The CTGAN method was implemented using the default settings with 100 epochs from the SDV library for Python.
The final approach was to use NearMiss version two undersampling implemented through the the imblearn library for Python, with version two selected and "not minority" as the sampling strategy. NearMiss version two was selected to provide an undersampling strategy that maintains the original data structure. Based on the efforts of the original developers of the NearMiss strategy, version two provided the best results for the requirements of this study [70].

Results of Vulnerability Classification
Shown below are the results of the vulnerability classification process, presented through an understanding of the features utilized in the classification model. Interestingly, macro-level dataset features were found to show higher importance in the determination of individual class vulnerability than those developed and processed solely on the class-based sub-datasets. Therefore, within the tables of descriptive statistics based on these macro dataset features shown below, some values are found to be the same across vulnerable and non-vulnerable splits because, within a given dataset, some classes may be vulnerable and others safe. This section presents the results as found within the study. Further discussion of these findings is provided in the Discussion section.

In-Label Distance Measures
In-label distance measures compute the distances between each observation within a class. This metric follows from insight found in work, such as that by Truex and Yaghini discussed above [20,21]. These measures first group observations by class and then determine the distances between each observation within the class using a city block, also known as a Manhattan, distance measurement as shown in Equation (3). This distance metric was utilized in agreement with Aggarwal et al. who found that this L1 norm metric provides better results in high dimensional datasets [71].
As discussed in the Introduction and reiterated above, it is understood that sparse class boundaries can lead to vulnerabilities within a class, and this is a driving factor for disparate vulnerability. Therefore, it is reasonable that five of the seven top features are related to in-label distance measures for the disparate attack vulnerability-classification model. Table 6 provides a summary of the descriptive statistics for each of the included inlabel distance features. It can be seen that, for the average of the label minimum, mean, and maximum distances, the distance is significantly higher for vulnerable datasets when compared to their non-vulnerable counterparts. Furthermore, included as significant features are the variance of the label minimum and mean distances, which were also significantly higher for vulnerable datasets. The variance of label distance features and the average of label minimum distances show the most divergence in the upper 50% of the data.

Width Ratio
The width ratio of a dataset can provide insight into an overabundance of information. The hypothesis being that the provision of many features in description of a limited set of observations can facilitate an adversary's inference of training data membership through this overabundance of information. In this study, the width ratio was implemented as a ratio of the number of features to the number of observations, meaning that a higher width ratio indicates a wider dataset. The feature found to be the most prominent in this family of width ratios was calculated after one-hot encoding of categorical variables and completed on class-based data subsets. Table 7 shows that vulnerable datasets have significantly wider datasets than their non-vulnerable counterparts-in agreement with the stated hypothesis. Table 7. Descriptive statistics of the width ratio after one-hot encoding on the class-based data subset feature for the overall dataset, the non-vulnerable observations, and the vulnerable observations.

Proportion of Binary Features
The proportion of binary features was included in the dataset feature set to understand how different types of features and different ratios of feature types can affect dataset vulnerability to membership inference attacks. Understanding how these types and ratios of feature types relate to vulnerability can assist data owners in the development and setup of their datasets dependent on security vs. utility needs. The proportion of binary features provides the ratio of the number of binary features to the total number of features. Table 8 shows that vulnerable datasets have, on average, a higher number of binary features. In addition, the largest diversion occurs in the upper 50% of the data.

Results of Hardening Exploration
Following the discoveries from the vulnerability study above, exploratory efforts to harden the datasets based on these findings while maintaining utility were attempted. This exploration allowed for both a first approach to dataset hardening methodologies against disparate attacks and a deeper understanding of what methods work for different types of datasets. As mentioned in the Results of Vulnerability Classification section, this section provides the results as found in the study. Further explanation of these results is provided in the Discussion section. Table 9 provides information on the number of datasets hardened; the number of class-based subsets within these datasets; the percent of subsets that were unchanged, made more secure, and made more vulnerable; and information on the changes in the victim model and attack accuracies. From this table, it can be seen that the two combinational hardening methods and the feature reduction via manifold theory performed the best in reducing the vulnerability to disparate membership inference attacks. Of these three, the two combinational hardening methods provided better maintenance of the original (victim) model utility, as evinced through the low/insignificant changes in the victim model accuracy and F1 score on average. Table 9. The results of hardening efforts showing the number of class-based subsets (along with the original number of datasets prior to class-based breakdown) and the results of each hardening method explored. These results include the number of subsets that remained the same, those that became more secure, and those that became more vulnerable, along with the changes in the victim model accuracy, F1 score, and attack accuracy on average.

Discussion of Vulnerabilities to Disparate Membership Inference Attack
This section discusses the findings associated with the vulnerability classification of datasets to disparate membership inference attacks. As shown in Table 5, the developed vulnerability-classification model can determine the vulnerability of a dataset to disparate membership inference attacks with an accuracy of 84.5%. This model provides data owners with the ability to evaluate their datasets' vulnerability to privacy leakage via this attack. Many datasets, including those with PII and PHI, such as medical datasets, and those that contain proprietary or confidential information, such as commercial and military datasets, can lead to individual, organizational, or national detrimental effects if their information is leaked. Therefore, having an understanding of this vulnerability to potential record discovery is of great importance to these data owners.
In addition to the vulnerability classification contribution, the features that make up this model equally contribute in their provision of understanding of this vulnerability. This section discusses these features and their importance to the understanding of this vulnerability.
Five of the seven features selected through the vulnerability-classification modeling effort were based on in-label observational distance measurements. This finding is in agreement with previous understandings that minority and sparsely populated classes tend to have higher vulnerability within a given dataset. Two main ideas are shown in the evaluation of the in-label distance features. The first is that the average label minimum, mean, and maximum distances are all greater for vulnerable datasets than for non-vulnerable datasets. This shows that sparse class regions are more prone to attack than their denser counterparts.
The second is that the variance of the minimum and mean distances-and for the non-included variance of the maximum distance feature-is greater for vulnerable datasets. These variances show the most diversity in the upper 50% of the data. Therefore, it can be concluded that, in addition to the fact that more sparse class regions lead to more vulnerable data subsets, a lack of even distribution of observations within the class region can also lead to disparate vulnerability. This is an important observation and contribution to the current understanding of vulnerability to both macro-level and disparate membership inference attacks. The current literature indicates the contribution of sparse class regions to attack vulnerability [19][20][21]. However, this finding demonstrates that not only can a lack of supporting members lead to the vulnerability of those subclasses but also a lack of uniformity in the density of observations within a particular class boundary region can lead to vulnerability.
The width ratio of class-based sub-datasets after one-hot encoding was also found to be an important feature in classifying disparate vulnerability. A wider dataset can result in an over explanation of observations by providing more information than is necessary for the feature-to-label mapping. This overabundance of information creates opportunities for membership inference attack. The fact that this was a feature developed on the class subsets as opposed to the macro dataset-resulting in wider datasets for those less populated class-based subsets-agrees with previous understandings that under-represented portions of a dataset, such as less populated classes, are more vulnerable to attack.
Finally, the proportion of binary features was found to be an important factor in determining disparate vulnerability. While this feature was developed on the macro dataset, dividing the dataset into class-based subsets would not change its value, given that it is the proportion of binary features to the total number of features. When exploring the reasoning of importance behind the inclusion of this feature, it is interesting to consider the width ratio after one-hot encoding as discussed above. Binary features are, by nature, "on" or "off".
One-hot encoding of categorical features creates a set of binary features indicating "on" or "off" for each element within the encoded categorical variable. Therefore, one-hot encoding of categorical variables will increase the proportion of binary features within the dataset, while the method for calculating the proportion of binary features was coded to not include the categorically one-hot encoded features. The idea that both of these "on/off" features are included as important for determining vulnerability leads to an understanding of how these discrete attributes can lead to vulnerability.
When considering that, within the literature on algorithmic and model vulnerability as discussed in the Introduction: Previous Work, entropy and correlation themes were seen as the most important, it can be speculated that features that include more entropy within the observational attribute itself could be more secure. In other words, features that are binary and one-hot encoded categorical features provide only two states in comparison to the infinite number of states one may find when a feature can take on a continuous value, such as between 0 and 1.
This, as described above for the uniformity of class regions, is an important contribution to the current understanding of the vulnerability of datasets to not only disparate membership inference attack but also membership inference attack in general-namely, that datasets with a larger proportion of continuous variables as opposed to discrete variables are more secure against membership inference attack due to the increase in entropy within those features.

Discussion of Hardening Exploration
This section discusses the results associated with the hardening exploration efforts. While the impetus of the article focuses on the understanding of dataset vulnerability to disparate membership inference attack, an exploration into hardening techniques based on these discovered vulnerabilities can assist in this understanding while also offering an introduction to mitigation strategies for the dataset owner.
These hardening explorations were developed based on the features found to contribute to vulnerability, which can be summarized into an over-abundance of information relating to the width-ratio feature, sparse and unevenly distributed class regions relating to the in-label distance features, and a lack of entropy within individual feature realization possibilities as found in the proportion of binary features and one-hot encoded categorical variables. This study focused on the former two vulnerabilities through feature reduction efforts based on correlation and manifold theory and through class-balance methods based on oversampling via CTGAN and undersampling using NearMiss methodologies. In addition, combinations of feature reduction and oversampling methods were also explored given their more promising results to evaluate if further hardening could be accomplished.
Feature reduction based in manifold theory provided better results than using correlation thresholds. Given that manifold theory finds a lower dimensional representation of the information contained within the feature set, this reduction to a more base layer could provide a perturbation effect on the membership inference attack attempt. In addition, by condensing the feature set, this hardening method could have changed the uniformity of in-region observational distribution to be more uniform and thus provided protection in this manner.
However, end-users may find hardening via manifold theory to be less appealing due to the increased difficulty in understanding and explanation of the results of the classification exercise given the decreased definition of what is contained within a particular feature of importance within their ultimate algorithm.
Oversampling provided for a larger increase in the percent of secured datasets and a lower percent in the number of datasets, which increased in vulnerability as compared to undersampling, while both methods attempted to balance classes, oversampling increased the fortification of existing observations. Therefore, it is reasonable that this increase in supporting members-and thus entropy-would reduce the ability of an attack to discern if an observation was or was not a part of the original training dataset. However, the oversampling method did not provide the level of improvement seen in the manifold theory-based feature reduction method. Given that the oversampling technique utilized maintained the original distributions, there would still be a similar non-uniformity of the inclass region distribution of observations, therefore, leaving this vulnerability-contributing factor unresolved.
However, the best results were found with a combination of the two best-performing methods of feature reduction and class balancing efforts-a manifold theory-based feature reduction with CTGAN oversampling. This provided an effectively insignificant change in the victim model performance while reducing the disparate attack accuracy by 0.02 on average. A total of 19% of class subsets were made more secure, and only 1% were more vulnerable. This elevated protection can be attributed to the perturbing effect and increase of uniformity of the in-class region observational distribution of the manifold-theory-based feature reduction as well as the increase of supporting observations and entropy of the oversampling method.

Summary and Future Efforts
This study provides an in-depth look at the vulnerability of datasets themselves to disparate membership inference attacks-those that focus on attacking individual classes as opposed to the overall dataset-contributing an addition to the current literature that focuses on model vulnerability. This understanding was accomplished through the creation of a vulnerability-classification model based on over 100 datasets-including frequently cited datasets within the AI security literature. The vulnerability-classification dataset used to create this classification model consisted of 118 features and a set of victim, shadow, and attack model accuracies, all used to describe and understand the vulnerability of these datasets to disparate membership inference attacks.
The resulting ensemble model, consisting of a logistic regression model, Naive Bayes model, and a random forest model, obtained a testing accuracy of 84.5% in classifying datasets as vulnerable or secure to these disparate membership inference attacks. Of the seven features used in the classification model, five were based on in-label observational distance measurements. This heavy reliance on observational distances within class regions is consistent with other findings in the literature, which state that minority and sparsely populated classes tend to increase vulnerability.
In addition to the vulnerability-classification model, this study also provided an increased understanding of the vulnerability of datasets to these attacks. First, it was shown that the uniformity of the in-class region distribution is an important factor in dataset vulnerability. Those datasets with a less uniform distribution of in-class observational distances were proven to be more vulnerable to attack. Second, it was shown that an increased proportion of binary features can result in an increase in vulnerability.
This finding was established through the width ratio after one-hot encoding and the proportion of binary features exclusive of categorical one-hot encoded features (both of which indicated an increase in vulnerability with the increase of either the width of the dataset after one-hot encoding) or the increase in the proportion of binary features, while wider datasets in general can contribute to an overabundance of information and, therefore, an adversarial advantage for membership inference. Of particular interest was the inclusion of the post-one-hot encoding aspect. One-hot encoding results in a binary feature indicating a categorical response and is a common preprocessing step for datasets.
Given understandings of entropic contributions to membership inference vulnerability and previous findings of low entropy features causing an increase in vulnerability, we concluded that these binary features, due to their low inherent entropy, result in an increase in attack success. Non-binary features can take on an infinite number of values and, thus, provide more entropy-influenced security compared with two-state binary features.
To further understand these vulnerabilities and to provide exploratory mitigation strategies, we investigated preliminary hardening strategies based on the vulnerabilities discovered in the vulnerability classification process. In particular, feature reduction methods were used to treat an overabundance of information and intelligent over-and undersampling methods were used to treat class-region sparsity and imbalances. The best-performing method proved to be a manifold theory-based feature reduction combined with a CTGAN-based oversampling strategy. This hardening process resulted in a reduction in disparate attack accuracy of 0.02 on average and an effectively insignificant change in the victim model performance. Using this method, 19% of class-based sub-datasets were made more secure, and only 1% were more vulnerable.
We concluded that manifold theory-based feature reduction provided improved results over correlation-based feature reduction due to the perturbing effects resulting from the consolidation of the feature set into a lower dimension as well as the potential densification and increased uniformity of in-class observational distances due to the re-mapping of labels to this new, reduced feature set. CTGAN oversampling's increased success over NearMiss version two undersampling was attributed to the increase in fortifying observations in contrast to a general reduction in majority class size, as well as through an increase in the entropy of affected features and classes through the increase of observations. This work provides data owners with the ability to classify their datasets' vulnerability to disparate membership inference attacks. In addition, this provides an understanding of this vulnerability and provides exploratory mitigation methods. Most notably, this is the first work to exclusively study dataset vulnerability to these attacks as opposed to model vulnerability. Through this effort, additional investigations, such as the in-class uniformity of observational distance and binary vs. continuous feature contributions to vulnerability were provided for a broader understanding of membership inference attacks. Future development efforts should be focused on understanding how hardening at the class level affects hardening at the macro level, as well as deeper investigations into other hardening methods, which may provide even better results.

Conflicts of Interest:
The authors declare no conflict of interest. Table A1. Dataset feature definitions. Features that were applied to both the macro and class-based subsets are defined with "(Macro and Disparate)". Those without this designation were applied only to the macro dataset.

Number of Observations (Macro and
Disparate) The quantity of observations within the original dataset.
Class Entropy Entropy as defined through the number of observations in each class.

Number of Classes
The number of classes.

Number of Features
The number of features in the original dataset.

Number of Features After One Hot Encoding
The number of features after the dataset has been processed using one-hot encoding on categorical features.

Proportion of Categorical Features
The proportion of categorical features in respect to the original number of features.

Proportion of Binary Features
The proportion of binary features in respect to the original number of features.

Mean of Mean Label Distances
The distances of observations within each label were calculated using cityblock distances and then averaged within that label. This feature is the mean of those averages.

Variance of Mean Label Distances
The distances of observations within each label were calculated using cityblock distances and then averaged within that label. This feature is the variance of those averages.

Mean of Mean Label Minimum Distances
The distances of observations within each label were calculated using cityblock distances. This feature is the mean of the minimum of distances for each label.

Variance of Mean Label Minimum Distances
The distances of observations within each label were calculated using cityblock distances. This feature is the variance of the minimum of distances for each label.

Mean of Mean Label Maximum Distances
The distances of observations within each label were calculated using cityblock distances. This feature is the mean of the maximum of distances for each label.

Variance of Mean Label Maximum Distances
The distances of observations within each label were calculated using cityblock distances. This feature is the variance of the maximum of distances for each label.  The variance of the number of categories for each categorical feature in the original dataset.
Mean Feature-Feature Correlation Grouped by Label The mean of the feature to feature correlation when grouped by label.
Maximum Feature-Feature Correlation Grouped by Label The maximum of the feature to feature correlation when grouped by label.
Minimum Feature-Feature Correlation Grouped by Label The minimum of the feature to feature correlation when grouped by label.

Mean of the Variance of Feature-Feature Correlation Grouped by Label
The average of the variance of feature to feature correlations when grouped by label.
Variance of the Means of Feature-Feature Correlation Grouped by Label The variance of the means of the feature to feature correlations when grouped by label. The geometric mean ratio of standard deviations of the individual populations to the pooled standard deviation.

Maximum Standard Deviation Ratio of Features by Label
The maximum of the standard deviation ratios of features as described above but grouped by label.

Minimum Standard Deviation Ratio of Features by Label
The minimum of the standard deviation ratios of features as described above but grouped by label.

Mean of the Standard Deviation Ratio of Features by Label
The mean of the standard deviation ratios of features as described above but grouped by label.

Variance of the Standard Deviation Ratio of Features by Label
The variance of the standard deviation ratios of features as described above but grouped by label. The variance of the mutual information of features.

Mean Mutual Information of Features Grouped by Label
The mean mutual information of features grouped by label.

Maximum Mutual Information of Features Grouped by Label
The maximum mutual information of features grouped by label.

Minimum Mutual Information of Features Grouped by Label
The minimum mutual information of features grouped by label.

Variance of the Mutual Information of Features Grouped by Label
The variance of the mutual information of features grouped by label.
Equivalent Number of Attributes Entropy of class divided by the mean mutual information of class and attributes.