Fair-CMNB: Advancing Fairness-Aware Stream Learning with Naïve Bayes and Multi-Objective Optimization

: Fairness-aware mining of data streams is a challenging concern in the contemporary domain of machine learning. Many stream learning algorithms are used to replace humans in critical decision-making processes, e.g., hiring staff, assessing credit risk, etc. This calls for handling massive amounts of incoming information with minimal response delay while ensuring fair and high-quality decisions. Although deep learning has achieved success in various domains, its computational complexity may hinder real-time processing, making traditional algorithms more suitable. In this context, we propose a novel adaptation of Naïve Bayes to mitigate discrimination embedded in the streams while maintaining high predictive performance through multi-objective optimization (MOO). Class imbalance is an inherent problem in discrimination-aware learning paradigms. To deal with class imbalance, we propose a dynamic instance weighting module that gives more importance to new instances and less importance to obsolete instances based on their membership in a minority or majority class. We have conducted experiments on a range of streaming and static datasets and concluded that our proposed methodology outperforms existing state-of-the-art (SoTA) fairness-aware methods in terms of both discrimination score and balanced accuracy.


Introduction
Enormous collections of continuously arriving data require efficient mining algorithms to render fair and high-quality predictions with minimum response delay.Many automated online decision-making systems have been proposed to supplement humans in several critical application areas subject to moral equivalence, such as credit risk assessment, online advertising, recruitment, and criminal recidivism assessment [1].These models have shown equivalent and in some cases better performance than humans.This argues for replacing human decisions with such models.However, such replacement has raised many challenging concerns regarding the fairness, transparency, and accountability of automated decision-making models [2].
Recent years have witnessed a number of state-of-the-art (SoTA) [3][4][5][6] methods aimed at mitigating discrimination typically under the assumption that data characteristics remain static.However, many real-world applications, e.g., fraud detection, e-commerce websites, and stock market platforms, rely on real-time data streams.The real-time data evolve in a streaming fashion, and the statistical dependencies within the data also change over time (concept drift) [7].Discriminatory outcomes have critical effects on current as well as future scenarios.For example, ref. [8] suggests that even second-generation immigrants in Europe face ethnic disadvantages in employment compared to equally qualified Europeans.Thus, we need to detect and offset discrimination cumulatively while considering the non-stationary nature of the data.Only a few SoTA methods tackle discrimination in streaming environments; however, they overlook the critical issue of class imbalance.Class imbalance is intrinsic to the fairness-aware learning paradigm and, if neglected, can mislead the assessment of a classifier's discrimination mitigation capability.The SoTA continual learning models focus only on optimizing the model's overall accuracy, which can lead to biased decision-making models.Consider a use case example where a financial institution uses machine learning algorithms to determine creditworthiness for issuing loans.In such a scenario, if the underlying data stream is imbalanced, i.e., one class dominates the other class, then a classifier which always predicts a positive outcome would yield a discrimination score (statistical parity, defined as the difference between mean positive outcomes for protected (e.g., female) and non-protected (e.g., male) groups) of 0. This indicates that a mere focus on accuracy could lead to a misguided impression of the discrimination mitigation capability of a classifier.Therefore, we have used balanced accuracy instead of accuracy to measure the predictive performance of the proposed model.
Deep learning algorithms have achieved significant success in various domains, including image and speech recognition, natural language processing, and many others.However, in fairness-aware stream learning, deep learning may not always be the best choice due to its high computational complexity [9].In contrast, traditional machine learning algorithms such as Naïve Bayes are often more efficient and require fewer computational resources than deep learning algorithms.This makes them more suitable for processing large volumes of data in real-time, which is essential for stream learning applications.Naïve Bayes requires fewer training data compared to deep learning models, which makes it appropriate for small datasets where the number of instances is limited [10].Additionally, Naïve Bayes can handle high-dimensional data well, where the number of input features is larger than the number of instances, while deep neural networks may suffer from the "curse of dimensionality".The results obtained by Naïve Bayes are more easily interpretable than those obtained by deep neural networks which can be seen as "black boxes" that are difficult to interpret.Furthermore, Naïve Bayes is less prone to overfitting than deep learning, which can be an advantage in streaming environments where data distributions may change over time and models need to adapt quickly.
In this work, we propose a novel adaptation of Naïve Bayes that deals with class imbalance and attenuates discrimination simultaneously utilizing multi-objective optimization (MOO) in a non-stationary environment.The key contributions of this research work are the following:

•
We challenge the deep learning dogma by presenting a novel adaptation of Naïve Bayes (Fair-CMNB: Fairness-and Class Imbalance-aware Mixed Naive Bayes) to address fairness concerns in streaming environments where computational efficiency, model interpretability, and active learning are important.

•
We mitigate discrimination as well as reverse discrimination (discrimination towards the privileged group) over the stream while simultaneously improving the predictive performance through multi-objective optimization.

•
Fair-CMNB is also capable of dynamically handling concept drifts and class imbalances.

•
Fair-CMNB is agnostic to the employed fairness notion (including the causal fairness notion FACE [11]).

Fairness-Aware Static Learning
The literature provides many approaches for detecting and then mitigating discrimination.For a detailed overview, please refer to [12].We can divide discrimination mitigation strategies into three basic categories: pre-processing, in-processing, and post-processing techniques.This division depends on whether they modify the input training data, adapt the algorithm itself, or manipulate outputs of the model to mitigate discrimination.

Pre-Processing Techniques
The origins of the data have a decisive influence on the outcomes of decision-making models.If the origin of the data is prejudiced, then the decision-making model trained with the biased data will also behave prejudicially.Massaging [4] is one of the most basic pre-processing techniques presented in the literature.It involves modification of class labels through minimal intrusive repercussions on the accuracy of the model.Reweighting [13] is another less intrusive pre-processing method presented in the literature to reduce discrimination.This method reduces discrimination by removing the dependence of model predictions on a sensitive attribute (an attribute or feature of an individual that is considered protected or private and which should be protected from discrimination or bias in various settings) by assigning weights to samples in training data.The weights are set based on the difference between the observed and expected probability of a sample (with a particular sensitive attribute) being correctly classified to a class.If the observed probability of a sample is lower than the expected probability, the sample is reweighted with a higher weight.Preferential sampling [14] is a special form of reweighting.It re-samples borderline objects with higher probability to minimize the adverse effect on predictive accuracy.The authors used a ranker to identify borderline objects in the training data.Data augmentation [2] is another potential method to deal with fairness concerns.However, even if the training data are purely unbiased, discrimination can still exist in the predictions because pre-processing techniques cannot handle the bias introduced by the algorithm itself [15].The authors of [16] proposed an adversarial learning-based instance reweighting method to achieve fairness.Similarly, an adaptive re-sampling-based discrimination mitigation method has been presented by ref. [17].

In-Processing Techniques
These techniques modify the classifier itself to obtain bias-free predictions.The authors of [5] presented a method to incorporate the condition of nondiscrimination into the objective function of their base model, i.e., a decision tree.The authors of [3] proposed a mixed-integer-programming-based framework to achieve fair decision trees for both classification and regression.Furthermore, ref. [6] provided a flexible convex-concave constraint-based framework for a fair margin-based logistic regression classifier.Another in-processing approach to achieve a fair neural network-based classifier is proposed by ref. [18].In this framework, the convex surrogates of constraints are included in the loss function of the neural network classifier through Lagrangian multipliers to achieve fairness.The literature also provides adaptive reweighting schemes to achieve fairness.For example, Adafair [19] is an Adaboost-based fairness-aware classifier designed to update instance weights in each boosting round, while considering the cumulative notion of fairness based on all members of the current ensemble.A constraint optimization-based method to enhance fairness has been proposed by ref. [20].

Post-Processing Techniques
These techniques alter the decisions of the classifier itself to diminish bias.For example, the authors of [5] have proposed a method which relabels certain leaves of a decision tree model to reduce discrimination while maintaining high predictive performance.The authors of [21] provided a method to alter the probabilities of a Naïve Bayes classifier to tackle discrimination.In ref. [22], the authors removed discrimination by processing the fair patterns with k-anonymity.Ref. [23] proposed a method to alter the decision boundaries of an Adaboost classifier to achieve fairness.Ref. [24] presented a relabeling method based on the Gaussian process that achieves fairness while maintaining high predictive accuracy.The authors of [25] proposed a method to use causal reasoning for mitigating discrimination.

Stream Classification
The main challenge in stream learning is to account for concept drifts, i.e., the model should adapt efficiently to the changing data patterns in the stream.The literature provides many batch learning methods for stream learning.For example, ref. [26] proposed a semi-supervised clustering method.Similarly, ref. [27] presented a probabilistic adaptive windowing method for stream classification.The authors claim that their method improves the traditional windowing method because it includes older samples along with the new ones to maintain information regarding the previous concept drifts.These traditional batch learning methods lack the ability to continuously update the model with the arrival of each new sample.
Online learning avoids the cost of data accumulation.Moreover, online learning algorithms have the ability to converge more quickly compared to batch learning algorithms.
Ref. [28] present an online boosting algorithm, i.e., OSBoost, for classification in nonstationary environments.This algorithm is an adaptation of the offline SmoothBoost.Another stream learning method is presented by ref. [29].This method is developed to deal with concept recurrence with clustering.Whenever a concept recurs, the most appropriate model is retrieved from the repository and used for further classification.Ref. [30] is another lossless learning classifier based on online multivariate Gaussian distribution (OVIG).An online version of a semi-supervised Support Vector Machine (SVM) is proposed by ref. [31] which classifies newly arriving data based on few labeled instances of the data.The authors of [32] proposed an ensemble learning approach named ElStream to detect concept drift in online streaming data.Similarly, ref. [33] proposed an ensemble classification method for heterogeneous stream data.

Fairness-Aware Stream Learning
This type of learning technique reduces discrimination in a streaming environment.A chunk-based pre-processing technique (massaging) is proposed by ref. [34] to mitigate discrimination.In this technique, the discrimination in each data chunk is removed and then it is fed to the online classifier.FAHT (Fairness-Aware Hoeffding Tree) [35] is another fairness-aware stream learning method, based on a decision tree, which is proposed to handle fairness in data streams.In this method, the notion of fairness is included in the attribute selection criteria for splitting the decision tree.The underlying decision tree grows by utilizing both information gain and fairness gain.FABBOO [1] provides a method to change the decision boundary of the decision trees to achieve fairness.Massaging (MS), FAHT, and FABBOO keep the role of protected group fixed over the stream.They lack the ability to handle reverse discrimination, i.e., discrimination towards the privileged group.Another data augmentation-based method has been proposed by ref. [36] for fairness-aware federated learning in a streaming environment.To address discrimination within streaming data, a method involving two swarms was proposed to incrementally build a classifier and reduce discrimination in the data [37].

Class Imbalance-Aware Stream Learning
Class imbalance is an inherent problem of model learning.If the learning algorithm does not tackle class imbalance appropriately, it mostly learns by simply ignoring the minority class instances [38].Ref. [39] presented a cost-sensitive online learning algorithm based on bagging/boosting techniques for imbalanced data streams.Class imbalance can also be handled by instance weighting as proposed by FABBOO [1].Data augmentation is another potential method for handling class imbalance.For example, ref.
[40] proposed a batch learning method, i.e., CSMOTE, to re-sample the minority class in a defined window of instances based on SMOTE.Similarly, ref. [41] proposed a SMOTE-based method for class imbalance-aware learning in a federated environment.

Preliminaries
The proposed model is designed for binary classification.Binary classification problems are addressed in this research because they represent fundamental challenges that are widely applicable across many domains.Furthermore, we assume that the streaming data have only one sensitive attribute with binary values, i.e., they can have two potential values (protected and non-protected).For example, in a loan approval scenario, a financial institution uses a machine learning model to automate decision making, with gender as the sensitive attribute (classifying female applicants as the protected group (S − ) and male as the non-protected group (S + )), to address historical gender biases.We have assumed that the sensitive attribute is binary as most of our competing baselines have provided solutions which include binary-sensitive attributes.Therefore, to provide a fair comparison, we have assumed the sensitive attribute to be binary.
The rest of this section delineates the key concepts central to the proposed framework.

Prequential Evaluation
We are dealing with streaming data in this work, so we need to update the model continuously.At every time point t, the instance x t (without label) is presented to the model for prediction, and later the label y t of instance x t is revealed to the model for training.This type of evaluation is called prequential evaluation [42] or test-then-train evaluation.
Prequential evaluation can be pessimistic at the start of the stream, as the false positives and false negatives encountered at the beginning of the stream affect the overall performance of the learner throughout the stream.This pessimism is challenging for the learner to train effectively.
Prequential evaluation with sliding windows is a technique that extends the basic prequential evaluation by considering only a subset of the most recent data instances for testing [43].This approach helps to address the issue of concept drift, where the underlying data distribution changes over time, by focusing on the most recent data.The main advantage of prequential evaluation with sliding windows is that it provides a more robust evaluation of the model's performance in non-stationary environments.Considering all the advantages of windowed prequential evaluation over basic prequential evaluation, we have adopted the windowed approach in this work.

Multi-Objective Optimization (MOO)
In the context of multi-objective optimization (MOO), the goal is to optimize a Kdimensional vector valued function f (x) = f 1 (x), . . ., f K (x) where X is a bounded set of inputs.The MOO paradigm does not seek a single optimal solution; instead, the goal is to discover a set of Pareto optimal solutions, such that an improvement in one objective will inevitably lead to a deterioration in another.The underlying goal is to maximize all the objectives.A solution f (x) dominates another solution f (x ′ ) denoted as f (x) . ., K and there exists at least one k where f k (x) > f k (x ′ ).The Pareto optimal set of solutions and corresponding inputs can be mathematically represented as Because the Pareto frontier consists of an infinite number of points, the objective is to find a finite approximation of this frontier.
In our proposed continual learning model, the main objectives of the MOO are (i) discrimination mitigation and (ii) enhanced predictive performance.

Fairness Notions
Statistical Group Fairness notion: There are many definitions of fairness in the literature [44], but no clear criteria have been presented for choosing a particular fairness definition.In this research, we use the notion of statistical parity [44] to assess the discriminating behavior of the proposed model.This notion ensures that each individual has an equal chance of being assigned to the positive class (y + ), regardless of its membership in the protected S + or non-protected group S − as shown in Equation ( 1).Statistical parity does not take into account the true label of the subject and thus may lead to reverse discrimination, i.e., discrimination towards the privileged group.In our proposed model, we also address this reverse discrimination problem; the details can be found in Section 4.4.

St. Parity
The discriminatory models have very long-lasting consequences, affecting not only current outcomes but also future outcomes.Short-term discrimination detection methods fail to ameliorate discrimination over time because discrimination scores that are minor at a single time point may aggregate into considerable prejudice in the long run.Thus, in contrast to the short-term discriminatory measures applied by SoTA stream learning methods, it is necessary to consider discriminatory outcomes cumulatively.We use the notion of cumulative statistical parity proposed by [1] to detect and measure discrimination over the stream.Equation ( 2) illustrates the notion of cumulative statistical parity.
The cumulative statistical parity is updated after the arrival of each new instance in the stream.'γ' is the adjustment factor which is used to adjust the discrimination score at the beginning of the stream to avoid division by zero.St.Parity = 1, −1 indicates complete unfairness, whereas St.Parity = 0 signifies a perfectly fair classifier.
Causal group fairness notion: Despite the simplicity and popularity of statistical fairness methods, they might overcorrect, struggle with paradox resolution, and be vulnerable to shifts in data distributions [45].On the other hand, causal fairness considers underlying causal structures, decoupling predictions from sensitive attributes and providing a deeper insight into data biases.We have utilized the causal group fairness notion average treatment effect (ATE/FACE) [11] to gauge the discrimination embedded in the predictions of the proposed framework as presented in Equation ( 3).We modified FACE to consider predicted outcomes.
Here, Y s + pot and Y s + pred represent the potential and predicted outcomes when S = s + .FACE quantifies the difference in the true positive outcomes (observed and potential) between the protected (treated) and non-protected groups (non-treated).FACE = 1, −1 indicates complete unfairness, whereas FACE = 0 signifies a perfectly fair classifier.We modify the definition in Equation ( 3) to take into account the cumulative discrimination as:

Potential Outcomes
To incorporate causal fairness, we calculate potential outcomes using a matching technique.The objective is to compute the potential outcomes by finding the matched neighbors from the opposite group.For instance, in loan approval, the counterfactual outcome for a female x k as if she were a male is based on similar males' observed outcomes.To determine similarity between individuals x j and x k , we use Propensity Score Matching (PSM) [45].PSM is aimed at estimating the effect of a treatment by accounting for the covariates that predict receiving the treatment.The propensity score, e(x k ), is the probability of receiving the treatment given observed covariates (the sensitive attributes (e.g., "gender") are unlikely to be influenced by any covariates).For the loan approval example, S k = 1 denotes the individual k who received the treatment, i.e., the individual is female and S k = 0 otherwise.Propensity score of x k derived from observed covariates C k is: The similarity between individuals x j and x k is determined through their propensity score difference.The logit version of this difference helps in reducing bias [46]: We match treated (protected) and control (non-protected) individuals using nearest neighbor matching with replacement, based on the aforementioned similarity metric.

Proposed Model
An illustration of the proposed model is shown in Figure 1.In this study, we are using prequential evaluation with sliding windows; therefore, as soon as a new instance x t arrives, it is tested using the proposed model (A).After testing, the instance x t with its true class label y t is fed to the discrimination detector (B) and online class imbalance monitor OCIM (C).The OCIM monitors the ratios of positive and negative classes throughout the stream and feeds the respective class ratios to the instance weighting module (D).The instance weighting module adjusts the instance weight (w i ) in accordance with the respective class ratio to ensure class imbalance-aware learning of the proposed model.The class ratios obtained from the OCIM are also used to keep track of the concept drifts using a concept drift detector (E) and to handle concept recurrence.The instance x t , its true label y t , and the respective weight w i are used to train online nominal Naïve Bayes (F) and online Gaussian Naïve Bayes (G).The discrimination detector monitors the discrimination over the stream using the employed fairness notion (cumulative statistical parity or cumulative FACE) and triggers the MOO-based discrimination mitigation module (H) if the cumulative discrimination value exceeds a user-defined threshold ε.The value of ε depends on the fairness budget allowed by the user, i.e., how much discrimination in predictions is acceptable to the user.We set this value to 0.00001, which means that we limit our learner to keep the discrimination score in the range [−0.001%, 0.001%].
Further details about these modules are provided in the following subsections.

Mixed Naïve Bayes
In this work, we tailor the Naïve Bayes algorithm to process streaming data for which we do not have access to historical data.By default, Naïve Bayes is designed only for nominal data.However, in real life, datasets are usually a combination of nominal and continuous attributes.To accommodate continuous and nominal attributes, we propose Mixed Naïve Bayes (MNB), a combination of online nominal Naïve Bayes and online Gaussian Naïve Bayes.For each new instance, continuous attribute values are sent to online Gaussian Naïve Bayes and nominal attribute values are passed to online nominal Naïve Bayes.Online nominal Naïve Bayes and online Gaussian Naïve Bayes update independently.The following sections illustrate the algorithmic details of these two models.

Online Nominal Naïve Bayes
The proposed model is designed for binary classification only.Online nominal Naïve Bayes maintains a summary for each class that contains the count of unique values of each nominal attribute.Whenever a new instance arrives, the summary is updated for the class to which the instance belongs.Since we are using prequential evaluation, the online nominal Naïve Bayes model computes the posterior probabilities of each class with the arrival of each new instance using Equation (7) before updating the summaries.

Online Gaussian Naïve Bayes
Online Gaussian Naïve Bayes maintains the running mean and variance of each continuous attribute.For this purpose, we use Welford's online algorithm [47].The running mean of each attribute is computed using Equation (8).Here, ān is the current mean of the attribute, n is the number of instances, ān−1 is the previous mean, and a n is the current value of the attribute.To calculate the variance, we need to calculate an intermediate term M 2,n as shown in Equation ( 9).Once we have M 2,n , we can determine the running variance by Equation (10).With the arrival of every new instance in the stream, the online Gaussian Naïve Bayes updates each continuous attribute's running mean and variance.The summaries of continuous attributes contain the running mean and variance of the respective attribute.
As we are using prequential evaluation, the online Gaussian Naïve Bayes model computes the posterior probability of each class using Equation ( 10) before updating the running mean and variance.The only difference is in computing the likelihood of each attribute a i , which is calculated using the following equation: In the next sections, we describe the details of the modules we propose to handle class imbalance and discrimination in data streams.

Module for Monitoring and Handling Class Imbalance
We use a class imbalance monitoring component that tracks the proportion of each class in the stream.The roles of majority and minority classes may swap as the stream evolves, i.e., a class that is in the minority at the current time may turn out to be the majority at a later time.We track the state of disequilibrium using the Online Class Imbalance Monitor (OCIM) [48] as shown in Equation (12).In this equation, CP + t is the percentage of positive class at time t and CP − t is the percentage of negative class at time t.After the arrival of each new record, OCIM updates the percentage CP t of the respective class using Equation (13).
The state of imbalance needs to be changed based on the most recent examples from the stream, and the impact of previous examples needs to be reduced.Therefore, we include a temporal decay factor (0 < α < 1) to quickly capture the change in disequilibrium.This decay factor limits the impact of historical data; therefore, CP y t is adjusted based on the most recent records.α = 0 means that the historical data do not influence the CP y t at all, and if we keep α = 1, then we include the complete effect of historical data on CP t .I[y, y t ] is the identity function that returns the value '1' if the predicted label (y t ) and the true label (y) are the same; otherwise, it returns the value '0'.
Once we have the class percentages (i.e., CP + t , CP − t ), we can use them to find an appropriate weight for each new instance of the data stream.Algorithm 1 presents the complete methodology for computing the instance weights.CW + and CW − are the class weights of the positive and the negative class, respectively.We compute CW + and CW − using the class weights library of Sklearn (https://scikit-learn.org,accessed on 22 September 2023).This weighting procedure assigns higher weights to minority class instances than majority class instances.The resulting weight distribution makes the minority class (positive class) more prominent during the training of the learner.

Algorithm 1 Computing instance weights
Require: true class labels y, positive class weight CW + , negative class weight CW − , OCI M t 1: Initialize: current instance's weight w i = 1; 2: if y == negative label and OCI M t > 0 then 3: 4: if y == positive label and OCI M t < 0 then 5:

Module for Handling Concept Recurrence
As shown in Figure 1, we use a concept drift detector proposed by the Page-Hinkley [49] explicit drift detection method.Our concept drift detection method monitors the OCIM parameter.This method of drift detection works by comparing the current OCIM to OCI M_mean t .OCI M_mean t is the mean value of the OCIM computed for a window of instances up to the current time as illustrated in Equation ( 14).We chose a window of 1000 instances to compute OCI M_mean t .In general, concept drift is detected when the observed OCI M t is above the mean OCI M_mean t by a specified threshold η at a given point in time.Through grid search, we chose the value of η as 0.02.This value of η gave us the best discrimination score and predictive performance.With η = 0, concept drift is detected when the mean class imbalance exceeds 0%.Furthermore, η = 0.02 allows the mean class imbalance to be in the range [−2%, 2%].
Concept recurrence is a special case of concept drift where the concepts which have already been seen in the past reappear in the evolving stream.As soon as concept drift is detected, the MNB stores the summaries of next instances as a separate model.In the future, when a similar concept reoccurs (similar concept drift recurs), then MNB retrieves the corresponding model and uses it for further prequential evaluation.

Online Discrimination Detection and Mitigation
We need to handle discrimination embedded in data streams.As the streams progress, the discriminated groups and the preferred groups do not remain the same.The group that was once discriminated against may turn out to be a preferred group later.Therefore, we need to develop a method that efficiently deals with this concept deviation.Also, we need to maintain the methodology that we developed to deal with the class imbalance problem.
Algorithm 2 illustrates our online discrimination mitigation procedure.To eliminate the discrimination, we change the probability distributions of the protected group P(S − | class) and the non-protected group P(S + | class) after the arrival of each new example in the data stream.If the discrimination value is greater than a certain threshold ε, we add a factor (λ) of the number of samples belonging to the negative class with protected value N(C − , S − ) to the number of samples belonging to the positive class with protected value N(C + , S − ) (Algorithm 2: line 2).To avoid unnecessary data augmentation, we also subtract the same factor (λ) from the number of samples belonging to the negative class with protected value N(C − , S − ) (Algorithm 2: lines 3).
Similarly, we add a factor (λ) of the number of samples belonging to the positive class with non-protected value N(C + , S + ) to the number of samples belonging to the negative class with non-protected value N(C − , S + ) (Algorithm 2: lines 4).We also subtract the same factor (λ) from the number of samples belonging to the negative class with the protected value N(C + , S + ) (Algorithm 2: lines 5).λ is actively tuned through MOO; the details can be found in the next section.
Since we want to deal with concept deviations in the evolving data streams, we also consider negative discrimination, i.e., when the learner starts discriminating against the samples with non-protected value.To remove the negative discrimination, we use the same method as described above, except that now we swap the roles of protected and non-protected groups (Algorithm 2: lines 6 to 10).

Algorithm 2 Online discrimination mitigation procedure.
Require: Summaries of the number of samples belonging to the positive class with protected value N(C + , S − ); the number of samples belonging to the positive class with non-protected value N(C + , S + ); the number of samples belonging to the negative class with protected value N(C − , S − ); the number of samples belonging to the negative class with non-protected value N(C − , S + ); discrimination score disc.Ensure: The overall number of samples does not change.

Adaptive Hyperparameter Tuning through MOO
In our research, we employ the Non-dominated Sorting Genetic Algorithm II (NSGA-II) [50] as a multi-objective optimization (MOO) method, to actively tune the hyperparameter λ during windowed prequential evaluation of streaming data to simultaneously optimize our multiple objectives, i.e., balanced accuracy and fairness.This MOO-based method assists in selecting a λ that reduces discrimination as well as not only retaining but also enhancing the benefits obtained by the class imbalance handling module, i.e., the high predictive performance.For every window of n instances, the MOO procedure, outlined in Algorithm 3, is invoked to optimize λ based on a trade-off between balanced accuracy and discrimination score.In each invocation, a population of M values of λ is initialized (Algorithm 3: line 1).The parent and child populations of λ are merged to form λ g h (Algorithm 3: line 3).Each lambda in this merged population is then used to test (windowed prequential evaluation) Fair-CMNB and the corresponding solutions (pairs of balanced accuracy and discrimination scores) are found (Algorithm 3: line 4).The Pareto front represents the set of optimal trade-offs between the two objectives, i.e., balanced accuracy and discrimination score; each point on the Pareto front ([B.Acc., disc_score]) signifies a unique balance between the balanced accuracy and discrimination score.Some points may have a high discrimination score but lower balanced accuracy, and others may have high balanced accuracy but a lower discrimination score.Fast non-dominated sorting is then applied to sort the Pareto fronts P (Algorithm 3: line 5).The next generation's parent population (λ g+1 p ) is incrementally populated by including individuals from the sorted fronts, up to a size limit of M. Crowding distance is calculated within each front (P j ) to preserve diversity among the solutions (Algorithm 3: lines 7 to 11).The newly found Pareto fronts are sorted based on their dominance to determine their inclusion priority in the subsequent generation (Algorithm 3: line 12).Only the first (M − |λ ) is formed using selection, crossover, and mutation operations on the newly formed parent population (Algorithm 3: line 14).After completing a generation, the generation counter g is incremented.This optimization process stops if we reach the maximum number of generations Z or the trade-off value (Equation (15)) is not improving over a fixed number of previous generations; the algorithm sorts the final child population by the trade-off criterion to select the optimal λ value (λ best ) (Algorithm 3: lines 15 to 16).This trade-off measure is inspired by F-score, where µ is a hyperparameter that makes the discrimination score (disc_score) µ times more important than the balanced accuracy (B.Acc.).If we keep the value of µ equal to 1, then the trade-off becomes the harmonic mean between B.Acc. and 1 − abs(disc_score).The selected λ best is then used for the windowed prequential evaluation of the subsequent t instances, after which the MOO procedure is invoked again.sort_by_dominance(P j ) ▷ sort P j in descending order according to dominance

Overall Computational Complexity
Assuming N total data points and hyperparameter tuning every T time steps: • Online Naïve Bayes: Update and Prediction complexity is O(Ndc).The overall complexity is dominated by the most expensive operation among these, typically the hyperparameter tuning cost O(pE), if significant.

Evaluation Setup 6.1. Benchmark Baselines
We compare the proposed methodology against five baseline models including the class imbalance-aware CSMOTE [40], non-stationary OSBoost [28], fairness-agnostic massaging (MS) [34], fairness-aware FAHT [35], and class imbalance-and discrimination-aware FABBOO [1].All the baselines are trained using the same hyperparameters as given in the respective research articles.We also evaluate different variants of MNB to stress the effectiveness of different modules of the proposed model.

1.
CSMOTE [40]: This baseline is not fairness-aware, but it is designed to handle class imbalance in a non-stationary environment by re-sampling the minority class in a defined window of instances.

2.
OSBoost [28]: This is a classification model for data streams.It is not capable of handling either class imbalance or discrimination.

3.
Massaging (MS) [34]: This is a fairness-aware learning method.It is a chunk-based technique which handles discrimination in the current chunk by swapping labels.But it does not account for cumulative effects of discrimination; it is designed to handle discrimination only on a short-term basis, i.e., for the current chunk.We use the default chunk size for training this baseline, i.e., 1000, as proposed by [34].This method cannot handle class imbalance.

4.
Fairness-Aware Hoeffding Tree (FAHT) [35]: This method is an adaptation of Hoeffding tree that is designed to deal with discrimination.It incorporates the fairness gain along with the information gain into the partitioning criteria of the decision tree.This model is not able to deal with class imbalance and concept drifts.

5.
FABBOO [1]: This is an online boosting approach that handles class imbalance by monitoring class ratios in an online fashion.It employs boundary adjustment methods to handle discrimination.6.
MNB (Mixed Naïve Bayes): This is a combination of online nominal Naïve Bayes and online Gaussian Naïve Bayes.It considers no notion of fairness and class imbalance while performing classification tasks.7.
Fair-CMNB (Discrimination-and Class Imbalance-Aware Mixed Naïve Bayes): This is a variant of MNB which mitigates discrimination (utilizing MOO) as well as handles class imbalance and concept drifts in the evolving stream.

Benchmark Datasets
The details of the datasets used to test the efficiency of the proposed model are shown in Table 1.The datasets have different characteristics related to the number of attributes (#Att.),number of instances (#Inst.),sensitive attribute (Sens.Att.), and class ratio (positive to negative).We are using static datasets along with the streaming datasets.Despite the growing interest in AI models that focus on fairness, there is still a lack of large streaming datasets in this domain.Therefore, we use static datasets along with streaming datasets to prove the effectiveness of our proposed model.Since we are unaware of the temporal characteristics of the static datasets, we report the evaluation metrics on the average of 10 random shuffles of each static dataset that passes through the model.

Evaluation Metrics
We use recall, balanced accuracy, gmean, cumulative statistical parity (St.Parity), and cumulative FACE to measure the predictive and fairness performance of the proposed framework and competing baselines.The mathematical representation of recall, balanced accuracy, and gmean are illustrated in Equations ( 16), (18), and (19).The details of statistical parity and FACE have already been explained in Section 3.3.

Results and Discussion
The proposed models are trained and tested following the prequential evaluation with sliding windows (with a window size of 1000 instances) strategy, i.e., test first, then train.We tune the hyperparameters α and ϵ by grid search.To obtain the best results for all datasets, we choose values of 0.9 and 0.00001 for α and ϵ, respectively.As mentioned earlier, the non-streaming datasets lack temporal features; therefore, we use ten random shuffles of each static dataset and present the average of their results.All the baselines are also evaluated using the prequential evaluation with sliding windows method.

Comparison with Baselines
Table 2 presents the measures of fairness and predictive performance obtained by the proposed model (Fair-CMNB) and the competing baselines for a set of streaming datasets.Similarly, the evaluation measures obtained on the average of 10 random shuffles of each static dataset by Fair-CMNB and the baselines are shown in Table 3. From Tables 2 and 3, we can observe that we always achieve the best discrimination score (St.Parity) as compared to all the baselines.Compared to SoTA methods, Fair-CMNB achieves the best balanced accuracy for the Adult Census, KDD, Default, Law School, NYPD, and Loan datasets.Although CSMOTE is a class imbalance-aware baseline, Fair-CMNB outperforms it in terms of balanced accuracy for all datasets except the Bank Marketing dataset.For the Bank Marketing dataset, CSMOTE (baselines model without fairness interventions) reports the best balanced accuracy but Fair-CMNB follows it with a close margin of 1.15%.However, the difference between the discrimination score achieved by CSMOTE and the proposed model for the Marketing dataset is substantial, i.e., 7.373%.
There is a noticeable disparity between the recall and balanced accuracy values obtained by fairness-aware baselines: MS, FAHT, and FABBOO.This suggests that these baselines attempt to alleviate discrimination by significantly sacrificing either the true positive rate or the true negative rate.In contrast, for most datasets, Fair-CMNB delivers recall and balanced accuracy values that are closely aligned.
Our research question is closely related to that of FABBOO.The predictive performance and discrimination scores achieved by Fair-CMNB are much better than those of FABBOO for both streaming and static datasets.We can observe that Fair-CMNB achieved 3.32%, 2.14%, 6.56%, 3.23%, 12.06%, 17.1%, 5.9%, and 10.7% higher balanced accuracy values for the Adult Census, KDD, Compas, Default, Law School, NYPD, Bank Marketing, and Loan datasets as compared to those achieved by FABBOO while maintaining low discrimination scores.A similar trend can be observed for gmean.FABBOO has the capability to reduce discrimination score to a suitable value while maintaining balanced accuracy but it is not able to handle negative discrimination.Also, FABBOO reports a significant difference in recall and balanced accuracy which indicates that it is achieving a low discrimination score at the cost of ignoring either the minority or the majority class.For example, for the Default dataset we observe a difference of 22.95% between recall and balanced accuracy reported by FABBOO.However, Fair-CMNB reports only a difference of 7.4% between recall and balanced accuracy.This proves that FABBOO struggles to handle class imbalance while mitigating discrimination.Similar behavior can be observed for other imbalanced as well as balanced datasets, i.e., Adult Census, Law School, Bank Marketing, NYPD, and Loan.
Figure 2 presents a comparison of the balanced accuracy and statistical parity values attained by Fair-CMNB and FABBOO for the Bank Marketing, Law School, and Default datasets.From this figure, it is evident that while both Fair-CMNB and FABBOO achieve similar statistical parity scores, Fair-CMNB consistently outperforms FABBOO in terms of balanced accuracy throughout the stream for all datasets.

Scalability
Fair-CMNB adapts well to large data volumes.Law School is the smallest dataset with approximately 18,000 instances, while KDD and NYPD are much larger, with each containing around 300,000 instances.As is evident from Tables 2 and 3, Fair-CMNB's performance remains consistent across both small (Law School) and large (KDD, NYPD) datasets.This demonstrates Fair-CMNB's efficient scalability with increasing data volume.

Agnosticism to Fairness Notions
We are using windowed prequential evaluation; therefore, we have access to the most recent window of instances.We generate the potential outcomes for this window of instances using the method mentioned in Section 3.4 and determine the FACE value.The predictive and fairness performance measures obtained by Fair-CMNB under causal fairness notion are presented in Table 4.The results indicate that Fair-CMNB consistently achieves high balanced accuracy alongside remarkably low FACE values across all datasets.This underscores Fair-CMNB's agnosticism to the specific fairness notion in use.

Impact Assessment of Naïve Bayes Modules
In this section, we compare the predictive and fairness performance of MNB (without fairness interventions) and Fair-CMNB (with fairness interventions) as shown in Table 5.

Hyperparameter Sensitivity
The most important hyperparameter in reducing discrimination is λ from Algorithm 2. We examined the effect of changing λ on the ability of our proposed model to reduce discrimination, as shown in Figure 3.We used the Adult Census dataset as a reference for this analysis.As can be seen in Figure 3a, when the value of λ is 0.01, the discrimination value immediately drops to zero, indicating that this value is too large.With this value of λ, we achieve a balanced accuracy of 75.13%.If we decrease λ to a value of 0.001, the discrimination score decreases to a smaller and stable value after about 20,000 instances, as shown in Figure 3b.The balanced accuracy is also not much affected with a value of 78.61%.If we further decrease the value of λ to 0.0001, the discrimination score does not reach a stable value until the end of the stream, although it decreases as shown in Figure 3c.This value of λ leads to a balanced accuracy of 79.93%.As shown in Figure 3d, if we leave λ at 0.00001, the discrimination score does not decrease throughout the data stream, and the achieved balanced accuracy is 80.37%.Therefore, we chose the value 0.001 for λ, which provides a good trade-off between the balanced accuracy and the attenuation of the discrimination score.

Conclusions
The central prerequisite of a just and sustainable world is to ensure gender equality and realize the human rights, nobility, and competence of diverse groups of society.Deep learning, although successful in many domains, may not always be optimal for fairness-aware stream learning where computational efficiency and model interpretability are major concerns.Therefore, we propose a multi-objective optimization (MOO)-based discrimination-and class imbalance-aware online learning framework to achieve parity between favored and prejudiced groups of subjects.
We present a novel adaptation of Naïve Bayes for mining data streams with embedded discrimination and class imbalance.We have demonstrated the effectiveness of our methodology by conducting experiments on a range of static and streaming datasets.Our approach mitigates both discrimination and reverse discrimination by modifying the data distribution based on a cumulative fairness notion through an MOO method.Our approach outperforms existing SoTA methods in terms of both balanced accuracy and discrimination score.We have shown that our approach effectively learns both majority and minority classes and achieves a low discrimination score while maintaining high predictive performance.We have also shown the adaptability of Fair-CMNB to different fairness notions (including the causal fairness notion FACE).To the best of our knowledge, this is the first attempt where a causal fairness notion is used to assess the discriminating behavior of a framework in online settings.
In the future, we aim to thoroughly investigate the forgetting phenomena of the class imbalance handling module to make it adaptable to the nature of concept drift in the data.We also plan to analyze the theoretical aspects of our approach.

g+1p
|) elements of the final (P j ) are added to the parent population (Algorithm 3: line 13) to keep the size of the population intact, i.e., M. The child population for the next generation (λ g+1 c

5. 2 .
NSGA-II for Hyperparameter Tuning • Population Initialization: Time complexity for initial population setup with p individuals is O(p).• Fitness Evaluation: For p individuals, with E as the evaluation time, the complexity is O(pE).• Non-dominated Sorting and Selection: The sorting process complexity is O(p 2 ).• Genetic Operators: The complexity of crossover and mutation operations is O(p).

5. 3 .
Page-Hinkley for Concept Drift Detection • Drift Detection: The complexity for each incoming data point is O(1).

Figure 2 .
Figure 2. Comparison between balanced accuracy B.Acc. and St.Parity values achieved by Fair-CMNB and FABBOO for Bank Marketing, Law School, and Default datasets.Notably, Fair-CMNB consistently outperforms FABBOO in terms of B.Acc. throughout the stream for all datasets while maintaining very low St.Parity.

Figure 3 .
Impact of varying λ on discrimination score (statistical parity) for Adult dataset.

7. 6 .
Deep Learning vs. Naïve BayesDeep neural networks (DNNs) can be computationally intensive due to their inherent structure and the iterative nature of their training process.On the other hand, Naïve Bayes, being based on straightforward probabilistic computations, is generally faster and more scalable.We evaluated the runtime of a four-layer online DNN, as proposed by[56], on the Law School dataset with windowed prequential evaluation.Our findings indicate that MNB (without fairness and class imbalance interventions) finished training in 130.624 s, while Fair-CMNB (with fairness interventions) took 360.183s.Meanwhile, the DNN (without fairness and class imbalance interventions) required approximately 627.933 s to complete its training throughout the entire data stream.All tests were conducted on an Intel Core i7 CPU equipped with 64 GB RAM.

Table 1 .
Description of datasets.

Table 2 .
Comparison of fairness and predictive performance achieved by Fair-CMNB and the competing baselines for streaming datasets with windowed prequential evaluation; best and secondbest results are in bold and underline, respectively.

Table 3 .
Comparison of fairness and predictive performance achieved by Fair-CMNB and the competing baselines for static datasets with windowed prequential evaluation; best and second-best results are in bold and underline, respectively.

Table 4 .
Fairness and predictive performance of Fair-CMNB under causal fairness notion.Fair-CMNB achieves high predictive performance along with very low FACE values, demonstrating its adaptability to different fairness notions.

Table 5 .
Comparison of fairness and predictive performance among MNB variants.Results show that Fair-CMNB effectively reduces discrimination while simultaneously enhancing predictive accuracy.