1. Introduction
Artificial intelligence systems may exhibit bias stemming from data (data bias), and algorithmic design choices can expedite erroneous decisions during training (algorithm bias) [
1,
2]. Generally, neural networks often exploit spurious correlations in training data as shortcuts to make predictions [
1,
3,
4]. This leads to suboptimal performance on examples where the learned shortcuts do not apply [
5,
6,
7]. This performance gap is observed across various applications like medical imaging [
8,
9,
10], and facial recognition [
5,
11]. Recent methods have sought to mitigate unintended biases in AI systems through interventions before (pre-processing), during (in-processing), or after (post-processing) training [
12]. In-processing approaches directly target algorithmic design to alleviate biases by adjusting sample importance [
7,
13,
14], employing adversarial learning [
15,
16], or incorporating invariant learning [
3,
17]. While these methods effectively address the problem, they rely on having access to diverse environments (also known as domains) or prior knowledge of protected groups. Unfortunately, obtaining such information is usually infeasible due to expensive annotations, challenges in effectively grouping datasets, and privacy and ethical constraints [
18]. An approximation becomes necessary when the system does not have direct access to diverse environments or protected groups.
Our goal is to strategically partition a training dataset and estimate distinct environments (domains) to facilitate the use of invariant learning algorithms for bias removal. Invariant learning methods learn an invariant predictor that remains robust across environments [
3,
5,
17], making them more effective compared to other debiasing approaches [
3]. Similar efforts, such as EIIL [
6], also explore environment estimation for invariant learning. However, these approaches heavily rely on the assumption that biased samples are easily identified through Empirical Risk Minimization (ERM) pre-training. Real-world scenarios, on the other hand, challenge this assumption, as the ERM approach might learn a combination of biased and causal features. Our paper acknowledges that shortcuts are learned more easily due to their simplicity, which offers an opportunity for effectively partitioning the samples. To validate our intuitions, we conducted an experiment on the Colored MNIST (CMNIST) dataset [
3] with the target attribute
y representing “the digit smaller than five or not” and the protected attribute
a representing digit color. The target label exhibits a strong correlation with digit color (with a probability of 90%). Hence, the model can easily use color as a spurious shortcut to make predictions during training. As shown in
Figure 1, training an ERM classifier revealed that the loss function rapidly decreases for biased samples, while for bias-conflicting samples, it first increases and then decreases when the model starts to overfit on all training samples. Two key observations emerged. First, bias is learned faster from early epochs, suggesting a profitable opportunity for a partitioning strategy. Second, given enough training, the ERM model can overfit even on bias-conflicting samples, confirming the limitations of naïve ERM-based approaches to separating biased samples. We propose to intentionally promote the features that are learned during the early epochs of training using the Generalized Cross-Entropy (GCE) loss function [
4]. This reinforcement is followed by partitioning training samples into two environments based on model performance. The discovered environments can then be used to train invariant learning algorithms. Despite its simplicity compared to more complex baselines, FEED effectively identifies environments with a high group sufficiency gap. Our contributions can be summarized as follows:
We present a novel environment discovery approach using the Generalized Cross-Entropy (GCE) loss function, ensuring the reference classifier leverages spurious correlations. Subsequently, we partition the dataset into two distinct environments based on the performance of the reference classifier and employ invariant learning algorithms to remove biases.
We study the environments in invariant learning from the perspective of the “Environment Invariance Constraint” (EIC), which forms the foundation for FEED.
We introduce the Square-MNIST dataset to evaluate the ability of our model in more challenging scenarios where the true causal features (strokes) and spurious features (squares) closely resemble each other. Our evaluation demonstrates the superior performance of FEED compared to other environment discovery approaches.
Figure 1.
Training dynamics for CMNIST benchmark. For bias-aligned samples, the label y can be easily predicted based on the spurious associations, however, for other samples, this spurious correlation does not apply. While the loss for bias-aligned samples decreases quickly, for other samples the loss increases at early epochs.
Figure 1.
Training dynamics for CMNIST benchmark. For bias-aligned samples, the label y can be easily predicted based on the spurious associations, however, for other samples, this spurious correlation does not apply. While the loss for bias-aligned samples decreases quickly, for other samples the loss increases at early epochs.
2. Related Works
Bias Removal without Environment Labels. Since obtaining environments or group annotations can be costly or infeasible, various methods have been proposed to remove biases by exploiting the mistakes of an ERM model (also known as reference model). One line of work utilizes these mistakes to reweigh the data for training the primary model [
7,
13,
19,
20,
21,
22]. For example, [
7] up-weight the error samples from the reference model or [
13] determine importance weights based on the relative cross-entropy losses of the reference and primary models. These methods, however, differ from ours because instead of training a classifier with curated importance weights, we trained an invariant predictor. Another line of work leverages the mistakes to apply an invariant learning algorithm [
6,
23,
24]. Refs. [
23,
24] both train a GroupDRO model by inferring subclasses from the representations learned by the reference model. The most closely related work to our paper is EIIL [
6], which infers the environments for invariant learning by maximizing the regularization term of IRM. The main drawback of the above-mentioned methods is the assumption that the ERM model always learns the shortcut. This is the case in benchmarks like CMNIST, which are specifically created to frustrate ERM [
25]. However, we show that these methods fail miserably on simpler tasks that do not follow the assumption. Another group of works trains a separate network to find either sample weights or environment assignment probability. Ref. [
26], for instance, extends DRO using an auxiliary model to compute the importance weights. However, rather than training an online fair model for accurate predictions within a given distribution, we aim to find data partitions that allow us to employ invariant learning techniques to address distribution shifts [
6]. ZIN [
27] also uses an auxiliary network to learn a partition function based on IRM. This structure cannot be generalized to provide environments for other robust algorithms. Ref. [
28] also proposes a framework to partition the data. However, their method is limited to the case where the input can be decomposed into invariant and variant features. Other works create domains for adversarial training [
29], but we focus on invariant learning due to the limitations of adversarial methods.
Invariant Learning. Recent studies have addressed biases by learning invariances in training data. Motivated by casual discovery, IRM [
3] and its variants [
25,
30,
31,
32,
33] learn a representation such that the optimal classifier built on top is the same for all training environments. LISA [
34] also learns invariant predictors via selective mix-up augmentation across different environments. Other methods like Fish [
35], IGA [
36], and Fishr [
37] introduce gradient alignment constraints across training environments. Another large class of methods for generalizing beyond training data is distributionally robust optimization (DRO) [
5,
38,
39,
40]. REx [
17] and GroupDRO [
5] are notable instances of DRO methods, aiming to find a solution that performs equally well across all environments. The success of the above-mentioned methods depends on environment partitions or group annotations. However, these annotations are often unavailable or expensive in practice. Beyond the methods discussed above, adversarial training is another popular approach for learning invariant or conditionally invariant representations [
15,
16,
29,
41,
42]. However, the performance of adversarial training degrades in settings where distribution shift affects the marginal distribution of labels [
3,
42]. Due to these limitations, recent works have focused on learning invariant predictors.
3. Frustratingly Easy Environment Discovery
In this section, we present our frustratingly easy framework (FEED) for partitioning a dataset into environments (domains) tailored for invariant learning. Our approach does not require prior knowledge of environment assignments or protected groups. Instead, we assume that the training dataset is affected by a shortcut that might be learned by the model to accurately predict outcomes for the majority of samples [
3,
5,
6,
17]. This shortcut, however, does not apply to the remaining samples, which may be either bias-conflicting or bias-irrelevant. Formally, we consider a dataset
, where
be observational data from multiple training environments
. In each environment, data are generated from the same input and label spaces
according to some distribution. The environments differ in how labels are spuriously correlated with the spurious attribute
. In an invariant learning problem, the goal is to find a predictor function
that generalizes well across all possible environments in
. However, the required environment assignments are not always available. In this paper, we aim to create useful environments to remove shortcuts and enhance generalization. After discovering the environments, we evaluate their efficacy by measuring the sufficiency gap [
6] and their practical utility in mitigating biases using invariant learning.
We begin by defining the Environment Invariance Constraint (EIC) [
6]. EIC is an important condition that invariant predictors must satisfy. Assume
is a representation space.
denotes the parameterized mapping or model that we optimize. We refer to
as the representation of sample
x. Invariant models learn a representation
that performs simultaneously optimal for all environments; i.e., it has stable relationships with
y across environments. In addition, for regular loss functions like cross-entropy and mean squared error, optimal classifiers can be expressed as conditional expectations of the output variable. Therefore, the data representation function
must satisfy the Environment Invariance Constraint (also known as Invariance Property), defined as:
This means that invariant models learn a set of features such that the conditional distribution of outcomes given the predictor is invariant across all training environments. Our goal was to partition a training dataset into environments that could promote effective invariant learning by maximally satisfying the EIC. In other words, we sought environments so that the invariant learning method could not satisfy the EIC unless it learned invariant associations and ignored shortcuts.
Following [
36], we defined the invariant set as
. Similarly, given training environments, we can define
.
is the set of features that are invariant for all possible unseen environments
. However, using the training environments
, we can only learn
. The learned predictor is only invariant to such limited environments, and it is not guaranteed to be invariant with respect to all possible environments
[
28]. As a result, for a set of training environments
, we have
. Intuitively, the invariant set
is smaller because it has to generalize across all domains. Hence, not all environments are helpful to tighten the invariant set, and even available labeled environments may be insufficient for learning the optimal
, as we will empirically demonstrate in the Experiment Section. Additionally, in many real-world applications, environments may not be available. This motivated us to study how to exploit the latent intrinsic variations in training data to discover refined environments.
Since spurious attribute a can introduce shortcuts for labels y, it follows that there exist latent intrinsic spurious features in our input samples x, e.g., the digit color in CMNIST or the background in the Waterbirds dataset. However, these shortcuts can vary across domains and degrade the generalization. To put it formally, for a pair satisfying EIC, there exists such that can arbitrarily change across environments. Higher variation of among environments leads to a smaller since more variant (unstable) features can be excluded by leveraging invariant learning algorithms, thereby bringing us closer to . In this regard, we redefine our research question as “how we can effectively partition the dataset into environments with significant variations in ”.
While in general, we may require a large number of environments to tighten the
, in most cases, two environments may suffice to recover invariance [
3,
6,
17]. These are situations where the EIC cannot be satisfied for two different environments,
and
, unless
extracts the causal invariance [
3]. To discover such environments, one approach is to partition the dataset into two opposite environments based on the agreement between
y and the spurious attribute
a. In one environment, the network can directly use the shortcut to make predictions (i.e., they agree). However, in the second environment, the association between the label and shortcut does not apply, meaning that the network has to use the shortcut but in a reverse manner (i.e., they disagree) to make correct predictions. This setup creates two environments with diverse
because the association between the label and the spurious attribute exhibits significant variation.
We aimed to generate two environments with opposite associations between labels and shortcuts. To achieve this, we trained a neural network,
M, as a reference classifier for partitioning the dataset. We then compared the performance between model
M and a dummy model
to separate bias-aligned and bias-conflicting samples. This way, we ensured that the two environments exhibited reverse associations. To guarantee that our reference classifier
M utilized the shortcut for predictions, we intentionally forced
M to make predictions based on the shortcut. Analyzing the training loss dynamics, we observed that the training loss of samples containing shortcuts reduced quickly, whereas the loss for other samples first increased and then decreased (
Figure 1). Empirical evidence suggests that neural networks tend to rely on shortcuts that may exist in the dataset and memorize them during the early stages of training, as these concepts are often simpler than the main task [
4,
13,
43]. Therefore, by deliberately reinforcing the predictions of model
M in the early stages of training, we could encourage it to learn the intrinsic spurious features
. We accomplished this using the Generalized Cross-Entropy (GCE) [
44] loss function:
where
is the softmax output for the
j-th element of
corresponding to the target
y, and
is a hyperparameter to control the degree of amplification. Using L’Hopital’s rule, GCE is equivalent to the standard Cross-Entropy (CE) when
[
44]. Compared to the Cross-Entropy, GCE weighs the gradient of each sample by an additional
, i.e.,
, where
is the model parameters. As a result, using the GCE loss, we could place more emphasis on samples for which the model has a higher confidence level (i.e., higher softmax output). Since the shortcut is easier and learned from the early epochs, we were encouraging our reference classifier to focus more on them.
Furthermore, it was crucial to ensure that as we continued the training of model
M by increasing the number of epochs, the model did not overfit on bias-conflicting samples (
Figure 1). This precaution was to guarantee that our reference classifier was making predictions solely based on the shortcut. In this regard, we proposed to train
M only on bias-aligned samples. We began with two randomly assigned environments
and
(
discrete uniform) with equal sizes. We then selected one of these two random environments, say
, as an initialization for the biased environment to train the reference classifier. After each training epoch, we updated both
and
based on a difficulty score that reflected how challenging each sample is. We chose to use the minimum of Cross-Entropy loss per sample for model
M and model
, as it would provide a continuous metric that could be easily compared. Since model
M is intentionally biased, it exhibits superior performance (i.e., lower Cross-Entropy loss) on biased samples, while model
uses the shortcut in the opposite direction and performs better on bias-conflicting samples. Consequently, as we iteratively updated the environment partitions,
progressively contained more bias-aligned samples, while
comprised an increasing proportion of bias-conflicting samples. This approach ensures that model
M continued training on an increasingly biased dataset without overfitting on all samples. Algorithm 1 provides the pseudocode for FEED. Following the partitioning of the training data into two environments, we could then apply invariant learning algorithms. Additionally, we empirically observed that we could use FEED to estimate groups based on the pair
(rather than
) for the GroupDRO algorithm and achieve favorable performance.
Algorithm 1 FEED Algorithm |
Input: dataset , model M |
Output: environments , |
1: Randomly initialize and using |
2: for epochs do |
3: train M by minimizing |
4: for do |
5: if then |
6: Assign to |
7: else |
8: Assign to |
9: end if |
10: end for |
11: end for |
12: return , |
Leveraging FEED allowed us to partition the dataset into two environments with high variation in spurious correlations. In these environments, an invariant model cannot satisfy the EIC unless it ignores the shortcut. While FEED employs the Generalized Cross-Entropy (GCE) loss to promote the learning of spurious correlations, other methods such as EIIL [
6] and JTT [
7] use the Cross-Entropy loss to train their reference models. However, Cross-Entropy may not always recover a biased model. Furthermore, unlike prior approaches [
6,
7] that utilize the entire dataset to train the reference classifier, we exclusively used
for training our reference classifier. This is to prevent overfitting and focusing solely on spurious correlations. Overfitting on all training samples would make partitioning the samples impossible. Moreover, rather than defining an optimization problem for environment discovery, as seen in previous works [
6], we proposed a simple yet effective approach for updating the environment assignments at each epoch. Employing an optimization problem is not easily scalable to the mini-batch training paradigm of neural networks.