1. Introduction
When comparing a phenomenon across populations, it is essential to account for differences in population composition in order to avoid drawing misleading conclusions. For example, in analyzing differences in unemployment duration between men and women, it is necessary to control for differences in human capital and job tenure to ensure comparability. This can be carried out by estimating the outcome of interest in both populations under a common composition, i.e., the unemployment duration distribution that would arise if men and women shared the same levels of human capital and tenure.
Neison (
1844) introduced standardization techniques to compare crude death rates across districts in Great Britain, accounting for differences in age structure. In this context, standardization entails re-estimating counterfactual values of the outcome under a common distribution of covariates. Typically, this distribution corresponds to that of one of the populations, although it may also be defined as the pooled distribution or another external benchmark. Later,
Kitagawa (
1955) formalized this approach and proposed decomposing the total difference in mortality rates into a composition effect, driven by differences in population characteristics, and a structure effect reflecting differences in group-specific risks. The latter term captures disparities in returns to observable characteristics and is often interpreted as reflecting institutional factors, behavioral responses, or potential discrimination (
Fortin et al., 2011;
Neumark, 1988;
Oaxaca & Ransom, 1999). The structural effect should therefore be interpreted as a statistical measure of unexplained heterogeneity rather than as evidence of a specific causal mechanism.
Kitagawa (
1964) extended this framework to decomposing the differences between the characteristics of the cumulative distribution functions (CDFs) of two populations.
In the context of unemployment duration gender gaps, the composition effect measures the share of the observed difference that arises solely from disparities in observable characteristics. In contrast, the structure effect captures the residual difference that would remain even if both groups shared identical covariate distributions. Moreover, the composition effect can be further decomposed to assess the relative contribution of each covariate (
Oaxaca & Ransom, 1999;
Rothe, 2015).
In economics, counterfactual decompositions became prominent through their application to the analysis of gender wage gaps. The Oaxaca–Blinder (OB) decomposition, introduced independently by
Oaxaca (
1973) and
Blinder (
1973), partitions the difference between sample means into two counterfactual components based on a linear specification of the regression function. In this framework, the counterfactual mean is obtained as the average of fitted values using coefficient estimates from one population and covariate observations from the other. The OB approach has been extended to other features of the cumulative distribution function and applied widely in the study of gender wage differentials (e.g.,
Cain, 1986;
Blau & Kahn, 1992;
Oaxaca & Ransom, 1999; and
Machado & Mata, 2005). Related work has analyzed other wage differentials, including racial (
Reimers, 1983;
Melly, 2005), union (
Freeman, 1980), skill-related (
Juhn et al., 1991), cross-country (
Donald & Hsu, 2014), policy-induced (
Rothe, 2015), and immigrant–native (
Chiquiar & Hanson, 2005) gaps.
Lemieux (
2002) provides a comprehensive review of methodologies for decomposing changes in wage distributions.
Nonparametric and semiparametric extensions relax the linearity assumption to address mean and distributional decompositions, often relying on kernel regression or density estimation (e.g.,
Stock, 1989;
DiNardo et al., 1996). For other distributional features, decompositions employ flexible nonparametric or semiparametric specifications of the conditional CDF.
Fortin et al. (
2011) provide an extensive overview of these methodological developments. Despite their broad use, only a limited number of studies have examined duration outcomes such as unemployment spells. More recent contributions include
Guo and Basse (
2021), who generalize the OB framework to nonlinear models, and
Charpentier and Flachaire (
2024), who propose a simultaneous decomposition of means and inequality.
Although these methods are widely used in applied work, relatively few studies focus on duration outcomes, such as unemployment spells. Analyzing duration data requires stronger identification assumptions, particularly concerning the censoring mechanism. We propose counterfactual decomposition techniques for right-censored duration outcomes that encompass alternative standardizations of the distribution function, using both nonparametric and semiparametric specifications of the underlying cumulative distribution function. We also provide a detailed discussion of how to implement OB-type decompositions in this setting. The original OB proposal cannot be directly extended to other distributional features that are nonlinear in the underlying conditional CDF. In particular, the cumulative hazard rate (CHR) is defined as the integral of the conditional CDF over the corresponding survival function. The averaged CHR (ACHR) is not directly related to the HR, and the two can exhibit very different shapes, as shown in the empirical example. Furthermore, the interpretation of the ACHR is not obvious. The HR is the main research tool in duration analysis, providing valuable information on the underlying duration distribution. In the context of unemployment spells, it measures the instantaneous probability of finding a job after remaining unemployed for a period of length t. Nevertheless, as we show in this article, the counterfactual HR can be derived from the counterfactual CDF, and this estimator can be used to decompose the difference between the HR estimates in the two populations.
The Cox proportional cumulative hazard rate (PCHR) model has been widely used to study gender gaps in unemployment duration, typically through the averaged cumulative hazard function (ACHF), i.e., the conditional hazard function integrated with respect to the marginal distribution of covariates, see, e.g.,
Ham et al. (
1999),
Du and Dong (
2009), and
Tansel and Tasci (
2010). Since the ACHF and the marginal hazard function (HF) are generally unrelated, inference based on the ACHF may be misleading. To our knowledge, no studies have implemented direct standardization of the HF or other distributional features under right censoring, nor have these methods been applied beyond unemployment. The counterfactual decomposition methodology presented in this article is directly transferable to these domains, offering a unified framework for analyzing differences in censored duration outcomes. Beyond unemployment analysis, the proposed methods can be applied to a wide range of outcomes that can be expressed as duration times. Examples include time to school dropout, time to credit default or repayment, duration until job promotion or contract termination, patient recovery times in medical studies, or the length of stay in migration.
Because the mean duration cannot be consistently estimated in the presence of censoring, comparisons based on the restricted mean survival time (RMST), i.e., the expected duration within a fixed time window, are more appropriate. For example, comparisons over the first 12 months (short-run unemployment) or 24 months (long-run unemployment) are meaningful. We extend our framework to incorporate RMST-based standardization and counterfactual decomposition. Our analysis connects to this broader research agenda, complementing recent work on counterfactual survival analysis (
Chapfuwa et al., 2021), on the role of heterogeneous survival expectations in shaping structural effects (
de Bresser, 2024), and on semiparametric methods for counterfactual inference in duration models (
Hausman & Woutersen, 2014).
We evaluate the performance of the proposed methods through Monte Carlo simulations under alternative data-generating processes, and we illustrate their empirical application by decomposing gender gaps in unemployment duration in Spain during 2004–2007, using data from the European Union Statistics on Income and Living Conditions (EU-SILC). The results indicate that composition effects account for only a small fraction of the observed gender gap. Moreover, estimates based on ACHR and HR differ substantially, with nonparametric approaches providing more robust and reliable insights.
The remainder of this article is structured as follows.
Section 2 introduces the main notation, standardization techniques, and counterfactual decomposition methods.
Section 3 develops estimators based on both the PCHR model and nonparametric specifications of the conditional distribution.
Section 4 evaluates the finite-sample performance through Monte Carlo simulations.
Section 5 applies the proposed methods to Spanish unemployment duration data. Finally,
Section 6 concludes with key findings and remarks.
2. Standardization Under Censoring
Consider the duration random variable for populations observed under right censoring according to variable , . Counterfactual decompositions must be performed from the observed random vector where is the observed vector of population components, or characteristics, with explanatory power for , and indicates whether or not the observed duration is censored with , the indicator function of event A. Censoring appears due to the lack of follow-up for the individuals. When individuals are observed over a fixed period, complete durations are not always available because the relevant event did not occur at the end of the observation period (administrative censoring) or because the individual drops out before completion (loss to follow-up).
Let
be the joint cumulative distribution function (CDF) of
and
the corresponding CDF of
given
, i.e.,
for all
and
where
is the corresponding CDF of a generic random variable, or random vector,
, and
is the upper bound of its support. Note that the sample does not provide information beyond
, which implies that
cannot be identified from the CDF of the observable random vector
. Thus, the consistent estimation of the sub-DF,
is the best we can hope for, where for any generic CDF
and
is the number of jumps in
Notice that
for all
, and also for all
when either,
.
The standardized version of
, taking population
s as standard, is
which represents the distribution that population
j would exhibit if it had the covariate distribution of population
s. For instance, in the context of gender differences in unemployment duration, this would correspond to the distribution of unemployment duration for women if they had the same observable characteristics as men. Henceforth, we take for granted that integrals in (
1) are well defined, which requires that the support of the components
in the standard population is contained in the support of the standardized population. Notice that
The counterfactual distribution
is the basis for identifying their moments or other characteristics. In particular, the standardized cumulative HR is
where
is the standarized HR. This is interpreted, in the context of unemployment duration analysis, as the instant probability of finding a job that would have a worker unemployed during a period of length
t, taken at random from population
j, if population
j had the same distribution of covariates than population
s.
As it was mentioned in the introduction, the existing applied work on counterfactual duration analysis of unemployment duration gender gaps is based on the ACHR,
where
is the CHR, defined as (
2) in terms of the cumulative CHR
Obviously, the
and
shapes are not necessarily related.
Alternative estimators can also be considered, relying on different specifications of the conditional CDF, which is equivalently characterized by the conditional hazard rate (CHR) for censored duration variables. Examples include semiparametric approaches such as the accelerated failure time (AFT) model (
Kalbfleisch & Prentice, 1973), the proportional odds (PO) model (
Clayton, 1976), or, more recently, the distributional regression framework for censored duration data proposed by
Delgado et al. (
2022). These alternatives to the Cox model may be preferred depending on the context. In this article, we have focused on Cox’s specification because it is by far the most popular. The advantage of a nonparametric model is that the underlying CHR is not specified, and the regularity conditions needed to justify the consistency of the proposed partition estimate are minimal taking a fixed number of partitions. Notice that, because we are estimating integrals using KM weights, independence between the survival variable
T and the censoring variable
C is needed. Naturally, other nonparametric estimators could also be employed.
While the hazard rate remains the central distributional feature in duration analysis, mean-based comparisons are often of interest from a policy perspective. Since the unconditional mean of duration cannot be consistently estimated under censoring, a natural alternative is the restricted mean survival time (RMST), defined as,
The RMST is the average duration in the first
periods. For instance, if
, this is the average unemployment duration during the first 12 months, which is related to short-term unemployment.
Chen and Tsiatis (
2001),
Karrison (
1987),
Zhang and Schaubel (
2011), and
Zucker (
1998) provide applications of RMST to different contexts. The parameter of the corresponding counterfactual CDF is
The crucial step to estimate the standardized quantities, either , , or requires the prior estimation of . To this end, we consider two possibilities. One is based on the PCHR specification, which assumes that the underlying CHR, belongs to the family of monotonically increasing functions where and .
When the CHR specification is correct, there exists a
and
such that
The identification of
and
requires that
- A.0.
is independent of conditionally on
The standardized
with respect to population
assuming that
, is
with corresponding cumulative HR
with HR
which is typically unrelated to the ACHR
The corresponding standardized RMST is,
Note that
for all
under a correct specification, but this is not necessarily the case under misspecification.
We can avoid specifying the CDF by noticing that standardizations can also be performed from any given partition
of
such that
for all
since
with
and
This suggests the standardization
Thus, it is not needed to specify
to obtain the components in the counterfactual decomposition, but only
Notice that, unlike the semiparametric standardization
we always have that
despite the actual underlying CDF shape, but
for
In order to identify
for
we need to assume that,
- A.1
is independent of
- A.2
and are independent conditional on a.s.,
Condition A.1 is the standard identification condition for the nonparametric
which justifies the consistency of the Kaplan–Meier (KM) product limit estimator using censored data (
Kaplan & Meier, 1958). Assumption A.2 is the extra condition, provided by
Stute (
1993) to identify
, which establishes the relation between the covariates and the censoring mechanism. Notice that we have a different standardization
for each possible partition, which is the cost that one must pay for not imposing a more restrictive specification.
The standardized RMST using partitions
is
An alternative standardization for the RMST relies on a semiparametric specification of the conditional mean of the restricted version of the outcome
, i.e., imposing restrictions on
since
In particular, a linear specification of
produces the OB standardization analog for the RMST. That is, taking into account that
with parameters
such that
has zero mean and is uncorrelated with
the OB standardization of
is
.
The total CDF difference
can be written in terms of the counterfactual effects using
as
where
is the counterfactual structural effect,
is the counterfactual composition effect, and
. We can perform a similar decomposition using
but we must take into account that the decomposition is for
which is different than
under the misspecification of the conditional CDF
.
The HR or the RMST differences can be decomposed differently from (
6). That is, if
represents a particular distributional feature of population
e.g.,
can be
,
, or
3. Estimation
The sample observed consists of as . Henceforth, ties within censored or uncensored duration times are ordered arbitrarily, and ties among uncensored and censored durations are treated as if the former precedes the latter.
Assuming a PCHR specification,
is estimated by
where
is the
Breslow (
1974) estimator of
and
is the Partial Maximum Likelihood (PML) estimator of
(
Cox, 1972). Henceforth, we avoid references to the sample size.
The weak convergence of
is an immediate consequence of the well-developed asymptotic theory for
.
Tsiatis (
1981) showed that, under
and
Thus, applying Theorem 4.1 in
Chernozhukov et al. (
2013), for
when the PCHR specification is correct,
where
as
.
Asymptotic confidence intervals for
can be obtained applying results in
Andersen and Gill (
1982). Bootstrap confidence intervals can also be obtained using techniques designed for the PCHR model, as in
Burr (
1994), which can be justified in the lines of Theorem 4.2 in
Chernozhukov et al. (
2013).
The Cox estimator adapted to the counterfactual context is consistent when the CHR is correctly specified but inconsistent under misspecification. The formal justification of inferences based on the proposed counterfactual decompositions is beyond the scope of this article. The results in
Chernozhukov et al. (
2013) can be used to derive a functional central limit theorem (CLT) for the counterfactual estimator of the HR based on the Cox’s specification. Inferences using the nonparametric CDF estimator based on partitions and KM weights can be carried out using results in
Stute (
1993) and
Stute et al. (
2000). These partition estimators can be implemented when regressors are not necessarily random variables, but other random objects such as those used when dealing with spatial, networks, or functional data. Inferences of the counterfactual CDF estimator when the number of sets in the partitions diverges and the size of the sets converges to zero as the sample size diverges could be justified using universal consistency results (
Stone, 1980), as implemented for partition estimators of conditional moments by
Györfi et al. (
2002).
Under A.1,
is consistently estimated by the KM estimator,
where
are KM weights, and for any generic sequence
is the
concomitant of the ordered
i.e.,
if
. Then,
is the mass attached to
. Likewise, under A.1 and A.2,
is consistently estimated by
Therefore, the estimator of
in (
5) is estimated by
with
and
Any of the above DF standardizations results in alternative estimates of the cumulative HR. Let
denote either
or
and the
estimator is
which is a jump function in both cases. The corresponding
estimate is the value of the jump
or the smooth version
where
and
K is a kernel function integrating to one.
Each standarized DF results in alternative
estimates, i.e.,
using the PCHR specification, and
when
is nonparametric, using the estimator based on grouped data.
An OB estimator of
can be obtained after noticing that
with
and
with
This suggests applying the OB approach by estimating
as
which is inconsistent when
is nonlinear.
Stute (
1993) shows, under
and
that
and that for any function
and
such that
,
is a consistent estimator of
for
which shows that
and
are consistent estimators of
and
respectively. This justifies that
is a consistent estimator of
Stute (
1993) derives the asymptotic distribution of
. Bootstrap confidence intervals can be obtained following the procedure described in
Stute et al. (
2000). A formal justification of the inferential properties is beyond the scope of this paper.
Regarding the decompositions, we estimate
by
and the corresponding decomposition is
The estimated counterfactual structural and composition effects,
and
are consistent estimates of
and
respectively. Decompositions based on
are performed in the same way, but consistent
estimation requires the correct specification of the underlying CHR. Likewise, we can perform decomposition of the differences between HR and RSTM estimates.
4. Monte Carlo Simulations
This section provides evidence on the finite sample performance of the alternative standardization methods for the RMST using only one component (). Henceforth, PML stands for the method using the Partial Maximum Likelihood estimator assuming a PCHR specification, OB-KM and OB-KM-Pol3 for the method based on the classical OB decomposition, using KM weights, under a linear and polynomial of order 3 specifications, respectively, and NP for the nonparametric method with and 10 with classes of equal size.
We consider the following designs,
- DGP1:
- DGP2:
- DGP3:
Monte Carlo experiments are based on 1000 replications with sample sizes of 200, 800, and 3200. We report the root mean squared error (RMSE) for the different estimators. Simulations are performed using as the target parameter and without exploiting the fact that, in the three designs, and are unbounded, i.e., and Parameter is consistently estimated by and the RMST is calculated with
Table 1 reports results under DGP1 design. Since the PCHR specification is incorrect, the PML estimator is inconsistent for
,
, which explains the large biases. The regression function is linear and hence both OB-KM and OB-KM-Pol3 are consistent. As expected, OB-KM, which is asymptotically the most efficient, performs best in finite samples, but there are no significant loses using the overparameterized OB-KM-Pol3. As expected, the nonparametric NP(3) and NP(10) estimators are inefficient.
Table 2 reports results under the DGP2 specification. The OB-KM and OB-KM-Pol3 are still consistent estimators of
, but they are inconsistent estimators of
for
. In turn, since the PCHR specification is correct, the PML is consistent and efficient. This is confirmed by the simulations. Interestingly, OB-KM-Pol3 is a fairly robust alternative to OB-KM and performs similarly to NP(
m).
Table 3 reports results under the DGP3. In this case, NP(
m), OB-KM, and OB-KM-Pol3 are consistent for
but the PML is inconsistent. However, OB-KM and OB-KM-Pol3 are inconsistent for
but NP(
m) consistently estimates
NP(
m) performs much better than both PML and OB-KM in this case. However, there is sensitivity with respect to the number of classes chosen, i.e., NP(3) versus NP(10). OB-KM-Pol3 shows to be a robust alternative to OB-KM that fairly captures nonlinearities in the underlying regression.
We also study the effect of ignoring censoring. First, assessing the effect on the estimates when censoring is not taken into account, and second, analyzing the effect of trying to estimate the mean, which is not identified because
is bounded.
Table 4 provides the RMSEs under DGP1 based on OLS fits using censored and uncensored observations, i.e., assuming that
is the actual duration. These biased estimates are compared with the corresponding OB-KM estimator. Simulation results show serious biases when ignoring censoring.
Table 5 illustrates the effect of neglecting the fact that
when estimating the duration mean. In this case,
cannot be consistently estimated beyond
, and unlike previous experiments, where
it is not possible to estimate
or
for
We compare the PML estimators of
under design DGP2, i.e., PML is efficient, when
is censored, using
as the censoring variable, i.e.,
.
Table 5 confirms high biases for estimating the mean when
, but the PML estimator still performs well as an estimator of
. This illustrates the importance of focusing the statistical inference in truncated or restricted parameters as the RMST.
5. Unemployment Duration Gender Gaps in Spain
This section investigates the causes of unemployment duration gender gaps in Spain using counterfactual decompositions of HR and RMST differences. We also provide a comparison between ACHR and HR estimates using the alternative specifications. The Spanish case is particularly interesting because it has experienced one of the highest unemployment rates among OECD countries in recent decades. According to official statistics, for the period 1995–2005, the average unemployment rate was around 6.8% in OECD countries and 5% in the US while it was 14% in Spain. Moreover, the difference in unemployment rates by gender has also been important. For the same period, women exhibited an unemployment rate 9 percentage points (p.p.) higher, while in the US this gap was around 0.04 p.p.
The existing literature has mainly paid attention to gender gaps in the aggregated unemployment rate (
Azmat et al., 2006;
Johnson, 1983;
Niemi, 1974;
Queneau & Sen, 2007), but gender gaps in other unemployment features, like spells of unemployment duration, have received less attention. Research on unemployment duration gender gaps has almost exclusively focused in explaining the gender differences in the ACHR rather than HR. See
Section 1 for a discussion.
We implement the proposed methodology to perform counterfactual decompositions of the HR and RMST differences using data from the Survey of Income and Living Conditions (SILC) for the period 2004–2007. Data is available on the website of the Spanish National Institute of Statistics
https://www.ine.es/dyngs/INEbase/operacion.htm?c=Estadistica_C&cid=1254736176807&menu=resultados&idp=1254735976608#_tabs-1254736195153 (accessed on 15 october 2022). This survey, carried out by the European Commission, is a rotative household panel that collects information on socioeconomic characteristics, including the occupational status (monthly) for a period of 4 years. Our population consists of unemployed workers older than 25 starting a spell of unemployment during the period 2004–2007. All data used in this study are publicly available from the Spanish National Statistics Institute (INE). We measure unemployment duration as the number of months that a worker is not employed, which is usually referred to as non-employment duration.
We consider as composition variables those commonly used in unemployment duration analysis such as age, educational level, tenure, marital status, whether the individual is the head of the household, the number of unemployed in the household, city size (according to three levels of urbanization in the SILC as big city, medium size city, and small city), and region (
Addison & Portugal, 2003;
Kuhn & Skuterud, 2004;
Tansel & Tasci, 2010). The first three variables refer to human capital characteristics, while the others are related to the opportunity cost of being unemployed and the reservation wage.
Table 6 provides summary statistics for the discrete explanatory variables and
Figure 1 the corresponding QQ plots for the continuous variables, i.e., age and tenure. Compositions are similar in the two populations, except for tenure. Therefore, the composition effect should not be particularly important. We observe censoring levels of 21.4% for women and 16.2% for men.
Henceforth, population 0 corresponds to women and population 1 to men. First, we analyze the HR differences between the two populations.
Figure 2 provides KM estimates of the marginal HR,
and the corresponding nonparametric estimates of
based on a partition
, as well as the HR difference,
into counterfactual effects. The partition is based on
classes and
is estimated using kernel smoothing,
, with an Epanechnikov kernel and the bandwidth chosen by the classical plug-in method. The number of classes in the partition was established according to the structure of the composition variables using some natural thresholds for the continuous variables. For instance, we grouped workers between 25 and 40 years old (prime-age workers) and workers with less than 10 years tenure. If the number of partitions is too high, some classes could contain very few observations, particularly when there is a high dependence on covariates, e.g., there are few observations in a partition class with young workers of more than 10 years tenure, married, and with a high education level. Therefore, by using age, labor market size, and marital status, we construct eight classes. This method has the advantage that the classes in a partition can be chosen in a natural way, accounting for the observed relation between duration and covariates.
The nonparametric KM estimates of the HR show an acceleration of women HR after 24 months of unemployment, which is consistent with the unemployment compensation normative in Spain during the analyzed period. Unemployment benefits expired after 24 months and workers received 70% of their salary during the first 6 months and 60% for up to 24 months.
Figure 2 suggests that, on average, women exhaust all unemployment benefits, producing a spike in the HR starting around the expiration rate. That is, women and men exhibit different optimal delays in job acceptance. This phenomenon has also been documented by
Roed and Zhang (
2003) and
Boone and van Ours (
2012) using Norwegian and Slovenian data, respectively. We observe that the estimated composition effect is very small at any period. That is, HR differences between women and men do not seem to be explained by the socioeconomic characteristics considered in this study, which are almost identical, but by anything else, like circumstances of the labor market’s tightness and discrimination related to institutional factors, labor circumstances, or behavioral aspects (
Bachmann & Sinning, 2016).
We have checked Cox’s model specification for the two populations using the popular
Schoenfeld (
1982) residual based goodness-of-fit test, which results in
p-values of 0.031 and 0.654 for women and men, respectively. Therefore, Cox’s specification is rejected for women but not for men, at 5% significance.
Figure 3 shows the smooth version of PML standardized HR estimates,
with the same kernel and bandwidths used in the nonparametric case. The estimates are quantitatively very different to the nonparametric ones in
Figure 2, possibly because of misspecification, though it also shows a composition effect close to zero at any period and an acceleration of the HR for women after 24 months of unemployment.
Figure 4 and
Figure 5 provide the ACHR estimates and its standardizations using NP(8) and PML, respectively. The HR and ACHR shapes are very different for each estimator. However, the counterfactual composition effect of both HR and ACHR are close to zero. The ACHR estimates of the counterfactual decompositions using PML is particularly hard to interpret.
Next, we analyze the
and
estimates for
using OB-KM, PML, and NP(8), which produces estimates of the duration means when
and also for
, which are of interest when studying short- and medium-term unemployment differences. RMST estimates and their corresponding counterfactual decompositions can be found in
Table 7 and
Table 8, respectively. The estimate of the non-standardized RMST during the first 42 months is around 10.8 months for women and 7.9 months for men. The standardized RMST estimate, which is interpreted as the average unemployment duration during the first 42 months if women would have the same components than men is around 10.3 months, close to the corresponding nonstandardized value. Results across methods are qualitatively similar and reveal a reduction in the corresponding standardized RMST.
The counterfactual decompositions of RMST using OB-KM and PML for
(see
Table 8) are fairly different compared to the NP(8), which may indicate a misspecification of the underlying assumed structures. For instance, the OB-KM estimator might be biased because
is nonlinear and the PML is also biased because the underlying CDF does not follow a PCHR model, which was confirmed by Schoenfeld’s test.
The counterfactual composition effect for any
is close to zero for any of the three methods. This indicates that the difference in worker characteristics slightly increases the severity of women’s unemployment duration compared to men, which might be driven mainly because of tenure, as suggested in previous descriptive analyses. Notice that the counterfactual effects using the OB-KM and PML methods are similar, but the composition effect using NP(8) is much smaller, as we have already seen for the HR in
Figure 2 and
Figure 3. Similar results are obtained for
and
. However, based on NP(8) and OB-KM, it seems that the composition effect is more important when explaining unemployment duration gender gaps in the first stages of unemployment, but this is always much smaller than the structure effect.
6. Conclusions
We have presented a new methodology for standardizing duration distributions using right-censored data. To this end, we introduced the counterfactual hazard function, as well as parameters that summarize the characteristics of the duration CDF, such as the RMST. These are nonlinear functionals of the underlying conditional CDF, and we have shown that the OB technique for obtaining counterfactual means cannot, in general, be extrapolated to this setting. In particular, the ACHR—obtained by averaging certain conditional hazard rates, typically under a proportional hazards specification—lacks a clear interpretation and may lead to misleading conclusions.
We proposed the following two types of counterfactual estimators of the conditional CDF: on the one hand, nonparametric estimators are based on partitioning the support of the explanatory variables into a fixed number of classes, and consistently estimate the joint CDF of durations and covariates using Kaplan–Meier (KM) weights; on the other hand, semiparametric estimators are based on a Cox proportional hazard specification, which are inconsistent under model misspecification.
The finite-sample properties of these estimators and the resulting decompositions—evaluated through Monte Carlo simulations and an application to gender gaps in unemployment duration using Spanish data—show that the nonparametric approach performs reasonably well in finite samples. Moreover, the OB approach using KM weights proves robust for RMST decompositions when a flexible functional form (such as a polynomial) is used for the underlying regression. In contrast, estimates based on the semiparametric PCHR specification perform very poorly under misspecification.
The empirical application shows that decompositions of hazard rate (HR) differences, rather than of averaged hazard rates (ACHR), yield more reliable results. The analysis suggests that the gender gap is mainly driven by the structure effect.
There are several potential applications and extensions of the proposed methods that are of interest. For example, the counterfactual estimators developed in this study can be applied across a broad range of disciplines such as health, labor and education economics, finance, management, and public policy. From a methodological standpoint, future work could refine the proposed estimators by incorporating more flexible frameworks for conditional distribution estimation and by developing data-driven partition selection rules in the nonparametric context.