Next Article in Journal
Bias-Correction in Vector Autoregressive Models: A Simulation Study
Previous Article in Journal
Referee Bias and Stoppage Time in Major League Soccer: A Partially Adaptive Approach
Article

Incorporating Responsiveness to Marketing Efforts in Brand Choice Modeling

Econometric Institute, Erasmus University Rotterdam, H11-2, P.O. Box 1738, Rotterdam NL-3000 DR, The Netherlands
*
Author to whom correspondence should be addressed.
Econometrics 2014, 2(1), 20-44; https://doi.org/10.3390/econometrics2010020
Received: 21 October 2013 / Revised: 4 February 2014 / Accepted: 4 February 2014 / Published: 21 February 2014

Abstract

We put forward a brand choice model with unobserved heterogeneity that concerns responsiveness to marketing efforts. We introduce two latent segments of households. The first segment is assumed to respond to marketing efforts, while households in the second segment do not do so. Whether a specific household is a member of the first or the second segment at a specific purchase occasion is described by household-specific characteristics and characteristics concerning buying behavior. Households may switch between the two responsiveness states over time. When comparing the performance of our model with alternative choice models that account for various forms of heterogeneity for three different datasets, we find better face validity for our parameters. Our model also forecasts better.
Keywords: marketing-instrument effectiveness; heterogeneity; multinomial probit; finite mixtures marketing-instrument effectiveness; heterogeneity; multinomial probit; finite mixtures

1. Introduction

The use of brand choice models has become standard practice in marketing research [1,2,3,4]. In many applications of these choice models, the random utility theory framework [5,6] is used to represent the choice process. An often made assumption used to be the homogeneity of households. That is, it was assumed that all households have similar tastes, where tastes also include features, such as price elasticity and promotion sensitivity. Differences in household behavior were only allowed to the extent that they could be fully explained by observable characteristics. This corresponds with so-called observed heterogeneity. Taste is in this case explicitly modeled, for example, by including demographic variables (see, e.g., [7]) or like [8], who include survey data in their brand choice model to capture heterogeneity. Usually, however, such survey data are not available. Furthermore, many studies have shown that not all heterogeneity can be captured by available observed characteristics. Hence, there might be so-called unobserved heterogeneity; see, for example, [9,10], among others.
There are two popular techniques to deal with unobserved heterogeneity; see [11,12] for a discussion. These techniques are both based on the notion that when there is unobserved heterogeneity in tastes, there is a corresponding preference distribution in the population. One approach imposes a continuous distribution of a known form to capture the heterogeneity; see, for example, [10]. The other approach tries to approximate the unknown distribution by a discrete distribution with a fixed number of probability masses. A choice model using the latter approach is an example of a finite mixture model; see, for example [13]. The mixture components are usually interpreted as segments of households with similar preferences.
In the above-mentioned approaches, tastes are usually assumed to be constant during the observation period for each household. This assumption is needed to identify the random heterogeneity. Additionally, the imposed unobserved heterogeneity structure has a priori no direct interpretation. For example, the interpretation of segments following from a mixture approach is usually done once the parameters have been estimated.
In the present paper, we propose a new approach. Next to a flexible specification of possible heterogeneity in tastes, we introduce unobserved heterogeneity in a brand choice model, which a priori has a direct and meaningful interpretation. Furthermore, we allow heterogeneity to be different across purchase occasions within the same household [14]. Households, who choose amongst brands within a specific product category, may differ in their response to marketing efforts. For example, some households will spend more time and effort while making their choice than others do. If little time and effort is invested in the decision process, it is perhaps less likely that the household will respond to marketing instruments [15]. For example, to be able to respond to price changes, one, of course, needs to recall the previous prices of all brands. To be able to respond to advertising, one has to read the newspaper in which the advertisement is printed. It may be unrealistic to assume that all households show such a strong involvement with the product category at all purchase occasions, especially if we consider low involvement categories, such as various supermarket product categories. Hence, it is likely that households will differ in the extent to which they are responsive to marketing efforts. Within a household, there may also be differences in the responsiveness across purchase occasions, for example, due to different types of shopping trips [14].
One reason why some households are unresponsive to marketing efforts could just be a lack of interest in marketing efforts made by brand managers. On the other hand, economic motivations may also explain varying responsiveness across households and over time. For example, search costs play an important role in the decision process of a household or an individual. As mentioned before, to be responsive to price changes, one needs to remember the prices of each option at each purchase occasion. Additionally, people usually face time constraints. It takes time for a household to compare the prices of all options at a specific shopping occasion at the time of purchase. Consider a household planning to buy many different items during the same shopping trip. There is obviously a limited amount of time available for the trip, and therefore, it may be unrealistic to assume that the household will allocate much time to each item. Following this line of thought, the more items a household purchases at a shopping trip, the less responsive this household might be to marketing efforts. Hence, the monetary value of all products purchased at a shopping trip may be inversely related to the responsiveness to marketing efforts.
As the decision process differs across households and across purchase occasions, the above implies that the observed choice of different households can unlikely be explained by the same variables. Choice behavior of responsive households can be explained by their base preferences, by marketing efforts and by their purchase history. Brand choice by unresponsive households may only be described by base preferences and purchase history. Moreover, household characteristics are rarely seen to significantly contribute to explaining brand choice, but these might be especially informative for the type of decision process used by the household. As such, household characteristics might influence brand choice, albeit perhaps only indirectly.
In this paper, we put forward a brand choice model that incorporates responsiveness to marketing efforts as an explicit form of heterogeneity. We introduce two latent segments. In the first segment, the households are assumed to respond to marketing efforts, while in the second segment, households are assumed not to do so. If households are not responsive, their brand choice may be influenced by their previous choice or they simply purchase their most preferred brand. Whether a specific household is a member of the first or the second segment at a specific purchase occasion is described by household-specific characteristics and characteristics concerning buying behavior. Additionally, to capture differences in responsiveness over time, households are allowed to switch between the two segments across purchase occasions.
The approach in the present paper is somewhat related to structural heterogeneity, where one allows individuals to have different decision strategies. For example, [16] examine brand choice within a product category, where the brands carry, say, different product sizes. A household might first choose a brand and then choose the specific size to purchase. Another household might first choose a specific size and only then consider the available brands. A third household might completely ignore all this and choose directly from all available brand and product size combinations. The authors of [17], for example, present a model in which households are allowed to differ in the reference point to which options are compared. These authors use a hierarchical Bayes model to model credit card adoption, where households are allowed to differ in their decision rule and where behavior can change over time. The authors of [18] consider structural heterogeneity with respect to framing in a prospect theory setting. For a given decision, some individuals may use a gain frame, while others may adopt a loss frame. In a sense, our model is also related to the work of [19]. They consider a two-state model of purchase incidence and brand choice, where they distinguish between households that plan their purchases and households that act opportunistically. The authors of [19], however, assume homogeneous preferences, while our model also incorporates preference heterogeneity.
The outline of the paper is as follows. In Section 2, we present our responsiveness model. In Section 3, we consider parameter estimation. We opt for a Bayesian approach; see, for example, [20]. We discuss prior specification and how to obtain posterior results using a Gibbs sampler. Furthermore, we discuss forecasting and model comparison. In Section 4, we apply our responsiveness model to three panel datasets concerning purchases of soft drinks, cereal and liquid detergent. We compare the performance of our model to two related choice models. In Section 5, we conclude with some remarks.

2. The Model

To describe our responsiveness model, we first introduce some notation. We assume that household i = 1 , , I chooses from J brands at each purchase occasion t = 1 , , T i . The variable, y i j t , denotes the chosen alternative, that is,
y i j t = 1 i f     h o u s e h o l d     i     p u r c h a s e s     b r a n d     j     a t     o c c a s i o n     t 0 o t h e r w i s e
Furthermore, we will use y i t { 1 , , J } to denote the index of the chosen brand at time t.
Each household is, at any point in time, either responsive or unresponsive to marketing efforts. In case a household is unresponsive to marketing efforts, the choice can only be attributed to base preference, habit, lagged choice and random influences. In the responsive state the household will also be affected by marketing efforts. We introduce a latent indicator variable, Z i t , to denote the responsiveness state of a household, i, at purchase occasion, t, that is,
Z i t = 1 i f     h o u s e h o l d     i     i s     r e s p o n s i v e t o     m a r k e t i n g     e f f o r t s     a t     p u r c h a s e     o c c a s i o n     t 0 o t h e r w i s e
Over time, households may switch between responsiveness states. For example, the responsiveness of a household may differ according to the type of shopping trip. The type of shopping trip may be measured by the size of the shopping basket; see [21]. Of course, we do not observe the responsiveness state of a household over time, and hence, these have to be inferred from the data.
To model the responsiveness, we consider a binary probit model, which relates Z i t to an intercept and household characteristics, like, for example, family income, collected in a k-dimensional vector, W i t . These characteristics may also include variables concerning the shopping trip itself, like the recency of the last purchase and the monetary amount spent on the shopping trip. The specification of the probit model for the responsiveness state thus becomes:
Z i t = 1 i f   Z i t * = W i t γ + η i t 0 0 i f   Z i t * = W i t γ + η i t < 0
where γ is a k-dimensional parameter vector and η i t N ( 0 , 1 ) . Hence, the probability that household i is responsive at purchase occasion t is given by:
Pr [ Z i t = 1 | γ ] = Φ ( W i t γ )
where Φ ( · ) is the cdfof a standard normal distribution. It is possible to include the lagged value of Z i t to W i t . This would result in a hidden Markov type model with two states; see, for example, [22].
In case a household is responsive to marketing efforts, then marketing instruments, such as price and promotion, can have an effect on the choice made by this household. We collect the marketing instruments for brand j = 1 , , J , as experienced by household i at purchase occasion t in the m-dimensional vector, X i j t . To model the choice process of a marketing-responsive household, we use a multinomial probit (MNP) model. Conditional on responsiveness, the utility of brand j for household i at the purchase occasion is:
U i j t = μ i j ( r ) + α ( r ) y i j , t - 1 + X i j t β i + ε i j t
for j = 1 , , J , where ε i t = ( ε i 1 t , , ε i J t ) N ( 0 , I J ) and I J denotes a J-dimensional identity matrix. The μ i j ( r ) parameters are individual-specific brand intercepts, where we impose that μ i J ( r ) = 0 for identification. The α ( r ) parameters measure the effect of state dependence in brand choice as y i j , t - 1 = 1 if household i purchased brand j at purchase occasion t - 1 . State dependence refers to a dynamic property of the choice process, as it incorporates the household’s tendency to currently buy the same brand as purchased at the previous occasion; see, among many others, [23]. The household-specific effects of the marketing-mix instruments are measured by the individual-specific parameters, β i . We allow for heterogeneity in these effects by assuming that:
β i N ( β , Σ β )
such that β and Σ β denote the population mean and covariance matrix of the effects of the marketing-mix on the brand utilities.
Of course, in case a household is unresponsive to marketing activities, the marketing instruments will not have an effect on its choice behavior. On these purchase occasions, the brand choice will be mainly determined by base preferences, lagged choice and random effects. This type of behavior can be modeled by the utility specification:
U i j t = μ i j ( u ) + α ( u ) y i j , t - 1 + ε i j t
with μ i J ( u ) = 0 , where, obviously, the X i j t are not included and where we allow the brand intercepts, μ i j ( u ) , and the lagged choice parameter, α ( u ) , to be different from the responsive case.
The base preferences of households are usually assumed to be constant over long periods of time. To do that here, we need to make sure that the base preference for a given household does not depend on the responsiveness state of the household on a particular purchase occasion. Note that we cannot simply restrict the utility intercepts of the two utility models, (8) and (9), to be equal, as the intercepts also correct for the means of the explanatory variables. Furthermore, for the unresponsive model, the brand intercepts also capture the differences in the baseline prices across brands. To allow for constant individual-specific preferences over time independent of the responsiveness state, we therefore have to follow another strategy.
Denote the deviation of the base preference of household i from the population mean by the J - 1 -dimensional vector ω i = ( ω i 1 , , ω i , J - 1 ) . For the model with continuous heterogeneity, we model the population distribution of these deviations by N ( 0 , Σ ω ) . Furthermore, for the ease of notation, we define ω i J = 0 , i = 1 , , I . As brand intercepts for the utilities conditional on responsiveness, we now have μ i j ( r ) = μ j ( r ) + ω i j , and for the utilities conditional on unresponsiveness (9), we use μ i j ( u ) = μ j ( u ) + ω i j . Hence, the household-specific vector, ω i , measures the deviation of household i’s preferences from the population mean for both responsiveness states.
Household i purchases brand j at purchase occasion t when U i j t is the maximum utility among all U i k t , k = 1 , , J . Strictly speaking, the unresponsive specification does not correspond to a proper utility maximization problem. Under standard utility maximization, prices must enter the (reduced-form) utility model, as prices are obviously part of a household’s budget restriction. Our implicit assumption in Equation (7) is that households that are unresponsive to marketing efforts maximize utility without considering the actual price differences among the brands. Instead, they aim at an approximate utility maximization that costs less effort. In this case, the baseline or regular price for each brand is used instead of the actual price. This implies that although “unresponsive” households do not take into account price promotions, they will react to permanent changes in price; see, for example, [24] for an overview of the impact of different price components. The utility specification in Equation (7) actually reads U i j t = μ i j ( u ) + α ( u ) y i j , t - 1 + δ i p ¯ j + ε i j t , where p ¯ j denotes the regular price of brand j. However, in practically available data, the long run price does not vary over time; therefore, we cannot separately identify δ i and μ i j ( u ) ; see [25] for similar arguments. The utility specification we use for the unresponsive case therefore does not include prices, and the brand intercepts in Equation (7) give a combination of base preferences and price effects.
If household i is responsive at time t, then the probability of purchasing brand j is given by:
Pr [ y i j t = 1 | Z i t = 1 ; μ i ( r ) , β i , α ( r ) ] =                            Pr [ ε i k t - ε i j t < ( μ i k ( r ) - μ i j ( r ) ) + α ( r ) ( y i k , t - 1 - y i j , t - 1 ) + ( X i k t - X i j t ) β i k j ]
This is the choice probability of a multinomial probit model. There is no closed form expression for this probability. For small values of J, one can use numerical integration methods to evaluate the probability. For large values of J, one can use the GHKsimulator; see [26]. If the household is unresponsive at t, the probability of purchasing brand j is:
Pr [ y i j t = 1 | Z i t = 0 ; μ i ( u ) , α ( u ) ] = Pr [ ε i k t - ε i j t < ( μ i k ( u ) - μ i j ( u ) ) + α ( u ) ( y i k , t - 1 - y i j , t - 1 ) k j ]
Finally, as we do not observe whether a household at purchase occasion t belongs to the responsive segment or not, the probability that it purchases brand j at purchase occasion t is obtained by summing the conditional probabilities over the segments, that is,
Pr [ y i j t = 1 | θ , θ i ] = Pr [ Z i t = 1 | γ ] Pr [ y i j t = 1 | Z i t = 1 ; μ i ( r ) , β i , α ( r ) ]                                                                         + ( 1 - Pr [ Z i t = 1 | γ ] ) Pr [ y i j t = 1 | Z i t = 0 ; μ i ( u ) , α ( u ) ]
where Pr [ Z i t = 1 | γ ] is given in Equation (4), θ collects the parameters common to all households and θ i collects the individual-level parameters, that is, θ = ( γ , α ( r ) , α ( u ) ) and θ i = ( β i , μ i ( r ) , μ i ( u ) ) .
An interesting by-product of our model concerns the possibility to calculate the conditional probability of responsiveness given the brand choice at purchase occasion t. Furthermore, conditioning on the parameters, this probability equals:
Pr [ Z i , t = 1 | y i j t = 1 , θ , θ i ] = Pr [ Z i t = 1 , y i j t = 1 | θ , θ i ] Pr [ y i j t = 1 | θ , θ i ]     = Pr [ y i j t = 1 | Z i t = 1 , μ i ( r ) , β i , α ( r ) ] Pr [ Z i t = 1 | γ ] Pr [ y i j t = 1 | Z i t = 1 , μ i ( r ) , β i , α ( r ) ] Pr [ Z i t = 1 | γ ] + Pr [ y i j t = 1 | Z i t = 0 , μ i ( u ) , α ( u ) ] Pr [ Z i t = 0 | γ ]
This expression gives the probability that household i is responsive to marketing efforts at purchase occasion t, given the parameters and the fact that brand j is purchased. In the applications, we will display a histogram of the posterior means of these conditional probabilities for each purchase occasion to give an impression of the average value and the dispersion of the responsiveness in the population; below, we will specify this posterior mean in more detail.

3. Inference

In this section, we discuss inference within the responsiveness model. We opt for a Bayesian approach. In Section 3.1, we derive the likelihood function of the model. Section 3.2 deals with the prior specification. In Section 3.3, we discuss how to compute posterior results, while in Section 3.4, we focus on model comparison.

3.1. Likelihood Function

The likelihood function of our responsiveness model is the joint density of the purchases of the Ihouseholds denoted by Y = { Y i } i = 1 I , where Y i = { { y i j t } t = 1 T i } j = 1 J :
p ( Y | Θ ) = i = 1 I p ( Y i | Θ )
where Θ = ( θ , β , μ ( r ) , μ ( u ) , Σ β , Σ ω ) and where p ( Y i | Θ ) equals the likelihood contribution of household i given by:
p ( Y i | Θ ) = p ( Y i | θ , θ i ) ϕ ( ω i ; 0 , Σ ω ) ϕ ( β i ; β , Σ β ) d ω i d β i
where θ i = ( β i , μ ( r ) + w i , μ ( u ) + w i ) . The density, p ( Y i | θ , θ i ) , denotes the likelihood contribution conditional on the unobserved heterogeneity parameters, θ i , that is,
p ( Y i | θ , θ i ) = t = 2 T i j = 1 J Pr [ y i j t = 1 | θ , θ i ] y i j t
where Pr [ y i j t = 1 | θ , θ i ] is given in Equation (10). We omit the first observation in Equation (14), as we need this observation to initialize the lagged choice dummy.

3.2. Prior Specification

For our Bayesian analysis, we define independent priors for the parameters in Θ. We opt for a conjugate prior specification. For the probit parameters, we take a normal prior specification, that is,
γ N ( γ 0 , S γ )
where γ 0 and S γ are prior parameters. For lagged choice parameters, α ( r ) and α ( u ) , the β parameters and the ( J - 1 ) -dimensional vectors of brand intercept parameters μ ( r ) = ( μ 1 ( r ) , , μ J - 1 ( r ) ) and μ ( u ) = ( μ 1 ( u ) , , μ J - 1 ( u ) ) , we also take a normal prior specification:
α ( r ) N ( α 0 ( r ) , s α ( r ) 2 ) α ( u ) N ( α 0 ( u ) , s α ( u ) 2 ) μ ( r ) N ( μ 0 ( r ) , S μ ( r ) ) μ ( u ) N ( μ 0 ( u ) , S μ ( u ) ) β N ( β 0 , S β )
For the covariance matrices in our model, we take inverted Wishart priors:
Σ β I W ( Q β , λ β ) Σ ω I W ( Q ω , λ ω )
where Q β and Q ω are fixed scale prior parameters and λ β , λ ω are fixed degrees of freedom prior parameters.
The joint prior, p ( Θ ) , of the model parameters, Θ, follows from the product of the priors implied by Equation (15), Equation (16) and Equation (17).

3.3. Posterior Results

If we combine the prior specification, p ( Θ ) , with the likelihood function, p ( Y | Θ ) , given in Equation (12), we obtain the posterior density:
p ( Θ | Y ) p ( Θ ) p ( Y | Θ )
To obtain posterior results, we implement the Gibbs sampler of [27] with data augmentation [28]; see also [29]. The Gibbs sampler is applied to the prior times the complete data likelihood function. Hence, the latent utilities U = { { { U i j t } j = 1 J } t = 2 T i } i = 1 I , Z * = { { Z i t * } t = 2 T i } i = 1 I and the latent parameters B = { β i } i = 1 I and Ω = { ω i } i = 1 I are sampled alongside the model parameters, Θ; see [30,31,32] for similar approaches in choice models. The complete Gibbs sampling scheme is as follows:
  γ | { U , Z * , B , Ω , Θ } \ γ   Ω | { U , Z * , B , Θ }
  B | { U , Z * , Ω , Θ }   β | { U , Z * , B , Ω , Θ } \ β
  Σ β , Σ ω | { U , Z * , B , Ω , Θ } \ { Σ β , Σ ω }   α ( r ) , α ( u ) | { U , Z * , B , Ω , Θ } \ { α ( r ) , α ( u ) }
  μ ( r ) , μ ( u ) | { U , Z * , B , Ω , Θ } \ { μ ( r ) , μ ( u ) }   U | { Z * , B , Ω , Θ }
  Z * | { U , B , Ω , Θ }
In Appendix A, we derive the full conditional posterior distributions of the model parameters in Θ and the latent variables, U, Z * , B and Ω.
The Gibbs simulation scheme generates a Markov chain. After the chain has converged, one can use the simulated values to compute posterior results. For example, the posterior probability of household i being responsive at purchase occasion t is given by:
Pr [ Z i , t = 1 | y i j t = 1 , Y ] = Pr [ Z i , t = 1 | y i j t = 1 ; θ , θ i ] p ( Θ , θ i | Y ) d Θ d θ i
where p ( Θ , θ i | Y ) denotes the posterior density of Θ and θ i . The posterior probability (19) is equal to 1 M m = 1 M I [ Z i t * ( m ) 0 ] for large M, where Z i t * ( m ) denotes the m-th draw of Z i t * in the Markov chain and I [ · ] denotes an indicator function, which is one in the case the argument is true and zero otherwise.

3.4. Model Comparison

To judge the added value of introducing the responsiveness to marketing efforts, we compare our model with two alternative model specifications. The first specification is a standard MNP model, where the utilities are given by:
U i j t = μ j + ω i j + α y i j , t - 1 + β i X i j t + ε i j t
with Equation (6), ω i J = 0 , ω i = ( ω i 1 , , ω i , J - 1 ) N ( 0 , Σ ω ) and ε i t = ( ε i 1 t , , ε i J t ) N ( 0 , I J ) .
The second specification is an MNP model, where we relate the β i parameters to the explanatory variables, W i j t , in a direct way, that is, β i = Γ W i j t + η i with η i N ( 0 , Σ β ) . Using the Kronecker product, we compactly define the ( k m ) -dimensional vector of the cross-terms of X i j t and W i j t by ( W i j t X i j t ) . The model can then be written as:
U i j t = μ j + ω i j + α y i j , t - 1 + ( W i j t X i j t ) β + X i j t b i + ε i j t
with b i N ( 0 , Σ β ) , ω i = ( ω i 1 , , ω i , J - 1 ) N ( 0 , Σ ω ) and ε i t = ( ε i 1 t , , ε i J t ) N ( 0 , I J ) . The ( k m ) -dimensional parameter vector, β, captures the cross-effects between W i j t and X i j t , that is, β = vec ( Γ ) .
We use the same prior specification as for our proposed model, that is, a normal prior for the mean parameters and inverted Wishart priors for the covariance matrices. Posterior results of these two alternative models can be obtained using a simplified version of the Gibbs sampler in Section 3.3.
Model comparison is based on out-of-sample performance. We compare the predictive likelihoods of our model and the two alternative specifications mentioned above. The predictive likelihoods are computed for purchase T i + 1 of each household collected in Y F = { Y i , T i + 1 } i = 1 I . These out-of-sample purchases are not used to compute posterior results. The predictive likelihood of our model is given by:
p ( Y F | Y ) = i = 1 I j = 1 J Pr [ y i j , T i + 1 = 1 | θ , θ i , Y ] y i j , T i + 1 p ( θ i , Θ | Y ) d θ i d Θ ,
where p ( θ i , Θ | Y ) denotes the posterior density of θ i and Θ. The predictive likelihoods of the other models are defined in a similar way. Model comparison based on predictive likelihoods is closely related to regular model comparison using marginal likelihoods; see Geweke [33] (p. 63) for the relation between marginal and predictive likelihoods. The advantage of predictive likelihood comparison is that the comparison is also valid in the case that uninformative priors are used.
The predictive likelihood (22) can easily be computed using the Gibbs output. Given the posterior draws, θ i ( m ) and Θ ( m ) , we simulate Z i , T i + 1 * ( m ) given W i j , T i + 1 according to Equation (4) for i = 1 , , I . If Z i , T i + 1 * ( m ) 0 , we simulate U i j , T i + 1 ( m ) according to Equation (5), and if Z i , T i + 1 * ( m ) < 0 , we use Equation (7) for j = 1 , , J . The maximum utility value determines brand choice. The product of the average value of the brand choices over the simulations converges to Equation (22).

4. Illustrations

We apply our model to three different categories of fast moving consumer goods. The same data are analyzed in [21] for other purposes 1. This dataset contains individual scanner panel data across 24 categories. The data cover a two-year period from June, 1991, to June, 1993, for two separate markets in a large U.S. city. The market we choose for our analysis concerns a suburban area. From the 24 available categories, we have randomly chosen three rather dissimilar categories, that is, soft drinks, cereal and liquid detergent.
For each category, we have selected households purchasing only the top brands, where the top brands are defined as having a market share of about 5% or more. In Table 1, we summarize the number of households and purchases in the three datasets and the selected brands together with their choice shares.
Table 1. Characteristics of the three datasets.
Table 1. Characteristics of the three datasets.
Soft drinksCerealLiquid detergent
Selected brands with choice shares
brand 1Canfield(0.14)General Mills(0.28)All(0.26)
brand 2Schweppes(0.12)Kellogg’s(0.42)Cheer(0.10)
brand 3Coca Cola(0.23)Philip Morris(0.10)Purex(0.06)
brand 4Dr. Pepper(0.10)Quacker(0.09)Surf(0.04)
brand 5Pepsi(0.14)Ralston(0.05)Tide(0.27)
brand 6Private Label(0.13)Nabisco(0.05)Wisk(0.23)
brand 7Royal Crown(0.15) Yes(0.04)
Number of observations
#households8824479
#purchases35136496642
Average household/shopping characteristics
household size1.952.532.80
family income4.896.427.21
dollars spent41.7057.6564.24
interp. times2.323.398.26
We perform a Bayesian analysis on the three datasets, and we consider three models. First of all, we consider our responsiveness model. The explanatory variables, W i t , in responsiveness Equation (2) are an intercept, household size, family income, amount of dollars spent on the shopping trip and weeks since last purchase in the product category. The motivation for the first two variables is based on the results of [14], who show that differences in household size and income can be related to differences in shopping trips. The last two variables are linked to inventory levels; see, among others, [34]. For the family income, we only know the income category. The marketing-mix instruments are normalized, so that the coefficients can be compared across the three product categories. Table 1 shows the average values of these variables across the shopping trips. To explain brand choice, we include brand intercepts, μ i j , and a lagged choice dummy variable in both utility specifications (5) and (7). We allow for unobserved heterogeneity on the brand intercepts, which is the same across the responsive and non-responsive utility specification, as discussed before. For the responsiveness utility specification (5), we also include price, feature and display ( X i j t ). Again, the marketing-mix variables are normalized for the ease of comparison. We allow for continuous unobserved heterogeneity specification in the parameters, explaining the effect of the marketing-mix variables.
The prior distribution for the γ parameter is given by Equation (15) with γ 0 = 0 and S γ = I . For the two covariance matrices in the model, Σ β and Σ ω , we take inverted Wishart priors (17) with Q β = 10 I , Q ω = 10 I and λ β = λ ω = 10 . The prior specification of the other parameters is normal, as stated in Equation (16), where the prior means are set at 0 and the prior covariance matrices are equal to identity matrices.
The two other models are a standard MNP model (20) and an MNP model with cross effects (21). The MNP model contains brand intercepts, a lagged choice dummy, price, feature and display.The MNP model with cross effects contains, on top of that, the cross effects of price, feature and display with household size, family income, amount of dollars spent on the shopping trip and weeks since last purchase in the product category, as stated in Equation (21). The prior specifications for parameters of the MNP with and without cross effects are similar to the responsiveness model. Again, we allow for continuous unobserved heterogeneity on the brand intercepts and on the effect of the marketing-mix variables.
The Bayesian analysis is performed on all purchases, except for the last purchase of each household. The last purchases are used for out-of-sample validation using predictive likelihoods, as described in Section 3.4.

4.1. Responsiveness Model

First of all, we focus on the inferred probabilities of being responsive to marketing efforts for each product category. Figure 1 displays histograms of the posterior means of the responsiveness probabilities per purchase occasion (11). Across the three categories, we see quite a bit of differences. For soft drinks, we find a relatively large proportion of purchase trips at which the household was responsive with a probability of almost one. We also find a cluster of observations with around a 0.2 probability of being responsive. For the cereal category, the probabilities are more centered around 0.5. For the liquid detergents, the distribution of the posterior probabilities is much more skewed to the right. Out of the three categories, the households act most responsive here.
Figure 1. Histograms of the posterior means of the responsiveness probabilities.
Figure 1. Histograms of the posterior means of the responsiveness probabilities.
Econometrics 02 00020 g001
The differences in these graphs may be explained from the characteristics of the shopping trips. The data show that the average inter-purchase times for liquid detergents are more than two times higher than for the other categories. A higher inter-purchase time may imply that households are relying less on routine to make the choice. Therefore, they may not be able to remember past prices and may be more actively involved with the purchase. As a result, households may be more likely to search for price information. The average inter-purchase time in the soft drinks categories is the smallest. Households are more likely to rely on memory, and they are less responsive. Additionally, we see that the average amount of dollars spent on the shopping trips involving soft drinks is smaller than for purchases in the other categories. This may imply that households have more time to compare prices when their shopping basket is smaller. The combination of both effects may explain the bimodality in the responsiveness distribution in the soft drinks category, as shown in the first panel of Figure 1. The results for the cereal category are somewhere in between with respect to the average inter-purchase times and the average amount of dollars spent.
To see the difference in cross-sectional variation in responsiveness and temporal variation, we compute the ratio of the average of the variance of the responsiveness probabilities per household to the variance of the average responsiveness probability per household. This ratio is 0.34 for the soft drinks category. For the cereal and detergent category, the ratios equal 12.32 and 1.51, respectively. Hence, for the latter two categories, the within household spread in response probabilities is larger than the spread across households. The difference between the smallest and largest responsiveness probability per household shows that for 87% of the households, this difference is larger than 50%, indicating that there is within-household variation in the responsiveness state. For the soft drinks and liquid detergent category, these percentages are 60% and 40%, respectively.
The histograms in Figure 1 do not provide direct information on which type of household is responsive at which type of shopping trip. Such information can, however, be obtained from the parameter estimates related to responsiveness Equation (2). Table 2 displays the posterior results for the complete responsiveness model. We carried out a limited model selection exercise to fine-tune the models. Based on overwhelming support of a Bayes factor, we have restricted the display variable to zero in the responsive choice part of the model for soft drinks and cereal.
The final line of Table 2 again confirms that the average responsiveness probability is close to 50%. This again stresses the importance of the unresponsive segment, as in fact, about half of the purchases can be associated with this segment. Furthermore, it shows that for liquid detergents, the households tend to be most responsive.
The first panel of Table 2 shows the parameters that influence the responsiveness state. The household size is positively related to the responsiveness to marketing efforts for soft drinks and liquid detergents. For cereals, this effect is also positive, but the effect is close to zero. For cereals, family income is a more important driver, where here, a higher family income implies less responsiveness. Note that the influence of lagged choice for cereal is small in the brand choice model for the unresponsive state. Hence, households with a higher income are more likely to go for their favorite brand without considering price. For soft drinks, we find that a longer inter-purchase time leads to a higher responsiveness probability. A longer inter-purchase time implies a higher need to actively compare the brands, as the last purchase cannot be remembered easily. This contradicts the idea that frequent buyers do more price checking [15]. Time since the last purchase is also positive for the cereal and liquid detergent category, but the posterior means are about the same as the posterior standard deviation. The average inter-purchase time in these categories is in general higher than for the soft drinks category, and a longer or shorter than average inter-purchase time possibly does not influence the probability of being responsive anymore.
Table 2. Posterior results for the responsiveness model a .
Table 2. Posterior results for the responsiveness model a .
Soft drinksCerealLiquid detergent
VariablemeanSDmeanSDmeanSD
Probit equation being responsive (3)
intercept0.0690.0630.0190.0210.2470.220
household size0.916 ***0.3840.0620.0860.625 ***0.215
family income0.1150.206-0.140*0.076-0.0610.221
dollars spent0.1310.1310.0580.052-0.310 *0.182
weeks since last purchase0.229 **0.1100.0480.0420.1530.157
Utility equation being responsive (5)
brand 1-0.2110.2013.110 ***0.2491.227 *0.622
brand 2-0.2050.2332.759 ***0.271-0.3530.758
brand 30.2580.2172.117 ***0.2620.8040.560
brand 4-0.562 **0.2881.832 ***0.314-0.5400.748
brand 50.1340.1720.2170.3671.255 **0.579
brand 6-1.327 ***0.513 0.1130.771
lagged choice0.089 *0.0530.256 **0.0921.875 ***0.383
price-0.342 ***0.098-0.159 ***0.063-0.5230.312
feature0.126 **0.0570.165 ***0.0430.565 **0.223
display b 0.541 **0.220
Utility equation being unresponsive (7)
brand 1-0.0370.3360.961 ***0.341-0.3690.618
brand 20.552 *0.3262.121 ***0.1620.0570.464
brand 30.980 ***0.3790.824 ***0.237-1.579 **0.808
brand 40.5930.3640.714 ***0.225-1.4441.025
brand 50.3000.3090.393 **0.168-0.1740.907
brand 60.0710.453 1.181 ***0.449
lagged choice0.510 ***0.0700.0360.0900.514 **0.252
average responsiveness0.4750.5070.582
probability
a ***, **, * denote that zero is not included in the 99%, 95%, 90% highest posterior density intervals, respectively; b Bayes factors provide overwhelming posterior support for the zero effect of display for soft drinks and cereal. This restriction is therefore imposed.
In [35], the authors find that larger basket sizes tend to be purchased at everyday, low-price stores, implying that households do respond to price. In this paper, however, we consider the influence of basket size on price responsiveness given store choice, which may have the opposite effect, as a large shopping basket may mean that less time can be devoted to each particular category, which, in turn, makes the household less responsive. In fact, for liquid detergents, we find that a larger basket size leads to a smaller probability of being responsive. For the other two categories, we find the opposite results, but the posterior means are about the same as the posterior standard deviation. In general, the amount of dollars spent on shopping trips containing cereal and soft drinks turn out to be smaller than for shopping trips containing purchases of liquid detergents, which may explain the difference.
The second and third panel of Table 2 present the parameter estimates for the brand utilities. The results indicate that there can be substantial differences in the baseline preferences across the responsive and the unresponsive segment. For example, for soft drinks, brand 3 has an average baseline preference within the responsiveness segment, but a relatively large baseline preference within the unresponsive segment.
The influence of lagged choice also differs substantially across the two segments. For cereal and detergents, we find that lagged choice is not important for unresponsive households; for soft drinks, we however find the opposite. If we consider the posterior means of the brand intercepts, we see that the differences in values across the brands is large for the cereal and liquid detergents category. Households seem to have a more distinct preference for a brand in these two categories. On unresponsive purchase occasions, they are more likely to choose their favorite brand, and lagged choice plays a less important role. For the soft drinks category, the differences in posterior means in the brand intercepts is much smaller, and lagged choice seems to be more important if the actual price does not matter.
Finally, the posterior means of the marketing-mix variables have the expected sign for all three product categories. The effect of price is negative, and for feature and display, we find a positive effect.
Unreported estimates of the variance of the brand intercepts ( Σ ω ) and the variance of the marketing-mix variables ( Σ β ) show that there is substantial variation in base preferences and marketing-mix parameters. Overall, the variation in the base preferences is largest. Comparing the different categories, we find that the heterogeneity is largest for detergents and that the degree of heterogeneity is about equal for cereal and soft drinks 2.

4.2. Standard MNP Model

The results in Table 2 already indicated that models for responsive and non-responsive households can differ, and the consequences of this finding are further articulated by the estimation results for the MNP model in Table 3. Most noticeable are the differences in coefficients for price. While the posterior mean of the price parameter for the responsive households are - 0 . 342 , - 0 . 159 and - 0 . 523 across the three categories, the MNP model for all households would yield an underestimation of the price effect as - 0 . 140 , - 0 . 072 and - 0 . 306 , respectively, which, on average, implies an underestimation of around - 0 . 2 . Note that in this MNP model, we also allow for heterogeneity across households. The responsiveness model clearly allows us to additionally separate the responsive from the unresponsive purchase occasions. Of course, when we do not make this split, the average price elasticity will be severely affected.
Table 3. Posterior results for the multinomial probit (MNP) model (20) a .
Table 3. Posterior results for the multinomial probit (MNP) model (20) a .
Soft drinksCerealLiquid detergent
VariablemeanSDmeanSDmeanSD
brand 1-0.1000.1771.920 ***0.1680.647 *0.389
brand 20.0680.1892.211 ***0.169-0.1160.493
brand 30.470 ***0.1621.249 ***0.171-0.2700.462
brand 4-0.0780.1810.996 ***0.173-0.7500.496
brand 50.1510.1480.403 **0.1901.067 **0.449
brand 6-0.4630.307 1.065 ***0.413
lagged choice0.298 ***0.0280.109 ***0.0190.615 ***0.127
price-0.140 **0.068-0.072 **0.037-0.306 *0.164
feature0.0770.0470.045 *0.0240.255 **0.116
display 0.290 **0.123
a ***, **, * denote that zero is not included in the 99%, 95%, 90% highest posterior density interval, respectively.
The parameter concerning lagged choice is also different across the models in Table 2 and Table 3. The parameter values in the MNP model are smaller than the values in the responsive part for the cereal and liquid detergent category and is larger for the soft drinks category.

4.3. MNP Model with Cross Effects

One of the main differences between our responsiveness model and the “standard MNP model with heterogeneity” is that household variables do not interact with the marketing instruments. Of course, one could extend the MNP model with these interaction effects. Table 4 reports the posterior results for such an MNP model.
Clearly, the results indicate that only a very minor additional contribution can be observed from these cross effects, except for a weak effect of the weeks since last purchase and feature for liquid detergent. The fact that we allow for unobserved heterogeneity in the effects of the marketing-mix variables already seems to sufficiently describe the differences in response to marketing-mix variables.
Taking the outcomes in Table 2, Table 3 and Table 4 together, we can conclude that the responsiveness model seems to add an important feature to the choice model. Ignoring this feature leads to substantively different results. For example, the MNP model with or without cross effects underestimates price effects relative to our model.
Next, we will focus on the fit of the different models to see whether the responsiveness model indeed fits the data better.
Table 4. Posterior results for the MNP model with cross effects a .
Table 4. Posterior results for the MNP model with cross effects a .
Soft drinksCerealLiquid detergent
VariablemeanSDmeanSDmeanSD
brand 1-0.0720.1781.848 ***0.1450.8060.460
brand 20.1060.1812.130 ***0.145-0.2520.544
brand 30.493 ***0.1581.203 ***0.144-0.1780.562
brand 4-0.0840.1870.962 ***0.149-0.6550.535
brand 50.1830.1460.357 **0.1561.004 **0.501
brand 6-0.553 *0.320 1.153 **0.537
lagged choice0.296 ***0.0280.114 ***0.0190.606 ***0.134
price-0.133 **0.064-0.063 *0.037-0.2780.184
feature0.081 *0.0460.063 **0.0250.272 **0.133
display 0.320 **0.135
Cross effect with household size
price0.0120.066-0.0230.0410.0780.215
feature-0.0180.043-0.0260.0260.1010.159
display 0.0480.141
Cross effect with family income
price0.0080.0660.0540.0380.2000.196
feature-0.0080.0470.0390.026-0.0940.126
display -0.0720.128
Cross effect with dollars spent
price-0.0530.0370.0180.0260.1780.145
feature-0.0140.020-0.0240.0170.1100.103
display 0.0160.107
Cross effect with weeks since last purchase
price-0.0170.0270.0090.0160.0290.143
feature0.0000.0160.0060.0120.134 *0.080
display -0.0360.092
a ***, **, * denote that zero is not included in the 99%, 95%, 90% highest posterior density interval, respectively.

4.4. Model Comparison

In Table 5, we present three fit measures for our three models, that is, the log predictive likelihood (22), the hit rate and the mean squared prediction error (MSPE). The predictive likelihood functions of the responsiveness model are clearly larger than for the MNP model and the MNP model with cross effects for all product categories. Predictive odds ratios are clearly larger than 50 in favor of the responsiveness model, except for the cereal category, where we find an odds ratio of about three in favor of the responsiveness model compared to the MNP specification. Note that because this comparison is based on observations not used during estimation, we do not need to penalize for the number of parameters. Recall that we take the final purchase occasion for each household to form the test sample. Furthermore, note that the “standard MNP model” outperforms the MNP model with cross effects. Based on the results in Table 4, this was to be expected, as hardly any cross effects turned out to be relevant.
Table 5. Model comparison.
Table 5. Model comparison.
ModelSoft drinksCerealLiquid detergent
Log predictive likelihood (22)
responsiveness - 106 . 698 - 279 . 878 - 79 . 250
MNP (20) - 111 . 491 - 281 . 031 - 83 . 167
MNP + cross (21) - 111 . 920 - 288 . 515 - 85 . 874
Hit rate
responsiveness 0 . 580 0 . 533 0 . 709
MNP (20) 0 . 557 0 . 533 0 . 709
MNP + cross (21) 0 . 557 0 . 541 0 . 658
Mean squared prediction error a
responsiveness 7 . 808 9 . 723 6 . 467
MNP (20) 8 . 203 9 . 966 6 . 573
MNP + cross (21) 8 . 237 10 . 008 6 . 751
a Mean squared prediction error is defined as 100 × 1 I i = 1 I ( I [ y i j , T i + 1 = 1 ] - Pr [ y i j , T i + 1 = 1 | Y ] ) 2 .
Looking at the (out-of-sample) hit rate, we find that, except for the cereal category, the responsiveness model also gives the highest hit rate. A closer look at the higher hit rate of the MNP model with cross effects for the cereal category shows that the higher hit rate is accompanied by higher prediction probabilities in the case that the model produces a mishit. This explains why the log predictive likelihood of the responsiveness model of cereal is higher. However, the differences in the hit rate across all models are negligible.
As a final performance measure, we consider the MSPE. In general, the MSPE of the responsiveness model is smallest. Note that the MSPE also indicates that the MNP model with cross effects performs worst.
The illustrations in this section have indicated quite convincingly that allowing for a possibly large fraction of non-responsive households leads to a better fit and to a more appropriate interpretation of the effects of market efforts, like price.

5. Concluding Remarks

Households may not respond to marketing-mix instruments at each purchase occasion. To be able to respond to these efforts, one needs to invest time and effort in, for example, remembering price changes and reading newspapers and leaflets to notice advertisements. Households differ in the amount of effort they wish to invest in a particular purchase, and therefore, they will most likely also differ in their responsiveness to marketing efforts.
The choice model we developed in this paper incorporates the responsiveness of a household at a specific purchase occasion as a form of unobserved heterogeneity. Households differ in their purchasing process. In essence, we assume there are two processes. Households either take marketing efforts into account, or they base their choice on base preferences and their past experiences. The specific decision process used can differ across households and across purchase occasions. To explain and forecast the decision process, used by a specific household at a specific purchase occasion, household characteristics can be used together with information on buying behavior. To take into account this form of heterogeneity, we extended a standard brand choice model. Basically, we introduced two segments of households, one segment is unresponsive to marketing efforts, whereas the other segment does respond to these efforts. The segment membership is separately modeled using a binary probit model. Household are allowed to switch over time between being responsive or not.
The illustration of our new model to three distinct categories shows that quite some different results can be obtained across our model and related MNP models. Some of the differences can be related to the circumstances and characteristics of the shopping trips in these product categories, such as inter-purchase times and the size of the shopping basket. Even though there were only three cases, we can draw a generalizing conclusion, and that is that the effects of market efforts will be underestimated in MNP models, as in these models, both responsive and non-responsive households are jointly treated as a single sample. Allowing for this specific form of heterogeneity in our model thus leads to better insights into the effects of the marketing mix. One can also identify the characteristics that result in higher probabilities of being responsive for households, and this has immediate managerial consequences. One key result of our model is that it leads to better targeting of marketing instruments, which, at the same time, then also yield less irritation and waste. Further research should examine if the responsiveness fraction of around 0.5 in the three studied categories is a fraction that could commonly be found for fast-moving consumer goods or whether such a fraction could differ across different types of products.

Acknowledgments

We thank three anonymous reviewers, Michel Wedel and Peter Rossi for their comments on earlier versions of this paper.

Conflicts of Interest

The author declares no conflict of interest.

A. Full Conditional Posterior Distributions

Sampling of γ

To simulate γ, we consider:
Z i t * = W i t γ + η i t for i = 1 , , I , t = 2 , , T i
with η i t N ( 0 , 1 ) . Define Z * = ( Z 1 * , , Z I * ) , where Z i * = ( Z i 1 , , Z i T i * ) and W = ( W 1 , , W I ) with W i = ( W i 1 , , W i T i ) . As we have a normal prior specification (15) on the regression parameter, γ, the full conditional posterior distribution of γ is normal with mean ( W W + S γ - 1 ) - 1 ( W Z * + S γ - 1 γ 0 ) and covariance matrix ( W W + S γ - 1 ) - 1 ; see, for example, Zellner [36] (Chapter III).

Sampling of Z *

The full conditional distribution of Z i t * is given by:
p ( Z i t * | · ) ϕ ( Z i t * ; W i t γ , 1 ) j = 1 J ϕ ( U i j t ; m i j t , 1 )
where m i j t = ( μ i j ( r ) + α ( r ) y i j , t - 1 + X i j t β i ) I [ Z i t * 0 ] + ( μ i j ( u ) + α ( u ) y i j , t - 1 ) I [ Z i t * < 0 ] . If we define κ 1 = j = 1 J ϕ ( U i j t ; μ i j ( r ) + α ( r ) y i j , t - 1 + X i j t β i , 1 ) and κ 0 = j = 1 J ϕ ( U i j t ; μ i j ( u ) + α ( u ) y i j , t - 1 , 1 ) , we can write:
p ( Z i t * | · ) = 1 κ ( κ 1 I [ Z i t * 0 ] ϕ ( Z i t * ; W i t γ , 1 ) + κ 0 I [ Z i t * < 0 ] ϕ ( Z i t * ; W i t γ , 1 ) )
where:
κ = κ 1 Φ ( W i t γ ) + κ 0 Φ ( - W i t γ )
The cdf of Z i t * is given by:
P ( Z i t * | · ) = I [ Z i t * < 0 ] κ 0 κ Φ ( Z i t * - W i t γ ) + I [ Z i t * 0 ] κ 1 κ ( Φ ( Z i t * - W i t γ ) - Φ ( - W i t γ ) ) + κ 0 κ Φ ( - W i t γ )
To sample Z i t * , we use the inverse cdf technique, which leads to:
Z i t * = Φ - 1 κ u κ 0 + W i t γ i f   u < κ 0 κ Φ ( - W i t γ ) Φ - 1 κ u κ 1 + κ 1 - κ 0 κ 1 Φ ( - W i t γ ) + W i t γ i f   u κ 0 κ Φ ( - W i t γ )
where u is a draw from a uniform distribution.

Sampling of U

To sample, U i j t , we note that:
U i j t = ( μ i j ( r ) + α ( r ) y i j , t - 1 + X i j t β i ) I [ Z i t * 0 ] + ( μ i j ( u ) + α ( u ) y i j , t - 1 ) I [ Z i t * < 0 ] + ε i j t
with ε i j t N ( 0 , 1 ) for i = 1 , , I , t = 2 , , T i , j = 1 , , J . Hence, we can sample U i j t from a truncated normal distribution with mean ( μ i j ( r ) + α ( r ) y i j , t - 1 + X i j t β i ) I [ Z i t * 0 ] + ( μ i j ( u ) + α ( u ) y i j , t - 1 ) I [ Z i t * < 0 ] and variance of one on the region ( max k j U i k t , ) if y i j t = 1 and ( - , U i k t ) if y i k t = 1 with k j ; see [31,32] for a similar approach.

Sampling of Ω

To sample ω i ( i = 1 , , I ), we consider the system of J - 1 equations:
U i j t - ( μ j ( u ) + α ( u ) y i j , t - 1 ) I [ Z i t * < 0 ] - ( μ j ( r ) + α ( r ) y i j , t - 1 + X i j t β i ) I [ Z i t * 0 ] = ω i j + ε i j t
for j = 1 , , J - 1 and t = 2 , , T i . Define the right-hand side of Equation (30) as U ˜ i j t , and let U ˜ i t = ( U ˜ i 1 t , , U ˜ i , J - 1 , t ) . If we combine this system of J - 1 equations with the unobserved heterogeneity specification ω i N ( 0 , Σ ω ) , it is easy to show that the full conditional posterior distribution of ω i is normal with mean ( T i I J - 1 + Σ ω - 1 ) - 1 ( t = 2 T i U ˜ i t ) and covariance matrix ( T i I J - 1 + Σ ω - 1 ) - 1 .

Sampling of Σ ω

The full conditional posterior of Σ ω is given by:
p ( Σ ω | · ) | Σ ω | - ( I + λ ω + J ) / 2 exp - 1 2 tr Σ ω - 1 i = 1 I ω i ω i + Q ω
and hence, we can sample Σ ω from an inverted Wishart distribution with scale parameter ( i = 1 I ω i ω i + Q ω ) and degrees of freedom I + λ ω .

Sampling of B

To sample β i ( i = 1 , , I ), we collect for each household, i, the equations:
U i j t - μ i j ( r ) - α ( r ) y i j , t - 1 = X i j t β i + ε i j t with Z i t * 0
for j = 1 , , J , t = 2 , , T i . Define U ˜ i j t = U i j t - μ i j ( r ) - α ( r ) y i j , t - 1 . If we combine regression Equation (32) with the heterogeneity specification β i N ( β , Σ β ) , we can easily show that we have to sample β i from a normal distribution with mean:
t = 2 T i I [ Z i t * 0 ] j = 1 J X i j t X i j t + Σ β - 1 - 1 t = 2 T i I [ Z i t * 0 ] j = 1 J X i j t U ˜ i j t + Σ β - 1 β
and covariance matrix:
t = 2 T i I [ Z i t * 0 ] j = 1 J X i j t X i j t + Σ β - 1 - 1

Sampling of Σ β

The full conditional posterior of Σ β is given by:
p ( Σ β | · ) | Σ β | - ( I + λ β + J ) / 2 exp - 1 2 tr Σ ω - 1 i = 1 I ( β i - β ) ( β i - β ) + Q β
and hence, we can sample Σ β from an inverted Wishart distribution with scale parameter i = 1 I ( β i - β ) ( β i - β ) + Q β ) and degrees of freedom I + λ β .

Sampling of β

The full conditional posterior density of β is given by:
p ( β | · ) exp - 1 2 i = 1 I ( β i - β ) Σ β - 1 ( β i - β ) exp - 1 2 ( β - β 0 ) S β - 1 ( β - β 0 )
Hence, we can sample β from a normal distribution with mean ( N Σ β - 1 + S β - 1 ) - 1 ( i = 1 N Σ β - 1 β i + S β - 1 β 0 ) and covariance matrix ( N Σ β - 1 + S β - 1 ) - 1 .

Sampling of μ ( u ) and μ ( r )

To sample μ ( u ) , we consider the system of J - 1 equations:
U i j t - α ( u ) y i j , t - 1 - ω i j = μ j ( u ) + ε i j t with Z i t * < 0
for j = 1 , , J - 1 , i = 1 , , I , and t = 2 , , T i . Define U ˜ i j t = U i j t - α ( u ) y i j , t - 1 - ω i j and U ˜ i t = ( U ˜ i 1 t , , U ˜ i , J - 1 , t ) . If we combine the system of equations (35) with the prior specification (16), it is easy to show that we have to sample μ ( u ) from a normal distribution with mean:
i = 1 I t = 2 T i I [ Z i t * < 0 ] + Σ μ ( u ) - 1 - 1 i = 1 I t = 2 T i I [ Z i t * < 0 ] U ˜ i t + Σ μ ( u ) - 1 μ 0 ( u )
and covariance matrix:
i = 1 I t = 2 T i I [ Z i t * < 0 ] + Σ μ ( u ) - 1 - 1
see, for example, Zellner [36] (Chapter VIII).
To sample μ ( r ) , we consider the system of J - 1 equations:
U i j t - α ( r ) y i j , t - 1 - X i j t β i - ω i j = μ j ( r ) + ε i j t with Z i t * 0
for j = 1 , , J - 1 , i = 1 , , I , and t = 2 , , T i . Define now U ˜ i j t = U i j t - α ( r ) y i j , t - 1 - X i j t β i - ω i j and U ˜ i t = ( U ˜ i 1 t , , U ˜ i , J - 1 , t ) . If we combine the system of equation (36) with the prior specification Equation (16), it is easy to show that the full conditional distribution of μ ( r ) is normal with mean:
i = 1 I t = 2 T i I [ Z i t * 0 ] + Σ μ ( r ) - 1 - 1 i = 1 I t = 2 T i I [ Z i t * 0 ] U ˜ i t + Σ μ ( r ) - 1 μ 0 ( r )
and covariance matrix:
i = 1 I t = 2 T i I [ Z i t * 0 ] + Σ μ ( r ) - 1 - 1

Sampling of α ( u ) and α ( r )

To sample α ( u ) , we consider the equation:
U i j t - μ i j ( u ) = α ( u ) y i j , t - 1 + ε i j t with Z i t * < 0
for j = 1 , , J , i = 1 , , I , and t = 2 , , T i . Define U ˜ i j t = U i j t - μ i j ( u ) . If we combine the regression equation with the prior specification (16), it is easy to show that the full conditional posterior distribution of α ( u ) is normal with mean:
i = 1 I t = 2 T i I [ Z i t * < 0 ] j = 1 J y i j , t - 1 2 + s α ( u ) - 2 - 1 i = 1 I t = 2 T i I [ Z i t * < 0 ] j = 1 J y i j , t - 1 U ˜ i j t + s α ( u ) - 2 α 0 ( u )
and covariance matrix:
i = 1 I t = 2 T i I [ Z i t * < 0 ] t = 2 T i y i j , t - 1 2 + s α ( u ) - 2 - 1
To sample α ( r ) , we consider the equation:
U i j t - X i j t β i - μ i j ( r ) = α ( r ) y i j , t - 1 + ε i j t with Z i t * 0
for j = 1 , , J - 1 , i = 1 , , I , and t = 2 , , T i . Define now U ˜ i j t = U i j t - X i j t β i - μ i j ( r ) . If we combine Equation (39) with the prior specification (16) for α ( r ) , it is easy to show that the full conditional distribution of α ( r ) is normal with mean:
i = 1 I t = 2 T i I [ Z i t * 0 ] j = 1 J y i j , t - 1 2 + s α ( r ) - 2 - 1 i = 1 I t = 2 T i I [ Z i t * 0 ] j = 1 J y i j , t - 1 U ˜ i j t + s α ( r ) - 2 α 0 ( r )
and covariance matrix:
i = 1 I t = 2 T i I [ Z i t * 0 ] j = 1 J y i j , t - 1 2 + s α ( r ) - 2 - 1

References

  1. P.M. Guadagni, and J.D.C. Little. “A logit model of brand choice calibrated on scanner data.” Mark. Sci. 2 (1983): 203–238. [Google Scholar] [CrossRef]
  2. P.K. Chintagunta, D.C. Jain, and N.J. Vilcassim. “Investigating heterogeneity in brand preferences in logit models for panel data.” J. Mark. Res. 28 (1991): 417–428. [Google Scholar] [CrossRef]
  3. M.P. Keane. “Modeling heterogeneity and state dependence in consumer choice behavior.” J. Bus. Econ. Stat. 15 (1997): 310–327. [Google Scholar]
  4. K. Hansen, V. Singh, and P. Chintagunta. “Understanding store-brand purchase behavior across categories.” Mark. Sci. 25 (2006): 75–90. [Google Scholar] [CrossRef]
  5. D.L. McFadden. “Conditional Logit Analysis of Qualitative Choice Behavior.” In Frontiers in Econometrics. Edited by P. Zarembka. New York, NY, USA: Academic Press, 1973, Chapter 4; pp. 105–142. [Google Scholar]
  6. D.L. McFadden. “Econometric Models of Probabilistic Choice.” In Structural Analysis of Discrete Data: With Econometric Applications. Edited by C. Manski and D. McFadden. Cambridge, MA, USA: MIT Press, 1981, pp. 197–272. [Google Scholar]
  7. G. Maddala. Limited Dependent and Qualitative Variables in Econometrics. Cambridge, UK: Econometric Society Monographs; Cambridge University Press, 1983, Volume 3. [Google Scholar]
  8. D. Horsky, S. Misra, and P. Nelson. “Observed and unobserved preference heterogeneity in brand-choice models.” Mark. Sci. 25 (2006): 322–335. [Google Scholar] [CrossRef]
  9. D.C. Jain, N.J. Vilcassim, and P.K. Chintagunta. “A random-coefficients logit brand-choice model applied to panel data.” J. Bus. Econ. Stat. 12 (1994): 317–328. [Google Scholar]
  10. P.E. Rossi, and G.M. Allenby. “A Bayesian approach to estimating household parameters.” J. Mark. Res. 30 (1993): 171–182. [Google Scholar] [CrossRef]
  11. G.M. Allenby, and P.E. Rossi. “Marketing models of consumer heterogeneity.” J. Econom. 89 (1999): 57–78. [Google Scholar] [CrossRef]
  12. M. Wedel, W.A. Kamakura, N. Arora, A. Bemmaor, J. Chiang, T. Elrod, R. Johnson, P.J. Lenk, S.A. Neslin, and C.S. Poulsen. “Discrete and continuous representations of unobserved heterogeneity in choice modelling.” Mark. Lett. 10 (1999): 219–232. [Google Scholar] [CrossRef]
  13. M. Wedel, and W.A. Kamakura. Market Segmentation: Conceptual and Methodological Foundations. Dordrecht, The Netherlands: Kluwer Academic Publishers, 1999. [Google Scholar]
  14. B. Kahn, and D. Schmittlein. “Shopping trip behavior: An empirical investigation.” Mark. Lett. 1 (1989): 55–69. [Google Scholar] [CrossRef]
  15. P. Dickson, and A. Sawyer. “The price knowledge and search of supermarket shoppers.” J. Mark. 54 (1990): 42–53. [Google Scholar] [CrossRef]
  16. W.A. Kamakura, B.D. Kim, and J. Lee. “Modeling preference and structural heterogeneity in consumer choice.” Mark. Sci. 15 (1996): 152–172. [Google Scholar] [CrossRef]
  17. S. Yang, and G.M. Allenby. “A model for observation, structural, and household heterogeneity in panel data.” Mark. Lett. 11 (2000): 137–149. [Google Scholar] [CrossRef]
  18. M. Wang, and P. Fischbeck. “Incorporating framing into prospect theory modeling: A mixture-model approach.” J. Risk Uncertain. 29 (2004): 181–197. [Google Scholar] [CrossRef]
  19. R.E. Bucklin, and J.M. Lattin. “A two-state model of purchase incidence and brand choice.” Mark. Sci. 10 (1991): 24–39. [Google Scholar] [CrossRef]
  20. P.E. Rossi, G.M. Allenby, and R.E. McCulloch. Bayesian Statistics and Marketing. Hoboken, NJ, USA: John Wiley & Sons, 2005. [Google Scholar]
  21. D.R. Bell, and J.M. Lattin. “Shopping behavior and consumer preference for store price format: Why “Large-Basket Shoppers Prefer EDLP”.” Mark. Sci. 17 (1998): 66–88. [Google Scholar] [CrossRef]
  22. O. Netzer, J. Lattin, and V. Srinivasan. “A hidden markov model of customer relationships management.” Mark. Sci. 27 (2008): 185–204. [Google Scholar] [CrossRef]
  23. P. Seetharaman, A. Ainslie, and P. Chintagunta. “Investigating household state dependence effects across categories.” J. Mark. Res. 36 (1999): 488–500. [Google Scholar] [CrossRef]
  24. A. Krishna, R. Briesch, D. Lehmann, and H. Yuan. “A meta-analysis of the impact of price presentation on perceived savings.” J. Retaling 78 (2002): 101–118. [Google Scholar] [CrossRef]
  25. H. Van Heerde, P. Leeflang, and D. Wittink. “Semiparametric analysis to estimate the deal effect curve.” J. Mark. Res. 38 (2001): 197–215. [Google Scholar] [CrossRef]
  26. A. Börsch-Supan, and V.A. Hajivassiliou. “Smooth unbiased multivariate probability simulators for maximum likelihood estimation of limited dependent variable models.” J. Econom. 58 (1993): 347–368. [Google Scholar] [CrossRef]
  27. S. Geman, and D. Geman. “Stochastic relaxations, gibbs distributions, and the Bayesian restoration of images.” IEEE Trans. Pattern Anal. Mach. Intell. 6 (1984): 721–741. [Google Scholar] [CrossRef] [PubMed]
  28. M.A. Tanner, and W.H. Wong. “The calculation of posterior distributions by data augmentation.” J. Am. Stat. Assoc. 82 (1987): 528–550. [Google Scholar] [CrossRef]
  29. L. Tierney. “Markov chains for exploring posterior distributions.” Ann. Stat. 22 (1994): 1701–1762. [Google Scholar] [CrossRef]
  30. J.H. Albert, and S. Chib. “Bayesian analysis of binary and polychotomous response data.” J. Am. Stat. Assoc. 88 (1993): 669–679. [Google Scholar] [CrossRef]
  31. R. McCulloch, and P.E. Rossi. “An exact likelihood analysis of the multinomial probit model.” J. Econom. 64 (1994): 207–240. [Google Scholar] [CrossRef]
  32. R.E. McCulloch, N.G. Polson, and P.E. Rossi. “A Bayesian analysis of the multinomial probit model with fully identified parameters.” J. Econom. 99 (2000): 173–193. [Google Scholar] [CrossRef]
  33. J.F. Geweke. Contemporary Bayesian Econometrics and Statistics. Hoboken, NJ, USA: John Wiley & Sons, 2005. [Google Scholar]
  34. R. Briesch, P. Chintagunta, and E. Fox. “How does assortment affect grocery store choice.” J. Mark. Res. 46 (2009): 176–189. [Google Scholar] [CrossRef]
  35. D. Bell, T.H. Ho, and C. Tang. “Determining where to shop: Fixed and variable costs of shopping.” J. Mark. Res. 35 (1998): 352–369. [Google Scholar] [CrossRef]
  36. A. Zellner. An Introduction to Bayesian Inference in Econometrics. New York, NY, USA: Wiley, 1971. [Google Scholar]
  • 1We thank David Bell for generously sharing the data with us.
  • 2Detailed results are available upon request.
Back to TopTop