Marketing Mix Modeling Using PLS ‐ SEM, Bootstrapping the Model Coefficients

: Partial least squares structural equations modeling (PLS ‐ SEM) uses sampling bootstrap ‐ ping to calculate the significance of the model parameter estimates (e.g., path coefficients and outer loadings). However, when data are time series, as in marketing mix modeling, sampling bootstrap ‐ ping shows inconsistencies that arise because the series has an autocorrelation structure and con ‐ tains seasonal events, such as Christmas or Black Friday, especially in multichannel retailing, mak ‐ ing the significance analysis of the PLS ‐ SEM model unreliable. The alternative proposed in this re ‐ search uses maximum entropy bootstrapping (meboot), a technique specifically designed for time series, which maintains the autocorrelation structure and preserves the occurrence over time of sea ‐ sonal events or structural changes that occurred in the original series in the bootstrapped series. The results showed that meboot had superior performance than sampling bootstrapping in terms of the coherence of the bootstrapped data and the quality of the significance analysis.


Introduction
Marketing mix models use multiple regression to measure marketing effectiveness and efficiency [1]. In the case of multichannel retailers that sell online and offline and advertise on both offline and Internet media, a common solution to the model marketing mix is chaining multiple regression models (based on conversations with consulting experts), i.e., modeling first the impact of advertising on online sales and then using this information to model offline sales. Recent research [2] proposed using partial least squares structural equation models (PLS-SEM) to measure the simultaneous impact of advertising in multichannel retailer contexts and to measure the effectiveness of the different advertising campaigns on web and store sales [3].
PLS-SEM has some desirable properties for marketing mix modeling because it is a causal modeling approach aimed at maximizing the explained variance of the dependent constructs, and because it is similar to multiple regression analysis, it is appropriate for prediction [4]. Moreover, and very relevant, PLS-SEM avoids the problem of indeterminacy and displays the factor scores [5], allowing the use of latent variable scores measured by one or several indicators in subsequent analyses [6]. Consequently, PLS-SEM is particularly useful for measuring the efficiency of marketing campaigns by attributing sales to each of the advertising channels and calculating marketing ROI [3].
However, because PLS-SEM does not assume normality, lack of extreme values, or symmetry in sample data [7], the parametric significance tests usually employed in linear models cannot be applied to test whether outer loadings and path coefficients are significant. Instead, PLS-SEM relies on a nonparametric sampling bootstrapping procedure [8] to test the significance of estimated coefficients. This bootstrapping methodology involves repeated random sampling with replacement from the original sample to create bootstrap samples. It is a good procedure for estimating sampling distributions under independent and identically distributed (i.i.d.) random variables [9], even in situations in which the i.i.d. setup is slightly violated [10], as with cases in which there might be changes in the mean or variance (i.e., the survey is conducted in different countries or with heterogenous respondents) [11,12].
Although sampling bootstrapping is a proper method to measure the significance of the coefficients in most PLS-SEM applications, it is not recommended for marketing mix time series because the data has internal structure and the sampling bootstrapping method can change the dates of events, such as Black Friday or Christmas, or introduce several additional events or none at all in a given year. It also does not respect the time intervals of the structural changes that the series may have.
As an alternative to sampling bootstrapping, we propose maximum entropy (meboot) bootstrapping [13], which maintains the individual basic shapes of time series and their time dependence structures as the autocorrelation function (ACF) and the partial autocorrelation function (PACF). Additionally, when applying meboot bootstrapping, the results inherit the structure while respecting the dates of special events such as Black Friday as well as the possible structural changes.
Despite its importance, little research has been done in the area of time series significance analysis using PLS-SEM models, especially with regard to marketing mix analysis. Furthermore, current research does not highlight the relevance and importance of the application of consistent bootstrap methodologies for solving these types of problems; this research makes important contributions by filling this void. For these reasons, the overall aim of this paper is to provide a detailed empirical demonstration of the advantages of the suggested meboot bootstrapping procedure in comparison with sampling bootstrapping to calculate the significance of PLS-SEM model parameter estimates in a time series or marketing mix modeling context. To this end, we based our analysis on standardized data from a European consumer electronics multichannel company [2] containing web and store sales and online and offline advertising activities.
Given this aim, the remainder of this paper is structured as follows. First, the theoretical foundations are explained. Then, the data used in this research is analyzed, and next, both bootstrapping methods are applied to finally discuss the results.

PLS-SEM
PLS-SEM is a technique appropriate for solving marketing mix problems even when very complex relationships exist [14] because the optimization algorithm maximizes the variance explained of the model's endogenous constructs, making it especially appropriate to identify key variables in situations of weak theory [15] or verify whether the hypothesized relationships are empirically acceptable [16], for example, those involving marketing mix model variables. Regarding its statistical properties, PLS-SEM admits single item constructs without identification or convergence problems [17]; moreover, PLS-SEM models can handle extremely non-normal data with asymmetries and very high levels of skewness, for example, those corresponding to marketing events such as Black Friday. PLS-SEM is also appropriate for the typical small sample sizes of marketing mix models, such as in our case of 120 weekly observations corresponding to approximately 2.5 years of weekly data.
Earlier applications of PLS-SEM to solve marketing mix problems focused on better understanding the direct and cross effects of advertising on sales. Early research [18] studied the impact of the interaction of radio and print advertising in the opening of checking and savings accounts at a commercial bank, finding evidence of direct and cross effects between both media. More recent research [19] added Internet advertising variables to measure the impact of print advertising and paid search on a service company, finding a crossover effect on online conversions.
Recently, [2] PLS-SEM applied to marketing mix showed evidence of the amplifying effect of organic search queries on the advertising and, consequently, the sales of a multichannel retailer. Additionally, the PLS-SEM [3] model was used to calculate the ROI of offline and Internet advertising campaigns.
To verify the statistical significance of the PLS-SEM model parameters, the literature proposes using sampling bootstrapping; the next section discusses the reasons.

Sampling Bootstrapping
The term bootstrapping is inspired by the story of the Baron of Munchausen [20], who explained how he pulled himself and his horse out of a swamp by his own hair, meaning that the Baron saved himself by his own means. In this sense, the homonymous statistical technique developed by Efron [9] is similar because bootstrapping draws conclusions about the characteristics of a population using the sample itself; in other words, given the absence of information about the population, the sample is assumed to be the best estimate of the population [21], making this method very appropriate when, as is the case with PLS-SEM, there is no knowledge about the distribution of the parameters.
To find the empirical sampling distribution of a parameter, bootstrapping generates a number of samples with repetition (recommended: 5000) [4], containing the same amount of data as the original series to be sure that the samples obtained have the same statistical properties as the original sample, i.e., if the data contains 120 observations, as in the present research, 5000 samples with 120 observations are generated; in this way, each resample has the same number of elements as the original sample, and the replacement method transforms the finite sample into an infinite population. For each sample, a PLS-SEM model is calculated, and the data on the coefficients of interest are stored, creating a distribution of 5000 distinct coefficients, one for each of the path coefficients or outer loading models of interest. For example, when analyzing the loadings of the indicator λ, we will obtain 5000 values of the estimate λ*, these values are then ordered from smallest to largest: Then, the lower and upper bounds of the confidence intervals are identified, i.e., if the desired confidence interval is 95%, the interval goes from the lower bound observation, 5000 × 0.025, to the upper 5000 × 0.975 observation, that is, from 125 observations to 4875. The resulting confidence interval (CI) suggests that the population value of λ will be somewhere in between    and    with a 95% probability. Once the confidence interval is calculated, if it does not include 0, we may consider that the coefficient is significant at 95%. However, as stated previously, in many cases, because of the nature of the data, the distribution of the parameters is asymmetric and the percentile method is subject to coverage error as stated by [7], meaning that, for example, a 95% confidence interval may actually be a 90% confidence interval. Hence, it is recommended to construct bias-corrected percentile confidence intervals to make statistical inferences when using PLS-SEM. Using bias-corrected and accelerated (BCa) bootstrap confidence intervals solves this problem by adjusting for biases and skewness in the bootstrap distribution [22]; for a detailed step-by-step explanation of the methodology in a PLS-SEM context, see [23].
In the case of time series data as marketing mix model variables, this methodology has a major drawback because, by definition, resampling does not preserve the order of the data, the autocorrelation structure, or the exact time of marketing-associated events such as Black Friday. To solve these problems, the present research proposes the maximum entropy bootstrapping methodology for analyzing the significance of time series coefficients, which will be explained next.

Maximum Entropy Bootstrapping
Carlstein [24], aware that time series do not satisfy the i.i.d. hypothesis required by bootstrapping and the problems generated by breaking the internal structure of time series by shuffling the data, proposed a solution convenient for stationary time series consisting of bootstrapping nonoverlapping blocks of observations instead of case-by-case observations; on the basis of this idea, the methodology was improved with the proposal of nonoverlapping moving blocks [25,26]. However, even after these improvements, the methods faced the same problems with respect to violations of the required stationarity property and therefore did not provide any remedy.
As a solution to time series bootstrapping, Vinod and López-de-Lacalle [13] proposed the application of the principle of maximum entropy (ME), explained in depth by [27]. According to Vinod [28], ME is a powerful tool to avoid unnecessary distributional assumptions, such as i.i.d. or stationarity assumptions. ME constructs a population of time series, called ensemble Ω, which can include regime switches, gaps, or jump discontinuities. With f(x) being the density function of xt, the entropy H (Equation (3)) is defined as: Maximizing the entropy H in a density f(x) function, defined in terms of Shannon information [29], means that we are finding the smoothest possible probability distribution that meets the constraints derived from prior knowledge about the mean and variance of the original series. The meboot algorithm constructs segments of ME density f(x) subject to certain mass-and mean-preserving constraints.
The meboot algorithm [13] is a procedure that generates a large number of replicates, e.g., 5000, of the original series, which can be used for statistical inference; it then applies the "blocking" technique to break the time series into nonoverlapping blocks such that the grand mean of all the simulated samples equals the time average of the original, constructing bootstrap samples, or ensembles, that retain the basic shape and dependence structure of the original data. Figure 1 shows the actual series of web sales used in this research, explained in the next section, as well as two random ensembles generated with the meboot algorithm. Moreover, the approach can be applied in the presence of structural breaks, such as economic crises or recoveries, as well as jumps due to Black Friday sales in which both offline and online sales may "jump" sharply above the mean. For more information on meboot, Vinod [30] provides extensive Monte Carlo evidence that supports the use of the meboot in empirical work and suggests that the meboot confidence intervals are reliable.

Data
To conduct the present research, we used data from Méndez-Suárez and Monfort [2], which contains a time series over 120 weeks from a European consumer electronics multichannel retailer, including information on investment in offline, Internet, and paid search advertising, as well as Google queries containing the name of the retailer and the online and offline sales. Table 1 depicts the descriptive statistics of the standardized values of the original data; some variables, such as online Sales, queries, and retargeting, show high levels of skewness and excess kurtosis.

Methods
To compare the results of sampling versus meboot bootstrapping, we used the PLS-SEM model from [2], depicted in Figure 2. The online and offline media in which the multichannel retailer advertised during the period are represented as two reflective latent constructs; the rest of the exogenous variables included in the structural model are single item constructs. The latent variable online advertising included display, Facebook, Retargeting, Twitter, and YouTube, and the latent variable offline advertising contained store flyers and TV advertising (Equation (4)).
The structural model contained four endogenous variables (Equation (5)), including queries, explained by online and offline web and store sales, both explained by on and offline advertising, paid search, and Christmas. Paid search was explained by queries.
The PLS-SEM model from Figure 2 was used to bootstrap the latent variable outer loadings and the path coefficients using sampling and meboot; the results are presented in the following section.

Empirical Results
To compare the results of sampling and meboot, we bootstrapped 5000 subsamples of the PLS-SEM model and calculated the bias-corrected and accelerated (BCa) confidence intervals [7]. Bootstrapping of the structural model employed the R [31] packages, plspm [32], and meboot [13]. The BCa confidence interval calculation in R followed that of Streukens and Leroi-Werelds [23]. The discriminant validity of the model, heterotraitmonotrait (HTMT) ratio of correlations, employed the R semTools package [33].

Correlations
The correlation of the original series and two random draws of the meboot and sample bootstrap are shown in Table 2a-c. The results showed similar correlations between the original and the bootstrapped variables; there were no significant differences to suggest that one method is better than the other or that one of the methods has major flaws and cannot be used to assess the significance of the results. Next, we analyze the results of the bootstrapped confidence intervals.

Reliability, Validity, Structural Model, and Fit Assessment
Following [7], to assess the reflective measurement model, we evaluated the composite convergent validity using the average variance explained (AVE), the internal consistency reliability with Cronbach's α, and the discriminant validity using HTMT. The mathematical formulations are represented in Equation (6) The AVE for construct ξj is defined as the average of the explained variances λ 2 of each reflective construct. In Cronbach's α, N is the number of low-order components (i = 1, …, N), and c is the average correlation between the lower-order components. In Jöreskog's ρ, li is the loading of the lower-order component i on a particular higher-order construct, and var(ei) is the variance of the measurement error of the lower-order component i. As explained by [34], the HTMT of constructs ξi and ξj with Ki and Kj indicators, respectively, are the averages of the correlations of indicators across constructs measuring different phenomena relative to the average of the correlations of indicators within the same construct. Table 3 shows the BCa confidence intervals of the reflective measuring model assessment using both bootstrapping methodologies. For the external loadings of the latent variables (Table 3a), there was agreement between the two methods in terms of the significance of the loadings, but in this case, the width of the intervals is consistently larger when using sampling bootstrapping, which means that there is a much larger level of dispersion of the results when this methodology is used.
However, the problems become especially severe when assessing the reflective constructs (Table 3b) because of the width of the sampling bootstrap intervals, which in all cases is three times wider or more compared with the meboot intervals; consequently, the latent variables are not validated in terms of AVE, Cronbach's Alpha, and Jöreskog's ρ; the HTMT is validated but by hundredths of a percent.  The confidence intervals from the regression coefficients (Table 4a) had similar amplitudes and showed similar results with respect to significance in all the paths, except for the offline advertising path to web sales, for which the sampling bootstrap method indicated that offline advertising had a non-significant coefficient on web sales; in other words, offline advertising does not impact the sales of the web store.

Indicators
As [35] stated, the different meaning of the term fit does not depend on whether covariance-based SEM or variance-based SEM is used but on whether confirmatory or explanatory research is performed (see [36]). Since in explanatory research, as in this case, we would like to explain as much variation as possible in a dependent variable, the R 2 is the natural measure of fit; however, as occurred in the assessment of the reflective construct outer loadings, the confidence intervals of the R 2 (Table 4b) of the sampling bootstrapped values were widespread and invalidated the model, contrary to the meboot values, which showed high levels of fit in line with the results of the model application shown in Figure 3.  To understand what really explains the differences between the bootstrapping methodologies, we need to visually inspect the entire time series. Figure 3 shows the original series, and two random paths of the sampling and meboot series both for online and offline sales. The sampling bootstrapped series added jumps to sales corresponding to events such as Christmas and Black Friday but at very different times from those occurring in the original series, and, for example, in the case of offline sales (Figure 3a), it included up to 10 jumps, only one of which corresponded to the date on which it occurred; however, at the times these events occur, the sampling bootstrapped series did not reflect them. On the other hand, in the meboot series, the jumps occurred at the same times as in the original series; however, as expected for the maximum entropy modeling, some replicas of the original series were more pronounced than others.  Figures (a,b) plot the original weekly Sales. Offline sales (a) and online sales (b) series and their respective sampling and meboot counterparts. The horizontal axis represents time in weeks and the vertical axis represents the standard deviation of the standardized sales series.

Discussion
PLS-SEM methodology using i.i.d. data has been very successful in areas such as marketing, strategic management, management information systems, production, and operations or accounting [37], and it is a promising methodology for time series, especially marketing mix modeling [2,3]. However, to succeed in these areas, the traditional method to measure the significance of the structural model and the outer loadings using sampling bootstrapping should be reconsidered because this method shuffles the data without considering their internal structure or respecting the order of the sequence, the autocorrelation structure, and the moments of occurrence of special events.
The present research presents a detailed analysis of the consequences of using sampling bootstrapping for time series, especially marketing mix series, showing the risks of the decision to trust in sampling bootstrap because the method destroys the internal structure of the series and shows wider confidence intervals for the outer loadings of the models. As a solution for these types of time-series analyses in PLS-SEM contexts, when the exact colocation of the bootstrapped data is essential, as in marketing mix analyses, this study recommends using meboot bootstrapping as an alternative and proves its suitability for time series or marketing mix modeling with PLS-SEM.
Additionally, this research contributes to the development of PLS-SEM methodology by providing a technique free of the risks associated with sampling bootstrapping in time series analysis, broadening the scope and accuracy of the methodology in other areas of research. Taken as a whole, the contributions of the present research provide valuable insights into how the evaluation of time series dependencies can be effectively performed using PLS-SEM analysis and why it is so relevant to apply a bootstrapping technique specifically adapted to time series and a technique that is compatible with their time structure to measure the significance of external loadings and path coefficients.
The managerial implications of this work are twofold: (1) practitioners have to be very careful when analyzing time series using PLS-SEM if the data is not i.i.d. and smooth in terms of shape because sampling bootstrapping shuffles the time series and destroys its integrity. In this respect, the present research shows that the use of sampling bootstrapping for time series involves very high risks, especially those associated with the assessment of the reflective and the path coefficient significance; this finding constitutes one of the main contributions of this article. The meboot bootstrapping procedure respects the internal structure of the data and maintains the colocation of special marketing events, making it a trustable technique for time series analysis.
The methodology proposed in the present research can be an excellent source of innovation for PLS-SEM methodology, extending its possible application to all areas that use time series analysis for explanatory purposes and are susceptible to the potential application of PLS-SEM predictive analysis, such as those related, for example, to quality control in industrial processes or the evolution of natural ecosystems.
Three limitations of the present research may become avenues for future research. The model in this study is limited to marketing mix series, and the proposed methodology has not been tested in other time series contexts in which PLS-SEM models may be used. Additionally, the proposed model and meboot bootstrap methodology were tested on a time series of only 120 observations and not on a series with a larger number of observations. In addition, since the model only uses reflective constructs, evaluation of time series models with formative constructs would complement the results of this research.