Estimating the Market Share for New Products with a Split Questionnaire Survey

: When designing a new product, conjoint analysis is a powerful tool to estimate the perceived value of the prospects. However, it has a drawback: when the product has too many attributes and levels, it may be difﬁcult to administrate the survey to respondents because they will be overwhelmed by the too numerous questions. In this paper, we propose an alternative approach that permits us to bypass this problem. Contrary to conjoint analysis, which estimates respondents’ utility functions, our approach directly estimates market shares. This enables us to split the questionnaire among respondents and, therefore, to reduce the burden on each respondent as much as desired. However, this new method has two weaknesses that conjoint analysis does not have: ﬁrst, inferences on a single respondent cannot be made; second, the competition’s product proﬁles have to be known before administrating the survey. Therefore, our method has to be used when traditional methods are less easily implementable, i.e., when the number of attributes and levels is large.


Introduction
Starting from the seminal work of Green and Rao [1], conjoint analysis has become a central tool for the study of marketing problems involving multiattribute alternatives (see [2][3][4][5][6][7][8][9] for some article reviews). Conjoint analysis is a method used in market research to estimate how people value the different attributes of a product. The term attribute refers to a characteristic of a product (e.g., colour) which can take different values (e.g., blue, red, green) also named attribute's levels. Using a survey where respondents have to state their preference among different potential products, it is possible to estimate with regressions techniques the implicit value of each single attribute's level. These implicit values, known as part-worths or utilities, can be used to estimate, for example, market share or revenue of a new product. Ordinary least squares (OLS) regression has been among the most widely used techniques in conjoint analysis to reconstruct the individual utility from the survey data [5,10,11]. In this case, data can be collected in the form of ratings or pseudo-interval-scaled rankings (the robustness of the latter option being shown by [11] and discussed, for instance, by [12]).
The problem of dealing with a large number of attributes and levels has always been among the main issues in conjoint analysis. Among the principal difficulties, we may distinguish two (however partially related) aspects. First, if too many profiles have to be ranked or rated, then the respondent may be unable to provide accurate estimations. The number of experimental profiles can be reduced by employing fractional factorial designs and orthogonal plans [13], or by using an algorithmic construction to pursue some optimality criteria (such as D-optimality), which allows for more flexibility in the choice of the number of profiles. However, the usage of OLS regression imposes a lower limit in the design size, since it requires the number of profiles to be larger than the total number of the individual nor at the segment level. Instead, it directly estimates the market share. This enables us to split the questionnaire among respondents and therefore to reduce the burden on each respondent as much as desired. In essence, the proposed method is composed of three steps. The first step consists of designing the split questionnaire. For this purpose, a D-optimal basis is computed from the set of all possible product profiles. Then, for each respondent, a subset of the basis is chosen using a random block algorithm. The survey is conducted in the second step. Roughly speaking, each respondent has to state her preference between each product in the subset and the competition's products. In the third step, using the stated preferences of each respondent, the share of preference is approximated with a logistic regression. The software provided as Supplemental Material allows the practitioners to implement the method in a user-friendly way. We notice that other softwares are available for the design of experiments, such as Ngene (which is however not free), and could be used to construct the split questionnaire in a very efficient way. Our software does not have the flexibility of Ngene, but serves for a unique purpose, i.e., the estimation of the share of preference, and integrates in a unique and simple framework both computer-based steps (split questionnaire construction and share of preference estimation) that build up our method. Notice that, by contrast, Ngene does not perform the preference estimation on the basis of the survey data.
Our method fully overcomes the first of the aforementioned problems regarding large numbers of attributes and levels, i.e., the risk of presenting too many profiles to each respondent. In contrast, because we propose to conduct full-profile questionnaire administration, the respondents may still be confused by the complexity of products in the case of very large numbers of attributes and levels. However, notice that the complexity of the products also affects real-world decisions. Because we simply ask respondents to state pairwise preferences, our approach does not deviate too far from the situation in a real market.
To assess the statistical performance of this new approach, we compare it with the popular full-profile conjoint analysis with rank orders and with parth-worths estimated by OLS regression. For this purpose, we conduct Monte Carlo simulations and we show that the accuracy of our method is comparable to that of conjoint analysis. We are aware that more recent conjoint analysis methods would be, in principle, more appropriate to perform a comparison. However, the choice to compare our method with standard, full-profile conjoint analysis is motivated by the aim of keeping the comparison as clean and simple as possibile, so as to better highlight the features of our new method. By contrast, making a direct comparison against hybrid methods would be difficult because of the self-explicated part, which would need ad hoc assumptions to model respondent error. We notice that standard full-profile conjoint analysis has often been recognised as a robust method even when compared with hybrid models (see, e.g., [16]). Moreover, as discussed below, in the simulations performed in this paper we assume no respondent error, and assign to respondents of the conjoint analysis simulations tasks that would be too demanding for a real survey (in particular, we ask each respondent to rank up to 64 different profiles). As a consequence, we expect the performance of full-profile conjoint analysis to be significantly better in our simulations than in a real-case study. We expect therefore that this assumption reduces the gap between the somewhat dated full-profile conjoint analysis and the more recent conjoint models, such as hybrid of choice-based models. We leave for future research further comparisons against more sophisticated approaches (as for instance the methods provided by the Sawtooth software), which address more explicitly the issue of coping with a large number of attributes and levels.
The present paper proposes a new approach to estimate the market share for new products. Thanks to a split questionnaire design, our approach permits us to reduce the burden on each respondent as much as desired and can be used even when the number of attributes and levels is large. The paper is organized as follows. Section 2 summarizes the method we propose for estimating the market share of a new product. Section 3 discusses some variants we considered, and motivates the choices we made. Section 4 exposes the Monte Carlo simulations we use for testing the efficiency of different methods in estimating market shares. In Section 5, we investigate the efficiency of the proposed method. First, we show how it outperforms candidate alternatives. Then, we investigate how the experimental design can be optimized. Subsequently, we compare the efficiency of the proposed method against standard conjoint analysis. Finally, in Section 6, we conclude and indicate further research directions.

The Proposed Method
This section presents the method that we propose in this paper. An explanation of the rationale behind the choice of its features among some considered alternatives is left to Sections 3 and 5. The method consists of three main steps preceded by a preliminary product analysis, which is in turn composed of three tasks. (see Figure 1).  In the preliminary analysis, we first identify the salient attributes of the product. This can be achieved by using classical methods; for instance, a semi-structured interview followed by a self-explicated conjoint analysis will be perfectly adequate. The second task consists of choosing, for each salient attribute, the levels to be considered. The last task requires to identify the products that are proposed by the competition.
We will now describe the three main steps building the method, namely: the construction of the experimental design, the questionnaire administration and the estimation of the demand function. Please note that for the sake of clarity, we begin with the second step.

Questionnaire Administration
The aim of the survey is to reveal whether a given product profile is preferred over the products proposed by the competition. The survey relies on a traditional full profile analysis and is composed of two stages. In the first stage, we identify from the products that are proposed by the competition the one that is preferred by the respondent. If the number of products in the competition is too large, then we can proceed with multiple pairwise comparisons rather than a single choice-based analysis. In the second stage, the respondent has to state her preference or indifference between a set of profiles that are presented to her and the preferred competition's product. This stage consists of a series of pairwise comparisons and it should not be too long because it has to keep the attention of the respondent.

Experimental Design
We now outline the procedure for selecting the profiles that will be administered to each respondent. Notice that it is beyond the scope of this paper to discuss, by contrast, the methodology for choosing a representative sample of respondents.
Given the salient attributes identified in the preliminary product analysis and the desired split questionnaire size (i.e., the desired number of profiles to be addressed to each respondent and the desired number of respondents to be interviewed), our method constructs a split questionnaire, the size of which stays as close as possible to the desired one. All of the distinct profiles that are found in the split questionnaire form a D-optimal basis. To maximize the performance (as we found with the numerical experiments discussed in Section 5), the number of profiles composing the basis is chosen to be as large as possible, insofar as this is compatible with the questionnaire size. Once the basis has been obtained, a random block algorithm completes the construction of the split questionnaire by assigning to each respondent a subset of the basis profiles. The D-optimal basis is computed with a modified Fedorov algorithm [39] whereas the random block experimental design is constructed with the same algorithm that was used to calculate the starting design in [40].

Demand Function Estimation
The demand function is estimated by using a logistic function of main effects, the coefficients of which are determined by a logistic regression procedure. The resulting demand function estimation allows the user to predict the market share for every possible product profile, provided that the latter can be described by the attributes and levels that have been identified in the preliminary analysis.

Considered Methods
This section summarizes all methods that are taken into account in the present study, and which are tested for efficiency in the Monte Carlo simulations exposed in Sections 4 and 5.
For the demand function estimation, we consider the following two models: • Linear function • Logistic function.
In both cases, we also investigate the two options of modeling main effects alone, or including first-order interactions.
For the regression design, we use D-optimal designs (the choice of D-optimality among other optimality criteria is discussed in Section 3.2).
For what concerns the block experimental design for splitting the questionnaire, we study the following options: Homogeneous groups.
The rest of the section motivates these preliminary choices and contains technical points. It can be skipped at first reading.

Demand Function Model
We consider a family of products with m salient attributes and where the number of possible levels is finite. Let us denote by a = 1, . . . , m the attributes and by L a the number of levels for attribute a. To describe the different product profiles, we use the standard dummy-coding for regression designs. We denote by x = (x 1 , x 2 , . . . , x p ) the column-vector describing the product profile. In this coding, the first element is associated with the intercept of the regression and is equal to 1. Then, the following elements describe the product profile with a binary coding: x i = 1 if the corresponding attribute level is encountered and x i = 0 otherwise. As usual, to avoid collinearity, the levels of attribute a are coded using only L a − 1 binary variables. Consequently, the size of the vector x is (1) As mentioned earlier, two classical options are considered, namely: the logistic and the linear function. Given that the method we propose makes use of the logistic approach, and because the linear approach is well-known, we limit ourselves to the description of the former case. To predict the share of preference for the product profile x, we use the logistic model whereβ T is the transpose of the column-vector of inferred parametersβ. The choice of this function is mainly motivated by the natural way in which it fits the unit interval. As already mentioned, the parametersβ are estimated using data obtained from a split questionnaire survey. For this purpose, a regression design containing v different profiles x 1 , . . . , x v is selected. The transposes of these vectors form the design matrix X of size v × p. During the survey, b respondents are interviewed but, because we use a split questionnaire, not all of the profiles are presented to a given respondent. To describe this particularity, we use the following assignation matrix: where i = 1, . . . , v and j = 1, . . . , b. We represent the results of the survey through the preference variables if respondent j prefers profile x i over the competition's offer, Finally, the estimation of the demand function is achieved by means of a logistic regression. According to the standard logistic regression procedure, the parameterβ is found by maximizing the log-likelihood for the variable β. At this point let us make some comments about terms we will use throughout the rest of the paper. We denote as demand function, the logistic function that describes the share of preference for all possible product profiles. We denote as market share the share of preference computed with the logistic function when the product profile is set to a given value.

Regression Design
In this subsection, we discuss possible ways to construct efficient regression designs. We start by reviewing optimality criteria for linear models, and consider the proper logistic problem only at a second step. Indeed, as discussed later on, our approach to construct efficient designs for logistic regression will borrow the optimality criterion from linear theory.
The methods of experimental designs, first introduced by Fisher see, e.g., [41], soon focused on the problem of constructing fractional factorial experiments [42][43][44][45][46], and in particular on orthogonal designs, which ensure a minimal variance of the estimated parameters in the case of linear regression models, see, e.g., [47]. Later efforts led to catalogues of basic orthogonal plans for a wide variety of cases, see, e.g., [48]. However, in many practical situations, an orthogonal plan cannot be constructed unless the number of levels or plots is enforced to accommodate the design requirements, which may of course imply a loss in the quality of the collected information. Consequently, the problem become central of constructing efficient (or optimal) designs for an arbitrary number of attributes, levels or profiles. As discussed in detail by Kiefer and Wolfowitz [49][50][51], possible criteria to define optimality are based on the eigenvalues of the covariance matrix X T X −1 of the inferred parameter vectorβ. In particular, A-, D-and E-optimality require the trace, the determinant and the maximal eigenvalue of the covariance matrix to be minimal, respectively. Other possible criteria to define optimality are based on the variance of predictions. For instance, the G-optimality requires the maximal variance among all design points of the linear estimator of the regression function to be minimal. An important result obtained by [52] stated the equivalence between D-optimality and G-optimality. This is a strong argument in favor of D-optimality because it is easier to compute than G-optimality . Further motivations are based on the invariance under linear transformations of the D-optimality criterion [49], on the volume minimization of particular confidence regions in parameter space [49,53], or rely on Bayesian arguments [54]. These reasons led us to choose, for linear regression purposes, the D-optimality criterion rather than A-or E-optimality. For algorithmic construction, D-optimality tends to be the preferred method [55]. In the 1970s, the first algorithms to achieve D-optimality were proposed [56][57][58][59]. In particular, Fedorov's algorithm is among the most important and most widely used methods to achieve D-optimality. A modified Fedorov algorithm, with similar precision but increased speed, was successively proposed by [39] and recommended by [55] for use in conjoint analysis. The problem of constructing optimal designs for logistic regression (which belongs to the class of generalized linear models) is a rather complex one. In the standard approach, one approximates the variance-covariance matrix of the parameter vector β by the inverse of the observed Fisher matrix I, see, e.g., [60]. The main difficulty stems from the fact that, unlike in linear regression, the Fisher matrix I depends on the maximum likelihood estimateβ, the determination of which is actually the final goal of the regression procedure. More precisely, denoting by x 1 , . . . , x v the v row vectors of the regression matrix X, the Fisher matrix elements are defined by the expectation value, see, e.g., [61] where E denotes the expectation value and Y is the random variable describing the consumers' choice: if the consumer prefers the product x over the competition's products, 0.5 if the consumer is indifferent, Math. Comput. Appl. 2021, 26, 7

of 22
One can show that for generalized linear models, the Fisher matrix takes the form where W is a diagonal "weight" matrix W = 1 v diag w β T x 1 , . . . , w β T x n , w being a single-valued function that depends on the model (in the case of a logistic model w(x) = (2 + 2 cosh(x)) −1 ). Various proposals have been made to estimate the Fisher matrix, despite the intrinsic difficulties. Chernoff [62] introduced the notion of locally optimal designs, the construction of which requires the parameter vectorβ to be initialized with a prespecified value. Later proposals involved a Bayesian estimation [61,63] that instead makes use of a probability distribution describing prior knowledge of the parameters to be estimated (see [64] for a review of optimal Bayesian designs). However, in the framework of our method, no prior information is assumed to be available. Criteria for the evaluation of Bayesian methods under uninformative priors have been recently discussed, for instance, by [65] (see also the discussion about uniform prior distributions by [66]). Nevertheless, modeling an uninformative prior still requires some assumptions, such as to define interval boundaries for the parameters. Because our method is meant to be used in a wide range of cases, so that it would be difficult to infer a universally valid, though uninformative, prior distribution for the maximum likelihood parameter vector, we prefer to avoid embarking on the construction of Bayesian designs. Opting again for simplicity, we decided to evaluate the Fisher matrix I by prespecifying the "most neutral" parameter vectorβ = 0. In this way, the Fisher information matrix equals, up to a constant factor, the Fisher information matrix for linear regression (i.e., X T X). Therefore, with this assumption, the problem is reduced to a linear model optimization. Relying on the previous discussion about optimality criteria for linear models, we decided to also adopt D-optimality as the criterion for building the experimental designs for logistic regression purposes. We implemented a modified Fedorov algorithm, according to the prescriptions of [39], and construct Doptimal designs by taking the best result among 100 algorithm runs, each with a different random design initialization.

Split-Questionnaire Design
Once the profiles composing the D-optimal basis have been selected, the next task is to split them among the different respondents. Let us denote by v the size of the D-optimal basis. The problem can be formulated as follows: v different profiles, each replicated r times, have to be administered to b respondents; then, every respondent is asked to state her preference about k ≤ v different product profiles. In this subsection, we investigate the possible ways of distributing the totality of vr questions among the b respondents. Identifying each respondent with a block allows us to use the same formalism that is commonly employed in block design theory. Notice that the mathematical problem that we face substantially differs from the one that is typically found in the response of blocked experiments, and we do not expect to have any particular correspondence for what concerns, for instance, the efficiency of the block design.
Consider the number λ ij of times where both profiles x i and x j are assigned to the same respondent (i.e., fall within the same block). The larger λ ij , the stronger will be the expected correlation among the preferences estimated by the questionnaire. It is difficult to estimate the effect of these correlations on the quality of the regression. In general, we may expect that if strongly correlated product profiles are administered to the same respondent, then this could imply an increase of the prediction error instability, and consequently lead to better or worse result, depending on whether the chosen respondent is more or less representative of the whole market. Without any a priori information about the underlying correlations, a reasonable prospect may be that of constructing a questionnaire that stays as neutral as possible; that is, in which the variance of the λ ij 's is minimized. In particular, a design minimizing ∑ i,j λ 2 ij is called MS-optimal among the designs of the same size (see [67]). An algorithm for MS-optimality can be simply obtained from the algorithm for A-optimality given by [40], by only restricting ourselves to the first step (i.e., the minimization of the quantity f 2 = ∑ i,j λ 2 ij ), and thus neglecting the second step, which minimizes a further quantity which is cubic in the λ ij 's.
For the sake of completeness, we also include in our study A-optimal designs that are calculated with the whole algorithm of Nguyen. Notice that every A-optimal design obtained with this algorithm is also MS-optimal. However, we believe that the notion of A-optimality, which is very useful in the analysis of blocked experiments, does not have a particular meaning in our context.
To further investigate the role of the λ ij , we add designs that we call "homogeneous groups" that in contrast to MS-optimal designs, maximize ∑ i,j λ 2 ij . We construct these designs with the further condition that v must be divisible by k. In this case, we divide the v profiles into v/k groups of size k, and assign r respondents to each group. It is easily verified that for a given pair of profiles x i , x j in the resulting design, either λ ij = r or λ ij = 0. Homogeneous groups are constructed very easily but are expected to be the most sensitive to correlations. Thus, when they are employed, the prediction error is expected to be more volatile than for the other methods.
Finally, we consider random block designs that are constructed with the same algorithm that is used to initialize the starting design in [40].

Numerical Experiments: Methodology
In Section 3, we selected various alternatives to set up our method. Further degrees of freedom that still need to be fixed include the experimental design parameters, and in particular, the choice of the D-optimal basis size v. To choose the best option, we resort to Monte Carlo simulations and test the various methods' efficiencies in estimating market shares. Once the final method is selected, simulations are also employed to compare the performance of the proposed method against conjoint analysis.
All of the Monte Carlo simulations were conducted for these four factorial designs.

Model Hypothesis
We suppose that the preference of each respondent is determined by a utility function that contains only main effects and no interactions. This is the classical assumption encountered in the majority of models in conjoint analysis. For a given respondent, the utility for a specified product profile is given by where x i is the binary dummy-coding describing the product configuration that was explained in Section 3.1, and u i is the respondent's part-worth. Please note that loyalty effects can be modelled using an attribute for the brand. We assume that each respondent will choose the product that gives her the highest utility. This excludes, for example, wrong answers that could be given during the survey and which are due to lack of concentration or to the complexity of the product profile. The respondent's choice is modelled in such a deterministic way so as to better isolate the intrinsic estimation errors of the methods we compare. Though being a popular model, and probably more realistic to describe the actual consumer's choice, the multinomial logit model is therefore discarded for the purposes of the present study.
To describe the variety of consumers, we suppose that the market is composed of different segments within which all consumers are identical. Of course, in reality, market segments contain consumers that have more or less similar behaviors rather than strictly identical behaviors. However, this simplification is justified here because we do not attempt to describe a real market but rather want to compare some numerical methods to predict market shares.

Prediction Error Criterion
To measure the prediction error, we chose the RMSE of the market shares. This is the standard deviation of the error term (i.e., the difference between the estimated market share and the real market share) over all possible product profiles. Obviously, the value of this criterion decreases when the quality of the prediction increases and is zero for a perfect prediction.

Monte Carlo Simulations
This procedure simulates the whole process of estimating the market shares and computing the prediction errors. The Monte Carlo simulations that we performed are organized in four stages that are summarized as follows.

• Number of market segments
The number of market segment is chosen randomly from 6 to 12 from an uniform distribution. • Market segment size The size of each market segment is chosen using an uniform distribution. • Number of ordinal attributes The number of ordinal attributes is randomly chosen using an uniform distribution, but is at least half of the total number of attributes. • Part worth for each segment For each segment, the part-worths are chosen randomly, using an uniform distribution on the unit interval. In case of ordinal attributes, the corresponding part-worths are rearranged to have the desired monotonic property. • Competition's product profile The competition's product profile is chosen randomly from all possible profiles.

Experimental design computation
• Computation of the D-optimal basis The profiles selected for the survey form a D-optimal basis and are computed with a modified Fedorov algorithm. • Splitting the questionnaire The questionnaire is split according to the four considered methods: random block, MS-optimal, A-optimal and homogeneous groups.

Survey simulation
• Selection of respondents The respondents are randomly selected according to a probability distribution that reflects the different market segments sizes. • Questionnaire administration For each respondent, the preference between each presented profile and the competition's profile is determined by comparing the respective utilities.

4.
Computation of the prediction error • Estimation of the demand function From the respondents' preferences, the demand function is estimated using a linear regression or a logistic regression. • Market share prediction error The estimated market share and the true market share are evaluated for all possible product profiles to compute the prediction error.
At this point, let us mention a few words about the choice of uniform distribution for market simulations. The Monte Carlo simulation is designed in order to take into account a variety of different possible markets. For market segments, a uniform distribution is clearly not limiting as the uniform distribution is used to choose the number of segments. For the same reason, a uniform distribution is clearly not limiting for choosing the number of ordinal attributes. For the part-worths, a uniform distribution is not limiting as the part-worths describe an ordinal utility function and, consequently, can be calibrated as desired as far as the order is respected.

Numerical Experiments: Results
For practitioners who desire to implement the method proposed in the present paper, the programs are given as supplementary materials. The programs and data used for the numerical experiments can be downloaded freely from the website https://drive.switch. ch/index.php/s/ASoI3YFmT46VB5j. Instructions to reproduce the numerical experiments can be found in the file readme.txt.

Best Method to Estimate the Demand Function
As explained previously, we study both the linear and the logistic function to estimate the demand function. We ran exactly the same series of tests for both cases, and we stated that the linear model leads to an increase of the prediction error between 10% and 15%. The result is clearly in favour of the logistic approach. To avoid too lengthy an exposition, we only discuss the numerical results for the logistic case; the results for the linear case are given as Supplemental Materials. For both cases, we also investigated the possibility to include first-order interactions. However, a first run of numerical experiments showed that this approach was less efficient for usual sample sizes. We leave the case of huge sample sizes open for further research.

Best Method to Split the Questionnaire
In this subsection, we establish the best method for splitting the questionnaires among the different respondents. Let us recall that we consider four experimental block designs, namely: random block, MS-optimal, A-optimal and homogeneous groups. For this purpose, we construct a numerical experiment plan that reflects the variety of the possible cases studies.
The problem of splitting the questionnaire can be formulated as follows: v different profiles, each replicated r times, have to be administered to b respondents and, subsequently, every respondent is asked to state her preference about k different product profiles. Obviously, the relation b = vr/k must hold. Furthermore, for homogeneous groups, k must be a divisor of v. Notice that these two constraints restrict the range of allowed parameters. Table 1 shows the experimental plan that we used. We consider each combination of parameters but, to avoid questionnaires that are too small (and perhaps unrealistic), we imposed the condition that the total number of administered profiles must be equal or greater than 300. For each set of parameters, we ran 1000 Monte Carlo scenarios and computed the prediction error for the four methods. Please note that 1000 simulations of the whole process means that number of cards, i.e., the sample size, is 300,000 for the smallest model and 12,000,000 for the biggest one. This is rather a big sample size for Monte Carlo simulation. Then, using statistical tests, we made pairwise comparisons of the methods' performance. We used a paired t-test for the average and a Pitman-Morgan test for the variance. For the first three methods-namely random block, MS-optimal and A-optimal-the tests did not show any statistically significant difference at a p = 5% level. For the fourth method-namely the homogeneous groups, except for a few parameter combinations-the tests did not show any statistically significant difference with respect to the other ones. More specifically, the Pitman-Morgan test showed that the variance of the prediction error is slightly higher for homogeneous groups in only two parameter combinations out of 28. Although two cases are largely insufficient to draw general rules, this supports the expectation as mentioned earlier, that the correlations present in the homogeneous groups design can imply larger variations in the prediction error. Meanwhile, the fact that the t-test for the average does not detect any statistical difference is not surprising because we expect correlations to primarily affect the variance and not the average prediction error.
Given that MS-optimal and A-optimal designs did not show any particular advantage with respect to random block designs, and because of the high computational cost needed to run the respective algorithms, we decided to discard them. Among the easy-to-compute block designs (i.e., random block designs and homogeneous groups), random block designs have the small advantage of allowing values of k that are not necessarily divisors of v. While also considering the earlier results concerning the variance of the prediction error, we decided to adopt random block designs throughout the rest of this study.

Number of Profiles to Be Presented to Each Respondent
We investigate here the effect of the number of profiles presented to each respondent on the prediction error using Pearson correlation tests. We use an experimental plan combining different values of the parameters (see Table 2). As earlier, we impose that the total number of administered profiles is equal or greater than 300. For each set of parameters, we performed 1000 Monte Carlo simulations, which corresponds to a sample size of 300,000 for the smallest model and 19,200,000 for the biggest one. To separate the effect of k (the number of profiles presented to each respondent) from those of v and r, we use the following normalized prediction error: where s(v, r, k, o) is the prediction error and o the scenario. In this formula, E stands for the expected value. As Figure 2 shows, the normalized prediction error has a very small dependence on k. This is statistically confirmed by the Pearson correlation tests (see Table 3). This result can be explained as follows. For fixed v and r, a smaller k correspond to a larger number of respondents b. Obviously, a larger respondent sample better approximates the real market, and consequently leads to a better accuracy. However, the differences are quite mild: although k = 1 represents the best choice, it would only lead to a decrease of about 1% in the prediction error compared to k = 10. This small gain has to be put in perspective with the price to pay: 10 times more respondents have to be interviewed. For this reason, we do not formulate a strict recommendation for the choice of k. We suggest that the number of profiles presented to each respondent should be only motivated by the load asked to her. For simple products, it can be relatively high (15 to 20) and for complex products relatively low (about 5). Throughout the rest of the present study, we select and keep a middling value for the number of profiles presented to each respondent, namely k = 10, which is reasonable from the point of view of the respondent load.

Number of Profiles in the Basis
We investigate now the effect of the number v of profiles in the basis on the prediction error. Given that the number of profiles presented to each respondent has been fixed (k = 10), we can directly associate the total number of administered profiles vr with the sample size b by means of the relation vr = 10 b. Throughout this subsection, the role of v and r is investigated when the sample size b (and thus vr) is held constant. The experimental plans are selected according to Figure 3, where horizontal lines correspond to v, and vertical lines to r. Diagonals from the lower left to the upper right correspond to lines of constant b and contain, within the same factorial design, a constant number of experimental points.
(k = 10), we can directly associate the total number of administered profiles vr with the sample size b by means of the relation vr = 10 b. Throughout this subsection, the role of v and r is investigated when the sample size b (and thus vr) is held constant. The experimental plans are selected according to Table 4, where horizontal lines correspond to v, and vertical lines to r. Diagonals from the lower left to the upper right correspond to lines of constant b and contain, within the same factorial design, a constant number of experimental points. Table 4. Experimental plan to study the interplay between v and r. Diagonals from the lower left to the upper right correspond to subsets of constant vr.  Please note that in the case of the three smallest factorial designs, the largest v among those included in the plan is relatively close to the total number of product design combinations. In contrast, a maximal value v = 512 is selected in the case of the largest, 4 5 × 3 5 factorial. The choice of this value, which is by far smaller than the total number of product design combinations, is mainly motivated by computational costs for D-optimality.
As before, 1000 Monte Carlo scenarios are performed, which corresponds to a sample size of 240,000 for the smallest model and 409,960,000 for the biggest one. To separate the effect of b from the one of v and r, we computẽ Please note that in the case of the three smallest factorial designs, the largest v among those included in the plan is relatively close to the total number of product design combinations. In contrast, a maximal value v = 512 is selected in the case of the largest, 4 5 × 3 5 factorial. The choice of this value, which is by far smaller than the total number of product design combinations, is mainly motivated by computational costs for D-optimality.
As before, 1000 Monte Carlo scenarios are performed, which corresponds to a sample size of 240,000 for the smallest model and 409,960,000 for the biggest one. To separate the effect of b from the one of v and r, we computẽ The normalized quantitys allows us to superpose the four lines of constant sample size in the numerical analysis. The results are plotted against v in Figure 4. The figure clearly shows that, for fixed b, it is better to increase v rather than r. The gain is stronger in the range of small v (and large r), while it is much weaker, and almost equal to zero, in correspondence of the largest values of v. Because the variation of v and r, given a constant b, has no impact on the questionnaire cost and feasibility, we can formulate a general and simple rule: once b (the number of respondents) and k (the number of profiles presented to each respondent) are selected, take v (the number of profiles in the basis) as large as possible. However, this means that the number of replication is as small as possible. If for logistic reasons, it is preferable to have more replications, it is still possible not to follow strictly this rule. The price to pay is not very high as we distinctly see in in Figure 4 that we quickly attains a plateau.

Comparison with Standard Conjoint Analysis
In this section, we compare our method against the popular full-profile conjoint analysis with rank orders and with parth-worths estimated by OLS regression. More precisely, we will consider two variants of the experimental design for conjoint analysis, namely: orthogonal and D-optimal designs. Let us recall that conjoint analysis permits us to estimate utility functions for each respondent, which in turn, allows us to compute the estimated market share needed for our numerical experiment. We have to compare two methods where the questionnaire administration is slightly different. To perform this comparison, we will take the same total number of profiles presented to the respondents during the whole survey. Assuming that we have only one competition product, this total number, noted N, is equal to b(k + 1) for our method and to b c k c for the conjoint model, where b c denotes the number of respondents and k c the number of profiles administered to each respondent in the conjoint survey. As an important remark, notice that N does not really represents the total effort made to administer the questionnaire. Indeed, in the conjoint case, the respondent must rank the profiles according to her preference, whereas in our method she has to compare each of them with the competition profile. Consequently, the conjoint model requires more mental operations from the respondent. Despite this, because it is difficult to provide a precise measure of the individual effort by rank-ordering profiles, we prefer to take the quantity N as the comparison criterion. Please note that this choice clearly disadvantages our method.
In the first comparison, for conjoint analysis, we use orthogonal plans as calculated by the SPSS software. For this purpose, we use the command ORTHOPLAN (ORTHOPLAN computes a main-effects orthogonal factorial plan) without specifying the minimal number of profiles contained in the design. The number k c is automatically chosen by the SPSS routine, and is equal to 16 for the 3 5 and 4 × 3 × 2 7 factorials, to 32 for the 4 × 3 4 × 2 3 factorial and to 64 for the 4 5 × 3 5 factorial. In addition, with k = 10 fixed, the requirement of comparing both methods at the same value of N puts a constraint on the possible values of v. Under this further condition, and following the rule we found, v is taken to be as large as possible. The experimental plan is given in Figure 5, and the results are shown in Figure 6. Please note that a slight horizontal shift has been added to the figure to improve visualization but, for both methods, the values of N are exactly those indicated by the axes.
Math. Comput. Appl. 2021, 1, 0 17 of 24 Table 4. Experimental plan to compare our method with standard conjoint analysis. Orthogonal plans generated by SPSS are used for conjoint analysis. We notice that both methods show a similar performance. More precisely, for the two smallest factorial designs, our method behaves a little better than conjoint analysis, while for the two largest factorial designs, conjoint analysis performs a little better. However, in these two latter cases, the conjoint analysis approach is very demanding in terms of respondent effort, so that it is unrealistic to assume that such a questionnaire can be accurately evaluated.
For this reason, a second comparison is made, in which the plans used for the conjoint analysis are D-optimal designs with a fixed number k c = 16 of profiles. This choice has both the advantages of imposing a realistic respondent load and of allowing a comparison at constant k c . Consequently, we are obliged to omit the largest factorial design, which would require 26 regression parameters to be estimated. The experimental plan for the simulations is identical to the one given by Figure 5 for what concerns the 3 5 -and the 4 × 3 × 2 7 -factorial designs. By contrast, the plan for the 4 × 3 4 × 2 3 -factorial design is modified according to Table 4. The results are shown in Figure 7. As in the previous figure, to improve visualization, a slight horizontal shift has been added to this figure .  Figure 7. Comparison between our method and conjoint analysis with D-optimal plans. The prediction error is plotted as a function of N. Dots mark the average value and bars denote one standard deviation. Round dots correspond to our method, while square dots to standard conjoint analysis.
Here again, both methods show similar performance. More precisely, conjoint analysis is a little better for the two smallest designs, but performs slightly worse for the largest design. We also see in both figures that our method presents a smaller standard deviation of the prediction error, which means that it leads to more stable predictions. Therefore, we can confidently conclude than the method we propose in the present paper is at least as good as standard conjoint analysis. The price to pay for lowering the amount of information presented to respondents while maintaining performance is to take a larger sample size. In our example, for the small models, we have to take a sample size 1.6 greater than standard conjoint analysis. Please note that we could have chosen to present as many profiles than the standard conjoint analysis to have the same sample size. Finally, for the biggest model, standard conjoint analysis cannot be implemented in reality as the burden on respondents is far too big.
From the numerical experiments, we draw an additional conclusion: Because the slope of the curves gets less and less steep with increasing N for all methods, it is probable that the prediction error will not fall below a certain value even if N is arbitrarily large. For what concerns conjoint analysis, we address this behaviour to the fact that the respondent burden k c must be rather small (by contrast, if we allowed k c to equal the total number of possible profiles, the prediction error found in our simulations would approach zero for N large enough). This behaviour also shows a structural limit of the method we propose in this paper. A logistic function of main effects does not actually possess enough degrees of freedom to describe the complicated demand function beyond a certain degree of accuracy. In any case, from the numerical experiments we performed, we observe that the prediction error (which we recall, measures one standard deviation from the true market share of a product profile), lies around 0.1, whatever the method or the factorial design. This value is valid for reasonable sample sizes and should be kept in mind when conducting a real case survey. Including interactions could allow us to lower this structural limit in the case of our method. For conjoint analysis, lowering the limit may be achieved by increasing the number of administered questions to each respondent, but this is hardly an option because respondents would be unable to provide an accurate evaluation of the questionnaire.
As a side remark, we also notice that the accuracy of conjoint analysis improves when using D-optimal designs rather than SPSS-generated orthogonal plans, provided that the number k c of presented profiles is the same. Indeed, we found that the orthogonal plans that we used are not always balanced and hence not optimal in terms of efficiency (see for instance [55]). D-optimal designs are therefore more suitable for comparing the efficiencies of the different methods.

Discussion and Conclusions
The present paper proposes a new approach to estimate the market share for new products. Thanks to a split questionnaire design, our approach permits us to reduce the burden on each respondent as much as desired. Hence, our approach can be used even when the number of attributes and levels is large. However, if one wants to maintain the performance, the price to pay is an increase of the sample size. The software provided as Supplemental Material allows practitioners to implement the method in a user-friendly way. It can be employed to design the split questionnaire and then, once the survey is conducted, to compute the market share estimations.
The method was selected among various alternatives by employing both theoretical arguments and Monte Carlo simulations. As a result, we chose to use a modified Fedorov algorithm to construct a D-optimal basis, and a random block algorithm to administrate subsets of the basis to each respondent. Given the users' desired questionnaire size, we provide an algorithm that stays close to the wished values and at the same maximizes the size of the D-optimal basis so as to increase efficiency. Indeed, for a given questionnaire size, we found that the method performs better when the same profiles are replicated less and the size of the basis is larger. In contrast, the number of questions administered to each respondent has been shown to play little or no influence onto the overall performance, and for this reason the user is free to specify the value she wishes. Finally, we found that a logistic function of main effects, while being analytically simple, is well-suited for estimating the demand function.
To assess the statistical performance of our new approach, we compared it with standard full-profile conjoint analysis with rank orders and with parth-worths estimated by OLS regression. For this purpose, we conducted further Monte Carlo simulations and can confidently conclude that the method we propose is at least as good as standard conjoint methods. These numerical experiments also show that both methods exhibit a structural limit. The prediction error does not fall below a certain value, which lies around 0.1, even if the sample is taken arbitrarily large. This accuracy is attained for relative small sample sizes: one hundred respondents seem to be sufficient. To our knowledge, even for conjoint analysis, this is the first time that this structural limit is exhibited and quantified.
The method proposed in this paper presents a weakness that is shared with all fullprofile approaches. Indeed, a practical size limit may still be imposed by the problem we mentioned in the introduction; i.e., the risk of presenting product descriptions that are too complex for being accurately evaluated by the respondent. However, the choice task demanded to the respondent (that simply consists of stating her own preference between two products) may be regarded as a simulation of what happens in a real market, in which consumers can still be confused by product complexity. Our method presents two further weaknesses that conjoint analysis does not have. First, inferences on a single respondent cannot be done. This disqualifies the method if the main goal is to study market segmentation, but is not an issue otherwise. Second, the competition's product profiles have to be known prior administrating the survey and, if they change, a new survey must be conducted. This is obviously penalizing but we are currently extending our method to fix this problem by using a multinomial logit approach. Our method is not the only one which tries to reduce the burden on each respondent. For instance adaptive choice-based conjoint [26], hybrid methods [29] or bridging methods [14] do the same. However, contrary to these methods, our method permits a total control of the burden and can therefore be used even when the other methods are not any more implementable.
To conclude, taking all into account, we recommend that our method should be used in the case where the concurrence profiles are not expected to change in the short term, and when the main advantages of our method can be better exploited, i.e., when the number of attributes and levels is large.