Next Article in Journal
Optimal Vaccination Strategies to Reduce Endemic Levels of Meningitis in Africa
Previous Article in Journal
The Power of Passivity in the Hirshleifer Contest Under Small Noise
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Test Me If You Can—Providing Optimal Information for Consumers Through a Novel Certification Mechanism †

by
Ulrike Vollstädt
1,*,
Patrick Imcke
2,
Franziska Brendel
2 and
Christiane Ehses-Friedrich
3
1
Faculty of Economics and Management, Otto von Guericke University Magdeburg, 39106 Magdeburg, Germany
2
Faculty of Business Administration and Economics, University of Duisburg-Essen, 45127 Essen, Germany
3
Faculty of Economics and Business Administration, Friedrich Schiller University Jena, 07743 Jena, Germany
*
Author to whom correspondence should be addressed.
A previous version of this paper circulated as Ruhr Economic Paper #887 with the title “Increasing consumer surplus through a novel product testing mechanism”.
Games 2025, 16(5), 44; https://doi.org/10.3390/g16050044
Submission received: 4 July 2025 / Revised: 23 August 2025 / Accepted: 26 August 2025 / Published: 29 August 2025

Abstract

Certifiers such as Stiftung Warentest (Germany), Which? (UK), and Consumer Reports (US) reduce asymmetric information between buyers and sellers by providing credible information about product quality. However, due to their limited testing capacities, they face a set selection problem and test only a subset of all available product models. We show theoretically that, under any current mechanism to select product models for testing, buyers always end up buying suboptimal product models, unless all product models which would be sold under complete information (or all but the overall cheapest one) happen to be tested. Instead, we propose a novel mechanism based on voluntary disclosure, but with the same testing capacity, which always yields the maximum possible consumer surplus and thus weakly dominates any current mechanism. Furthermore, we confirm in a controlled laboratory experiment that our mechanism significantly increases consumer surplus.
JEL Classification:
C72; C91; D82; L15

1. Introduction

From technical products such as smartphones to insurances or consumer products such as foodstuff, toothpaste, and strollers, sellers are often better informed about product quality than buyers. Unfortunately, this asymmetric information may lead to a fundamental economic problem: if sellers of high-quality products are not able to credibly convey the high quality of their products, not all buyers are able to choose the quality they would like to buy. Consequently, information asymmetry about vertical product quality1 can reduce consumer (and producer) surplus (Akerlof, 1970).2
Independent consumer organizations such as Stiftung Warentest (Germany), Which? (UK), and Consumer Reports (US) reduce asymmetric information between buyers and sellers by providing credible information about product quality. These organizations are third-party certifiers whose goal is to provide objective and comprehensive information about product quality. They are usually non-profit organizations, and neither require sellers to pay a fee for the rating service nor accept advertisements, in order to avoid conflicts of interest. Instead, sales of their own publications represent one of their main sources of financing (see Stiftung Warentest, 2019; Which?, 2023c; Consumer Reports, 2023; International Consumer Research & Testing, 2021).3 Usually, independent consumer organizations aim to provide a comprehensive rating of vertical product quality; i.e., they include ratings for a stroller’s weight, how waterproof the raincover is, and the level of toxic substances. However, they do not include ratings for horizontal quality, e.g., how tasteful a stroller’s color is. Independent consumer organizations often have employees whose task is to purchase products (for testing) while concealing that they are doing this on behalf of their organization. In order to obtain a comprehensive quality rating, e.g., very good, good, satisfactory, fair and poor, independent consumer organizations weight and add the ratings of all included quality dimensions. Consumers can access the testing results online or in print magazines. Moreover, in several European countries, certain sellers may print their own testing result on their product, so consumers are able to access this quality information directly while shopping online or in retail settings (see, for example, Stiftung Warentest, 2020; Which?, 2023b, and Appendix A.1 for screenshot examples). Independent consumer organizations are usually well-known and well-regarded. For instance, 96% (77%) of all German consumers know of (strongly trust) Stiftung Warentest (KantarEmnid & Verbraucherzentrale Bundesverband, 2018). In the US, Consumer Reports has more than 6 million paying members and their website receives an average of 11 million unique visits per month (Consumer Reports, 2023). Information provided by independent consumer organizations is thus very close to what Viscusi (1978) suggests in a reply to Akerlof (1970), namely to provide credible information to buyers. 4
While independent consumer organizations offer credible and precise information about product quality, they are hampered by limited testing capacities; i.e., they face a set selection problem as they need to select a subset of all available product models in the market for testing.5 Typically, they select product models based on which ones are perceived to be of greatest interest for consumers. Stiftung Warentest, for example, uses current sales numbers to select the bestselling product models for testing. Their subset of tested product models usually accounts for 2% to 33% of all available product models (as in the 09/2016 magazine; see GfK SE, Nuremberg, 2017). By contrast, Which? and Consumer Reports make their selections using a combination of sales numbers, price, and other criteria (see Appendix A.3 for details). To the best of our knowledge, no independent consumer organization currently allows sellers to directly suggest certain product models for testing.
The logic behind any of these current selection mechanisms is that consumers are likely to want information on the bestselling product models and/or on product models from different price ranges. However, our counterargument is that these are not necessarily the ones that buyers would have selected under complete information. It is not clear whether any of these current product model selection mechanisms do in fact provide optimal information for consumers, i.e., whether the selection of product models leads to optimal consumer surplus. In particular, there may be product models among the untested ones which dominate the tested product models, e.g., offer a higher quality at the same price, but have simply not been selected for testing (see Section 3.1 for formal definitions of dominated and non-dominated product models). Indeed, among a sample from Stiftung Warentest, there are many dominated product models (see Figure A4 and Figure A5 in Appendix A.2). Note that this observation is in line with numerous empirical studies which measure the correlation between product quality and price within the subsets of tested product models in different countries (see Note 4). Thus, the problem is not the limited testing capacity itself, but the consequences of this limited capacity for the provision of optimal information.
In this paper, we explore the mechanism by which a certifier solves the set selection problem, i.e., how it selects a subset of product models for testing. In particular, we propose a novel mechanism to select product models that yields greater buyer information. Importantly, our proposed mechanism uses the same number of testing slots. Only the selection process differs to take advantage of the sellers’ information (see Section 3.1.3 for a detailed description). To test the performance of our proposed selection mechanism, we develop a certification game to derive theoretical, testable predictions. Furthermore, we conduct a controlled laboratory experiment to test these theoretical predictions.
The rest of our paper proceeds as follows: Section 2 discusses the related literature. Section 3 introduces our theoretical framework and derives our theoretical, testable predictions. Section 4 presents our experimental design and hypotheses and reports our experimental results. Section 5 discusses our findings and concludes.

2. Related Literature

This paper is related to several strands of theoretical, empirical and experimental literature such as certification, information disclosure and unraveling, social choice under constraints and Bayesian persuasion.
Many papers on certification, including ours, analyze how certifiers influence market transparency (for an overview, see Dranove & Jin, 2010). The main contribution of our paper is that we focus on certifiers who face a set selection problem as they need to select a subset of product models for testing. To the best of our knowledge, we are the first to do so. For example, Stahl and Strausz (2017) explicitly mention independent consumer organizations in their paper, but they do not discuss any testing capacity constraints.
Our paper is, however, closely related to Stahl and Strausz (2017) as we also focus on different aspects of seller versus buyer certification. Following them, we define buyer (seller) certification as a mode of certification where buyers (sellers) (i) initiate and (ii) pay for certification. As to paying for certification, we view independent consumer organizations as containing mainly, but not only, elements of buyer certification because, in addition to selling their own publications to consumers, some consumer organizations also sell licenses to producers wishing to print their certification on their product (see, for example, Stiftung Warentest, 2020 and Which?, 2023b). While we do not suggest any changes in how consumer organizations finance themselves, we do suggest a change in who initiates certification. More specifically, we suggest including one aspect of seller certification by allowing sellers to apply for certification. However, we suggest to keep the aspect of buyer certification of letting the certifier have the final say in whom to test from the set of applicants. As we show below, this allows sellers to signal whether they are (non-)dominated, and it leads to more transparency as buyers are able to infer more about the quality of untested product models.
Our certification game represents a market with sellers, buyers, and a certifier (building on a model by Encaoua and Hollander (2007) without a certifier; see Section 3 for details). As our main research question is how a certifier can optimally solve its set selection problem, we need a game that allows for a certain number of sellers from whom the certifier then selects a subset for testing. Related papers such as Stahl and Strausz (2017) and Ispano and Schwardmann (2023) analyze models with one or two sellers, respectively. Importantly, our certification game allows for prices which may not be positively correlated with quality. As depicted in Figure A4 and Figure A5 in Appendix A.2, product models are represented by points in the quality–price space, and dominated product models may exist. This implies that, in our setting, it is not necessarily all the high-quality sellers who want to signal high quality, but it is all the sellers whose product models would be bought under complete information who want to signal that they are optimal.
Our proposed mechanism by which a certifier selects product models for testing allows sellers of non-dominated product models to voluntarily and credibly disclose their product quality which may lead to information unraveling (see Grossman, 1981; Milgrom, 1981). Note that, while our mechanism builds on Grossman (1981) and Milgrom (1981), it goes beyond their ideas by including, amongst other things, a certifier modeled as an algorithm which selects a subset of applicants for testing (see Section 3.1.3 for details). To the best of our knowledge, we are the first to do so.
Also, note that our study differs from Encaoua and Hollander (2007) in that we do not focus on sellers’ pricing and quality-setting choices. Instead, we investigate a situation in which the unknown variable has already been determined, similarly to Benndorf et al. (2015), and Jin et al. (2022). In other words, we take situations where qualities and prices have already been set, and focus on sellers’ decisions to apply for testing—and thus on the degree to which information unravels—as a first step in analyzing the performance of our mechanism.
Our proposed mechanism with unraveling and information disclosure is supported by previous theoretical, empirical, and experimental research (for overviews, see Dranove & Jin, 2010 and Brendel, 2021). On the theoretical side, several studies investigate the different conditions under which unraveling occurs (see, for instance, the seminal papers of Grossman, 1981 and Milgrom, 1981, as well as the overviews in Dranove & Jin, 2010 and Brendel, 2021) and find that complete unraveling requires several strong assumptions. By contrast, unraveling has been observed but to an incomplete degree in empirical studies (see, for instance, Mathios, 2000 and Jin & Leslie, 2003, amongst many others). The experimental papers also observe unraveling, sometimes to an incomplete degree (see, for example, Benndorf et al., 2015 and Benndorf, 2018), but sometimes to a complete degree when allowing for detailed feedback and learning (see, for example, Forsythe et al., 1989 and Jin et al., 2021). To the best of our knowledge, there is no study that investigates whether unraveling leads to improved information in contexts with limited information disclosure capacities. We contribute to this strand of literature by analyzing the first game with such limited information disclosure capacities.
While all third-party certifiers help to mitigate buyers’ information disadvantage, they may differ along several dimensions, e.g., seller versus buyer paid certification, for-profit or not-for-profit, limited versus unlimited testing capacities, and focus on subset (e.g., environmentally friendly product) versus comprehensiveness of quality information (overall high-quality product). As mentioned above, independent consumer organizations are usually not-for-profit, and they mainly rely on buyers paying for certification. By contrast, for-profit third-party certifiers such as Moody’s and PSA6 charge sellers fees for the rating service (see Dranove & Jin, 2010; List, 2006; Jin et al., 2010). Independent consumer organizations face limited testing capacities, and thus need to select subsets of product models testing. By contrast, both for-profit third-party certifiers such as Moody’s and PSA and not-for-profit third-party certifiers such as USDA organic or Blauer Engel are typically able to consider and, if applicable, certify each seller. (Yet, sellers may have to wait a certain amount of time for certification.) Thus, these certifiers do not need to select a subset of sellers for certification, which is the main focus of this paper. As for the comprehensiveness of quality information, independent consumer organizations such as Consumer Reports, Stiftung Warentest and Which? aim to provide comprehensive quality information including search, experience and credence characteristics. By contrast, certifiers such as USDA organic (US) or Blauer Engel (Germany) provide information about a distinct subset of quality dimensions, more specifically, credence characteristics. 7
Furthermore, this paper is also related to the literature on social choice under constraints (see, for example, Barberà et al., 2005; Bahel & Sprumont, 2020, 2021) as both our paper and the literature on social choice under constraints investigate mechanisms to solve certain set selection problems. Social choice under constraints analyzes mechanisms to solve social set selection problems when not all possible subsets of objects are feasible. We analyze mechanisms to solve seller set selection problems in which all subsets are, in principle, feasible, while the constraint is the number of sellers in the subset.
Finally, this paper is also related to the literature on Bayesian persuasion (Kamenica & Gentzkow, 2011) as both our paper and the literature on Bayesian persuasion tackle complex problems of information disclosure. Bayesian persuasion is concerned with optimal disclosure when a sender designs a disclosure signal to a receiver about the value of a prospect, sometimes through an intermediary (Bizzotto et al., 2021). In our paper, the intermediary (the certifier) reveals an always perfect signal, but has to select a subset of senders whose perfect signals it reveals. Senders may apply to be selected, but they do not design the disclosure signal.

3. Theory

We start by describing the theoretical framework (Section 3.1). Subsequently, we present theoretical results for three different versions of the game (Section 3.2). See Table A2 in Appendix A.6 for a list of symbols.

3.1. Theoretical Framework

In this subsection, we establish our theoretical framework. More specifically, we describe our certification game in Section 3.1.1. Subsequently, we establish a set of basic definitions in Section 3.1.2. Furthermore, we introduce three different versions of the game in Section 3.1.3 which later help us analyze different product model selection mechanisms. Last, we establish two assumptions about the distribution of market parameters in Section 3.1.4.

3.1.1. Certification Game

Our certification game represents a market with sellers, a certifier, and buyers. This one-shot game allows us to analyze how information about a limited subset of tested product models influences consumer surplus (and seller profits) in the short term. The sequence of the game occurs over three stages (see Table 1 and Section 3.1.3).
Sellers (He/His)
We consider a market with a non-empty set of rational sellers F, with F = { f 1 , , f n } , and n N . These sellers offer heterogeneous product models. For simplicity, we assume each seller offers exactly one product model, but can sell as many units of that product model as demanded. For seller f t F , with t { 1 , , n } , we call q t the quality of the corresponding product model, with 0 q t Q R + , and with Q denoting the set of all quality levels. We call p t seller f t ’s price, with 0 p t R + . Similarly to Benndorf et al. (2015), and Jin et al. (2022), we are mainly interested in product quality disclosure. Therefore, we analyze situations in which quality and price have already been set. Also, we assume the correlation between price and quality (when considering all product models) equals zero since certifiers are most helpful for buyers if buyers are unable to infer quality from price (see also Note 4).8 We further assume product quality comprises experience and credence characteristics, excluding search characteristics, according to Nelson (1970) and Darby and Karni (1973), and these quality characteristics are aggregated into the unidimensional, overall vertical quality measure q t . Moreover, we assume there is no pair of sellers offering their product model at the same price, i.e., f t , f s F with t s , we require that p t p s . (Note that, in Appendix A.8, we present a generalized selection mechanism which allows us to relax, amongst other things, this assumption.) We define
f Z c : = arg   min f k Z p k
with f Z c Z , and with Z F , as the seller offering the cheapest product model within the set Z.9 Furthermore, we define
f F c q ̲ : = f F c | f F c with q F c = q ̲
with f Z c q ̲ F , and with q ̲ Q as the overall lowest quality, as the seller who offers the overall lowest quality and the overall lowest price. Note that f F c q ̲ may or may not exist depending on the market parameters, while f F c always exists. If f F c q ̲ exists, he is always identical to f F c .
For seller f t , we assume the function c q t denotes the unit costs of production. Thus, the unit costs of production c ( q t ) depend on only the quality level, are independent of the total number of produced units and are identical for a given quality level. Furthermore, the cost function is assumed to be continuously differentiable, strictly increasing and strictly convex in quality, i.e., c q t > 0 and c q t > 0 . Since we are not interested in analyzing market entry or exit decisions and since positive fixed costs would thus not influence equilibrium predictions, we assume all sellers’ fixed costs equal zero. For simplicity, we assume there are no sellers providing the same utility for any buyer. Finally, we assume sellers have complete information. We write seller f t ’s profit function as
π t p t , q t = p t c q t d p t , q t
where d p t , q t represents the number of buyers buying seller f t ’s product model.
Certifier (It/Its)
Following the literature on honest certification, we assume the certifier provides credible information about product quality (see, e.g., Stahl & Strausz, 2017). However, it does so for a limited subset of product models which it selects according to its maximum testing capacity k N , and according to its product model selection mechanism. Quality q t is perfectly revealed f t K , with t { 1 , , n } , and with K F as the set of sellers whose product models are tested.10 We assume the certifier has already determined its number of testing slots based on how precisely it wants to report quality. In particular, we assume the certifier provides at least as many certification slots as there are quality levels, i.e., k # Q . (Note that, in Appendix A.8, we present a generalized selection mechanism which allows us to relax, amongst other things, this assumption.) We model two different product model selection mechanisms: any current mechanism SellersMayNotApply and our new mechanism SellersMayApply (see Section 3.1.3 for details). We assume both mechanisms use the same number of testing slots.
We refrain from modeling the certifier’s surplus function since, as mentioned above (Section 1), it is a non-profit organization11 which does not rely on fees for the rating service. Since we model the certifier as an algorithm without its own surplus function, we do not call it a player.
Buyers (She/Hers)
We next identify a non-empty set of rational buyers B, with B = { b 1 , , b s } , and s N .12 These buyers decide whether, and if so which, product model to buy (at most one). They are not able to resell. For buyer b h , with h { 1 , , s } , we call θ h her valuation of quality, with 0 < θ h R + . Before having received any information from the certifier, buyers are assumed to have complete information about prices and valuations of quality. As to quality, we assume buyers only know the quality distribution. Moreover, we assume buyers believe that the correlation between price and quality equals zero (once more, see Note 4). We assume
E u h p t , q t , θ h = 1 f t K θ h q t p t + 1 f t F K θ h E q t p t
is the expected utility function of buyer b h B buying seller f t ’s product model, with K being the set of sellers whose product models have been tested.13  θ h q t is a buyer’s willingness to pay for q t . If a buyer chooses a tested product model, her utility is deterministic since all buyers know the quality of a tested product model prior to purchase. Thus, if a buyer chooses to buy a tested product model, her utility simplifies to the first summand of Equation (4), i.e., to u h p t , q t , θ h = θ h q t p t . In this case, her utility equals the product of her valuation of quality and the selected product model’s quality, minus the selected product model’s price. On the other hand, if a buyer chooses to buy an untested product model, her expected utility is probabilistic since she does not know the true quality of this product model prior to purchase. Thus, her expected utility simplifies to the second summand of Equation (4), i.e., to E u h p t , q t , θ h = θ h E q t p t . In this case, her utility equals the product of her valuation of quality and the selected product model’s expected quality, minus the selected product model’s price. Finally, if a buyer chooses not to purchase a product model, her utility is zero. After having received information from the certifier, we assume buyers update the expected quality of all untested product models (also see Section 3.2.2 and Section 3.2.3).

3.1.2. Local and Global Dominance, and Dominating Rivals

Before describing the different product model selection mechanisms in more detail, we need to establish a set of basic definitions. We begin by defining local and global dominance to distinguish if a product model is dominated within the whole market (as in Section 3.2.1 when analyzing a world of complete information), or within a certain submarket like the set of tested product models (as in Section 3.2.2 and Section 3.2.3 when analyzing worlds of incomplete information with different product model selection mechanisms).
Definition 1 
((Non-)Dominated products). Let Z F be a non-empty set of sellers. A seller f t Z offers a dominated product in Z if f j Z with p j p t q j > q t p j < p t q j q t . A seller f t Z offers a non-dominated product in Z if f j Z
  • if p j < p t , then q j < q t ,
  • if q j > q t , then p j > p t .
When referring to a true submarket of F, i.e., when Z F , we call a product locally (non-)dominated. When referring to the whole market F, i.e., when Z = F , we call a product globally (non-)dominated.
Essentially, a product is dominated in a set (or market) if at least one seller in this set offers a strictly higher product quality without being more expensive, or a strictly lower price without offering a lower product quality. By comparison, a product model is non-dominated in a set if every seller in this set offering a strictly higher product quality also has a strictly higher price, and every seller offering a strictly lower price also offers a strictly lower quality. In the following, we use the terms “seller with (non-)dominated product” and “(non-)dominated seller” equivalently.
To illustrate Definition 1, consider the following local market (see Figure 1): Z = { f 1 , f 2 , f 4 , f 5 , f 6 } , with
  • q 1 = 2 , p 1 = 5 ,
  • q 2 = 3 , p 2 = 10 ,
  • q 4 = 1 , p 4 = 11 ,
  • q 5 = 2 , p 5 = 9 ,
  • q 6 = 4 , p 6 = 28 .
Furthermore, consider the following global market: F = Z f 3 , with q 3 = 5 and p 3 = 27 . While sellers f 1 , f 2 , and f 6 are locally non-dominated in market Z, sellers f 1 , f 2 , and f 3 are globally non-dominated in market F.
Having defined local and global dominance, we now partition any set of sellers Z into two sets of sellers: N D Z Z , the set of locally non-dominated sellers in Z, and D Z Z , the set of locally dominated sellers in Z, with N D Z D Z = Z . If Z = F , we use the notation D instead of D F , and N D instead of N D F . In the following, we assume all sellers are sorted according to these two disjoint sets. Without loss of generality, F = { f 1 , , f n } = N D D , with N D : = { f 1 , , f m } F as the set of globally non-dominated sellers, and with D : = { f m + 1 , , f n } F as the set of globally dominated sellers, with m N . We denote the subset of globally non-dominated sellers who have at least one buyer under complete information with N D ¯ .
Next, we define a seller’s dominating rivals.
Definition 2 
(Dominating rivals of a certain seller). We call the set
R t : = { f j F | ( ( q j > q t p j p t ) ( q j q t ) ( p j < p t ) ) }
the set of dominating rivals of seller f t D , with R t F .
Definition 2 implies that seller f t is globally (and locally) dominated by every seller in R t . We do not define sets of dominating rivals for globally non-dominated sellers as such sets would be empty.

3.1.3. Three Versions of the Game

This section introduces three different versions of the game which later help us analyze different product model selection mechanisms.
Benchmark CompleteInformation
As a benchmark, we model an ideal world of CompleteInformation about product quality (see Table 1). Since information is complete, there is no need for a certifier.14 Therefore, stages 1 and 2 of the game are excluded, and the game reduces to stage 3. In this stage 3, buyers observe the quality and price of all product models, and buyers decide which product model to buy (if any). The theoretical results can be found in Section 3.2.1. Note that, while we include this game version of CompleteInformation as a theoretical benchmark, the following incomplete information game versions provide the basis for our experimental treatments.
Any Current Selection Mechanism SellersMayNotApply
To represent any current product model selection mechanism in which sellers are not able to influence directly which product models are selected for testing, we model a mechanism in which sellers are passive. We name this mechanism SellersMayNotApply (see Table 1). Since sellers are passive, stage 1 of the game is excluded, and the game reduces to stages 2 and 3. In stage 2, the certifier tests the set of product models K , with K F . This set of tested product models could, in principle, be determined by any combination of currently used selection criteria. Note that, when analyzing the performance of any current mechanism SellersMayNotApply in Section 3.2.2 and Section 3.2.4, we consider any possible scenario of which product models the set of tested product models K may contain. In stage 3, buyers observe the quality of the tested product models as well as the prices of all product models, and then decide which product model to buy (if any).
New Selection Mechanism SellersMayApply
Under our new mechanism, sellers are able to influence whether a certifier will test their product model. Therefore, the game occurs over all three stages (see Table 1). In stage 1, sellers decide whether, and if so, with which quality, to apply for testing of their product model. Note that sellers apply by submitting their product model number.15 Sellers who apply incur positive application_costs which we assume to be lower than the profit margin of every globally non-dominated seller who has at least one buyer under CompleteInformation, i.e., p t c t > application_costs > 0     f t N D ¯ . Note that these application_costs could be very low, e.g., application paper work, and are not a fee for the testing service. After all sellers made their application decision, stage 2 starts. We assume the certifier has immediate access to each applicants’ price16 and uses a pre-specified algorithm to select product models from the set of applicants (see below). Eventually, all product models selected by the algorithm are tested, and the quality stated during the application may or may not be confirmed. If an applicant seller’s product model is chosen for testing and the submitted quality is found to be false, this seller must also pay a positive punishment_fee which is assumed to be higher than the maximum additional profits any globally non-dominated seller could make when being tested. (Note that, in Appendix A.8, we present a generalized version of SellersMayApply which uses a different method to deter lying, namely combining a lower punishment_fee with not revealing a liar’s quality. Furthermore, note that the punishment_fee exists under SellersMayApply, but not under SellersMayNotApply as, in our theoretical framework, sellers are only able to state a (true or false) quality when applying for testing. We abstract from quality statements made to buyers, e.g., marketing.) After the final test, the true qualities of all tested product models are published. In stage 3, buyers observe the quality of the tested product models as well as the prices of all product models, and then decide which product model to buy (if any).
The SellersMayApply mechanism is based on the algorithm described in Figure 2. See Figure A7 in Appendix A.7.1 for a formal description. Note that the algorithm stops after algorithm step 2 in its first iteration if all sellers submit true qualities when applying to be tested (which they are predicted to do in equilibrium according to Proposition 2 below). To illustrate our algorithm, consider once again global market F in Figure 1. Assuming all sellers applied stating their true qualities, the algorithm would pre-select seller f 4 ( f 1 , f 2 , f 6 , f 3 ) as the cheapest seller for quality level 1 (2, 3, 4, 5, respectively), and then exclude the locally dominated sellers f 4 and f 6 in algorithm step 1. In algorithm step 2, all untested, locally non-dominated product models selected at the end of the preceding algorithm step, i.e., sellers f 1 , f 2 , f 3 , would be tested. Since we assumed all sellers applied stating their true qualities, no false quality statements would be detected, and the algorithm would stop. The theoretical results for our new mechanism SellersMayApply can be found in Section 3.2.3.

3.1.4. Two Distribution Assumptions

Last, we make two assumptions about the distribution of market parameters. (Note that, in Appendix A.8, we present a generalized version of SellersMayApply which does not require these distribution assumptions.) In the following, let f t be a seller from N D ¯ f F c 17, and let b h be any buyer who would select seller f t under CompleteInformation. Furthermore, let f k be the seller offering the next-lowest quality level in N D ¯ , and let f g be any seller in N D offering a cheaper product model than f t . Moreover, let E q t SellersMayApply denote seller f t ’s expected quality under SellersMayApply.18
Assumption 1 
(Joint distribution of price and quality). Consider a hypothetical scenario in which seller f t , seller f F c q ̲ (if any), and all globally dominated sellers are untested; therefore, their qualities are known in expectation only to buyers. Furthermore, seller f k and all sellers in N D ¯ offering quality levels higher than f t are tested; therefore, their qualities are known to buyers. (All other sellers (if any) may or may not be tested; therefore, their qualities may or may not be known to buyers.) We assume price and quality are distributed such that, given this scenario, E q t SellersMayApply q k .
To illustrate Assumption 1, consider seller f t = f 7 in experimental market 1 (see Section 4.1 and Appendix A.10 for details). In this example, f 7 is the seller offering a quality of 3 in N D ¯ . Furthermore, f k = f 4 is the seller with the next lowest quality level in N D ¯ compared to f 7 , i.e., q 4 = 2 . Last, f F c q ̲ = f 1 . Therefore, we consider the following scenario. Sellers f 7 and f 1 , and all globally dominated sellers, i.e., f 2 , f 3 , f 5 , f 6 , f 8 , f 9 , f 11 , f 12 , f 14 , f 15 , are untested; therefore, their qualities are known in expectation only to buyers. Furthermore, sellers f 4 , f 10 , and f 13 are tested; therefore, buyers know that q 4 = 2 , q 10 = 4 , and q 13 = 5 . (There are no other sellers to be considered in this example.) In this scenario, E q 7 SellersMayApply = ( 3 × 1 + 2 × 2 + 3 × 3 ) / 8 = 2 .19 Since q 4 = 2 , it follows that E q 7 SellersMayApply q 4 , which implies that Assumption 1 holds.
Assumption 2 
(Joint distribution of price, quality, and valuation of quality). Consider a hypothetical scenario in which seller f g , seller f F c q ̲ (if any), and all globally dominated sellers are untested; therefore, their qualities are known in expectation only to buyers. Furthermore, seller f t and all sellers in N D ¯ offering quality levels higher than f t (if any) are tested; therefore, their qualities are known to buyers. (All other sellers (if any) may or may not be tested; therefore, their qualities may or may not be known to buyers.) We assume price, quality, and buyers’ valuation of quality are distributed such that, given this scenario, u h p t , q t , θ h > E u h p g , q g , θ h SellersMayApply .
To illustrate Assumption 2, consider once more seller f t = f 7 in experimental market 1 (see Section 4.1 and Appendix A.10 for details). In this example, f g f 1 , f 4 , and f F c q ̲ = f 1 . Furthermore, buyer b h = b 3 (with θ 3 = 7 ) would select seller f 7 (with q 7 = 3 , and p 7 = 10.9 ) under CompleteInformation. Thus, we check Assumption 2 for two different cases: f g = f 1 (with q 1 = 1 , and p 1 = 2.9 ) and f g = f 4 (with q 2 = 2 , and p 2 = 5 ). In the former case, u 3 p 1 , q 1 , θ 3 > E u 3 p 4 , q 4 , θ 3 SellersMayApply . In the latter case, u 3 p 7 , q 7 , θ 3 > E u 3 p 4 , q 4 , θ 3 SellersMayApply .20 Combining both inequalities implies that Assumption 2 holds.
Both Assumptions 1 and 2 imply that we rule out “too left-skewed” distributions of quality. Note that we need them in steps 3 and 6 of Proposition 2’s proof. Furthermore, note that both assumptions hold for all our experimental markets in which quality is uniformly distributed, and in which N D starts from the highest quality level and does not have any “gaps”.21

3.2. Theoretical Results

Having established our theoretical framework in the previous subsection, we are now able to analyze the three different versions of the game. We start with an ideal world that contains CompleteInformation about product quality (see Section 3.2.1) and then examine two worlds consisting of incomplete information about product quality that therefore require an certifier. Specifically, Section 3.2.2 examines SellersMayNotApply, any current product model selection mechanism. Section 3.2.3 examines SellersMayApply, our proposed product model selection mechanism.
Since sellers are passive in our benchmark case and in any current mechanism SellersMayNotApply, we analyze only buyer behavior. By contrast, since sellers are active in our proposed mechanism SellersMayApply, we analyze both buyer and seller behavior by applying backward induction. We calculate equilibrium payoffs for both buyers and sellers in all three versions of the game (Nash equilibria in the games under CompleteInformation and under SellersMayApply, Bayesian Nash equilbria under SellersMayNotApply).22

3.2.1. Benchmark CompleteInformation

In this subsection, we present our benchmark case of an ideal world of CompleteInformation about product quality (see Section 3.1.3). Buyers maximize their utility according to Equation (4). Because they have complete information about product quality, Equation (4) simplifies to u h p t , q t , θ h = θ h q t p t . Hence, a buyer’s maximization condition is given by:
arg   max f t F f 0 θ h q t p t
with f 0 representing a non-existing seller with q 0 = 0 and p 0 = 0 , denoting a buyer’s choice not to purchase a product model. Equation (6) implies that a buyer will choose the product model among all those yielding a non-negative utility which maximizes her utility. If all available product models yield a negative utility, she will refrain from buying. Therefore, buyer b h receives her maximum possible utility in equilibrium
u h p F f 0 * h , q F f 0 * h , θ h = θ h q F f 0 * h p F f 0 * h = θ h q N D ¯ f 0 * h p N D ¯ f 0 * h
with f Z * h denoting the seller who maximizes buyer b h ’s utility within the set Z, and with q Z * h and p Z * h denoting the respective quality and price. Note that a globally dominated product model cannot maximize buyer b h ’s utility under CompleteInformation since each dominating rival product model would generate a higher surplus for buyer b h . Therefore,
f F f 0 * h = f N D ¯ f 0 * h .
In addition, under CompleteInformation, buyer b h ’s utility is always non-negative in equilibrium. See Appendix A.7.3 for a formal definition of the corresponding market areas.
Equilibrium profits of globally dominated sellers equal zero. If seller f t offers a globally non-dominated product model, his equilibrium profit can be calculated as:
π t p t , q t = d p t , q t ( p t c q t ) = h = 1 s 1 f t = f N D ¯ f 0 * h p t c q t .
The above equation implies that a product model maximizes buyer b h ’s utility by taking into account all sellers who would sell at least one product model under CompleteInformation and the option of not buying.
The following definition summarizes aggregate consumer surplus and seller profits.
Definition 3 
(Aggregate consumer surplus and seller profits under CompleteInformation). Aggregate consumer surplus equals
h = 1 s θ h q N D ¯ f 0 * h p N D ¯ f 0 * h .
Globally dominated seller profits equal zero. Aggregate profits of globally non-dominated sellers equal
t = 1 m h = 1 s 1 f t = f N D ¯ f 0 * h p t c q t .

3.2.2. Any Current Selection Mechanism SellersMayNotApply

In this subsection, we analyze a world with incomplete information about product quality where a certifier perfectly reveals information about a subset of product models under any current mechanism SellersMayNotApply (see Section 3.1.3). Buyers maximize their utility according to Equation (4). Hence, buyer b h ’s maximization condition is given by:
arg   max f t F f 0 θ h E q t SellersMayNotApply p t = arg   max f t F f 0 max f t K f 0 θ h q t p t , max f t { F K } f 0 θ h E q t SellersMayNotApply p t .
This maximization condition implies that, among all product models yielding a non-negative expected utility, a buyer will choose either the optimal tested one, or the optimal untested one, depending on which of these two yields the higher expected utility. If all available product models yield a negative expected utility, a buyer will refrain from buying.
Regarding the expected quality of untested product models, we assume in Section 3.1.1 that buyers take the information from the certifier into account. Under any current mechanism SellersMayNotApply, this implies that buyers calculate the expected quality of untested product models by mentally removing the set of tested product models K from the complete set of sellers F; i.e., they calculate E q t SellersMayNotApply based on the set of untested sellers F K . Note that, E q t SellersMayNotApply is the same for all untested product models. Since θ h is given, it follows that price is the only decision parameter when choosing among all untested product models. Therefore, the cheapest untested product model maximizes the expected utility among all untested product models for all buyers.
Buyer and seller equilibrium payoffs vary, of course, depending on which product models are selected for testing. The respective equilibrium payoffs, given a certain set of tested product models K , can be found in Appendix A.7.4. Below, we compare aggregate buyer surplus and aggregate seller profits under SellersMayNotApply to those under CompleteInformation (see Definition 3) while considering all possible scenarios of which product models are selected for testing.
Proposition 1 
(Consumer surplus and seller profits under any current mechanism SellersMayNotApply). Any current mechanism SellersMayNotApply always leads to a lower consumer surplus (and higher (lower) profits of globally dominated (globally non-dominated) sellers) compared to a world of CompleteInformation except for two scenarios. In only two rare scenarios, any current mechanism SellersMayNotApply leads to the same consumer surplus (and same seller profits) as under CompleteInformation. These two scenarios are as follows:
(i) 
In the first scenario, all product models which would be sold under CompleteInformation are selected for testing. In other words, N D ¯ K .
(ii) 
In the second scenario, all product models which would be sold under CompleteInformation except for the overall cheapest one are selected for testing. In other words, f F c N D ¯ , f F c K and N D ¯ f F c K . Moreover, the overall cheapest product model is selected by every buyer who would have selected it in under CompleteInformation. In other words, b l b 1 , , b s with f F c = arg   max f k N D ¯ f 0 u l p k , q k , θ l , the following holds: f F c = arg   max f k F K c K f 0 E u l p k , q k , θ l SellersMayNotApply .
Proposition 1 implies that any current mechanism SellersMayNotApply always leads to a lower consumer surplus compared to a world of complete information unless all product models which would be sold under complete information (or all but the overall cheapest one) happen to be selected for testing. The proof consists of two steps. In the first step, we show that buyers select suboptimal product models in most scenarios of which product models are selected for testing which leads to a lower consumer surplus. In the second step, we show that buyers select their complete-information-optimal product models only in the two exception scenarios which leads to the optimal consumer surplus. The detailed Proof of Proposition 1 can be found in Appendix A.7.5.

3.2.3. New Selection Mechanism SellersMayApply

In this subsection, we analyze another world with incomplete information about product quality where a certifier perfectly reveals information about a subset of product models but with our new mechanism SellersMayApply (see Section 3.1.3). As in the previous subsection, buyers maximize their utility according to Equation (4), and buyer b h ’s maximization condition is given by Equation (12).
In contrast, to the previous subsection, however, buyers can infer more from the certifier’s information since they know that sellers have the option to apply for testing under our new mechanism SellersMayApply, while this option does not exist under any current mechanism SellersMayNotApply. Regarding untested product models, buyers calculate their expected quality by mentally removing not only the set of tested product models K from the complete set of sellers F. In addition, buyers mentally remove all quality levels larger than or equal to the quality of the next most expensive tested product model. In other words, buyers calculate E q t SellersMayApply based on the set of untested sellers F K while considering only the quality levels
q t Q q i | f k K with p k > p t q k q i
due to the following reasoning.
Rational buyers know that, in equilibrium, no seller will apply to be tested stating a false quality (see step 1 in the Proof of Proposition 2). Buyers can also conclude that, in equilibrium, any tested seller must be globally non-dominated (see step 2 in the Proof of Proposition 2). It follows that the quality of an untested product model must be lower than that of any tested product model offered at a higher price. Otherwise, i.e., if the quality of an untested product model was higher than or equal to the quality of any tested product model offered at a higher price, this cheaper untested product model would dominate the tested one. However, all tested products must be globally non-dominated (see step 2 in the Proof of Proposition 2) which would not be the case if the untested product model dominated any of the tested ones.
Note that E q t SellersMayApply may differ among untested product models. However, it is identical for all product models within a certain price range23 between tested product models. Since θ h is given, it follows that price is the only decision parameter when choosing among all untested product models within this price range. Therefore, the cheapest untested product model maximizes the expected utility among all untested product models within a certain price range for all buyers.
Sellers maximize their profits according to Equation (3). Since quality and price are set, sellers make decisions regarding only the product test. Unlike under any current mechanism SellersMayNotApply where we need to consider all possible scenarios of which product models are selected for testing, there is a unique Nash equilibrium under our new mechanism SellersMayApply which we describe in the following Proposition 2.
Proposition 2 
(Unique Nash equilibrium, consumer surplus and seller profits under our new mechanism SellersMayApply). A unique Nash equilibrium
( apply with q 1 , , apply with q ( # ND ¯ ) , do not apply , , do not apply , buy product model of seller f F ˜ 1 , , buy product model of seller f F ˜ s ) T
exists, with f F ˜ 1 , , f F ˜ s N D ¯ .
In equilibrium, all sellers (or all but the overall cheapest one)24 of product models that buyers would have bought under CompleteInformation apply to be tested stating their true quality, and are tested. All other sellers do not apply to be tested. All buyers select the product model they would have selected under CompleteInformation. Consumer surplus and seller profits are identical to those occuring under CompleteInformation, except that profits of sellers who applied for testing are reduced by the application_costs .
Proposition 2 implies that our new mechanism SellersMayApply always creates the maximum possible consumer surplus, equivalent to a world of complete information. The proof consists of six steps in which we sequentially eliminate strategy profiles either because they contain strictly dominated strategies, or because they cannot be a Nash equilibrium due to other reasons. In step 1, we show that all strategy profiles can be eliminated in which at least one seller applies to be tested stating a false quality. In step 2, we show that, given all remaining strategy profiles, all strategy profiles can be eliminated in which seller f F c q ̲ or at least one globally dominated seller applies to be tested stating his true quality. In step 3, we show that, given all remaining strategy profiles, all strategy profiles can be eliminated in which at least one seller from N D ¯ f F c does not apply for testing. In step 4, we show that, given all remaining strategy profiles, seller f F c f F c q ̲ may apply to be tested stating his true quality, or he may not apply to be tested, depending on the market parameters. In step 5, we show that, given all remaining strategy profiles, all strategy profiles can be eliminated in which at least one of the sellers from N D N D ¯ f F c (if any) applies to be tested stating his true quality. In step 6, we show that, in equilibrium, all buyers select the same product model as under CompleteInformation. The detailed Proof can be found in Appendix A.7.5.

3.2.4. Comparing Consumer Surplus Resulting from Any Current and Our New Mechanism

We end this section with a comparison of the consumer surplus generated from the two mechanisms.
Proposition 3 
(Comparing consumer surplus resulting from any current mechanism SellersMayNotApply and from our new mechanism SellersMayApply). SellersMayApply outperforms SellersMayNotApply by leading to the optimal, higher consumer surplus in all possible scenarios but two. In the two exceptions stated in Proposition 1, both SellersMayApply and SellersMayNotApply lead to the same optimal consumer surplus.
Proposition 3 implies that our new mechanism SellersMayApply weakly dominates any current mechanism SellersMayNotApply.
Proof Proposition 3. 
(Comparing consumer surplus resulting from any current mechanism SellersMayNotApply and from our new mechanism SellersMayApply). According to Proposition 1, any current mechanism SellersMayNotApply leads to a lower consumer surplus than a world of CompleteInformation for all but two rare scenarios of which product models are selected for testing. In those two exception scenarios, it leads to the same consumer surplus as a world of CompleteInformation. According to proposition 2, our new mechanism SellersMayApply always leads to the optimal consumer surplus of a world of CompleteInformation. It follows that, for all but two possible scenarios of which product models are selected for testing under any current mechanism, our new mechanism SellersMayApply outperforms any current mechanism SellersMayNotApply by leading to a higher consumer surplus. In the two exception scenarios stated in Proposition 1, both mechanisms lead to the same, optimal consumer surplus. □

4. Experiment

4.1. Experimental Design and Hypotheses

Based on the certification game introduced in the previous section, we design a laboratory experiment to test our theoretical predictions and ascertain the extent to which these predictions are observed with human decision makers. In particular, unraveling has been shown to decrease in more complex experimental settings (see Hagenbach & Perez-Richet, 2018; Jin et al., 2022). Recall that our mechanism includes two product model dimensions (quality and price) as well as the option for sellers to state a false quality when applying to be tested. We design four experimental treatments. The first two represent two scenarios of any current mechanism SellersMayNotApply in which sellers cannot directly influence whether their product model is tested, while the latter two represent two versions of our new mechanism SellersMayApply (see Section 3). Note that we do not include an experimental treatment for our benchmark version of the game CompleteInformation since it is comparatively simple, and since there is no real-world equivalent.
SellersMayNotApply-WorstCase 
To model a scenario under any current selection mechanism in which the market functions extremely poorly, we design a worst-case scenario regarding the set of tested product models. In this worst-case scenario, the tested product models are the ones vertically furthest away from the globally non-dominated ones.25
SellersMayNotApply-Random 
We also design an intermediate scenario under any current mechanism where the set of tested product models is chosen randomly among all available ones. In this random scenario, the share of globally non-dominated, CompleteInformation optimal product models among all tested product models is almost identical to the share of globally non-dominated product models in all markets.26 We include this treatment to investigate whether our new mechanism outperforms chance.
SellersMayApply-LyingPoss(ible) 
This treatment represents the version of our new mechanism where sellers may apply for testing and may provide a true or a false quality. While the option of providing a false quality does not change the equilibrium predictions (see Proposition 2), it makes the SellersMayApply mechanism more complex. Therefore, we consider it important to investigate this treatment in the lab.
SellersMayApply-Truth 
This treatment represents another version of our new mechanism where sellers may apply for testing and are not allowed to provide a false quality.
In our experiment, we use a between-subject design; i.e., per session, we conduct one treatment. Each session consists of twelve rounds/markets with different quality–price combinations. At the end of a session, one of the twelve rounds is chosen randomly for payment (4 ECU = 1 EUR). In each session, we include 15 sellers ( n = 15 ), 8 buyers ( s = 8 ) and one certifier which is implemented as a computer algorithm rather than a participant. Player roles are assigned randomly at the beginning of a session and remain constant afterwards. The certifier selects at most five product models to be tested ( k = 5 ).
During the experiment, sellers are assigned product models at one of five different quality levels, q t { 1 , 2 , 3 , 4 , 5 } . This range allows for a balance between experimental simplicity and the ability to distinguish seller behavior across product model quality levels.27 Furthermore, we choose one of the simplest possible unit costs functions fulfilling c ( q t ) > 0 and c ( q t ) > 0 , namely the quadratic unit costs of production, i.e., c ( q t ) = q t 2 . In our experiment, buyers are assigned across four quality valuations, θ h { 3 , 7 , 11 , 15 } , with two subjects per θ h . While buyers know that there are three sellers per quality level, they do not know if price is related to quality. They learn a product model’s quality only if the certifier has revealed this or if they have purchased the product model. If applicable, sellers incur application_costs of 0.5 ECU and a punishment_fee of 24 ECU if a false quality statement is detected.
To ensure that our experiment yields only positive total payoffs, each subject receives an initial endowment of 100 ECU. Sellers earn 0.5 ECU per correct answer when we ask them their beliefs about other sellers’ behavior. For a variation of SellersMayApply-Truth, we ask sellers for their beliefs about both seller and buyer behavior.
Since our experiment is conducted in Germany, where Stiftung Warentest mainly selects bestselling product models for testing, we use a bestsellers frame for the tested product models under SellersMayNotApply. Therefore, in all treatments, participants are informed in the instructions that they should act on the assumption that there are five bestselling product models per round. These product models are chosen for testing in the two SellersMayNotApply treatments (see Appendix A.9 and Appendix A.10 for details).
Among the 12 different markets, there are four different types: markets with 5, 4, 3, or 2 globally non-dominated product models (markets 1–3, 4–6, 7–9, and 10–12, respectively; see Appendix A.9 and Appendix A.10 for details). Within a session, one type of market is played for 3 rounds in the same random order to exclude learning about market types. Moreover, all 12 markets differ in their quality–price combinations, which are assigned exogeneously. For all globally non-dominated product models in each market, prices are slightly higher than marginal costs. In addition, the price–quality combinations of these product models are chosen such that, under CompleteInformation, two buyers with the same theta would either buy the globally non-dominated product model with optimal quality if available ( θ 1 = θ 2 = 3 q * = 2 , θ 3 = θ 4 = 7 q * = 3 , θ 5 = θ 6 = 11 q * = 4 , θ 7 = θ 8 = 15 q * = 5 ), or refrain from buying otherwise. Thus, we ensure that, under CompleteInformation, no buyer would choose a product model with q = 1 , which corresponds to Stiftung Warentest’s “poor” rating. This rating is given when a product model is considered unacceptable for all, as when it does not suit its claimed purpose and/or entails unacceptable risks such as high toxic material levels. Again to prevent learning across rounds, we choose quality–price combinations for globally dominated product models such that there are three product models per quality level and the correlation between quality and price is below 0.01.
We base our hypotheses on our theoretical results from Section 3.2. In particular, H1, H2, and H3 are consequences of Proposition 2, while H4 is a consequence of Proposition 3.
H1. 
Seller behavior. Under our new mechanisms SellersMayApply-LyingPoss and SellersMayApply-Truth
globally dominated sellers will not apply to be tested.
with the exception of q t = 1 , globally non-dominated sellers will apply to be tested and will, if applicable, state their true quality.
H2. 
Content of the product test. The product test will contain the following:
no information on globally non-dominated product models under SellersMayNotApply-WorstCase,
information on 21.7% of globally non-dominated product models under SellersMayNotApply-Random, and
information on all globally non-dominated product models under both SellersMayApply-Truth and SellersMayApply-LyingPoss.
H3. 
Buyer behavior. Buyers will choose the least number of globally non-dominated product models under SellersMayNotApply-WorstCase, more under SellersMayNotApply-Random, and only globally non-dominated product models under both SellersMayApply-Truth and SellersMayApply-LyingPoss.
H4. 
Surplus and profits. Per capita, the following will hold:
consumer surplus as well as globally non-dominated seller profits are lowest under SellersMayNotApply-WorstCase, higher under SellersMayNotApply-Random, and highest under both SellersMayApply-Truth and SellersMayApply-LyingPoss.
globally dominated seller profits are highest under SellersMayNotApply-WorstCase, lower under SellersMayNotApply-Random, and lowest under both SellersMayApply-Truth and SellersMayApply-LyingPoss.
Our experiment comprised 25 sessions and was conducted between January 2017 and December 2018 at the Essen Laboratory for Experimental Economics (elfe), Germany. We conducted five sessions per treatment, with the exception of SellersMayApply-Truth, where we conducted an additional five sessions in which sellers were asked for their beliefs about buyer behavior. In total, 575 subjects participated in the experiment. On average, a session lasted two hours, and a subject earned 27.39 EUR. More details on the number of participants are displayed in Table 2. Participants were invited to participate in the experiment using ORSEE (Greiner, 2015). The experiment was programmed and conducted with zTree (Fischbacher, 2007). A translated version of the instructions can be found in Appendix A.11. Translated screenshots of the main decision situations in z-Tree can be found in Appendix A.12.

4.2. Experimental Results

We prepare and analyze the data with R 4.5.1 (R Core Team, 2023), including the R package zTree (Kirchkamp, 2019). Unless stated otherwise, we report the results of two-sided Mann–Whitney U tests for all treatment comparisons, conservatively counting one experimental session as one independent observation. Since we did not find any significant differences between SellersMayApply-Truth with and without asking for beliefs about buyer behavior, we pool these data in our subsequent analyses.

4.2.1. Seller Behavior

Figure 3 depicts the share of sellers who do and do not apply to be tested, split by sellers with globally dominated versus globally non-dominated product models.28 In line with our first hypothesis H1, Figure 3 shows that most globally dominated sellers do not apply to be tested (70.9% under SellersMayApply-LyingPoss, 83.1% under SellersMayApply-Truth). Also in line with H1, we see from Figure 3 that most globally non-dominated sellers do apply to be tested (70.5% under SellersMayApply-LyingPoss, 82.1% under SellersMayApply-Truth). Moreover, under SellersMayApply-LyingPoss, we see that only 7.4% (2.9%) of globally dominated (globally non-dominated) sellers apply stating their false quality, which is marginally significantly (not significantly) different from zero (sign test, p-values 0.06 and 0.12, respectively). Interestingly, we further see from Figure 3 that more globally dominated sellers apply to be tested under SellersMayApply-LyingPoss compared to SellersMayApply-Truth, but fewer globally non-dominated sellers do so. Thus, we see that SellersMayApply-LyingPoss reflects a greater degree of out-of-equilibrium behavior. We summarize our first main result as follows.
Result 1 
In line with H1, under SellersMayApply-LyingPoss and SellersMayApply-Truth:
most globally dominated sellers do not apply to be tested.
most globally non-dominated sellers do apply to be tested and, if applicable, state their true quality.
However, not in line with H1, there is more out-of-equilibrium behavior under SellersMayApply-LyingPoss than under SellersMayApply-Truth.
Result 1 is consistent with previous experimental findings (see, for example, Benndorf et al., 2015) which show that participants behave largely but not completely in line with theoretical predictions.
The finding that SellersMayApply-LyingPoss elicits greater out-of-equilibrium behavior means that the number of sellers with globally non-dominated product models who apply to be tested will impact the number that are tested. It further implies that if sellers with globally dominated product models apply stating false qualities to the extent that they create testing congestion, globally non-dominated product models may end up being squeezed out of the testing pool. The next subsection presents the degree to which this actually happens.

4.2.2. Content of the Product Test

Figure 4 shows the respective shares of globally dominated and non-dominated product models in the product test pool. In line with our second hypothesis H2, the product test yields the least globally non-dominated product model information under SellersMayNotApply-WorstCase (0%), more under SellersMayNotApply-Random (21.7%), more still under SellersMayApply-LyingPoss (79%), and most under SellersMayApply-Truth (96.1%). Note that the shares under SellersMayNotApply-Random and SellersMayNotApply-WorstCase are determined by design. In particular, under SellersMayNotApply-WorstCase, since the exogenously assigned bestsellers are all globally dominated, no globally non-dominated product models are tested. Also note that, by design, 21.7% of globally non-dominated product models are tested under SellersMayNotApply-Random as this reflects the share of globally non-dominated product models in all markets (23.3%).
Figure 4 further shows that the product test pool contains fewer globally non-dominated product models under SellersMayApply-LyingPoss than under SellersMayApply-Truth (difference: 17.1 percentage points), but more than under SellersMayNotApply-Random (difference: 57.3 percentage points). Thus, we find that although out-of-equilibrium behavior under SellersMayApply-LyingPoss decreases the share of globally non-dominated product models in the test, this share remains higher than that under SellersMayNotApply-Random. We summarize our second main result as follows.
Result 2 
In line with H2, the product test provides the least information on globally non-dominated product models under SellersMayNotApply-WorstCase, more under SellersMayNotApply-Random, and more still under SellersMayApply-LyingPoss. However, not in line with H2, the product test provides even more information on globally non-dominated product models under SellersMayApply-Truth.

4.2.3. Buyer Behavior

We next examine the relation between buyer information and behavior under the different mechanisms. Figure 5 shows that, consistent with H3, buyers choose the fewest globally non-dominated product models under SellersMayNotApply-WorstCase (48.1%), more under SellersMayNotApply-Random (65.4%), more still under SellersMayApply-LyingPoss (82.1%), and the most under SellersMayApply-Truth (88.2%). Note that the relatively large shares of buyers choosing globally non-dominated product models under SellersMayNotApply-Random and SellersMayNotApply-WorstCase can be explained by the large share of buyers who choose the cheapest product model, which is always globally non-dominated (53.5% of all buyers who choose globally non-dominated product models under SellersMayNotApply-Random, 90.9% under SellersMayNotApply-WorstCase). Figure 5 further shows that buyers choose fewer globally non-dominated product models under SellersMayApply-LyingPoss (82.1%) compared to SellersMayApply-Truth, but more than under SellersMayNotApply-Random. Figure 5 also shows a positive share of non-buyers for each treatment, but no significant difference across treatments. We do find that buyers choose the highest number of globally dominated product models under SellersMayNotApply-WorstCase (35.2%), fewer under SellersMayNotApply-Random (24.2%), and the fewest under SellersMayApply-LyingPoss and SellersMayApply-Truth (5% and 3.8%, respectively). Thus, we summarize our third main result as follows.
Result 3 
In line with H3, buyers choose the fewest globally non-dominated product models under SellersMayNotApply-WorstCase, more under SellersMayNotApply-Random, and more still under SellersMayApply-LyingPoss. However, not in line with H3, buyers choose an even greater number of globally non-dominated product models under SellersMayApply-Truth.

4.2.4. Surplus and Profits

In our final set of experimental results, we examine participant payoffs across different treatments. From Figure 6, we see that, consistent with H4, per capita consumer surplus is lowest under SellersMayNotApply-WorstCase (5.7 ECU), higher under SellersMayNotApply-Random (12.4 ECU), and highest under SellersMayApply-LyingPoss and SellersMayApply-Truth (17 ECU and 17.8 ECU, respectively). Figure 6 further shows that globally non-dominated seller profits are lowest under SellersMayNotApply-WorstCase (2.3 ECU), higher under SellersMayNotApply-Random (5.1 ECU), and highest under SellersMayApply-Truth (8.3 ECU). Figure 6 also shows that globally non-dominated seller profits are lower under SellersMayApply-LyingPoss (7.2 ECU) compared to SellersMayApply-Truth, but still higher than under SellersMayNotApply-Random. Again in line with H4, Figure 6 shows that globally dominated seller profits are highest under SellersMayNotApply-WorstCase (5.4 ECU), lower under SellersMayNotApply-Random (3 ECU), and lowest under SellersMayApply-LyingPoss and SellersMayApply-Truth (0.3 ECU in both treatments). Thus, we summarize our fourth main result as follows.
Result 4 
In line with H4, per capita
consumer surplus is lowest under SellersMayNotApply-WorstCase, higher under SellersMayNotApply-Random, and highest under SellersMayApply-LyingPoss and SellersMayApply-Truth.
globally non-dominated seller profits are lowest under SellersMayNotApply-WorstCase, higher under SellersMayNotApply-Random, and highest under SellersMayApply-Truth and SellersMayApply-LyingPoss.
globally dominated seller profits are highest under SellersMayNotApply-WorstCase, lower under SellersMayNotApply-Random, and lowest under SellersMayApply-Truth and SellersMayApply-LyingPoss.

5. Discussion and Conclusions

In this study, we propose a novel mechanism to solve a certifier’s set selection problem given its limited testing capacity in order to provide more valuable information to consumers. Our mechanism relies on the unraveling prediction as it allows sellers to indicate a product model’s quality when applying for testing. We first develop a certification game to derive testable predictions for different mechanisms to select product models for testing. We show theoretically that a unique Nash equilibrium exists in which our proposed mechanism yields optimal buyer information equivalent to a world of complete information. We also show that any current mechanism always leads to a lower consumer surplus unless all product models which would be sold under complete information (or all but the overall cheapest one) happen to be selected for testing. Therefore, our mechanism weakly dominates any current mechanism.
We then use an experimental setting to test the predictions derived from our game. The results of our experiment show that, under our new mechanism, most sellers with globally non-dominated product models apply for testing, suggesting that information unraveling is sufficient for increasing the information provided on globally non-dominated product models. Buyers benefit from the superior information as they buy more globally non-dominated product models. Thus, our experimental results confirm that our mechanism increases consumer surplus compared to current mechanisms with limited testing capacities. Our results further show that globally non-dominated seller profits increase while those of globally dominated sellers decrease under our mechanism. Our experimental results are consistent with those of previous studies that find that unraveling does occur, yet usually to an incomplete degree, in complex settings (see Hagenbach & Perez-Richet, 2018; Jin et al., 2021, 2022). Furthermore, we show that even in our two-dimensional context where price does not necessarily equal quality, and where false quality statements are allowed, our mechanism outperforms any current mechanism. Importantly, our mechanism can be applied to any certifier with limited testing capacities.
We acknowledge the following limitations of our theoretical framework. First, while all product tests are based on a partly arbitrary set of weights in order to create a unidimensional overall quality measure, we acknowledge that our mechanism would be particularly sensitive to the set of weights used as it might determine which product models are globally dominated or non-dominated. However, test results published by Consumer Reports show that more than half of all tested product models are globally dominated on all quality sub-dimensions (Hjorth-Andersen, 1984).29 This implies that more than half of these tested product models are, in fact, insensitive to the set of weights; i.e., they remain globally dominated no matter which set of weights is used. In addition, our mechanism could be extended to include more than one set of weights, which would be an interesting research question. For example, a certifier could issue different calls for applications based on different sets of weights if it identifies separate groups of buyers with different preferences. Thus, a certifier could provide customized information for each group of buyers. Note that, irrespective of whether a certifier wishes to include one or more sets of weights, we consider it important that a certifier announces precisely how it will measure product quality. In particular, sellers need to know which quality characteristics the certifier includes, and how these are weighted to calculate an overall, unidimensional quality measure.
Second, we limit our analysis to the short term. More specifically, in situations in which quality and price have already been set and remain constant, we show that buyers are able to select more globally non-dominated product models under our new mechanism. However, we acknowledge that globally non-dominated sellers may have an incentive to increase their price after the product test if the game allowed for a longer time horizon. Yet, this concern also exists for any current mechanism since sellers who are locally non-dominated within the set of tested sellers may also increase their price after the product test. Therefore, we consider this as a general problem, i.e., a problem that both any current mechanism and our new mechanism face, and not as a specific disadvantage of our mechanism. In relation to this, while we assume in our theoretical framework that each seller can sell as many units of his product model as demanded, we acknowledge that this may empirically not always be the case, especially in the short term. In particular, one may fear that globally non-dominated, but previously less well-known sellers may not be able to satisfy the increased demand in the short term. However, a similar problem also seems to exist with current mechanisms.30 Also, since our mechanism is more transparent than current mechanisms (see Appendix A.3 for details), it may actually allow sellers to estimate their future demand more precisely, which may mitigate the concern about the production capacities of less well-known sellers.
Third, we assume a high degree of precision when measuring product quality. More specifically, we assume in our theoretical framework that sellers are perfectly informed about their own and others’ product quality. While we acknowledge that this may not always be the case, we would still argue that sellers have an informational advantage over buyers. In relation to this, we assume that the certifier is able to measure product quality perfectly. We think that relaxing either assumption provides interesting avenues for future research: extending our certification game to a case where (i) sellers are less precisely informed about product quality, and (ii) where the certifier may not be able to measure product quality perfectly.
Fourth, our theoretical framework does not allow us to investigate how real-world complexities, such as strategic misreporting, market power, or collusion among sellers, might impact the effectiveness of our mechanism in other settings. Strategic misreporting, for example, should theoretically not occur in our setting where one seller sells one product model. Yet, it would be interesting to analyze settings in which one seller offers several product models. Here, a different way to deter lying might be more appropriate. Similarly, in a more complex setting in which one seller offers several product models, it would be interesting to investigate if and, if yes, how, our mechanism needed to be modified when sellers have different degrees of market power. As to analyzing collusion, a repeated setting in which sellers can set quality and price before the testing occurs would be an interesting extension.
Finally, we also acknowledge that there may be concerns about adopting our mechanism in the field for two reasons. First, the role of sellers in the application process in our mechanism may raise the concern that independent consumer organizations will incur a reduction in their perceived credibility. To mitigate this concern, they could provide a transparent explanation of the product model selection process to buyers. This information could emphasize that the role of sellers does not present a conflict of interest since the final decision which product models to test remains with the consumer organization. Moreover, they could emphasize that the testing process continues to include anonymous test buyers and objective testing methods. Note that other third-party certifiers, which do not face the same testing capacity constraints, do allow sellers to directly suggest certain product models to be tested. However, these other certifiers do not need to select a subset from their applicants which is the main focus of this paper (see end of Section 2 for more details).
Second, independent consumer organizations may be concerned about the effect of our mechanism on their publication sales. Low ratings of (previous) bestsellers under current mechanisms may elicit surprise and media attention, leading to increased interest in consumer organization publications. By contrast, under our new mechanism, none of the bestsellers may be included in the test. However, it is also possible that our mechanism would generate increased interest in consumer organization publications since it yields a more helpful portrait of what to purchase.
Importantly, it is also possible for independent consumer organizations to use a hybrid of our new mechanism and any current mechanism depending on their objectives. For example, consumer organizations could reserve a certain number of testing slots for bestselling product models to ensure that they are included in the test, while all other testing slots are filled using our new mechanism. Overall, our paper presents an alternative to current product model selection mechanisms that reduces information asymmetry between buyers and sellers in order to increase consumer surplus.

Author Contributions

Conceptualization, F.B., C.E.-F. and U.V.; Theoretical methodology and analysis, P.I. (mathematical model, Notation, Assumptions, Definitions, Lemmas 1, 2, and 3, Propositions 1, 2, and 3 & respective Proofs), C.E.-F. (structure of the game, objective functions), F.B. and U.V. (SellersMayApply algorithm, Proposition 4 & proof); Experimental methodology, F.B., C.E.-F. and U.V.; Experimental software, validation, investigation, and analysis, F.B. and U.V.; Resources, F.B. and U.V.; Data Curation, F.B. and U.V.; Writing—Original Draft Preparation, U.V.; Visualization, F.B. and U.V.; Supervision, project administration, and funding acquisition, U.V.; All authors contributed to reviewing, editing and finalizing of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

Funding from the University of Duisburg-Essen (Program to promote excellent junior scientists) and the German Research Foundation (Deutsche Forschungsgemeinschaft, grant VO 2419/1-1) is gratefully acknowledged.

Data Availability Statement

The data presented in this study are available in the Open Science Framework at https://osf.io/9rhu5/ (accessed on 18 July 2025).

Acknowledgments

We thank seminar participants at several institutions (University of Duisburg-Essen, ESA Mentoring Workshop Berlin, University of Jena, Max Planck Institute for Research on Collective Goods Bonn, ESA European Meeting Vienna, Thurgauer Wirtschaftsinstitut/University of Konstanz, and DICE/University of Düsseldorf), as well as Zhewei Song, Erwin Amann, Volker Benndorf, Jeannette Brosig-Koch, Uwe Cantner, Yan Chen, Laura Gee, Malte Griebenow, Hanna Hoover, Ginger Jin, Oliver Kirchkamp, Dorothea Kübler, David Miller, Xiaofei Pan, Katrin Schmelz, Jan Siebert, Franziska Then, Silke Übelmesser, and Willem Schuchardt for valuable comments. Philipp Allroggen, Veronique Bathke, Denise Drabas, Annika Gauda, Mara König, Franziska Michel, Nina Ostermaier, Martin Sklorz, and Maximilian Zinn provided excellent research assistance.

Conflicts of Interest

One party had the right to review the paper prior to its circulation: the GfK SE (listed among the references). In 2017, we purchased data about total numbers of available product models for certain products in Germany from the GfK SE, and we used this data to motivate our study (see Section 1/Introduction). At that time, the GfK SE had reserved the right to review how we would quote and publish its data. Other than that, the authors declare no conflicts of interest.

Appendix A. Online Appendix

Appendix A.1. Screenshots of Products with Stiftung Warentest Results

Figure A1. Vacuum cleaner screenshot (Amazon Europe Core Sàrl, 2023).
Figure A1. Vacuum cleaner screenshot (Amazon Europe Core Sàrl, 2023).
Games 16 00044 g0a1
Figure A2. Milk screenshot (ALDI Nord Deutschland Stiftung & Co. KG, 2023).
Figure A2. Milk screenshot (ALDI Nord Deutschland Stiftung & Co. KG, 2023).
Games 16 00044 g0a2
Figure A3. Kids’ toothpaste screenshot (dm-drogerie markt GmbH + Co. KG, 2023).
Figure A3. Kids’ toothpaste screenshot (dm-drogerie markt GmbH + Co. KG, 2023).
Games 16 00044 g0a3

Appendix A.2. Sample of Products

Figure A4. Sample of products tested by Stiftung Warentest 09/2016. Note: Each dot represents one product model. The numbers in parentheses denote the shares of tested product models relative to all available product models in Germany (source: GfK SE, Nuremberg, 2017).
Figure A4. Sample of products tested by Stiftung Warentest 09/2016. Note: Each dot represents one product model. The numbers in parentheses denote the shares of tested product models relative to all available product models in Germany (source: GfK SE, Nuremberg, 2017).
Games 16 00044 g0a4
Figure A5. Sample of products tested by Stiftung Warentest 09/2016 cont. Note: Each dot represents one product model. The numbers in parentheses denote the shares of tested product models relative to all available product models in Germany (source: GfK SE, Nuremberg, 2017).
Figure A5. Sample of products tested by Stiftung Warentest 09/2016 cont. Note: Each dot represents one product model. The numbers in parentheses denote the shares of tested product models relative to all available product models in Germany (source: GfK SE, Nuremberg, 2017).
Games 16 00044 g0a5

Appendix A.3. Current Product Model Selection Mechanisms

Table A1. Current product model selection mechanisms.
Table A1. Current product model selection mechanisms.
Product Model Selection CriteriaProduct Model Selection Standardized?
Stiftung Warentest (Germany)main criterion: sales numbers/bestsellers; if applicable, organic product models and/or product models with new features will be selected even if they are not among the bestsellersyes
Which? (UK)popularity, incl. sales numbers brand reliability price if applicable, innovationno
Consumer Reports (US)spectrum of models: wide availability, incl. sales numbers wide range of prices if applicable, new featuresnot clear
Sources: https://www.test.de/unternehmen/testablauf-5017344-0/ (last accessed 5 December 2019); The website provides an overview of how Stiftung Warentest selects product models. In addition, we contacted them via telephone on 16 December 2014. Heike van Laak (head of the communication department) confirmed that they use a standardized product model selection mechanism. http://www.which.co.uk/about-which/research-methods/lab-testing/ (last accessed 5 December 2019); The website provides an overview of how Which? selects product models. However, it is not clear whether Which? uses a standardized product model selection mechanism. Therefore, we contacted them via telephone. On 30 September 2016, Kim Culver (corporate affairs) informed us that they do not use a standardized product model selection mechanism. http://www.consumerreports.org/cro/about-us/whats-behind-the-ratings/testing/appliances-home/index.htm (last accessed 5 December 2019); The website provides an overview of how Consumer Reports selects product models. It seems likely that they use a standardized product model selection mechanism, but it is not completely clear. Therefore, we contacted them via e-mail on 6 October 2016, but Nicole Sarrubbo (events and organizational communications manager; public affairs, events and outreach manager; associate editor) informed us that they do not provide additional information regarding their mechanism.

Appendix A.4. Out-of-Stock Product Models

Two product models recently rated “good” by Stiftung Warentest were temporarily not available for purchase soon after the ratings had been published: the frying pan “Gastro Sus Diamas Pro Industar” rated in the 01/2021 magazine, and the coffee machine “Philipps EP5447/90” rated in the 01/2022 magazine.
Tobias Neuhaus from the Josef Schulte-Ufer KG Metallwarenfabrik confirmed via telephone on Feb 2nd, 2022, that their company was surprised by the increased demand after the frying pan’s positive rating had been published. They needed approximately three months to return to regular delivery times.
The coffee machine “Philipps EP5447/90” was out of stock throughout February 2022 on the following major websites offering coffee machines (all accessed on 28 February 2022):

Appendix A.5. Visual Comparison of Bagwell and Riordan (1991)’s and Our Setup

Figure A6. Visual comparison of Bagwell and Riordan (1991)’s and our setup.
Figure A6. Visual comparison of Bagwell and Riordan (1991)’s and our setup.
Games 16 00044 g0a6

Appendix A.6. List of Symbols

Table A2. List of symbols (in alphabetic order).
Table A2. List of symbols (in alphabetic order).
b h buyer, with b h B , and h { 1 , , s }
Bset of buyers, with B = { b 1 , , b s } , and s N
c q t seller f t ’s unit costs of production
Dset of globally dominated sellers
D Z set of locally dominated sellers within the set Z, with Z F
f 0 non-existing seller with q 0 = 0 and p 0 = 0 , denoting a buyer’s choice not to purchase a product model
f F c q ̲ seller offering the overall lowest quality and the overall lowest price
f t seller, with f t F , and t { 1 , , n }
f t false non-existing seller with p t , q t false
f Z c seller who offers the cheapest product model within the set Z, with Z F
f Z * h seller who maximizes buyer b h ’s utility within the set Z, with Z F
f Z ˜ h seller who maximizes buyer b h ’s expected utility within the set Z, with Z F
Fset of sellers, with F = { f 1 , , f n } , and n N
hindex of buyer b h and θ h
kcertifier’s maximum testing capacity, with k N
Kset of applicant sellers, with K F
K false set of applicant sellers who state a false product model quality when applying
K true set of applicant sellers who state their true product model quality when applying
K (final) set of tested sellers, with K F
K i set of tested sellers in the i th algorithm iteration
nnumber of sellers f t F , with t { 1 , , n }
N D set of globally dominated sellers
N D Z set of locally non-dominated sellers within the set Z, with Z F
N D ¯ globally non-dominated sellers who have at least one buyer under complete information
p t seller f t ’s price, with 0 p t R +
q t seller f t ’s quality, with 0 q t Q R +
q t false seller f t ’s falsely stated quality
q ̲ overall lowest quality
Qset of all quality levels q t
R t set of dominating rivals of seller f t D , with R t F
snumber of buyers b h B , with h { 1 , , s }
tindex of seller f t , quality q t , and price p t
u h buyer b h ’s utility
Zsubset of F
θ h buyer b h ’s valuation of quality, with 0 < θ h R +
π t seller f t ’s profit

Appendix A.7. Additional Formal Descriptions of Section 3

Appendix A.7.1. Formal Algorithm Description of Section 3.1.3

We first partition the set of applicants K into two disjoint subsets: K true : = f j K | stating q j = q j true , the set of applicants who state a true product model quality, and K false : = f j K | stating q j = q j false , the set of applicants who state a false product model quality, with K = K true K false . We denote a seller f t ’s falsely stated quality with q t false = q t + ϵ , with q t ϵ R + and ϵ 0 . The variable ϵ indicates the extent to which the falsely stated quality deviates from the true product quality. The vector of features p t , q t false is associated with the product model of a non-existing seller f t false . We define K i with i 1 , , arg   min l { 1 , , # K } l | K l K false = as the set of all tested product models remaining after the algorithm step 3 (see below) has been performed for the i th time. Furthermore, we define
K 1 K 2 K arg   min l { 1 , , # K } l | K l K false = = : K
as a sequence of these sets. We define K 0 : = . The algorithm proceeds until either algorithm step 3 is reached without detecting a false quality statement or there are no more promising untested applicants, i.e., i 1 , , min arg   max l { 1 , , # K } l | { K l K false } = , # K . Note that the algorithm stops after the first iteration if all sellers submit true qualities when applying to be tested (which they are predicted to do in equilibrium according to Proposition 2), i.e., K = K true .
Figure A7. Flowchart of SellersMayApply algorithm to select product models for testing from the applicants (verbal description in Figure 2).
Figure A7. Flowchart of SellersMayApply algorithm to select product models for testing from the applicants (verbal description in Figure 2).
Games 16 00044 g0a7

Appendix A.7.2. Illustration of Assumption 2

In the first case ( f g = f 1 ), Assumption 2 has to hold for a scenario in which seller f 1 , and all globally dominated sellers, i.e., f 2 , f 3 , f 5 , f 6 , f 8 , f 9 , f 11 , f 12 , f 14 , f 15 , are untested; therefore, their qualities are known in expectation only to buyers. Furthermore, sellers f 7 , f 10 , and f 13 are tested; therefore, buyers know that q 7 = 3 , q 10 = 4 , and q 13 = 5 . Last, seller f 4 may or may not be tested; therefore, q 4 may or may not be known to buyers. Here, E q 1 SellersMayApply = ( 3 × 1 + 3 × 2 ) / 6 = 1.5 if seller f 4 is untested, and E q 1 SellersMayApply = 3 × 1 / 3 = 1 if seller f 4 is tested.31 Therefore, it follows that E u 3 p 1 , q 1 , θ 3 SellersMayApply 7 × 1.5 2.9 = 7.6 . Since q 7 = 3 , it follows that u 3 p 7 , q 7 , θ 3 = 7 × 3 10.9 = 10.1 . Therefore, it follows that u 3 p 1 , q 1 , θ 3 > E u 3 p 4 , q 4 , θ 3 SellersMayApply which implies that Assumption 2 holds.
In the second case ( f g = f 4 ), Assumption 2 has to hold for a scenario in which sellers f 1 and f 4 , and all globally dominated sellers, i.e., f 2 , f 3 , f 5 , f 6 , f 8 , f 9 , f 11 , f 12 , f 14 , f 15 , are untested; therefore, their qualities are known in expectation only to buyers. Furthermore, sellers f 7 , f 10 , and f 13 are tested; therefore, buyers know that q 7 = 3 , q 10 = 4 , and q 13 = 5 . (There are no other sellers to be considered in this example.) Here, E q 4 SellersMayApply = 3 × 1 + 3 × 2 / 6 = 1.5 .32 Therefore, it follows that E u 3 p 4 , q 4 , θ 3 SellersMayApply = 7 × 1.5 5 = 5.5 . Since q 7 = 3 , it follows that u 3 p 7 , q 7 , θ 3 = 7 × 3 10.9 = 10.1 . Therefore, it follows that u 3 p 7 , q 7 , θ 3 > E u 3 p 4 , q 4 , θ 3 SellersMayApply which implies that Assumption 2 holds.

Appendix A.7.3. Definition of Market Areas of Section 3.2.1 (CompleteInformation)

We use Equation (7) to define the market areas, i.e., the sets of valuations of quality θ h θ 1 , , θ s whose buyers prefer the quality level of seller f t N D given the vector of prices p 1 , , p n T with c ( q i ) < p i f i F and vector of qualities q 1 , , q n T :
Θ t p t , q t = θ h | u h p t , q t , θ h max 0 , u h p j , q j , θ h f j F
with h { 1 , , s } . Here, all globally non-dominated sellers f t with at least one buyer for the market area Θ t ( p t , q t ) will sell at least one product model while no globally dominated seller will do so.

Appendix A.7.4. Equilibrium Payoffs of Section 3.2.2 (SellersMayNotApply)

Under any current selection mechanism SellersMayNotApply, buyer equilibrium payoffs can be calculated as:
u h p F f 0 ˜ h , q F f 0 ˜ h , θ h = θ h q F f 0 ˜ h p F f 0 ˜ h = θ h q N D K f F K c f 0 ˜ h p N D K f F K c f 0 ˜ h
with f Z ˜ h denoting the seller who maximizes buyer b h ’s expected utility in the set Z, and with q Z ˜ h and p Z ˜ h denoting the respective quality and price.
Regarding sellers’ equilibrium payoffs, we first examine the case where seller f t D offers a locally dominated tested product model in K , which implies f t D K . Since f t ’s product model is locally dominated in K , it does not maximize any buyer’s expected utility. It follows that f t arg   max f l F E u h ( p l , q l , θ h ) SellersMayNotApply     b h B . Therefore, d ( p t , q t ) = 0 , and π t ( p t , q t ) D K = 0 .
In the second case, seller f t F offers a locally non-dominated tested product model in K , which implies f t N D K . Here, we make no restriction on whether our seller is globally dominated or globally non-dominated. Expected profits can be calculated as
E π t p t , q t SellersMayNotApply N D K = E d p t , q t SellersMayApply N D K ( p t c q t ) = h = 1 s 1 f t = f K f 0 * h θ h q t p t > θ h E q F K c SellersMayNotApply p F K c ( p t c q t ) .
The above equation implies that a tested product model maximizes the buyer’s utility by taking into account both the full set of tested product models yielding a non-negative utility (first pair of braces in the indicator function) and the cheapest untested product model (second pair of braces).
In the third case, seller f t F offers an untested product model, which implies f t F K . Again, we make no restriction on whether our seller is globally dominated or globally non-dominated since buyers consider only the expected quality of untested product models. The expected demand and expected profits can be calculated similarly to Equation (A3), with the difference being that untested product models compete among themselves on price. A selected product model in this case must be the cheapest untested one yielding a non-negative utility (first pair of braces) with an expected utility higher than that of every tested product model (second pair of braces).
E π t p t , q t SellersMayNotApply F K = E d p t , q t SellersMayNotApply F K p t c q t = h = 1 s 1 { f t = f F K c f 0 ˜ h { θ h E q t SellersMayNotApply p t > θ h q K * h p K * h p t c q t
Note that, if seller f t ’s product model is not tested and he does not offer the cheapest price of all untested product models (first pair of braces), his profit will equal zero, regardless of whether his product model is globally dominated or not.

Appendix A.7.5. Proofs of Section 3.2

Proof of Proposition 1. 
(Consumer surplus and seller profits under any current mechanism SellersMayNotApply). The expected consumer surplus equals
h = 1 s θ h q N D K f F K c f 0 ˜ h p N D K f F K c f 0 ˜ h .
The expected profits of globally dominated sellers equal
t = m + 1 n h = 1 s 1 f t = f N D K f F K c f 0 ˜ h p t c q t .
The expected profits of globally non-dominated sellers equal
t = 1 m h = 1 s 1 f t = f N D K f F K c f 0 ˜ h p t c q t .
  • Lower consumer surplus
  • Let N D ¯ K . Furthermore, let f j N D ¯ , with f j K , and with b l b 1 , , b s , with f j = arg   max f k F f 0 u l p k , q k , θ l , and with f j arg   max f k f F K c K f 0 E u l p k , q k , θ l SellersMayNotApply . It follows that b l b 1 , , b s with
    max f k N D ¯ f 0 u l p k , q k , θ l > max f k f F K c K f 0 E u l p k , q k , θ l SellersMayNotApply .
  • This inequality holds since, under SellersMayNotApply, b l selects a product model f j different from her complete-information-optimal product model since we required that f j K and f j arg   max f k f F K c K f 0 E u l p k , q k , θ l SellersMayNotApply . Therefore, it follows that b l ’s utility is strictly lower under SellersMayNotApply compared to a world of CompleteInformation. It follows that the aggregate consumer surplus is also strictly lower.
  • Identical consumer surplus
(i)
If N D ¯ K , then, according to Assumption 2, b l b 1 , , b s the following holds: max E u l p F N D ¯ c , q F N D ¯ c , θ l SellersMayNotApply , 0 < max f j N D ¯ u ( p j , q j , θ l ) . For each buyer, this leads to the identical optimization problem in both a world of CompleteInformation and under SellersMayNotApply, i.e., arg   max f k N D ¯ f 0 u l p k , q k , θ l = arg   max f k f F K c K f 0 E u l p k , q k , θ l SellersMayNotApply     b l b 1 , , b s . Therefore, b l ’s utility under SellersMayNotApply is identical to that under CompleteInformation. It follows that the consumer surplus is also identical.
(ii)
Let f F c N D ¯ , let f F c K , and let N D ¯ f F c K . Moreover, b l b 1 , , b s with f F c = arg   max f k N D ¯ f 0 u l p k , q k , θ l , the following holds: f F c = arg   max f k F K c K f 0 E u l p k , q k , θ l SellersMayNotApply .
In this case, it follows that, except b l b 1 , , b s with f j = arg   max f k N D ¯ f 0 u l p k , q k , θ l , every buyer selects the same optimal product model as under CompleteInformation by assumption. Therefore, the utility of these buyers under SellersMayNotApply is identical to that in a world of CompleteInformation. Since we assume f j = f F K c = arg   max f k f F K c K f 0 E u l p k , q k , θ l SellersMayNotApply       b l b 1 , , b s with f j = arg   max f k N D ¯ f 0 u l p k , q k , θ l , it follows that these buyers also select the same optimal product model as under CompleteInformation. It follows that all buyers select the same optimal product model as under CompleteInformation, and that the consumer surplus is identical.
Lemma A1 
(Properties of locally non-dominated product models within submarkets). Let f t N D Z with f t { f 1 , , f n } and Z F . It follows that f t N D Z Z Z with f t Z .
Proof of Lemma A1. 
(Properties of locally non-dominated product models in submarkets). Let seller f t { f 1 , , f n } offer a locally non-dominated product model in Z F . It follows that f j Z , with j { 1 , , n } and p j < p t , it is fulfilled that q j < q t , and f j Z , with j { 1 , , n } and q j q t , it is fulfilled that p j > p t . It follows that f j Z Z , with j { 1 , , n } , f t Z and p j < p t , it is fulfilled that q j < q t , and f j Z , with j { 1 , , n } , f t Z and q j q t , it is fulfilled that p j > p t . It follows that seller f j ’s product model is locally non-dominated in Z. Since we make no further requirements for Z, our last conclusion holds Z Z with f t Z . It follows that seller f t ’s product model is locally non-dominated Z Z with f t Z . □
Lemma A2 
(Transitivity of being (locally or globally) dominated). Let f t be dominated by f j in Z F , and let f j be dominated by f k in Z. It follows that f t is also dominated by f k in Z.
Proof of Lemma A2. 
(Transitivity of being (locally or globally) dominated). Let f t be dominated by f j in Z. It follows that p t p j and q t < q j , or p t > p j and q t q j . We also know that f j is dominated by f k in Z. It follows that p j p k and q j < q k , or p j > p k and q j q k . Combining both conclusions, we have four possible options:
  • p t p j p k and q t < q j < q k ,
  • p t p j > p k and q t < q j q k ,
  • p t > p j p k and q t q j < q k , or
  • p t > p j > p k and q t q j q k .
Thus, it follows that:
  • p t p k and q t < q k ,
  • p t > p k and q t < q k ,
  • p t > p k and q t < q k , or
  • p t > p k and q t q k .
In each case, it follows that f t is dominated by f k in Z. □
Lemma A3 
(Relationship between local dominance and composition of a set of dominating rivals). Assume f t D Z . It follows that { R t N D Z } .
Proof of Lemma A3. 
(Relationship between local dominance and composition of a set of dominating rivals). Assume f t D Z . Since Z = N D Z D Z , ∃ seller f j { R t Z } , with either f j { R t N D Z } , or f j { R t D Z } . If f j { R t N D Z } , our proof would be complete. Therefore, let us consider f j { R t D Z } . We know about the globally dominated seller f j that ∃ seller f k { R j Z } , with either f k { R j N D Z } , or f k { R j D Z } . Again, if f k { R j N D Z } , our proof would be complete. Therefore, we consider f k { R j D Z } . We continue this procedure as long as there are no more locally dominated sellers in Z remaining. Recall from Section 3.1 that the number of locally dominated sellers in Z is finite since the number of global sellers is finite. At the maximum, the procedure consists of ( # D Z 1 ) steps. Let f l D Z be the last seller in this procedure to be dominated by a seller f x N D Z . It follows that f x { R l Z } . Due to the transitivity of being dominated (see Lemma A2), we can conclude that f x { R t Z } . It follows that { R t N D Z } . □
Proof of Proposition 2. 
(Unique Nash equilibrium, consumer surplus and seller profits under our new mechanism SellersMayApply).
Step 1: 
In this step, we show that all strategy profiles can be eliminated in which at least one seller applies to be tested stating a false quality.
Step 1.1: 
In this step, we show that applying to be tested stating a false quality is a strictly dominated strategy for any globally non-dominated seller.
If a globally non-dominated seller applies to be tested stating a false quality, this may either result in him being tested, or it may result in him not being tested. If he applies stating a false quality and is tested, his profits would be lower compared to applying stating his true quality and compared to not applying since we assume the punishment_fee is higher than the maximum additional profits any globally non-dominated seller could make when being tested (see Section 3.1.1).
If a globally non-dominated seller applies stating a false quality and is not tested, this does not change the set of product models that are tested. Therefore, the globally non-dominated seller could save the application_costs by not applying for a test.
Combining both results, it follows that applying stating a false quality is a strictly dominated strategy for all globally non-dominated sellers and can thus be eliminated for these sellers.
Step 1.2: 
In this step, we show that, when analyzing all remaining strategy profiles after step 1.1, a strategy profile in which at least one globally dominated seller applies stating a false quality cannot be a Nash equilibrium.
If a globally dominated seller applies to be tested stating a false quality, this may either result in him being tested, or it may result in him not being tested. Analogously to step 1.1, if a globally dominated seller applies stating a false quality and is not tested, this does not change the set of product models that are tested. Therefore, the globally dominated seller could save the application_costs by not applying for a test.
If a globally dominated seller, e.g., f t , applies stating a false quality and is tested, we can distinguish two cases: at least one of his dominating rivals is tested, too, or none of his dominating rivals is tested. In the former case, f t would not receive any demand. Therefore, he would receive the same profit as if he had not applied, and additionally would have to pay the punishment_fee compared to applying stating his true quality.
In the latter case—a globally dominated seller applies stating a false quality, and is tested, while none of his dominating rivals is tested—we can furthermore distinguish two subcases: either the globally dominated seller incurs losses (or makes zero profit) after having been tested, or he receives a positive profit after having been tested. In the first subcase, the globally dominated seller could save the application_costs compared to not applying.
In the second subcase, we can furthermore distinguish two subcases: either f t receives higher profits than those had he applied stating his true quality, or he receives lower or equal profits. In the latter subcase, f t could have saved the punishment_fee by applying stating his true quality.
As to the former subcase, we will show by a proof by contradiction that any strategy profile resulting in the latter subcase cannot be a Nash equilibrium. In order to show this, we furthermore distinguish two subcases: Either at least one of f t ’s dominating rivals applies for a test stating his true quality, but is not tested (subcase 1), or none of f t ’s dominating rivals applies for a test stating his true quality (subcase 2).
Subcase 1
Recall that the first subcase contains all strategy profiles in which
  • seller f t D applies stating a false quality such that he is tested, makes positive profits which are higher than those had he applied stating his true quality
  • at least one of f t ’s dominating rivals applies stating his true quality, but neither he nor any other dominating rival is tested.
In order for this subcase to occur, all of f t ’s dominating rivals who applied stating their true quality would have to be locally dominated within the set of applicants given stated qualities, i.e., within K true f o false | f o K false , since they would have been tested otherwise. (Recall that we assume the following. There is no pair of sellers offering their product model at the same price (see Section 3.1.1). The certifier provides at least as many testing slots as there are quality levels (see Section 3.1.1). Therefore, there is at most one locally non-dominated applicant in K true f o false | f o K false per quality level, and the certifier is able to test all locally non-dominated sellers in K true f o false | f o K false .)
If all of f t ’s dominating rivals who applied stating their true quality were locally dominated within the set of applicants given stated qualities, at least one seller, e.g., f z , would need to exist for each of these dominating rivals, for whom the following holds according to Lemma A3: He is locally non-dominated within the set K true f o false | f o K false , and he dominates all of f t ’s dominating rivals who applied stating their true quality, but is not tested. Is there any seller f z who would do so in equilibrium?
According to Lemma A2, seller f z would also dominate seller f t within the set K true f o false | f o K false . Since we require, in this subcase, that none of f t ’s dominating rivals is tested, and since f z would be one of f t ’s dominating rivals if he applied stating his true quality, it follows that f z has to be a seller who applied stating a false quality. Moreover, according to step 1.1, f z cannot be a globally non-dominated seller; i.e., f z could only be a globally dominated seller.
According to Lemma A3, seller f z has at least one dominating rival since he is globally dominated. Moreover, f z would only have an incentive to apply stating a false quality if none of his dominating rivals is tested. This implies that either there are none of f z ’s dominating rivals in K true f o false | f o K false , or, if there is at least one of f z ’s dominating rivals in K true f o false | f o K false , all of them are locally dominated within this set.
If there were none of f z ’s dominating rivals in K true f o false | f o K false , he could have saved the punishment_fee by applying stating his true quality. Therefore, there would have to be at least one of f z ’s dominating rivals in K true f o false | f o K false , and all of them would have to be locally dominated within this set. Note that these dominating rivals would also have to be dominated in K true f o false | f o K false by other sellers, which in turn would also have to be dominated in the same set, etc.
It follows that there would have to be a group of globally dominated sellers who apply stating their false quality, are tested, receive positive profits, congest each others’ dominating rivals testing slots, and who would not receive equal or higher profits when applying stating their true quality. However, within any of these groups of globally dominated sellers, there is no seller who would be able to congest the testing slot of the overall cheapest seller within this group since we assume there is no pair of sellers offering their product model at the same price (see Section 3.1.1). Therefore, the overall cheapest seller within any of these groups will not have an incentive to apply stating a false quality. This leads to a chain reaction causing sellers, whose dominating rivals in K true f o false | f o K false would be dominated by this seller, not to apply either, etc. This implies that seller f t would also not have an incentive to apply stating a false quality.
Subcase 2
If the globally dominated seller f t were locally non-dominated in K true f o false | f o K false when applying stating his true quality, he could have saved the punishment_fee by applying stating his true quality. It follows that he has to be locally dominated in K true f o false | f o K false . Since none of f t ’s dominating rivals is tested as none of them applied, f t has to be locally dominated in K true f o false | f o K false by another globally dominated seller who applied stating a false quality. Since none of this seller’s dominating rivals is tested as none of them applied, and according to Lemma A2, this seller has to be locally dominated in K true f o false | f o K false by another globally dominated seller who applied stating a false quality, etc. Note that the overall cheapest seller in this group of sellers cannot be dominated by any other seller in the group in K true f o false | f o K false . Therefore, the overall cheapest seller in this group does not have an incentive to apply stating a false quality. This leads to a chain reaction causing sellers who would have been dominated by this seller in K true f o false | f o K false to refrain from applying stating their false quality, etc. This contradicts the requirement that f t applied stating a false quality.
It follows that, given all remaining strategy profiles after step 1.1, a strategy profile in which at least one globally non-dominated seller applies stating a false quality cannot be a Nash equilibrium.
Combining steps 1.1 and 1.2, it follows that all strategy profiles can be eliminated in which at least one seller applies to be tested stating a false quality.
Step 2: 
In this step, we show that, given all remaining strategy profiles, all strategy profiles can be eliminated in which seller f F c q ̲ or at least one globally dominated seller applies to be tested stating his true quality.
Step 2.1:
In this step, we show that, given all remaining strategy profiles, all strategy profiles can be eliminated in which seller f F c q ̲ applies to be tested stating his true quality.
By definition, seller f F c q ̲ offers the overall lowest quality. Therefore, his true quality is always smaller than or equal to his expected quality for any possible strategy profile. Thus, his demand would never increase if he applied to be tested stating his true quality (compared to not applying), but he would have to pay the application_costs . It follows that, given all remaining strategy profiles after step 1, a strategy profile in which seller f F c q ̲ applies to be tested stating his true quality cannot be a Nash equilibrium.
Step 2.2:
In this step, we show that, given all remaining strategy profiles, all strategy profiles can be eliminated in which at least one globally dominated seller applies to be tested stating his true quality.
By definition, any globally dominated seller f t has at least one dominating rival. If both f t and his dominating rival applied to be tested stating their true quality, our selection algorithm would have excluded f t in algorithm step 1, and f t would not have been tested. In this case, f t could have saved the application_costs by not applying.
Seller f t would only be tested if none of his dominating rivals applied for testing. The only reason for f t ’s dominating rivals not to apply for testing could be that they expect the additional profit from applying to be lower than the application_costs . This could either be because they do not expect to be tested after having applied, or because they do not expect any additional demand even if they were tested after having applied. If any dominating rival did not expect to be tested after having applied, f t would also be excluded from testing in algorithm step 1. If a dominating rival did expect to be tested after having applied, but did not expect any additional demand, f t would also not expect any additional demand if tested, and thus, should not have applied. (By definition, any of f t ’s dominating rivals is cheaper and potentially also of higher quality compared to f t . If buyers preferred another seller (or refraining from buying) over f t ’s dominating rival, they would also prefer this other seller (or refraining from buying) over f t .)
It follows that all strategy profiles can be eliminated in which at least one globally dominated seller applies to be tested stating his true quality.
Combining steps 2.1 and 2.2, it follows that all strategy profiles can be eliminated in which seller f F c q ̲ or at least one globally dominated seller applies to be tested stating his true quality.
Step 3: 
In this step, we show that, given all remaining strategy profiles, all strategy profiles can be eliminated in which at least one seller in N D ¯ f F c does not apply to be tested stating his true quality.
Let f t ( f k ) denote the seller with the highest (second-highest) quality level in N D ¯ . According to Assumption 1, it follows that, for any strategy profile in which f k applies to be tested (stating his true quality) and in which no globally dominated seller applies for testing, untested f t is dominated in expectation by tested f k . For any strategy profile in which neither f k , nor any globally dominated seller applies for testing, untested f t is not the cheapest untested seller within his price range. It follows that f t would not receive any demand if he were untested.
If seller f t applies for testing stating his true quality, he will be tested since
(i)
in equilibrium, no seller will apply stating a false quality
(ii)
f t will be locally non-dominated on K true F according to Lemma A1
(iii)
there are enough testing slots for all sellers in N D ¯ as
(a)
the certifier is assumed to provide at least as many testing slots as there are quality levels (see Section 3.1.1)
(b)
there are at most as many sellers in N D ¯ as there are quality levels because we assume there is no pair of sellers offering their product model at the same price (see Section 3.1.1).
Moreover, if seller f t is tested, he will receive additional profits since any buyer b l who would have selected him under CompleteInformation will also select him if tested, irrespective of whether any seller in N D ¯ f t f F c q ̲ applies for testing or not. (By definition, buyer b l prefers tested f t over any tested seller in N D ¯ f t . According to Assumption 2, buyer b l also prefers tested f t over any cheaper untested seller.) Since application_costs are assumed to be lower than the profit margin of each seller in N D ¯ (see Section 3.1.3), f t will make positive, and therefore, higher profits when applying compared to not applying.
It follows that all strategy profiles can be eliminated in which the seller with the highest quality level in N D ¯ does not apply for testing. Therefore, the expected quality of all cheaper untested product models decreases.
Given all remaining strategy profiles, the seller with the second-highest (third-highest, …, pre-lowest) quality level in N D ¯ , e.g., seller f t 1 f t 2 , , f 2 , would not receive any demand if he were untested since any buyer would receive a higher utility when selecting the seller with the third-highest (fourth-highest, …, pre-lowest) quality level in N D ¯ , irrespective of whether any seller in N D ¯ f t f F c q ̲ f t 1 applies for testing or not—according to Assumption 1.
Analogously, if the seller with the second-highest (third-highest, …, pre-lowest) quality level in N D ¯ applies for testing, he will be tested since
(i)
in equilibrium, no seller will apply stating a false quality
(ii)
f t 1 will be locally non-dominated on K true F according to Lemma A1
(iii)
there are enough testing slots for all sellers in N D ¯ as
(a)
the certifier is assumed to provide at least as many testing slots as there are quality levels (see Section 3.1.1)
(b)
there are at most as many sellers in N D ¯ as there are quality levels because we assume there is no pair of sellers offering their product model at the same price (see Section 3.1.1).
Moreover, if seller f t 1 is tested, he will receive additional profits since any buyer b h who would have selected him under CompleteInformation will also select him if tested, irrespective of whether any seller in N D ¯ { f t f F c q ̲ f t 1 } applies for testing or not. (By definition, buyer b h prefers tested f t 1 over any tested seller in N D ¯ { f t f t 1 } . According to Assumption 2, buyer b h also prefers tested f t 1 over any untested seller in N D ¯ { f t f t 1 } .) Since application_costs are assumed to be lower than the profit margin of each seller in N D ¯ (see Section 3.1.3), f t will make positive, and therefore, higher profits when applying compared to not applying.
It follows that all strategy profiles can be eliminated in which the seller with the second-highest quality level in N D ¯ does not apply for testing. Therefore, the expected quality of all cheaper untested product models decreases, etc.
Combining the above results, it follows that all strategy profiles can be eliminated in which at least one seller in N D ¯ f F c does not apply for testing stating his true quality.
Step 4: 
In this step, we show that, given all remaining strategy profiles, seller f F c f F c q ̲ may apply to be tested stating his true quality, or he may not apply to be tested, depending on the market parameters.
Depending on the joint distribution of qualities, prices and valuations of quality, seller f F c f F c q ̲ may or may not belong to N D ¯ . If he does not belong to N D ¯ , he would not receive any additional profits if he were tested since he would not be selected by any buyer under CompleteInformation. He would, therefore, also not be selected by any buyer in any strategy profile in which all sellers in N D ¯ are tested since every buyer would select the product model he would have selected under CompleteInformation.
If seller f F c f F c q ̲ does belong to N D ¯ , his equilibrium strategy depends on the joint distribution of qualities, prices and valuations of quality, and his resulting expected quality (if untested). If they are distributed such that seller f F c f F c q ̲ would not receive any additional demand if tested, he will not apply for testing. If they are distributed such that seller f F c f F c q ̲ would receive any additional demand if tested, he will apply for testing.
Step 5: 
In this step, we show that, given all remaining strategy profiles, all strategy profiles can be eliminated in which at least one of the sellers in N D N D ¯ f F c (if any) applies to be tested stating his true quality.
If seller f k N D N D ¯ f F c applied to be tested stating his true quality, he would be tested since
(i)
in equilibrium, no seller will apply stating a false quality
(ii)
f t will be locally non-dominated on K true F according to Lemma A1
(iii)
there are enough testing slots for all sellers in N D as
(a)
the certifier is assumed to provide at least as many testing slots as there are quality levels (see Section 3.1.1)
(b)
there are at most as many sellers in N D as there are quality levels because we assume there is no pair of sellers offering their product model at the same price (see Section 3.1.1).
However, even if tested, no buyer would select his product model if all buyers’ complete-information-optimal product models are tested.
If all buyers’ complete-information-optimal product models except the overall cheapest one (which also belongs to N D ¯ ) are tested, the following strategy profile could not be an equilibrium: a strategy profile in which a seller from N D N D ¯ f F c applied for testing and buyers who would have otherwise selected the overall cheapest seller would now select the tested seller from N D N D ¯ f F c . In this strategy profile, they overall cheapest seller would have an incentive to deviate from his not-applying-strategy because the application_costs are assumed to be lower than the profit margin of one additional buyer (see Section 3.1.3).
Therefore, buyers will select only sellers from N D ¯ or the overall cheapest seller.
It follows that f k would not receive any demand after an applying to be tested stating his true quality, but would have to pay the application_costs . Therefore, f k will not apply.
It follows that all strategy profiles can be eliminated in which at least one of the sellers in F N D ¯ f F c , applies to be tested stating his true quality.
Step 6: 
In this step, we show that, in equilibrium, all buyers select the same product model as under CompleteInformation.
According to Assumption 2, all tested sellers are optimal for the same buyers who would have selected them under CompleteInformation. Also, all untested sellers receive the same demand as under CompleteInformation.
All untested sellers in D or in N D N D ¯ f F c receive zero demand. If f F c does not belong to N D ¯ and is untested, he receives zero demand. If f F c does belong to N D ¯ and is untested, he receives the same positive demand as under CompleteInformation. Otherwise, he would have had an incentive to deviate from his not-applying-strategy because the application_costs are assumed to be lower than the profit margin of one additional buyer (see Section 3.1.3).
It follows that, in equilibrium, all buyers select the same product model as under CompleteInformation.
To conclude, by sequentially eliminating strategy profiles either because they contain strictly dominated strategies, or because they cannot be a Nash equilibrium due to other reasons, we solve for the unique Nash equilibrium described in Proposition 2. □

Appendix A.8. SellersMayApplyGeneralized

In this section, we present a generalized version of SellersMayApply which could be applied if the certifier were not able to provide as many testing slots as there are quality levels. This version also allows for identical prices, does not require the two distribution assumptions about the market parameters (Assumptions 1 and 2), and uses a different method to deter lying. In the following, we describe which assumptions we do need, and how the SellersMayApplyGeneralized mechanism differs from SellersMayApply.
  • Sellers (He/His)
We assume that, if seller f t F is indifferent between applying stating his true quality and not applying, he will apply since he reasons as follows: If other sellers mistakenly did not apply for testing, f t could still have a chance to be tested. (In the following, we refer to this assumption as “indifference assumption 1”.) If seller f t is indifferent between applying stating a false quality and not applying, we assume he will not apply due to lying aversion (Abeler et al., 2019). (In the following, we refer to this assumption as “indifference assumption 2”.) Moreover, sellers may offer their product model at the same price.
  • Certifier (It/Its)
We assume the certifier provides at least one regular (i.e., not punishment-fee-financed) testing slot, i.e., k 1 . Furthermore, we assume both mechanisms, SellersMayNotApply and SellersMayApplyGeneralized, provide the same number of regular testing slots. However, under SellersMayApplyGeneralized, the certifier may add extra testing slots financed through the detected liars’ punishment fees. Also, we assume the certifier knows how the valuation of quality θ h is distributed.
  • Buyers (She/Hers)
We assume buyers know whether k # N D ¯ or k < # N D ¯ . In other words, we assume buyers know whether there are at least as many (regular) testing slots as the number of unique complete-information-optimal product models, or whether there are fewer (regular) testing slots.
  • New Selection Mechanism SellersMayApplyGeneralized
Sellers who apply need to pay a positive application_deposit . This application_deposit will be returned to all locally non-dominated applicants (using stated qualities) unless it turns out that they stated a false quality when applying. If an applicant seller’s product model is chosen for testing and the submitted quality is found to be false, this seller must also pay a positive punishment_fee which is as high as the costs for one additional testing slot. For each liar that is detected, an extra testing slot is added to the number of regular testing slots k. After the final test, the true qualities of all tested and truthfully-stated product models are published. The liars’ qualities are not published.
The SellersMayApplyGeneralized mechanism is based on the algorithm described in Figure A8. Note that it stops after algorithm step 2 in its first iteration (like the SellersMayApply algorithm) if all sellers submit true qualities when applying—wich they are predicted to do in equilibrium according to Proposition A1. This also implies that, in equilibrium, no punishment-fee-financed testing slots will be added.
  • No Distribution Assumptions
Last, the two distribution assumptions about the market parameters (Assumptions 1 and 2) are not needed.
Proposition A1 
(Unique Nash equilibrium and consumer surplus under SellersMayApplyGeneralized). A unique Nash equilibrium
( apply with q 1 , , apply with q ( # ND ) , do not apply , , do not apply , buy product model of seller f F ^ 1 , , buy product model of seller f F ^ s ) T
exists, with f F ^ 1 , , f F ^ s , with
f F ^ t = arg   max f t F f 0 θ h E q t SellersMayNotApply p t = arg   max f t F f 0 max f t K f 0 θ h q t p t , max f t { F K } f 0 θ h E q t SellersMayNotApply p t .
In equilibrium, all globally non-dominated sellers apply to be tested stating their true quality. From these applicants, at most k sellers maximizing expected aggregate consumer surplus are tested and their qualities published. All other sellers, i.e., all globally dominated sellers, do not apply to be tested. All buyers select their optimal tested product model, or their optimal one from the cheapest untested ones per price range. Consumer surplus depends on the number of (regular) testing slots.
(i) 
If k # N D ¯ , i.e., if there are at least as many (regular) testing slots as the number of unique complete-information-optimal product models, consumer surplus is as high as under CompleteInformation.
(ii) 
If k < # N D ¯ , i.e., if there are fewer (regular) testing slots than the number of unique complete-information-optimal product models, expected consumer surplus is at least as high as the expected consumer surplus under any current mechanism SellersMayNotApply.
Figure A8. SellersMayApplyGeneralized algorithm to select product models for testing from the set of applicants.
Figure A8. SellersMayApplyGeneralized algorithm to select product models for testing from the set of applicants.
Games 16 00044 g0a8
Proposition A1 implies that SellersMayApplyGeneralized yields the maximum possible consumer surplus, equivalent to a world of complete information, if there are “enough testing slots”. Otherwise, SellersMayApplyGeneralized yields at least the same expected consumer surplus as any other mechanism SellersMayNotApply. Note that, in this case, no selection mechanism is able to yield the maximum possible consumer surplus, equivalent to a world of complete information. Thus, SellersMayApplyGeneralized is still optimal as it, in expectation, weakly dominates any other mechanism.
Proof of Proposition A1. 
(Unique Nash equilibrium, consumer surplus and seller profits under our new mechanism SellersMayApplyGeneralized).
Step 1: 
In this step, we show that all strategy profiles can be eliminated in which at least one seller applies to be tested stating a false quality.
If any seller f t F applied to be tested stating a false quality, he would either end up belonging to the set of locally non-dominated applicants (using stated qualities), or he would not. In the latter case, f t would not be returned the application_deposit , nor would his application change the set of tested and published product models. Therefore, his profit would be higher if he did not apply for testing.
If f t ended up belonging to the set of locally non-dominated applicants (using stated qualities), we can furthermore distinguish two cases: either he ended up belonging to the subset maximizing aggregate consumer surplus, or he did not. In the former case, f t would be tested, but his quality would not be published (since detected liars’ qualities are not published). Furthermore, he would not be returned the application_deposit , he would have to pay the punishment_fee , and an extra testing slot financed through this punishment_fee would be added. Therefore, the set tested and published product models would not be changed by f t ’s application, and his profit would be higher if he did not apply for testing.
If f t ended up not belonging to the subset maximizing aggregate consumer surplus, he would be returned the application_deposit , but would not be tested, and his application would not change the set of tested and published product models. Therefore, f t would be indifferent between applying stating a false quality and not applying, and is assumed to choose not to apply in this case (“indifference assumption 2”).
Combining these results, it follows that all strategy profiles can be eliminated in which at least one seller applies to be tested stating a false quality.
Step 2:
In this step, we show that, given all remaining strategy profiles, all strategy profiles can be eliminated in which at least one globally non-dominated seller does not apply to be tested stating his true quality.
All globally non-dominated sellers weakly prefer applying stating their true quality over not applying since they would be returned the application deposit when applying. According to the “indifference assumption 1”, all of them will apply (stating their true quality). It follows that all strategy profiles can be eliminated in which at least one globally non-dominated seller does not apply to be tested stating his true quality.
Step 3:
In this step, we show that, given all remaining strategy profiles, all strategy profiles can be eliminated in which at least one globally dominated seller applies to be tested stating his true quality.
Since all globally non-dominated sellers will apply (stating their true quality), no globally dominated seller would be returned the application deposit when applying (stating his true quality). Therefore, not applying dominates applying (stating their true quality). It follows that all strategy profiles can be eliminated in which at least one globally dominated seller applies to be tested stating his true quality.
Step 4:
In this step, we show that, in equilibrium, buyers select their optimal tested product model, or their optimal one from the cheapest untested ones per price range. Consumer surplus depends on the number of (regular) testing slots.
(i)
If k # N D ¯ , i.e., if there are at least as many (regular) testing slots as the number of unique complete-information-optimal product models, buyers can conclude that all unique complete-information-optimal product models are tested (and published) since buyers are assumed to know that k # N D ¯ . Therefore, each buyer selects the optimal tested product model and consumer surplus is as high as under CompleteInformation.
(ii)
If k < # N D ¯ , i.e., if there are fewer (regular) testing slots than the number of unique complete-information-optimal product models, buyers can conclude that not all unique complete-information-optimal product models are tested (and published) since buyers are assumed to know that k < # N D ¯ . Therefore, each buyer selects her optimal tested product model, or her optimal one from the cheapest untested ones per price range, and expected consumer surplus is at least as high as the expected consumer surplus under any current mechanism SellersMayNotApply. (Note that k < # N D ¯ also holds for SellersMayNotApply. Therefore, in the best case, the tested product models would be identical to the ones under SellersMayApplyGeneralized.)
To conclude, by sequentially eliminating strategy profiles because they cannot be Nash equilibria, we solve for the unique Nash equilibrium described in Proposition A1. □

Appendix A.9. Graphical Overview of Experimental Markets

Figure A9. Overview of experimental markets 1–6. Note: Each dot represents one product model. The tested product models in the worst-case scenario are marked with |, the ones in the random scenario are marked with /. Markets were played in the following random order: 3, 8, 6, 2, 11, 7, 1, 5, 9, 4, 12, 10.
Figure A9. Overview of experimental markets 1–6. Note: Each dot represents one product model. The tested product models in the worst-case scenario are marked with |, the ones in the random scenario are marked with /. Markets were played in the following random order: 3, 8, 6, 2, 11, 7, 1, 5, 9, 4, 12, 10.
Games 16 00044 g0a9
Figure A10. Overview of experimental markets 7–12. Note: Each dot represents one product model. The tested product models in the worst-case scenario are marked with |, the ones in the random scenario are marked with /. Markets were played in the following random order: 3, 8, 6, 2, 11, 7, 1, 5, 9, 4, 12, 10.
Figure A10. Overview of experimental markets 7–12. Note: Each dot represents one product model. The tested product models in the worst-case scenario are marked with |, the ones in the random scenario are marked with /. Markets were played in the following random order: 3, 8, 6, 2, 11, 7, 1, 5, 9, 4, 12, 10.
Games 16 00044 g0a10

Appendix A.10. Parameters of Experimental Markets

Table A3. Parameters of experimental markets 1–4.
Table A3. Parameters of experimental markets 1–4.
Market f t Experimental Seller ID p t q t Globally Non-DominatedTested Under WorstCaseTested Under Random
11
2
3
4
5
6
7
8
9
10
11
12
13
14
15
4
6
8
11
3
9
10
14
5
1
2
13
12
15
7
2.9
4.5
91.7
5
10.6
89.9
10.9
11.1
20.7
21
30
31
35.3
37
40.3
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
0
0
1
0
0
1
0
0
1
0
1
1
0
0
0
0
0
1
1
0
1
0
1
0
0
1
0
0
0
0
21
2
3
4
5
6
7
8
9
10
11
12
13
14
15
3
10
8
15
11
14
2
7
4
13
12
1
5
6
9
3.1
36.2
52.3
5.5
51.9
52
11.3
11.5
20.8
20.9
21
34.6
35.2
36.1
36.3
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
0
1
1
0
1
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
1
0
1
1
0
0
0
1
31
2
3
4
5
6
7
8
9
10
11
12
13
14
15
3
12
6
4
7
11
2
15
5
14
8
10
9
13
1
4
20.9
70.4
5.1
68.8
69.3
12
20.8
62.4
22
37.6
52.4
36.7
37.2
37.7
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
0
0
1
0
1
1
0
0
1
0
0
1
0
0
0
0
0
1
1
0
1
1
0
0
0
0
1
0
0
0
41
2
3
4
5
6
7
8
9
10
11
12
13
14
15
9
1
4
13
15
3
7
8
11
14
12
2
6
5
10
7.3
13.1
90.5
5.4
6.1
88
11.4
35.1
40.3
21.8
23.5
41
36
44.9
36.2
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
0
0
0
1
0
0
1
0
0
1
0
0
1
0
0
0
0
1
0
0
1
0
1
1
0
0
1
0
0
0
1
0
0
1
0
0
0
0
1
0
0
0
1
0
1
Table A4. Parameters of experimental markets 5–8.
Table A4. Parameters of experimental markets 5–8.
Market f t Experimental Seller ID p t q t Globally Non-DominatedTested Under WorstCaseTested Under Random
51
2
3
4
5
6
7
8
9
10
11
12
13
14
15
4
10
12
15
2
8
5
7
14
1
11
9
6
13
3
7.2
38.2
58
5.8
57.3
42
12.2
21.2
50.7
21.4
34.9
35.1
35
36.5
37.8
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
0
0
0
1
0
0
1
0
0
1
0
0
1
0
0
0
1
1
0
1
1
0
0
1
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
1
0
1
0
0
1
61
2
3
4
5
6
7
8
9
10
11
12
13
14
15
10
14
4
5
9
15
11
7
13
1
8
12
6
3
2
13.2
15
98
4.9
6.1
36.3
11.5
69.9
79.8
20.7
32.7
35
35.1
35.9
36.3
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
0
0
0
1
0
0
1
0
0
1
0
0
1
0
0
0
0
1
0
0
1
0
1
1
0
0
1
0
0
0
0
0
0
1
1
0
0
0
1
1
0
0
0
1
0
71
2
3
4
5
6
7
8
9
10
11
12
13
14
15
15
6
2
1
3
8
4
10
13
11
12
5
14
9
7
13.9
65.6
70.5
14.6
15.8
25.6
10.6
55.6
57.2
18.8
55.4
35.2
30.3
43.5
50.5
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
0
0
0
0
0
0
1
0
0
1
0
0
1
0
0
0
1
1
0
0
0
0
1
1
0
1
0
0
0
0
1
1
1
0
0
1
0
0
0
0
0
0
0
0
1
81
2
3
4
5
6
7
8
9
10
11
12
13
14
15
7
3
11
8
4
12
14
10
6
13
9
1
2
5
15
10.7
48.9
57.9
22.3
30.3
42.5
9.7
42.6
45.8
17
34.7
36.7
28.5
40.2
50.7
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
0
0
0
0
0
0
1
0
0
1
0
0
1
0
0
0
1
1
0
0
1
0
1
1
0
0
0
0
0
0
0
0
1
1
0
0
1
1
0
0
1
0
0
0
0
Table A5. Parameters of experimental markets 9–12.
Table A5. Parameters of experimental markets 9–12.
Market f t Experimental Seller ID p t q t Globally Non-DominatedTested Under WorstCaseTested Under Random
91
2
3
4
5
6
7
8
9
10
11
12
13
14
15
4
1
3
13
5
2
15
12
11
6
14
7
8
9
10
12
57.3
66.2
14.9
16.9
76.1
11
42.8
81
20.1
35.7
66
32.4
37.5
61.1
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
0
0
0
0
0
0
1
0
0
1
0
0
1
0
0
0
1
1
0
0
1
0
0
1
0
0
1
0
0
0
1
0
1
1
1
0
0
0
0
0
0
0
1
0
0
101
2
3
4
5
6
7
8
9
10
11
12
13
14
15
8
5
6
9
1
3
10
12
14
15
4
13
11
2
7
50.3
55.3
59
47
53.5
57.3
29.4
61.9
75
17.6
55.6
79.1
29.3
60.3
77.8
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
0
1
0
0
0
0
1
1
0
0
1
0
0
1
1
0
0
0
1
1
0
1
1
0
0
0
0
0
0
111
2
3
4
5
6
7
8
9
10
11
12
13
14
15
5
1
7
14
12
10
11
3
13
9
6
2
8
4
15
23.7
26.3
73
26.8
54.5
78.1
36.3
58.4
63.4
21.3
51.8
57.7
33
47.3
58
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
0
1
0
0
1
0
1
1
0
0
1
0
0
0
1
0
0
0
1
0
1
0
0
0
0
1
0
1
0
121
2
3
4
5
6
7
8
9
10
11
12
13
14
15
2
12
13
4
15
1
11
6
10
7
9
14
5
8
3
30.3
34.6
77
28.9
56.6
80.2
38.4
60.5
65.5
18.5
63.9
79.8
29.9
42.4
72.1
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
0
1
0
0
1
0
0
1
0
1
1
0
0
0
0
1
0
1
0
0
0
0
0
0
0
1
1
0
1

Appendix A.11. Experimental Instructions

This is a translated version of the instructions for SellersMayApply-LyingPoss (original in German). Differences to the other treatments are included below.
  • Welcome to the Experiment
You are participating in a study on decision-making behavior in experimental economics. During the experiment, you and the other participants will be asked to make decisions. You can earn money in doing so. The amount you will earn depends on your and on the other participants’ decisions. At the end of the experiment, your earnings will be paid to you in cash. During the experiment, all amounts will be stated in the experimental currency “thaler” and will be converted into EUR at the end (4 thalers = 1 EUR). None of the other participants will receive information on your decisions or on your payoffs. All data will be used exclusively for research.
The experiment will last approximately 2.5 h. Please read the following instructions carefully. Should you have questions at any point in time, please raise your hand. (Participants who are in one of the cubicles with doors, please open the door so that we can see you raising your hand.) We will come to you and answer your question at your cubicle.
Participants’ roles 
The experiment consists of twelve rounds. There are two roles: sellers and buyers. First, it will be determined randomly which participants will be sellers and which will be buyers, and which sellers and buyers, respectively, receive which ID. You will keep your role and ID throughout the whole experiment. This means, if you were seller 1 in round 1, for example, you will remain seller 1 in all remaining rounds, or if you were buyer 1 in round 1, you will remain buyer 1 in all remaining rounds. There are 15 sellers an 8 buyers in total.
Sellers 
Sellers offer identical products each at a certain price, a certain quality and certain unit costs. A product belongs to one of five potential quality levels: 1 (poor), 2 (fair), 3 (satisfactory), 4 (good), 5 (very good); i.e., the higher the number, the higher a product’s quality. Sellers are not able to influence price, quality and unit costs. These will be assigned to them each round.
Buyers 
Buyers select one seller per round, from whom they can buy at most one product. (They also have the option not to buy a product.) Buyers value the quality of a product differently. You can find the buyers’ individual valuations in the following table. A buyer’s individual valuation is multiplied by the quality of the product purchased, thus influencing the earnings per round (for details, see paragraph “Earnings per round”). It remains the same for each buyer throughout the experiment. Once a buyer has selected a product, it is considered purchased; the seller’s consent is not required. It is possible for multiple buyers to buy from the same seller.
Available information 
At the beginning of each round, sellers are informed on the screen about their own price, unit costs and quality as well as the prices, unit costs and qualities of the other sellers. At the beginning of each round, buyers are informed about their own valuation of product quality (according to their buyer no. and the information in the table) and the prices of the products offered. It is also known that, in each round, 3 sellers offer products per quality level. This means that there are three sellers with poor product quality, three sellers with fair product quality, three sellers with satisfactory product quality, three sellers with good product quality and three sellers with very good product quality. In no round is it known whether there is a relationship between the price and quality of a product. In each round, assume there are five sellers from whom purchases were made most frequently in the past, but the reasons for this are unknown. At the beginning of each round, you will be informed which these five sellers are.
Buyer IDIndividual valuation of quality
13
23
37
47
511
611
715
815
Product testing 
The sellers are not able to influence price, quality and unit costs, but they have the opportunity to apply to a product testing organization such as Stiftung Warentest each round. The task of the product testing organization is to check the product quality and to disclose it to all buyers. This happens before the buyers decide which products to buy. If applicable, sellers must state the price and quality of their product (it is possible to lie about the quality) when applying, and pay an application fee of 0.50 thalers to the testing organization. Should the product test disclose that a seller stated a false quality, he will have to pay costs of 24.00 thalers for this false quality statement.
Testing capacity 
The capacity of the product testing organization is limited. Among the applicants, it selects a maximum of five sellers whose products it tests.
Step 1: The product testing organization first selects the sellers with the cheapest product per quality level for the test.
Step 2: Among these, should there be sellers with products that cost the same or more than a product of a lower stated quality than a product of a better stated quality, these products are excluded again. Only the remaining untested products will be tested. Products that have already been tested will never lose their testing slot.
Step 3: If, after the product test, it turns out that in this iteration at least one seller stated a false quality and if the maximum testing capacity has not yet been reached, the testing organization re-starts with step 1. If, after the product test, it turns out that all sellers stated the true quality in this iteration, no further products will be tested.
For step 1 and step 2, the testing organization uses the stated qualities of untested products (because their true quality is not yet known). From the second iteration onwards, if applicable, it uses the true qualities of the products already tested (because their true quality is then known). If less than five products are selected by step 1 and step 2, fewer products will be tested accordingly. Should there ever be more applicants selected than the number of remaining testing slots, a random selection will be made among these applicants. The application fee of 0.50 thalers must be paid regardless of whether a product will eventually be tested or not. You can find an overview on page 4.
Earnings per round 
Each participant receives an initial endowment of 100 thalers. The earnings are determined as follows:
  Earnings seller per round :
  initial endowment
+(price × number sold products)
−(unit costs × number sold products)
−if applicable, application fee
−if applicable, fee for stating false quali
  Earnings buyer per round :
  initial endowment
+(quality × ind. valuation of quality)
−price
In the course of the experiment, sellers will be asked how they think other participants will behave. For each answer that is correct, a seller will receive additional 0.50 thaler in the corresponding round. Sellers only receive feedback on how many of their beliefs were correct for the payoff-relevant round at the end of the experiment.
Payment 
When all twelve rounds will have been completed, the computer randomly selects one of the twelve rounds to be payoff-relevant for all participants. The other rounds are not taken into account for the payment. At the end of the experiment, each participant will receive the amount of money they have earned in the payoff-relevant round, converted into EUR (4 thalers = 1 EUR). If applicalbe, the amount is rounded up to a multiple of 0.10 EUR.
Comprehension questions 
Please click on “continue” on the screen when you will have finished reading the instructions and have no further questions until here. The experiment starts on the screen with comprehension questions. These comprehension questions are supposed to make it easier for you to become familiar with the decision-making situation. If you have any questions, please raise your hand. (Participants who are in one of the cubicles with doors today, please open the door so that we can see you raising your hand.) We will then come to you and answer your question at your cubicle. Once all participants will have correctly answered the comprehension questions, round 1 of the experiment will start.
Technical note 
For technical reasons, please enter a dot instead of a comma to separate decimal places in numbers if applicable.
Games 16 00044 i001
  • Differences to other treatments
  • Treatment SellersMayApply-Truth
Product test 
The sellers are not able to influence price, quality and unit costs, but they have the opportunity to apply to a product testing organization such as Stiftung Warentest each round. The task of the testing organization is to check the product quality and to disclose it to all buyers. This happens before the buyers decide which products to buy. If applicable, sellers must state the price and quality of their product (it is not possible to lie) when applying, and pay an application fee of 0.50 thalers to the testing organization.
Testing capacity 
The capacity of the product testing organization is limited. Among the applicants, it selects a maximum of five sellers whose products it tests. The testing organization first selects the sellers with the cheapest product per quality level for the test. Among these, should there be sellers with products that cost the same or more than a product of a lower stated quality than a product of a better stated quality, these products will not be tested. If less than five products are selected via this method, fewer products will be tested accordingly. The application fee of 0.50 thalers must be paid regardless of whether a product will eventually be tested or not.
  Earnings seller per round :
  initial endowment
+(price × number sold products)
−(unit costs × number sold products)
The figure “Testing capacity and selection of products to be tested” was not included.
  • Treatments SellersMayNotApply-WorstCase and SellersMayNotApply-Random
Product testing 
Each round, a product testing organization such as Stiftung Warentest tests certain products. The task of the testing organization is to check the product quality and to disclose it to all buyers. This happens before the buyers decide which products to buy.
Testing capacity 
The capacity of the product testing organization is limited. It selects five sellers whose products it tests, namely each round the five sellers from whom the most frequent purchases were made in the past.
  Earnings seller per round :
  initial endowment
+(price × number sold products)
−(unit costs × number sold products
The figure “Testing capacity and selection of products to be tested” was not included.

Appendix A.12. Screenshots of Main Decision Situations in z-Tree

These are translated z-Tree screenshots of the main decision situations for SellersMayNotApply-WorstCase and SellersMayApply-LyingPoss (original in German).
Figure A11. SellersMayNotApply-WorstCase seller’s screenshot.
Figure A11. SellersMayNotApply-WorstCase seller’s screenshot.
Games 16 00044 g0a11
Figure A12. SellersMayNotApply-WorstCase buyer’s screenshot.
Figure A12. SellersMayNotApply-WorstCase buyer’s screenshot.
Games 16 00044 g0a12
Figure A13. SellersMayApply-LyingPoss seller’s screenshot.
Figure A13. SellersMayApply-LyingPoss seller’s screenshot.
Games 16 00044 g0a13
Figure A14. SellersMayApply-LyingPoss buyer’s screenshot.
Figure A14. SellersMayApply-LyingPoss buyer’s screenshot.
Games 16 00044 g0a14

Notes

1
Product quality is a multidimensional construct comprising horizontal and vertical dimensions. Horizontal quality dimensions are subjective. More precisely, while it may be possible to objectively specify horizontal quality dimensions, consumers differ in their preferences about them (Hotelling, 1929). For example, horizontal dimensions of a stroller’s quality include its color. While it is possible to objectively specify a certain color, e.g., by using a spectrophotometer, consumers have different preferences over colors. In contrast, vertical quality dimensions are objectively rateable. To illustrate, vertical dimensions of a stroller’s quality include its weight, how waterproof the raincover is, and the level (if any) of toxic substances contained in its materials. Note that these vertical dimensions usually contain search, experience, and credence characteristics (Nelson, 1970; Darby & Karni, 1973). For a stroller, a search characteristic would be its weight since a stroller’s weight can be determined before purchasing it. An experience characteristic would be how waterproof the raincover is since this is usually observable only after use. A credence characteristic would be how many toxic substances are contained in the fabric since consumers are usually not able to observe this amount even after having purchased the stroller.
2
In our opinion, most products contain some amount of horizontal and some amount of vertical quality dimensions while the relevance of each one may differ. This paper focuses on products whose vertical quality dimensions are at least as relevant for buyers as its horizontal ones, e.g., toothpaste, strollers, or grills. We do not analyze markets for products whose horizontal quality dimensions are more relevant for buyers than its vertical ones, e.g., fiction movies or books. Note that, while online consumer ratings for such products can be found on websites like amazon.com or imdb.com, independent consumer organizations usually do not test fiction movies or books.
3
See https://www.international-testing.org/members.html?section=icrt_shareholders (accessed on 16 December 2023) for a detailed list of world-wide independent consumer organizations.
4
We are aware that buyers also use other proxies for quality, e.g., online consumer ratings (Rao & Monroe, 1989 and De Langhe et al., 2016). While online consumer ratings are often readily available, they are problematic since, most importantly, they usually do not include credence characteristics such as toxic substances in food, cosmetics, or clothing, or under which working conditions a product was manufactured. Second, online consumer ratings often include not only vertical, but also horizontal quality dimensions although the latter are, by definition, not objectively rateable. Third, fake ratings constitute a real problem, even among verified purchases (Mayzlin et al., 2014 and Which?, 2023a). Interestingly, online consumer ratings have also been shown to correlate poorly with ratings provided by independent consumer organizations (De Langhe et al., 2016 and Köcher & Köcher, 2018). Some buyers also use price as a proxy for quality. Yet, it seems to be a poor proxy as only moderately positive, zero, or even negative correlations between product quality and price have been found repeatedly (see Ratchford et al., 1996; Olbrich & Jansen, 2014 for overviews, and the following for country-specific studies: Oxenfeldt, 1950 and De Langhe et al., 2016; Diller, 1977, 1988; Yamada & Ackerman, 1984; Bodell et al., 1986; Steenkamp, 1988; Kirchler et al., 2010). We acknowledge that these correlational results are sensitive to the weights of quality dimensions which are, to some degree, arbitrary. Yet, testing results published by Consumer Reports show that more than half of all tested product models are dominated on all quality dimensions (Hjorth-Andersen, 1984).
5
We use “product” (“product model”) as the more general (specific) term. Usually, several product models belong to one certain type of product, e.g., several smartphone models belong to the product smartphone. Furthermore, we use “game” or “theoretical framework” instead of “theoretical model” in this paper to avoid confusion. Note that independent consumer organizations do not only face capacity constraints as to which product models, but also as to which products to select for testing. This study focuses on the problem of which product models to select.
6
Professional Sports Authenticator (PSA) is one of the largest card grading services world-wide (for more information, see https://www.psacard.com/services/tradingcardgrading, accessed on 6 December 2023).
7
USDA organic is a label issued by the US Department of Agriculture (for more information, see https://www.usda.gov/media/blog/2012/03/22/organic-101-what-usda-organic-label-means, accessed on 6 December 2023). Blauer Engel is the ecolabel of the federal government of Germany (for more information, see https://www.blauer-engel.de/en, accessed on 6 December 2023). Note that another difference is that their labels are binary (either USDA organic or Blauer Engel certified, or not) whereas independent consumer organizations provide ordinally scaled ratings which often include at least three to five levels.
8
Our setting differs from theoretical papers which find that price may signal product quality, e.g., Bagwell and Riordan (1991). We analyze situations in which all buyers have access to quality information about a share of product models. By contrast, Bagwell and Riordan (1991) analyze situations in which a share of buyers is informed about all product models’ quality (see Appendix A.5 for a visual comparison). Moreover, we assume buyers know that producing a higher quality is more expensive, but they do not know the precise production function. By contrast, Bagwell and Riordan (1991) assume the product function is common knowledge. Note that, when starting out from our markets in which quality and price are uncorrelated considering all product models, they will be positively correlated under our new mechanism when considering only the set of tested product models.
9
Note that there is only one seller f Z c in the set of sellers Z since identical prices are ruled out by assumption.
10
While the literature on certification consistently includes independent consumer organizations in the larger set of “certifiers”, it is less consistent in how a “certified product” is defined. Therefore, unless noted otherwise, we use “tested” when referring to a product model whose quality has been tested and publicly revealed. Note that a “tested product model” may also be of the lowest possible quality, whereas “certified product model” might imply that a certain quality threshold has been reached (see, for instance, Bizzotto & Harstad, 2023).
11
Note that, therefore, we also refrain from modeling a principal-agent relationship between buyers and the certifier like Carroll (2019) and Szalay (2005) do.
12
See Ispano and Schwardmann (2023) for a model with boundedly rational buyers.
13
Note that we use an indicator function to describe a buyer’s utility with a single function. If the condition in braces is true, the indicator variable equals one. If the condition is not fulfilled, the indicator variable equals zero.
14
Note that, from a buyer’s perspective, CompleteInformation is identical to a setting in which a certifier were able to test all existing product models, i.e., K = F . However, from an overall economic perspective, the latter setting would include certifying costs—which we abstract from and do not analyze in this paper.
15
Importantly, sellers do not send items of their product model to the certifier since these items might be of higher quality than those actually sold to buyers. As mentioned in Section 1, independent consumer organizations usually use their own test-buyers who purchase products anonymously, and this procedure remains unchanged under SellersMayApply. By contrast, sellers of baseball cards do send their actual cards to, e.g., PSA for rating.
16
Currently, Stiftung Warentest, Consumer Reports and Which? collect the prices of all product models to be tested. If prices vary for a certain product model, Stiftung Warentest calculates the mean price.
17
Note that the overall cheapest seller f F c is always part of N D , but he may or may not belong to N D ¯ . More precisely, we would have to write N D ¯ f F c if f F c N D ¯ , and N D ¯ if f F c N D ¯ , respectively. For better readability, we still always write N D ¯ f F c in the following even if f F c N D ¯ .
18
In Section 3.2.3, we derive how buyers calculate the expected quality of an untested product model under SellersMayApply.
19
Note that E q 7 SellersMayApply neither contains the qualities of tested sellers, nor quality levels equal to or higher than 4. See Section 3.2.3 for a general explanation of why an untested seller f t ’s expected quality under SellersMayApply neither contains the qualities of tested sellers, nor quality levels equal to or higher than those of tested sellers in N D ¯ offering higher quality levels than f t (if any).
20
See Appendix A.7.2 for details.
21
More specifically, “no gaps within N D ” implies that there is one seller for the highest quality level in N D , one seller for the second highest quality level in N D , …, one seller for the second-lowest quality level in N D , and one seller for the lowest quality level in N D .
22
Note that only in the third version of the game decisions are interdependent. Therefore, strictly speaking, the first two versions of the game may not be defined as a “game”.
23
The lowest price range contains all prices lower than the price of the cheapest tested product model. The second-lowest price range contains all prices between those of the cheapest and the second-cheapest tested product model. …The second-highest price range contains all prices between those of the second most expensive and the most expensive tested product model. The highest price range contains all prices higher than the most expensive tested product.
24
In this case, the unique Nash equilibrium is slightly different:
( apply with q 1 , , apply with q ( # ND ¯ 1 ) , do not apply , , do not apply , buy product model of seller f F ˜ 1 , , buy product model of seller f F ˜ s ) T ,
with f F ˜ 1 , , f F ˜ s N D ¯ , and with # N D ¯ denoting the overall cheapest seller.
25
We are aware that this presents an extreme scenario. However, in this scenario, sellers with the most dominated product models would earn relatively high profits per sold unit given that their prices are highest among product models of the same quality. This relatively greater profit could be spent on advertising to attract more buyers.
26
More precisely, the share of globally non-dominated product models among all tested product models is 21.7% in this random scenario, while the mean share of globally non-dominated product models in all markets is 23.3%.
27
Our experiment is conducted in Germany, where Stiftung Warentest uses five different verbal quality ratings (very good, good, satisfactory, fair and poor) for product models; thus, subjects are likely familiar with a five-item rating scale. In addition to these verbal ratings, Stiftung Warentest also publishes more precise numerical ratings ranging from 1.0 to 5.0.
28
Note that “globally non-dominated” includes sellers with q t = 1 for markets 1 to 3 in which five globally non-dominated product models exist. While we do not expect these sellers to apply for testing, we include them in our analysis to be consistent with markets 4 to 12 in which all sellers with globally non-dominated product models are predicted to apply to be tested.
29
To the best of our knowledge, this question has not been analyzed again.
30
For example, two product models rated “good” by Stiftung Warentest were temporarily not available for purchase soon after the ratings had been published: the frying pan “Gastro Sus Diamas Pro Industar” rated in the 01/2021 magazine, and the coffee machine “Philipps EP5447/90” rated in the 01/2022 magazine (see Appendix A.4 for details).
31
Note that E q 1 SellersMayApply neither contains the qualities of tested sellers, nor quality levels equal to or higher than 3 (2) if seller f 4 is untested (tested). See Section 3.2.3 for a general explanation of why an untested seller f g ’s expected quality under SellersMayApply neither contains the qualities of tested sellers, nor quality levels equal to or higher than those of tested sellers in N D ¯ offering higher quality levels than f g .
32
Note that E q 4 SellersMayApply neither contains the qualities of tested sellers, nor quality levels equal to or higher than 3. See Section 3.2.3 for a general explanation of why an untested seller f g ’s expected quality under SellersMayApply neither contains the qualities of tested sellers, nor quality levels equal to or higher than those of tested sellers in N D ¯ offering higher quality levels than f g .

References

  1. Abeler, J., Nosenzo, D., & Raymond, C. (2019). Preferences for truth-telling. Econometrica, 87(4), 1115–1153. [Google Scholar] [CrossRef]
  2. Akerlof, G. A. (1970). The market for “lemons”. Quality uncertainty and the market mechanism. The Quarterly Journal of Economics, 84(3), 488–500. [Google Scholar] [CrossRef]
  3. ALDI Nord Deutschland Stiftung & Co. KG. (2023). Fair & gut tierwohl-weidemilch. Available online: https://www.aldi-nord.de/sortiment/kuehlung-tiefkuehlung/kaese-milch-milchprodukte/milch-milchersatz/tierwohl-weidemilch-5122-0-0.article.html (accessed on 18 December 2023).
  4. Amazon Europe Core Sàrl. (2023). Bosch staubsauger beutellos serie 6 BGC41XALL. Available online: https://www.amazon.de/Bosch-BGC41XALL-Bodenstaubsauger-Hygiene-Filter-XXL-Polsterd%C3%BCse/dp/B09SZRS8JK/ref=sr_1_9?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&crid=1HK4TEZORZ28T&keywords=Staubsauger+testsieger&qid=1702752522&sprefix=staubsauger+testsieger%2Caps%2C107&sr=8-9 (accessed on 18 December 2023).
  5. Bagwell, K., & Riordan, M. H. (1991). High and declining prices signal product quality. American Economic Review, 81(1), 224–239. [Google Scholar]
  6. Bahel, E., & Sprumont, Y. (2020). Strategyproof choice of social acts. American Economic Review, 110(2), 596–627. [Google Scholar] [CrossRef]
  7. Bahel, E., & Sprumont, Y. (2021). Strategy-proof choice with monotonic additive preferences. Games and Economic Behavior, 126, 94–99. [Google Scholar] [CrossRef]
  8. Barberà, S., Massó, J., & Neme, A. (2005). Voting by committees under constraints. Journal of Economic Theory, 122, 185–205. [Google Scholar] [CrossRef]
  9. Benndorf, V. (2018). Voluntary disclosure of private information and unraveling in the market for lemons: An experiment. Games, 9(2), 23. [Google Scholar] [CrossRef]
  10. Benndorf, V., Kübler, D., & Normann, H.-T. (2015). Privacy concerns, voluntary disclosure of information, and unraveling: An experiment. European Economic Review, 75, 43–59. [Google Scholar] [CrossRef]
  11. Bizzotto, J., & Harstad, B. (2023). The certifier for the long run. International Journal of Industrial Organization, 87, 1–19. [Google Scholar] [CrossRef]
  12. Bizzotto, J., Perez-Richet, E., & Vigier, A. (2021). Communication via third parties. Unpublished.
  13. Bodell, R. W., Kerton, R. R., & Schuster, R. W. (1986). Price as a signal of quality: Canada in the international context. Journal of Consumer Policy, 9, 431–444. [Google Scholar] [CrossRef]
  14. Brendel, F. (2021). Limits of information unraveling: A survey on voluntary disclosure. Unpublished.
  15. Carroll, G. (2019). Robust incentives for information acquisition. Journal of Economic Theory, 181, 382–420. [Google Scholar] [CrossRef]
  16. Consumer Reports. (2023). What we do. Available online: https://www.consumerreports.org/cro/about-us/what-we-do/media-page/index.htm (accessed on 6 December 2023).
  17. Darby, M. R., & Karni, E. (1973). Free competition and the optimal amount of fraud. The Journal of Law & Economics, 16(1), 67–88. [Google Scholar]
  18. De Langhe, B., Fernbach, P. M., & Lichtenstein, D. R. (2016). Navigating by the stars: Investigating the actual and perceived validity of online user ratings. Journal of Consumer Research, 42, 817–833. [Google Scholar] [CrossRef]
  19. Diller, H. (1977). Der preis als qualitätsindikator. Die Betriebswirtschaft, 37(2), 219–234. [Google Scholar]
  20. Diller, H. (1988). Die preis-qualitäts-relation von konsumgütern im 10-jahresvergleich. Die Betriebswirtschaft, 48(2), 195–200. [Google Scholar]
  21. dm-drogerie markt GmbH + Co. KG. (2023). elmex Zahnpasta Kinder, 2 bis 6 Jahre, 50 mL. Available online: https://www.dm.de/elmex-zahnpasta-kinder-2-bis-6-jahre-p8718951271234.html (accessed on 18 December 2023).
  22. Dranove, D., & Jin, G. Z. (2010). Quality disclosure and certification. Theory and practice. Journal of Economic Literature, 48(4), 935–963. [Google Scholar] [CrossRef]
  23. Encaoua, D., & Hollander, A. (2007). First degree discrimination by a duopoly: Pricing and quality choice. Berkeley Economic Journal on Theoretical Economics, 7(1), 14. [Google Scholar] [CrossRef]
  24. Fischbacher, U. (2007). z-Tree: Zurich toolbox for ready-made economic experiments. Experimental Economics, 10(2), 171–178. [Google Scholar] [CrossRef]
  25. Forsythe, R., Isaac, R. M., & Palfrey, T. R. (1989). Theories and tests of “blind bidding” in sealed-bid auctions. The RAND Journal of Economics, 20(2), 214–238. [Google Scholar] [CrossRef]
  26. GfK SE, Nuremberg. (2017). Point-of-sales-panel Germany: Compact cameras (Nov 15 - Jun 16), Multi-function color printers (Apr 16 - Jun 16), Multi-function black/white printers (Apr 16 - Jun 16), Black/white printers (Apr 16 - Jun 16), Box spring beds, 180x200 cm (Feb 16 - May 16), Tumble dryers with pump (Jan 16 - Feb 16), Tumble dryers without pump (Jan 16 - Feb 16), LED lamps, G9, 230 volt (Jan 16 - Feb 16), Consumer Panel Germany: Tuna in oil (Apr 16 - May 16), Frozen tuna (Apr 16 - May 16), Toothpaste (Mar 16 - Apr 16).
  27. Greiner, B. (2015). Subject pool recruitment procedures: Organizing experiments with ORSEE. Journal of the Economic Science Association, 1, 114–125. [Google Scholar] [CrossRef]
  28. Grossman, S. J. (1981). The informational role of warranties and private disclosure about product quality. Journal of Law and Economics, 24(3), 461–483. [Google Scholar] [CrossRef]
  29. Hagenbach, J., & Perez-Richet, E. (2018). Communication with evidence in the lab. Games and Economic Behavior, 112, 139–165. [Google Scholar] [CrossRef]
  30. Hjorth-Andersen, C. (1984). The concept of quality and the efficiency of markets for consumer products. Journal of Consumer Research, 11, 708–718. [Google Scholar] [CrossRef]
  31. Hotelling, H. (1929). Stability in competition. Economic Journal, 39(153), 41–57. [Google Scholar] [CrossRef]
  32. International Consumer Research & Testing. (2021). Application for membership of international consumer research & testing Ltd. Available online: https://www.international-testing.org/members.html?section=join_icrt (accessed on 16 December 2023).
  33. Ispano, A., & Schwardmann, P. (2023). Cursed consumers and the effectiveness of consumer protection policies. Journal of Industrial Economics, 71, 407–440. [Google Scholar] [CrossRef]
  34. Jin, G. Z., Kato, A., & List, J. A. (2010). That’s news to me! Information revelation in professional certification markets. Economic Inquiry, 48(1), 104–122. [Google Scholar] [CrossRef]
  35. Jin, G. Z., & Leslie, P. (2003). The effect of information on product quality: Evidence from restaurant hygiene grade cards. The Quarterly Journal of Economics, 118(2), 409–451. [Google Scholar] [CrossRef]
  36. Jin, G. Z., Luca, M., & Martin, D. (2021). Is no news (perceived as) bad news? An experimental investigation of information disclosure. American Economic Journal: Microeconomics, 13(2), 141–173. [Google Scholar]
  37. Jin, G. Z., Luca, M., & Martin, D. (2022). Complex disclosure. Management Science, 68(5), 3236–3261. [Google Scholar] [CrossRef]
  38. Kamenica, E., & Gentzkow, M. (2011). Bayesian persuasion. American Economic Review, 101, 2590–2615. [Google Scholar] [CrossRef]
  39. KantarEmnid & Verbraucherzentrale Bundesverband. (2018). Verbraucherreport: Infografiken Juli 2018. Available online: https://www.vzbv.de/sites/default/files/downloads/2018/10/12/verbraucherreport_2018_-_infografiken.pdf (accessed on 6 December 2023).
  40. Kirchkamp, O. (2019). Importing z-Tree data into R. Journal of Behavioral and Experimental Finance, 22, 1–2. [Google Scholar] [CrossRef]
  41. Kirchler, E., Fischer, F., & Hölzl, E. (2010). Price and its relation to objective and subjective product quality: Evidence from the Austrian market. Journal of Consumer Policy, 33, 275–286. [Google Scholar] [CrossRef]
  42. Köcher, S., & Köcher, S. (2018). Should we reach for the stars? Examining the convergence between online product ratings and objective product quality and their impacts on sales performance. Journal of Marketing Behavior, 3, 167–183. [Google Scholar] [CrossRef]
  43. List, J. A. (2006). The behavioralist meets the market: Measuring social preferences and reputation effects in actual transactions. Journal of Political Economy, 114(1), 1–37. [Google Scholar] [CrossRef]
  44. Mathios, A. D. (2000). The impact of mandatory disclosure laws on product choices: An analysis of the salad dressing market. Journal of Law and Economics, 43, 651–678. [Google Scholar] [CrossRef]
  45. Mayzlin, D., Dover, Y., & Chevalier, J. (2014). Promotional reviews: An empirical investigation of online review manipulation. American Economic Review, 104(8), 2421–2455. [Google Scholar] [CrossRef]
  46. Milgrom, P. R. (1981). Good news and bad news. Representation theorems and applications. The Bell Journal of Economics, 12(2), 380–391. [Google Scholar] [CrossRef]
  47. Nelson, P. (1970). Information and consumer behavior. Journal of Political Economy, 78(2), 311–329. [Google Scholar] [CrossRef]
  48. Olbrich, R., & Jansen, H. C. (2014). Price-quality relationship in pricing strategies for private labels. Journal of Product & Brand Management, 23(6), 429–438. [Google Scholar] [CrossRef]
  49. Oxenfeldt, A. R. (1950). Consumer knowledge: Its measurement and extent. The Review of Economics and Statistics, 32(4), 300–314. [Google Scholar] [CrossRef]
  50. Rao, A. R., & Monroe, K. B. (1989). The effect of price, brand name, and store name on buyers’ perceptions of product quality: An integrative review. Journal of Marketing Research, 26, 351–357. [Google Scholar]
  51. Ratchford, B. T., Agrawal, J., Grimm, P. E., & Srinivasan, N. (1996). Toward understanding the measurement of market efficiency. Journal of Public Policy & Marketing, 15(2), 167–184. [Google Scholar]
  52. R Core Team. (2023). R: A language and environment for statistical computing [Computer software manual]. R Core Team. Available online: https://www.R-project.org/ (accessed on 18 December 2023).
  53. Stahl, K., & Strausz, R. (2017). Certification and market transparency. Review of Economic Studies, 84, 1842–1868. [Google Scholar]
  54. Steenkamp, J.-B. E. M. (1988). The relationship between price and quality in the marketplace. De Economist, 136(4), 491–507. [Google Scholar] [CrossRef]
  55. Stiftung Warentest. (2019). Statutes. Available online: https://www.test.de/unternehmen/about-us-5017053-0/ (accessed on 6 December 2023).
  56. Stiftung Warentest. (2020). RAL LOGO LIZENZ—Allgemeine vertragsbedingungen. Available online: https://www.ral-logolizenz-warentest.de/fileadmin/Resources/Public/Pdf/2023/2023-01-02_RAL_Logolizenz_Vertragsbedingungen_Anlagen_01_02.pdf (accessed on 16 December 2023).
  57. Szalay, D. (2005). The economics of clear advice and extreme options. Review of Economic Studies, 72, 1173–1198. [Google Scholar] [CrossRef]
  58. Viscusi, W. K. (1978). A note on “lemons” markets with quality certification. The Bell Journal of Economics, 9(1), 277–279. [Google Scholar] [CrossRef]
  59. Which? (2023a). The facts about fake reviews: Which? Investigators reveal tricks that sellers use to mislead online shoppers. Available online: https://www.which.co.uk/news/2018/10/the-facts-about-fake-reviews/ (accessed on 6 December 2023).
  60. Which? (2023b). Use our logo. Available online: https://b2b.which.co.uk/work-with-us/use-our-logo (accessed on 16 December 2023).
  61. Which? (2023c). Who we are. Available online: https://www.which.co.uk/about-which/who-we-are (accessed on 6 December 2023).
  62. Yamada, Y., & Ackerman, N. (1984). Price-quality correlations in the Japanese market. Journal of Consumer Affairs, 18(2), 251–265. [Google Scholar] [CrossRef]
Figure 1. Example markets.
Figure 1. Example markets.
Games 16 00044 g001
Figure 2. SellersMayApply algorithm to select product models for testing from the set of applicants (formal description in Figure A7).
Figure 2. SellersMayApply algorithm to select product models for testing from the set of applicants (formal description in Figure A7).
Games 16 00044 g002
Figure 3. Share of sellers who do or do not apply to be tested, stating a true or false quality (if applicable). Note: For all treatment comparisons, we report the results of two-sided Mann–Whitney U tests, conservatively counting one experimental session as one independent observation. We denote p-values as follows: *** < 0.01, ** < 0.05, and * < 0.1.
Figure 3. Share of sellers who do or do not apply to be tested, stating a true or false quality (if applicable). Note: For all treatment comparisons, we report the results of two-sided Mann–Whitney U tests, conservatively counting one experimental session as one independent observation. We denote p-values as follows: *** < 0.01, ** < 0.05, and * < 0.1.
Games 16 00044 g003
Figure 4. Share of globally (non-)dominated product models in the product test. Note: For all treatment comparisons, we report the results of two-sided Mann–Whitney U tests, conservatively counting one experimental session as one independent observation. We denote p-values as follows: *** < 0.01, ** < 0.05, and * < 0.1.
Figure 4. Share of globally (non-)dominated product models in the product test. Note: For all treatment comparisons, we report the results of two-sided Mann–Whitney U tests, conservatively counting one experimental session as one independent observation. We denote p-values as follows: *** < 0.01, ** < 0.05, and * < 0.1.
Games 16 00044 g004
Figure 5. Share of buyers choosing nothing, globally dominated or globally non-dominated sellers. Note: For all treatment comparisons, we report the results of two-sided Mann–Whitney U tests, conservatively counting one experimental session as one independent observation. We denote p-values as follows: *** < 0.01, ** < 0.05, and * < 0.1.
Figure 5. Share of buyers choosing nothing, globally dominated or globally non-dominated sellers. Note: For all treatment comparisons, we report the results of two-sided Mann–Whitney U tests, conservatively counting one experimental session as one independent observation. We denote p-values as follows: *** < 0.01, ** < 0.05, and * < 0.1.
Games 16 00044 g005
Figure 6. Mean per capita consumer surplus and profits. Note: For all treatment comparisons, we report the results of two-sided Mann–Whitney U tests, conservatively counting one experimental session as one independent observation. We denote p-values as follows: *** < 0.01, ** < 0.05, and * < 0.1.
Figure 6. Mean per capita consumer surplus and profits. Note: For all treatment comparisons, we report the results of two-sided Mann–Whitney U tests, conservatively counting one experimental session as one independent observation. We denote p-values as follows: *** < 0.01, ** < 0.05, and * < 0.1.
Games 16 00044 g006
Table 1. Sequence of the game.
Table 1. Sequence of the game.
CompleteInformationSellersMayNotApplySellersMayApply
Stage 1(Sellers are passive.)Sellers may apply for testing.
Stage 2(Certifier is passive.)                                                                            Certifier tests subset of product models.
Stage 3                                                                          Buyers decide which product model to buy (if any).
Table 2. Number of sellers and buyers per session, and number of sessions and participants per treatment.
Table 2. Number of sellers and buyers per session, and number of sessions and participants per treatment.
TreatmentSellers per SessionBuyers per SessionNumber of Sessions/
Independent Observations
Participants
SellersMayNotApply-WorstCase1585115
SellersMayNotApply-Random1585115
SellersMayApply-LyingPoss1585115
SellersMayApply-Truth1585115
SellersMayApply-Truth (with beliefs about buyer behavior)1585115
Total 25575
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Vollstädt, U.; Imcke, P.; Brendel, F.; Ehses-Friedrich, C. Test Me If You Can—Providing Optimal Information for Consumers Through a Novel Certification Mechanism. Games 2025, 16, 44. https://doi.org/10.3390/g16050044

AMA Style

Vollstädt U, Imcke P, Brendel F, Ehses-Friedrich C. Test Me If You Can—Providing Optimal Information for Consumers Through a Novel Certification Mechanism. Games. 2025; 16(5):44. https://doi.org/10.3390/g16050044

Chicago/Turabian Style

Vollstädt, Ulrike, Patrick Imcke, Franziska Brendel, and Christiane Ehses-Friedrich. 2025. "Test Me If You Can—Providing Optimal Information for Consumers Through a Novel Certification Mechanism" Games 16, no. 5: 44. https://doi.org/10.3390/g16050044

APA Style

Vollstädt, U., Imcke, P., Brendel, F., & Ehses-Friedrich, C. (2025). Test Me If You Can—Providing Optimal Information for Consumers Through a Novel Certification Mechanism. Games, 16(5), 44. https://doi.org/10.3390/g16050044

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop