1. Introduction
From the perspective of economics, car repairs, financial advice, tech support, and medical services have one thing in common. In these markets, consumers rely heavily on the reputation or expertise of the supplier providing goods and services. These types of services are referred to as credence goods. Credence goods are specialized services with inherent uncertainty, making it challenging for consumers to assess the quality of services provided, even after consuming it [
1]. In such markets, suppliers/providers have the advantage of information and knowledge, which could be used to increase the utilization of services by customers, even beyond what the customer needs, a phenomenon known as supplier-induced demand [
2]. In such markets, the intersection between supplier-induced demand and payment systems, such as fee-for-service (FFS) and salary-based systems, can significantly impact providers’ behavior and the quality/quantity of services provided. In this study, we use a controlled laboratory setting to study the impact of different payment systems on supplier-induced demand in credence goods markets. We specifically aim to identify the effect payment systems, mainly fee-for-service and salary, have on the quantity and the quality of care provided by experts.
The inherent information asymmetries in credence goods markets make certain inefficiencies unavoidable. These include overtreatment (experts provide more goods and services than required), undertreatment (experts provide fewer goods and services than required) and overcharging (charging for a higher quality than the goods and services provided) [
3,
4]. This experiment focuses on unpacking the connection between payment systems and the first two types of inefficiencies: overtreatment and undertreatment.
Payment systems are likely to impact the types of inefficiencies that arise from a credence good problem. The literature is rife with examples [
3,
5]. We investigate arguments regarding responses to two payment types—salary and fee-for-service—in this study. The literature on credence goods and contract theory argues that providers in a salary system may under-provide services or provide inferior-quality services due to the lack of financial incentives [
6,
7]. In this type of system, providers receive a fixed salary regardless of the number of services they provide, which may reduce their motivation to provide services to their fullest extent, particularly when the effort is costly. Providers have no financial incentive to increase the number of services or the quality of services they provide, as their salary is not directly tied to their output. Additionally, in a salary system, providers may have limited flexibility in the services they can provide, as their scope of services is defined by the salary agreement. This can result in a reduction in the quality of services provided, as providers may not have the ability to tailor their services to meet the needs of individual consumers. Hart, Shleifer, & Vishny [
7] argue that providers in a salaried system have little incentive to engage in activities that could either enhance efficiency or reduce costs. Arrow [
8], however, argues that doctors care about their patients and often work hard, despite the effort costs for treating them.
Arrow’s observations highlight a key source of interest for us. Arrow [
8] notes that doctors are agents for themselves (or for the companies that hire them) but, given the context, also agents for the patients. This dual-agent role can incentivize them to forgo their own profits if pursuing them would undermine the health outcome of the patient. Arrow notes that some doctors will even charge nothing for patients who cannot afford to pay for services. The potential life-or-death aspect of healthcare seemingly puts it into a separate category in credence goods. As described below, numerous experimental results show that subjects that make decisions with a medical framing tend to be more patient-oriented and less profit-oriented than those who see neutral framing. Similarly, medical students tend to be less profit-oriented than non-medical students. To the best of our knowledge, economic lab experiments investigating the interplay between credence goods markets and payment systems have exclusively focused on the healthcare markets, with the exception of Bejanaro et al. [
9]. Such experiments, even when conducted with a neutral framing and a general subject pool, have been interpreted in the context of healthcare. This emphasis on healthcare is due to the unique characteristics of credence goods in the healthcare sector and the significant impact payment systems can have on provider behavior and patient outcomes. On the other hand, lab experiments investigating general credence goods tend to emphasize concerns such as verifiability, liability, competition, and fraud.
We are curious as to how much concern for the customer or job satisfaction might prompt actions that are less than profit-oriented outside of the healthcare context. Hart, Shleifer, and Vishny [
7] note that the more a salaried worker’s actions are associated with the worker (they note difficulties in replacing a worker with another as a factor in measuring this trait), the less likely it is that worker is to take actions that are not beneficial to the customer. We are interested in the extent to which providers in either payment system might be motivated to engage in costly efforts in order to solve problems their customers face. It is possible that they may also be willing to do so because they enjoy the tasks or gain personal satisfaction from them. As just one example, tenured professors continue to engage in teaching and research, which may be an indication that financial incentives may not be the sole decisive factor in explaining behavior.
1.
On the contrary, economic theories [
10,
11,
12] assume that providers in an FFS system may over-provide services due to the financial incentives to increase their revenue. In an FFS system, providers receive payment for each service they provide, creating a direct financial incentive to increase the number of services they provide. This can lead to the over-utilization of services, as providers have a financial incentive to provide services even when they may not be necessary. In addition, providers in an FFS system may be incentivized to recommend treatments or procedures that are not necessarily in the best interest of the patient but that generate additional revenue for the provider. This phenomenon, known as “supplier-induced demand”, can result in the over-utilization of services and increased costs—for example, in the healthcare sector [
2], but also in taxi services or car repair services [
3,
5]
Financial incentives and inefficiencies in credence goods markets are not solely driven by payment systems. In fact, the presence of insurance further exacerbates the economic inefficiencies in markets for credence goods [
4]. Studies have explored the role of insurance in providing financial incentives for providers to increase the volume of services they offer in these markets [
4,
13,
14,
15]. For example, in the healthcare industry, insurance coverage can give providers more freedom to increase their prices or the number of services they provide, as they are less bound by ethical and psychological considerations [
16]. Empirical evidence supports this notion, with studies showing that when patients are fully insured, providers tend to recommend more expensive treatments and drugs [
17,
18,
19]. Research has also found that providers are more likely to perform additional services when they are reimbursed by insurance or a third party compared to when patients are paying out of pocket [
13,
20]. However, the extent to which insurance affects provider behavior may differ between payment systems, as provider behavior is influenced by a multitude of factors, including financial incentives, institutional structures, and cultural norms. Our hypothesis is that the role of insurance in influencing provider behavior is greater in an FFS system as compared to a salaried system.
To assess the potential effect of different payment systems and of job interest or satisfaction on worker behavior in a non-healthcare context, we designed an experiment that was similar to the seminal Hennig-Schmidt et al. [
21,
22,
23] experiments but differed in that we use neutral framing and non-medical students. As described below, we also use a real-effort task for the providers to assess the effect of interest in the job, or job satisfaction, on provider behavior. Real-effort tasks allow us to test whether individuals will exhibit a degree of dedication towards tasks that are semi-enjoyable, even if doing so requires an investment of personal resources.
Additionally, in our experiment, hypothetical customers have two different types of insurance coverage. Our interest in service provision extends beyond the issues of over- and under-treatment. We are interested in the impact of the payment system and of semi-enjoyable tasks on the quality and quantity of services provided. We aim to assess whether any of the more altruistic behavior seen in a medical framing context can be seen in a non-medical context.
Our research demonstrates that payment systems have an impact on the behavior of providers in credence goods markets, in terms of both the quality and the number of services provided. We observed that the mean deviation from the optimal number of tasks required was lower in the salaried system compared to the FFS system. When accounting for all customer types, in the FFS treatment, customers received optimal services only 9.6% of the time, compared to 25.2% under the salaried treatment. However, when not accounting for customer type, we found that providers in the salaried treatment were under-providing services, while those in the FFS treatment were significantly over-providing. After controlling for the customer type, we observed that only in the salary treatment did we see a variation of provision decisions based on the customer type and the optimal number of services needed to maximize customer benefits. However, if a customer requires a higher number of services, providers in the salaried treatment were generally unwilling to meet their needs optimally, in terms of quantity. Interestingly, under FFS treatment, we do not notice a difference in provision decisions based on customers’ insurance status. In considering benefit losses and accounting for customer characteristics, we found significant differences between the two payment types, with losses for customers in the FFS treatment exceeding those in the salaried treatment. Interestingly, we find the quality of service to be superior under the salary system than under the FFS system. We notice the quality under FFS to deteriorate in the latter stages of the game, whereas under salary, quality improves as the game progress. We also note that, on average, experts in an FFS system perform more faulty procedures on customers without insurance. This was not possible by design in the experiment by Hennig-Schmidt et al. [
21].
Our research, conducted in a controlled laboratory environment, provides valuable insights into the management of credence goods markets, particularly regarding the impact of information asymmetries on market outcomes. Additionally, we examine the role that insurance and third-party payment systems play in shaping provider behaviors.
Balafoutas and Kerschbamer [
5] note that most of the early literature on credence goods was focused on the healthcare sector. They note that the general property of a credence good problem is one in which there is asymmetric information such that the provider knows more than the customer about what the customer truly needs. In these cases, there is the potential that either the provision of goods or services does not fit the needs of the customers or that the provider charges for goods or services that have not been provided [
5]. In this context, healthcare credence problems are not strongly different from other credence goods problems such as those involving auto mechanics or taxi drivers. A doctor, for instance, has the same profit incentive to engage in supplier-induced demand and prescribe unnecessary tests and treatments as a taxi driver may have to take a longer than necessary route to a destination. As noted, however, there is reason to suspect that ethical considerations in the medical profession may affect behavior and that, if left to make their own decisions, individual doctors might be less motivated by profit and more by patient wellbeing
2. Green [
25] notes that physicians are motivated to solve patient problems. Hennig-Schmidt et al. [
21] find, in experiments, that medical students are less inclined to focus on their own profit if doing so is detrimental to patient welfare. As such, our interest is to explore the reactions to a general service provider–customer credence good problem in order to compare the results to experiments based on a doctor–patient credence good problem.
Our findings hold relevance to general credence goods. They offer valuable information that can inform the design and assessment of credence goods systems and their impact on market outcomes and costs. By assessing the impact of differential payment systems on provider incentives, our study contributes to the credence good literature by isolating the impact of payment systems on service provision decisions and customer welfare. Methodologically, our study highlights the growing role that experimental economics plays in the economics of credence goods.
Our paper proceeds as follows. In
Section 2, we discuss the relevant literature briefly. In
Section 3, we introduce and explain our hypotheses.
Section 4 explains the experimental design.
Section 5 presents the results, and
Section 6 discusses our findings and offers concluding remarks.
3. Hypotheses
Hypothesis 1. Subjects assigned to fee-for-service will perform more tasks on average than those assigned to salary.
The payment models of fee-for-service and salary can influence the incentives and behavior of providers, leading to potential differences in the average number of tasks performed. In the fee-for-service system, providers’ profits are directly tied to the quantity of tasks performed. They receive financial compensation for each task, creating a financial incentive to increase the number of services provided. Providers may be motivated to maximize their earnings by performing more tasks, even if they are not strictly necessary or beneficial. This can result in an over-provision of services in the fee-for-service system, as providers aim to increase their revenue by performing as many tasks as possible. On the other hand, the salary payment model offers providers a fixed salary, regardless of the number of tasks performed. Providers receive a predetermined amount of compensation, independent of the quantity of services rendered. In this model, there may be less financial motivation for providers to increase the number of tasks, as their earnings are not directly linked to the quantity of services provided. This can potentially result in a lower average number of tasks performed in the salary system compared to in the fee-for-service system.
Hypothesis 2. Customers will face losses in both fee-for-service and salary systems.
In credence goods markets, where information asymmetry exists between providers and customers, customers face the risk of incurring losses due to inadequate or unnecessary services. Both fee-for-service and salary payment models can potentially result in losses for customers, albeit through different mechanisms.
In the fee-for-service system, providers are incentivized to maximize their profits by performing more tasks, as their earnings are directly tied to the quantity of services performed. This can create a risk of over-treatment, where providers may recommend or perform unnecessary tasks to increase their revenue. These unnecessary services can lead to losses for customers who bear the costs of the additional tasks without deriving corresponding benefits. On the other hand, in the salary system, providers receive a fixed salary regardless of the number of tasks performed. While this model may mitigate the risk of over-treatment, it does not guarantee protection against losses. Providers may be less motivated to go above and beyond their duties or invest additional effort into delivering high-quality services since their compensation is not directly tied to the quantity of tasks performed. This lack of incentive may result in the under-treatment or neglect of customer needs, which can also cause losses for customers. Accordingly, there will be no significant difference in the losses that customers face between fee-for-service and salary systems.
Hypothesis 3. Service provision decisions will not be influenced by customer needs and benefits.
Credence goods markets are characterized by information asymmetry, where providers possess specialized knowledge and expertise that customers may lack. In these markets, providers have the potential to overstate the need for treatments, which can be advantageous for them in terms of financial gain [
2,
38]. Arrow [
8] argued that medical ethics and pro-social motivations will constrain such behavior, but in a general framing context, we should not expect to see altruistic behavior. Considering the economic theory of agency and the potential for information asymmetry, our hypothesis speculates that service provision decisions in credence goods markets will not be significantly influenced by customer needs and benefits. Providers may prioritize their own interests and financial gains over aligning their decisions with the specific needs and benefits of individual customers. The issue of overstatement in credence goods markets arises from the inherent difficulty customers face in accurately assessing the quality and necessity of services. Providers, being more knowledgeable about the nature of the service, may exploit this information advantage to maximize their own profits. As a result, customer needs and benefits may not play a substantial role in shaping service provision decisions.
Hypothesis 4. The quality of services in the fee-for-service system will be no better or worse than that in the salary payment model.
The salary payment model offers providers a fixed salary regardless of the number of tasks performed. Without direct financial incentives tied to service quantity, providers may have less motivation to increase the quantity of services they provide. However, this does not automatically guarantee that the quality of services will be superior. Providers might lack the personal drive to go above and beyond or improve the quality of care if they do not perceive a direct benefit or reward for doing so. In contrast, providers in the FFS system have strong incentives to improve the quantity of services they offer. However, without repeated interaction between providers and customers, there is no incentive for quality. Furthermore, without a situation of personal benefit from the quality
10 of what they produce, providers have little internal motivation to increase quality [
7]. Additionally, in the fee-for-service payment model, providers are incentivized to maximize the quantity of services they offer in order to increase their profits. The more tasks performed, the higher their financial returns. However, this focus on quantity may not necessarily align with the quality of services provided. Providers may be motivated to prioritize the volume of tasks over the thoroughness or appropriateness of care, potentially leading to variations in quality. Considering these factors, our hypothesis suggests that the quality of services in the fee-for-service system will be comparable to that in the salary payment model.
Hypothesis 5. In the fee-for-service treatment, the mean deviation from the ideal number of tasks will be higher when experts serve customers with insurance coverage.
Credence goods markets, such as healthcare, often involve complex decision-making processes influenced by multiple factors, including insurance coverage. In healthcare, insurance coverage plays a crucial role in shaping provider decisions and patient outcomes [
4,
13,
20]. In the fee-for-service treatment, providers receive payment for each task or service they perform. The profit motive in this payment model incentivizes providers to increase the number of tasks performed in order to maximize their financial returns. However, when customers have insurance coverage, providers may perceive less financial risk and potential financial burden on the customers. As a result, providers may feel more inclined to recommend and perform additional tasks or services, even if they are not necessary [
16,
33]. In our experiment, customers have heterogeneous insurance coverage. Customers in the salary treatment pay a flat service fee regardless of the number of tasks, while customers in the fee-for-service system either bear the total cost of their services or have full insurance coverage. Accordingly, we hypothesize that, in the FFS treatment, the mean deviation from the ideal number of tasks will be higher when experts serve customers with insurance coverage.
4. Materials and Methods
The experiment was conducted at Tampere University DMLab between 24 October 2022 and 1 December 2022. A total of 160 students took part in the experiments. The subjects were recruited from Tampere University and Tammerkosken Lukio using the online recruiting system ORSEE. The subjects had the opportunity to choose a desired session from among 14 sessions. The experiments were designed and conducted with oTree [
39].
The experimental design was based on that of Karunadasa & Sieberg, which in turn builds on that of Hennig-Schmidt et al. [
21,
36]. However, our design makes some key changes to the theoretical model of Karunadasa & Sieberg and Hennig-Schmidt et al. [
21,
36]. Instead of squared differences between the number of treatments and the optimal value, in this experiment, payoffs to the customer monotonically decline with the distance from the optimal number. This was a conscious decision on our part to make the experiment similar to that of Hennig-Schmidt et al. [
21]. Additionally, to capture all elements of credence goods, we also allowed subjects to get paid, even when a task was performed incorrectly.
As we recruited students from all disciplines, our experiment, as stated above, used a neutral framing. Neutral framing in economic lab experiments provides advantages in terms of increasing the validity and reliability of the results. Given findings that medical providers may have different motivations from non-medical subjects [
21,
25], a neutral frame allows us to assess a general credence good problem. We therefore describe a situation between a service provider and a customer, where the service provider has the advantage of information asymmetry. The instructions used in our experiment are provided in
Appendix A.
In our experiment, instead of having our subjects simply choose a level of effort, we use real effort tasks. A real-effort task requires participants to perform some actual work, which could either be psychological, manual, or both, to achieve a certain outcome [
40]. This was done to make the effort real and either potentially costly or potentially interesting, depending on how they individually viewed it. Jokipalo and Sieberg [
41] found that a majority of subjects engaging in a similar real-effort task found it to be interesting or enjoyable. This is an important consideration in terms of understanding provider behavior in credence goods situations. The assumption that any provider behavior is costly is reasonable, but it overlooks the possibility that a provider may choose to engage in the effort anyway because the provider is motivated to do so, either because it is an intellectual challenge or because the work is enjoyable
11.
Real effort tasks are an advantageous method in economic lab experiments because they provide a slightly more realistic representation of behavior in real-world situations. In these tasks, participants are asked to perform a task that requires genuine effort, rather than simply answering hypothetical questions about what level of effort would be chosen or making decisions based on simulated scenarios. Additionally, real effort tasks can provide a more accurate representation of preferences and decision making under conditions of uncertainty or risk. The use of real effort tasks in economic lab experiments can therefore lead to a deeper understanding of economic behavior and the factors that influence decision making in the real world. There is a variety of real-effort tasks used in economic lab experiments [
42]. We used a repetitive decoding task, where participants were given a grid of letters and a decoding key and were required to convert numbers into letters.
Appendix B provides a screenshot of the decoding task.
Our experiment included two treatments: Salary and Fee for Service (FFS). In the Salary treatment, regardless of the number of tasks performed, the subjects received a flat wage of 13 ECUs per round. In the FFS treatment, the amount the subjects earned was tied to how many tasks they performed and varied between 0 and 23 ECUs. The exchange rate per ECU was 0.09. In both treatments, each completed task had a cost of 1 ECU, which was deducted from their payments to determine payoffs per round
12. This was a deliberate decision on our part to identify if customer conditions and the number of tasks required to optimally serve the customer would impact the behavior of providers, especially in the salaried system.
In all treatments, the subjects took on the role of a service provider who would be solving a problem a hypothetical customer faced. Each subject was randomly matched with a hypothetical customer in each round. The experiment included three different types of customers: type 1 with moderate conditions, type 2 with mild conditions, and type 3 with severe conditions. Depending on the type of customer, the optimal number of tasks that needed to be performed to solve the problem was different. However, the subjects could choose how many tasks they wished to perform, between and including 0 and 10.
The decisions made by each subject not only determined their own earnings but also the benefits to the customers they were serving. Customer benefits varied according to the customer type but followed a concave pattern. The point of benefit maximization depended on the customer type. The
Figure 1 below shows customer benefits according to the number of tasks and customer type. Drawing from Karunadasa & Sieberg [
36], depending on the treatment, the subjects were informed that their customers will bear a cost depending on the number of tasks performed. In the salary treatment, the subjects were notified that the customers bear a flat fee cost regardless of the number of services provided. In FFS, the subjects were informed that customers bear a cost for each service provided, and the cost of services was either borne by the customer or was covered through insurance
13. While customers in our experiment were hypothetical, 60% of the benefits to the customers were allocated to a fund supporting experimental research at Tampere University
14.
4.1. Experimental Procedure
An experiment session lasted approximately 60 min. Each participant was given a show-up fee of EUR 5. One session was allocated for one treatment, and in each session, subjects had the opportunity to earn a maximum of 156 ECUs. While the researchers knew which treatment would be conducted in which session, this information was not available to the participants. This allowed for a controlled between-subject comparison. After the participants had taken their seats, instructions were read aloud. Participants also had the opportunity to read the instructions on their computer screens. Depending on the session, instructions were given in either Finnish or English; however, the instructions on the screen were available only in English. After the instructions were read, the participants were given the opportunity to clarify any questions they would have.
The experiment included a total of 12 rounds of play. In each round, the subjects were matched with a different type of customer, and customer benefits varied depending on the type of the customer. Before the start of a new round, the subjects were informed of the customer type they were matched with and were shown a table containing the payment and costs per task, individual profits, and customer benefit. Throughout the experiment, the subjects had the opportunity to move to the next round if they had completed the number of tasks they wished to perform. The
Table 1 below summarizes the order of matching and the optimal number of tasks required in each round.
During each round, the subjects had to choose and perform several real-effort tasks that could be any number between and including 0 and 10
15. In each round, the decisions the subjects made determined their own profit and customer benefits. The
Table 2 below provides an overview of provider profits and customer benefits depending on the number of tasks performed.
At the end of 12 rounds of decision making, the subjects were asked to fill out a brief questionnaire before receiving the payments, explaining the factors influencing their decision making. The questionnaire is provided in
Appendix C.
4.2. Overview of the Dataset
The data analyzed included data for 158 participants, across 14 sessions. A total of 54% of the participants were assigned to salary treatment. The experiment followed a within-subject design, and each participant was assigned to either salary treatment or FFS. Each session included 12 rounds of play, and in each round, participants performed between 0 and 10 real-effort tasks. A total of 1884 observations were used in the final analysis
16. Statistical analysis of the experimental data was carried out through R statistical software [
43,
44,
45,
46,
47].
A total of 49.4% of the participants were female, while 41.1% were male
17. Among our subjects, 34% were from social sciences, 27% were from STEM fields, and 12.5% were high-school students
18. The
Table 3 below provides an overview of the demographic characteristics of our sample.
On average, a participant earned 116.5 ECUs, an equivalent of EUR 15.6 (Mode 15.64). The average earnings in FFS were 131.77 ECUs, equivalent to EUR 17 (Mode EUR 19.2), and they were 103.4 ECUs (EUR 14.4) in Salary (Mode EUR 15)
19. A total of 60% of the total earnings made by the participants, equivalent to EUR 896, were transferred to a research fund supporting experimental research at Tampere University.
5. Results
5.1. The Mean Deviation from the Optimal Number of Tasks Required Is Less under Salary Than under FFS
On average, across all rounds, 6.3 services were provided
20. To identify the impact of the remuneration system on service provision, we isolated the number of services provided according to the remuneration type. The
Figure 2 below provides a snapshot of the number of services provided.
Testing across all customers, providers under FFS largely deviated from the optimal number of services required. On average, providers under FFS provided 8.7 tasks (Mode 10, SD 2.7), which is significantly larger than the average number of tasks required (p < 0.0001, Wilcoxon signed-rank test). Of 888 decisions made under FFS, 769 decisions have a positive mean deviation between the number of tasks required and the number of tasks performed, and only 74 decisions had a difference of zero. The mean difference was negative for 45 decisions.
On average, providers remunerated under the salaried system provided 4.4 tasks (Mode 4, SD 3.1). Testing across all customers, providers under the salary treatment significantly underprovided procedures (p < 0.0001, Wilcoxon signed-rank test). Of 1020 decisions made by salaried providers, 553 decisions had a negative deviation from the number of tasks required, while 230 decisions had a positive deviation. The deviation between the required number of procedures and the number of procedures provided was zero for 257 decisions.
We tracked subject behavior across 12 rounds of play and observed that only two subjects completed tasks precisely equivalent to the ideal number of tasks. One subject was in the salary treatment, and the other was in FFS.
5.2. Unlike in FFS, Providers in Salary Increasingly Take Customer Needs into Account When Making Decisions
We notice that provision decisions under the salary system vary according to the customer type and the optimal number of services needed to maximize customer benefits, as shown in
Figure 3 below. Comparing the average number of services provided per customer type, providers in the salaried system optimally served customer type 1 (
p = 0.5, Wilcoxon signed rank test)
21 and significantly overserved customer type 2 (
p = 0.0006, one-sided Wilcoxon signed rank test)
22.
However, under FFS, customers were consistently overserved, regardless of their needs. Across all customer types, the average number of tasks provided was 8.7 (Mode 10), which was significantly higher than the average required number of tasks (p < 0.0001, Wilcoxon signed-rank test). We could observe the same effect when isolating individual customer types. For customer type 1, the ideal number of tasks required was 5; however, providers in our sample, on average, provided 8.7 tasks (p < 0.0001, one-sided Wilcoxon signed rank test). Customer type 2, which required three tasks to be served optimally, on average, provided 8.2 tasks (p < 0.0001, one-sided Wilcoxon signed rank test). No customer type 3 providers in our sample, on average, performed nine tasks when the required number was seven (p < 0.0001, one-sided Wilcoxon signed rank test).
5.3. Under Salary, Providers Avoid Costly Effort
While subjects in our salaried treatment largely took the needs of the customer into account when choosing the number of tasks to perform, as shown in
Figure 4 below, they, on average, underserved customer type 3. The optimal number of tasks required by customer type 3 was seven; however, on average, subjects in our salaried treatment only provided 4.3 tasks (
p < 0.0001, one-sided Wilcoxon signed rank test). Of the 340 decisions made on customer type 3, only 19 decisions had a deviation of zero between the number of required tasks and the actual number of tasks provided, while 269 decisions had a negative deviation.
5.4. Unlike in Salary Treatment, Providers in FFS Strive for Profit Maximization
The decisions made in the experiment involved a trade-off between providers’ payoffs and customers’ benefits. We therefore compared how providers’ payoffs deviate from the maximum possible payoffs in each round, according to the payment system. As shown in
Figure 5, in each round, regardless of the payment system, the maximum payoff a provider could receive is 13 ECUs, which is equivalent to 156 ECUs in total. We notice that under FFS treatment, on average, a provider earned 128.25 ECUs (Mode 156, sd 39.7), and under salary treatment, the mean payoff was 103 ECUs (Mode 110, sd 27.7). The difference in mean payoffs between the FFS treatment and the salary treatment is statistically significant (
p < 0.0001, Mann–Whitney U Test). As we tracked subjects across 12 rounds, we observed that 38 subjects earned the maximum possible ECUs, and we found all of them in the FFS treatment.
In each round of play, subjects were informed of how much they could earn based on the decisions they made. Throughout the game, subjects had the option to move to the next round once they had made the number of decisions they wanted to make
23 and moved on with the game, which allowed them to leave earlier than others still playing the game. This meant that although they had 10 potential tasks to perform per round, they could choose to perform fewer per round and leave the experiment when they decided they were done. We therefore assume that the tasks performed were performed either because the subject wanted to perform them or because they were motivated by their own gain or the customer’s gain.
Our assumption closely resonates with statements made in the self-reported post-experiment questionnaire. A total of 65.7% of the subjects assigned to the salary treatment mentioned they attempted to balance personal payoffs and customer benefits, while only 41.5% of the subjects in the FFS treatment mentioned the same. A total of 20% of the providers in the salary treatment also recognized customer benefits as a decisive factor when making decisions; however, only 3.8% of the subjects in the FFS treatment highlighted this. A total of 43.4% of the responders in the FFS treatment identified self-interest as a decisive factor. Interestingly, self-interest was not mentioned at all by those assigned to the salary treatment
24.
5.5. In a Fee-For-Service System, the Insurance Coverage of Customers Has No Impact on the Number of Treatments Providers Perform
If we combine all customer types and assume that the customers in the salary system have social insurance, then we notice that customers with insurance are significantly underserved compared to customers with insurance (
p < 0.0001). However, in our experiment, we describe the customers in the salaried system as paying a flat fee, and we otherwise never mention insurance. We therefore assume that if a customer is served by a subject in an FFS system, the customer pays a fee depending on how many services are provided. We make the assumption that insurance is only relevant when being served by a subject in the FFS system. In isolating for customers served under the FFS system, we do not see a significant association between how customers are served and their insurance coverage (
p = 0.15, Wilcoxon rank sum test). As shown in the
Figure 6 below, the mean number of tasks performed on a customer with insurance is 8.7 (Mode 10, sd 2.3), and the mean number of tasks performed on a customer without insurance is 8.8 (Mode 10, sd 2.4).
5.6. Salary Treatment Produces Better-Quality Services
In the experiment, FFS participants were compensated even when they performed tasks that were not carried out correctly
25. This allows us to test if payment systems affect the quality of services. An analysis of the average number of incorrect tasks performed by subjects across both treatments revealed that the subjects performed 0.2 incorrect tasks, on average. However, further differentiation of the data based on the payment system revealed that subjects assigned to the FFS treatment performed an average of 0.33 incorrect tasks, while those under the salary treatment performed an average of 0.12 incorrect tasks. This difference was found to be statistically significant (
p < 0.0001, Wilcoxon rank sum test). To investigate the impact of payment systems on the quality of work, we compared the proportion of faulty tasks performed by subjects under two different payment systems: fee-for-service (FFS) and salary. Our analysis included a two-sample test for the equality of proportions, which revealed that the proportion of faulty tasks as a fraction of total tasks was 0.037 under the FFS system, compared to 0.027 under the salary system. This difference was found to be statistically significant (
p = 0.0033, Pearson’s Chi-squared test). A closer examination of the data, as per
Figure 7 below, showed that the rate of incorrect tasks increased as the game progressed under the FFS treatment, while the opposite effect was observed under the salary treatment. When differentiating between customer types, the observed effect persisted. However, the effect size was the greatest for customer type 3. Under the FFS treatment, customer type 3 received an average of 0.32 incorrect treatments, compared to 0.08 incorrect treatments under the salary treatment (
p < 0.0001, as determined by a Wilcoxon rank sum test).
To determine the impact of insurance coverage on the quality of services, a comparison was made between the mean number of faulty tasks performed on customers with and without insurance in the FFS treatment. The results indicate a higher occurrence of faulty tasks when customers did not have insurance coverage. Specifically, customers with insurance received an average of 0.28 faulty treatments, while those without insurance received 0.37 faulty treatments. The difference was found to be statistically significant (p < 0.0001, Wilcoxon rank sum test).
5.7. Differences in Customer Benefits Losses under Different Payment Systems Are Significant
In analyzing profit maximization behavior and self-reported factors affecting providers’ decision making, we realize that customers’ benefits and benefit losses
26 are affected by the payment system. On average, across all customer types, the benefit loss is 1.3; however, in differentiating the payment types, we find significant differences in benefit losses, as shown in
Figure 8 below. The mean loss of benefits under salary treatment was 0.6 (Mode 0, sd 3.4), whereas under FFS, the mean loss was 3.7 (Mode 3, sd 2.7). In examining the difference in the mean benefit loss, we identify that significant differences exist between payment systems (
p < 0.0001, Wilcoxon signed-rank test).
By separating observations by round and payment systems, we notice an unusual increase in welfare loss for customer 3 in the latter rounds of play (round 12), which could be an indication of an end-game effect [
48]. The end-game effect refers to a situation where individuals interacting in a set number of rounds substantially reduce their contributions towards the end of the game. This increase could also be attributed to the occurrence of a learning effect.
6. Regression Analysis
We ran a multinomial logistic regression model to forecast service provision based on four predictor variables: customer type, insurance status and method of remuneration
27 and service quality
28. Service provision was categorized into three levels based on the difference between the number of tasks performed by the subject in each round and the actual number of tasks required. Observations, where the difference was zero, were categorized as “Optimal”, observations with negative differences were considered as “under-provision”, and observations with positive differences were treated as “over-provision”. We used the optimal level of service provision as the reference category for the dependent variable. This type of regression is appropriate when the dependent variable is nominal and has more than two categories that cannot be ranked in order of preference. The results are shown in
Table 4 below. By examining the results in
Table 4, we find that the service provision is significantly affected by the customer type and the remuneration method in our experiment. Additionally, we find a statistically significant relationship between over-provision and faulty tasks.
Based on the regression results, we notice that, in comparison to a type 1 customer, the odds of a type 2 customer being over-served is 0.6 (p = 0.001), while the odds of the same type being under-served is 0.3 (p < 0.001). In comparison to a type 1 customer, the odds of a type 3 customer being overserved is 1.9 (p = 0.005), whereas the odds of being underserved is 5.1 (p < 0.001). Controlling for customer type, the odds of a customer being overtreated is 10.6 (p < 0.001) if the customer is served by a subject remunerated via FFS as opposed to a flat salary. The odds of being under-served by a subject remunerated by FFS is 0.2 (p < 0.001). Additionally, we find that faulty tasks are more likely to happen under over-provision. According to our results, faulty tasks increase the odds of over-provision by 3.8, and this effect is statistically significant (p < 0.001). The odds of faulty tasks are considerably low in comparison to when services are under-provided; however, this effect is not statistically significant. Our model specifications are statistically significant (X2 = 1204.3; p < 0.0001). Overall, we explain about 55% of the variance in service provision status using the customer type, remuneration type, and quality of services.
Additionally, we used linear mixed-effect regression models to examine the effect of payment systems (FFS and salary) on a series of outcome variables (the total number of tasks completed, the deviation from the ideal number of tasks, and the rate of faulty tasks) while controlling for customer type. Our findings are summarized in
Table 5.
We found that providers in the FFS treatment performed 4.4 more tasks than those in the salary treatment, and this effect was statistically significant. However, the effect of customer type on the total number of tasks was not statistically significant, except for customer type 2. This leads us to argue that subjects complete more tasks when customers have a lower optimal service level.
Regarding the deviation from the ideal number of tasks, we found that providers in the FFS system deviated from the ideal number of tasks by 4.4 tasks, on average, which was significantly higher than the value of those in the salary system, and this deviation proves to be significant. We also observed that type 2 customers received 1.2 more procedures than their ideal number, while type 3 customers received 2.2 fewer procedures than their ideal level, and these effects were statistically significant.
Finally, we examined the rate of faulty tasks and found that experts performed 0.2 more faulty tasks under the FFS system than under the salary system. However, we did not find any significant impact of customer types on faulty tasks. These results provide important insights into the impact of payment systems and customer types on provider performance and can inform policy decisions.
7. Discussion and Conclusions
In this controlled economic experiment, we used a real-effort task to test the behavior of providers in credence goods markets based on how they are paid: FFS and salary-based. In the case of FFS, we tested if the insurance coverage of a customer would impact provider–customer interactions. Our aim was to examine if subjects would be purely affected by personal profit or if factors such as a concern for the customers’ wellbeing or an enthusiasm towards tasks that are semi-enjoyable would provide incentives to provide more or less costly effort, even if such activity would decrease their own payoffs.
We find that customers served by providers in an FFS system face significantly higher benefit losses than those in a salaried system. We further notice that providers in the fee-for-service treatment consistently significantly over-performed tasks. In fact, the over-provision rate in the FFS rate was 86.6%, in comparison to 22.5% under salary. We did observe the under-provision rate to be higher under the salary system: 52.3% compared to 5.1% under FFS; however, when differentiating for customer types, providers in the salary system did serve customers with moderate conditions optimally. Interestingly, customers with mild conditions were significantly overserved. While we notice that providers in the salaried system are influenced by the needs of the customers, we do not see customer needs as having any significant impact on the decisions made by providers in the FFS system. Our observations are corroborated through the self-reported post-experiment questionnaire, in which the majority of the subjects assigned to the salary treatment listed balancing personal pay-offs and customer benefits to be the decisive factor in decision making. In comparison, for the majority of the subjects assigned to the FFS treatment, self-interest was the critical factor in decision making.
Our analysis of the quality of services provided shows that the quality is significantly better in the salary treatment than in the fee-for-service treatment. On average, subjects in the FFS treatment performed 0.33 tasks incorrectly, while subjects in the salary treatment performed only 0.12 tasks incorrectly. Our results also reveal that the largest difference in quality is for customer type 3. Under the FFS treatment, customer type 3 received an average of 0.32 incorrect treatments, while under the salary treatment, they received only 0.08 incorrect treatments. We also observed that the number of faulty tasks performed by subjects in the FFS treatment increased as the game progressed, while the number of faults decreased in the salary treatment. This finding is consistent with previous studies by Keser & Schnitzler and Lagarde & Blaauw [
30,
49].
Our findings differ from previous evidence primarily in terms of profit maximization and providers’ responsiveness to customer needs. Studies by Henning-Schmidt et al. [
21] and Ahlert et al. [
31] demonstrate that, with medical framing, experts in both payment systems deviate from profit maximization to enhance customer welfare. However, in our study, without medical framing, we observe a different pattern. In the salaried treatment, providers appear to be influenced by customer-regarding behavior, while the opposite holds true in the case of FFS.
This discrepancy in results can be attributed to several differences in our experimental design. Again, unlike Hennig-Schmidt et al. [
21] and Ahlert et al. [
31], we employed a general subject pool and used neutral framing. That being said, our findings closely align with neutrally framed experiments conducted with similar subject pools. For instance, Ahlert et al. [
31], in their treatment involving economic students and neutral framing, did find subjects to be profit maximizers. Furthermore, Green [
25] demonstrated that retrospective payment systems such as FFS can diminish intrinsic motivations and promote profit-maximizing behavior. Similar to our approach, Green [
25] used a neutrally framed real effort task and a general subject pool. Bejarano et al. [
9], in their neutrally framed real-effort experiment, also found that, under an FFS system, providers tend to maximize profit and provide more services than under salary. Additionally, Karunadasa & Sieberg [
36] suggest that, in an FFS system, physicians will strive to maximize profit by increasing the number of services provided. They argue that a salaried system eliminates financial incentives for overtreatment, although overtreatment remains a possibility if a provider is highly concerned about patients’ welfare. The results of our experimental analysis further support these claims.
Surprisingly, our results do not show insurance to be a significant factor in provider–customer interactions in the FFS treatment. This finding largely contradicts experimental and empirical research before us. Sülzel & Wambach [
15], Balafoutas et al. [
13], and Huck et al. [
20] confirm that, in credence goods markets, such as healthcare, a positive correlation exists between the insurance coverage of a customer and the number of services provided. While Balafoutas et al. [
13] and Huck et al. [
20] utilized a subject pool and framing effect similar to ours, their findings regarding insurance significantly diverged from ours. Consequently, we presume that the observed effect in our study is independent of any design modifications we have implemented. Instead, it could potentially be attributed to an unaccounted random factor. It is plausible that the observed effect could be attributed to ambiguities or a lack of clarity in the instructions provided to the subjects, leading to potential misunderstandings or misinterpretations. However, we do see interesting differences in the quality of services for customers with and without insurance. In the FFS treatment, we found that customers without insurance received lower-quality care, as evidenced by the higher number of faulty tasks. Moreover, we observed that customers in the FFS treatment experienced more benefit losses. These findings combined raise significant questions. The combination of increased benefit losses and more faulty tasks in fee-for-service treatment for customers without insurance raises serious concerns about the accessibility and equity of services.
Our findings are of relevance in the context of public vs. private ownership and payment systems in markets for credence goods. The general consensus, especially in the non-medical service market, is that public ownership tends to be ineffective and inefficient in delivering credence goods. In a public ownership model, when providers are paid a flat fee, financial incentives tend to be weak or absent. These incentives, however, also have a benefit in that there is a lack of motivation for the overprovision of services or other costly actions [
6,
7]. The absence of financial incentives may also hinder providers from optimally responding to cases that require additional effort. Nevertheless, this type of ‘cream skimming’ could occur regardless of the ownership model and payment structures [
6,
7]
30.
Based on our results, we challenge the notion that sub-optimal treatment is a consequence of public ownership. Our experiment demonstrates that, even in the absence of medical framing, providers who operate under a salaried treatment system are motivated to treat their customers optimally, although it requires personal costs. However, we also observed that customers who require additional effort are often underserved by providers under the salaried system. Unlike previous studies [
21,
29], we are hesitant to claim that customers in need of intermediate or extensive attention are better served under an FFS system than under a salary. While it may seem that the number of tasks completed is the primary metric for evaluation, the quality of those tasks is just as important. Overprovision in an FFS system can lead to poor-quality services, particularly for customers with extensive needs. This highlights a significant disadvantage of incentives in private markets, where the drive to reduce costs can come at the expense of quality. This phenomenon has been noted previously by Hart, Shleifer, and Vishny [
5].
In closing, we are aware of the challenges associated with generalizing findings from lab experiments for the real world [
50]. We believe our experiment constructs a more realistic credence goods set-up, using, for instance, real-effort tasks and neutral framing. Yet, the findings of our experiment are limited by the constraints and abstracts of the lab. For instance, in a real-world situation, customer benefits may not be as explicit and obvious as in our experiment. In more delicate credence goods markets, it might be difficult to explicitly identify the optimal level of service. We suggest that future research on this matter should incorporate personal and contextual elements involved in customer–provider interactions and the framing effect to better understand the potential for supplier-induced demand in credence goods markets.