Payment Systems, Insurance, and Agency Problems in Healthcare: A Medically Framed Real-Effort Experiment

: Background : This study aims to examine the impact of different healthcare payment systems, specifically salary and fee-for-service (FFS) models, on service provision, patient welfare, and quality of care. The influence of payment models on healthcare delivery and patient outcomes, as well as how these models affect doctors’ decision-making based on patients’ insurance coverage, is not well understood. Methods : A medically framed real-effort task experiment was conducted. This study compared two payment systems: salary and FFS models. Key outcomes measured included the level of service provision, patient welfare, and quality of care. The analysis focused on how financial incentives and patient insurance coverage influenced healthcare decisions. Results: This study found overtreatment in FFS models and undertreatment in salary-based models. Healthcare decisions are significantly influenced by financial incentives and patient needs. Specifically, in FFS models, decisions are driven by self-interest, while in salary models, they are guided by patient needs. Within the FFS model, insurance coverage affects doctors’ decisions and patients’ benefits. Insured patients often receive unnecessary or incorrect procedures, indicating a supply-side moral hazard. Conclusions: Financial incentives and patient insurance coverage significantly influence healthcare decisions, with FFS models promoting self-interested decision-making and salary models focusing more on patient needs. This study contributes to the literature on supply-side moral hazard to health economics studies that use laboratory experiments to model medical decision-making.


Introduction
The healthcare industry's payment systems are a central pillar to the functioning of healthcare delivery, with vast implications for the quality-of-service provision and, ultimately, patient outcomes.Primarily revolving around fee-for-service (FFS) and salary models, these payment systems have the power to shape the direction, efficiency, and effectiveness of healthcare service provision.However, the intricate dynamics of these systems and their full impacts are not always clearly understood.While theoretical investigations of the impact payment systems have on healthcare provisions are well recorded, testing these hypotheses using real-life data has been tricky.This is mainly because it is not always immediately obvious what the ideal level of treatment for a given health problem should be; thus, it is harder to show whether a given treatment or test was necessary or not.Given such challenges, laboratory experiments offer a controlled environment where researchers can manipulate and observe various factors from payment systems to quality of care.This allows for a more nuanced understanding of how these payment systems influence healthcare provision.This paper follows a similar approach.This study is a medically framed real-effort task experiment to obtain a detailed understanding of FFS and salary payment systems in healthcare, particularly emphasizing their role in shaping service provision, patient welfare, and quality of care.
Healthcare is often viewed as a credence good from an economic perspective.Credence goods are intangible services, marked by inherent uncertainty, which makes it challenging for consumers to assess their quality, even after use [1].In such markets, providers possess an information and knowledge advantage.They can leverage this to boost service utilization by customers, sometimes exceeding their actual need, a phenomenon known as supplier-induced demand [2].
The inherent information imbalances in credence goods markets, such as healthcare, are further exacerbated by incentives different payment systems create.Payment systems such as FFS or salary can in certain instances result in inefficiencies, which may manifest as overtreatment (providing more goods or services than necessary), undertreatment (providing fewer goods or services than needed), or overpricing (charging for a higher quantity than delivered) [3,4].Further research on credence goods indicates that insurance can exacerbate these existing inefficiencies by further changing provider incentives.
Healthcare providers often stand out from other credence goods providers due to the unique, potentially life-or-death nature of their services.The Hippocratic oath's high ethical standards suggest that healthcare professionals should be more reliable than other credence goods market providers.However, Ellis and McGuire [2] portray healthcare providers as "double agents".They must balance the interests of patients and other stakeholders, considering patient benefits, fees, and service costs when making clinical decisions.In the healthcare industry, insurance coverage could allow for providers to raise their prices or increase their services, as insurance could make them are less constrained by ethical and psychological considerations [5].Empirical evidence supports this idea, with studies showing that providers tend to recommend costlier treatments and drugs when patients are fully insured [6][7][8][9][10].However, empirical research on the topic is mixed, given the complexity of the healthcare landscape.
As a result, a growing body of health economics research uses economic lab experiments to put this complex relationship between payment systems and healthcare decisionmaking under the microscope.That being said, experimental research on payment systems in healthcare is modest.Perhaps the most well-known among them is the experiment developed by Hennig-Schmidt et al. [11] where a chosen-effort experiment was used to understand how healthcare provision decisions differ under capitation and FFS systems.Their experiment was conducted within a medical context, using a subject pool comprising solely medical students.Although the patients in their experiment were hypothetical, the benefits earned were donated to charity.Many subsequent experimental healthcare payment system studies followed a similar design.
This experiment modifies the version presented by Karunadasa, Sieberg, and Jantunen [12], which itself builds upon Hennig-Schmidt et al. [11].This updated version incorporates design elements from both Green [13] and Lagarde and Blaauw [14].Specifically, to assess claims that a medical frame can impact behavior, this experiment uses a medically framed real-effort task with two treatments: FFS and salary and subjects are randomly assigned to the roles of both patients and doctors.Unlike in Green [13], patients in this experiment are passive, powerless recipients.Patients in the salary treatment are fully insured while patients in the FFS treatments have heterogenous insurance statuses.
Modifications made here are deliberate, not coincidental.In terms of methodology, one could argue that this experiment is similar to a dictator game where the doctor (dictator) decides the distribution of payoffs between themselves and a patient (respondent).The subgame perfect Nash equilibrium strategy for a dictator is to give nothing to a recipient, because, unlike in a similar ultimatum game (in which the recipient can either accept the proposed division or reject it, leaving both parties with nothing), the recipient cannot retaliate or otherwise affect the dictator's payoff.Research, however, shows that in most dictator games, dictators give roughly 29% of the total pie [15].While there are fewer equal splits of the pie than are generally seen in ultimatum games, there is a clear tendency for at least some dictators to take the well-being of responders into account.Eckel and Grossman's [16] study shows that while most people give a positive amount in a dictator game, they can be encouraged to give less if a double-blind treatment is used, where neither the recipient nor the experimenter knows the amount of money given.It also shows that more money is given if the recipient is perceived as "deserving".
As highlighted in other articles, earnings have been donated to charity.This is considered a "deserving recipient", as shown by Eckel and Grossman's [16] study.However, a crucial element here is the expectation.The charity does not know if it does not receive money.It is not in the same room, therefore unable to observe how individual subjects make decisions, and potentially leaves with very little.In Eckel and Grossman's experiment, although 23 subjects kept all of the money for themselves, 13 gave some positive amount to their responder counterparts, indicating some concern for their well-being or attempts to avoid completely disappointing them.
The dictator game gives important information about behavior, indicating that concern for counterparts can be possible even in the crudest of sharing scenarios.Although some subjects do take all of the money for themselves, not every subject is wholly motivated by personal profit maximization.It stands to reason that in a more intimate scenario, one can expect factors beyond profit maximization to play a role.Doctors do not operate in a dictator game scenario.They have real patients with genuine needs, whom they examine to discern their requirements.Patients know their doctors and can recognize at least some of the effort they put in.
This experiment attempts to create a more realistic situation by having half of the subjects assigned to the role of patients whose payoffs are purely determined by the decisions a doctor makes.This scenario forces the subjects to either strive for the best treatment or knowingly make someone else worse off to benefit themselves.They also know that the subject will discover the decision made, but not the identity of the doctor, after each round.
Medical framing adds context and emphasizes that healthcare is a unique type of credence good.Prior experimental research indicates that medical framing can reduce profit-maximizing behaviors [17][18][19][20].However, most of these studies use medical students as subjects, except for Ahlert et al. [21].This experiment applied medical framing to a general subject pool to understand its effects.The aim is to understand how payment systems influence decision-making in healthcare.The researchers acknowledge that making two alterations to the original design, as seen in Karunadasa, Sieberg, and Jantunen [12], complicates the establishment of a causal link between the design changes and their effects.This issue will be addressed in a future paper.
The presence of insurance further contributes to efforts aimed at creating a more realistic situation.The researchers are conscious of the fact that this experiment only considers two ends of the spectrum: fully insured or completely uninsured.In reality, most patients fall somewhere between these two extremes, with co-payments and deductibles moderating their level of insurance coverage.Nonetheless, this experimental design allows for the isolation and testing of the impact of these extremes on provider decision-making and the quality of care.
The modifications made to the design in Karunadasa, Sieberg, and Jantunen [12] make a more nuanced understanding of how payment systems influence healthcare provision possible.In this experiment, the researchers are not merely interested in over-and undertreatment.Rather, they also have an interest in the impact of the payment system and potentially enjoyable real-effort tasks on the quality and quantity of services provided.This study also aims to understand how the association between insurance and payment systems, specifically in an FFS system, impact patient welfare.
This experiment contributes to the growing body of experimental research in health economics in multiple ways.The researchers introduced a medically framed real-effort game involving passive patients.This helps us understand how payment systems influence the quality and quantity of care within healthcare settings.Also, this can determine how insurance modifies the link between payment systems and service provision by considering different insurance statuses.Results show overtreatment was more prevalent in FFS, while salary treatment saw more instances of undertreatment.Patients in the salary treatment received significantly more benefits than those under FFS.The average welfare loss per patient was higher in the FFS treatment.Also, quality of care was found to be higher in salary than in FFS treatment.The number of tasks performed in the FFS treatment was influenced by the presence of insurance.This paper proceeds as follows: Section 2 provides a brief overview of the related literature and Section 3 discusses methodology and data.Section 3 presents the results and Section 4 offers a discussion and conclusion.The health economics literature documents the role payment systems play in influencing physicians' behavior quite extensively.The association between income and activity is a key determinant of this behavior, with retrospective systems granting providers the autonomy to control their earnings, in contrast to fixed prospective systems [22,23].A common type of retrospective system is the FFS payment system.The FFS system is often associated with a tendency for overtreating.Similarly, salaried and capitation (CAP) systems, where providers receive a fixed payment per patient regardless of the level of care, can lead to undertreating, or the provision of fewer services than needed [24,25].The literature also suggests that the patient's benefit can play a significant role in influencing physician decision-making.This implies that despite the financial incentives inherent in different payment systems, physicians still consider patient welfare when deciding on the level of care [20].However, payment systems may inadvertently promote unintended behaviors.These include shorter visits, up-coding (billing for more expensive services than those provided), and patient selection based on health status.These behaviors can occur even when individual physicians do not directly benefit financially, demonstrating the indirect effects of payment systems on provider behavior [7,[26][27][28].This can especially be the case under corporate investments in primary care, which tends to focus more on the triple bottom line than patient welfare, and healthcare costs.The literature provides further insights into the impacts of FFS payments.A study by Vengberg et al. [26] found that this type of payment can stimulate shorter visits, up-coding of visits, and "skimming" of healthier patients, according to managers and salaried doctors at primary healthcare centers.In a study conducted by Gosden et al. [29] FFS payment resulted in more primary care visits, greater continuity of care, and higher compliance with the recommended number of visits.However, it was noted that patient satisfaction with access to their physician was lower compared to when salaried payment was used.Overall, the literature suggests that healthcare payment systems have significant implications for provider behavior, patient satisfaction, and the quality of care.

Overview of
A growing number of health economics studies have explored how payment systems affect healthcare provider behavior.Many of these studies draw on the pioneering experimental design used by Hennig-Schmidt et al. [11].Their work found that the provision of medical services is greater under a FFS system than under a capitation system, suggesting that patients requiring more intensive medical services are better cared for under FFS.Similarly, Brosig-Koch et al. [18] observed overprovision to be common under FFS and underprovision under capitation.Another study by Keser and colleagues [30] found that physicians under an FFS system tend to offer more medical care to patients with similar conditions than those under capitation.Green [13] found that while FFS results in the highest number of services provided, salaried systems deliver the highest quality of care.Similarly, Karunadasa, Sieberg, and Jantunen [12] show the overtreatment and subpar quality of services to be widespread under FFS.Their study highlights the idea that providers in fee-for-service systems may prioritize quantity over quality in their services, leading to more mistakes.Lagarde and Blaauw [14], in their study, found that while salary treatments resulted in the lowest number of services provided, they offered the best quality of care.

Insurance and Provider Behavior
One focus of the health economics literature addresses how insurance impacts healthcare utilization and expenditure from the patient's perspective.There is ample empirical evidence suggesting that insured patients often consume more healthcare than necessary, leading to moral hazard.However, alternative explanations focusing on the role of providers in fostering moral hazard are rather limited.Patients rarely make healthcare decisions independently.In most cases, doctors and healthcare providers significantly influence what kind and how much healthcare is consumed.Therefore, estimates of moral hazard should, realistically speaking, consider the role that physicians and other suppliers play in driving unnecessary healthcare consumption among insured patients.
Factors such as a patient's budget, professional integrity, and concern for the patient's well-being can moderate a doctor's ability to influence the demand for certain procedures and drugs.However, when a patient is insured, the impact of these factors may not be too significant, potentially leading to increased prices or service provision [5,31].Limited empirical evidence suggests that doctors tend to prescribe expensive drugs and treatments to fully insured patients, leading to a higher incidence of fraudulent, or at least unnecessary, behavior [6,8,9].
Experimental evidence on insurance coverage and provider behavior is limited, primarily focusing on credence goods markets outside of medical care.While parallels can be drawn theoretically, it is important to note that healthcare is different to other credence goods, simply due to the role of medical ethics.Ignoring this difference momentarily, experimental evidence suggests the positive correlation between insurance coverage and realized expenditure could be due to second-degree moral hazard.Using evidence from computer repair markets, Kerschbamer and colleagues [4] shows that insurance can induce second-degree moral hazard, resulting in inflated costs through fraudulent oversupply and overpricing.Similarly, a laboratory experiment by Balafoutas et al. [3] studying diagnostic uncertainty in insurance coverage in credence goods markets shows that insurance can increase instances of malpractice and decrease accuracy in diagnostics by experts.Karunadasa, Sieberg, and Jantunen [12], in their experiment, did not find any significant association between insurance and overprovision or quality of services.
To the best of our current knowledge, only two experiments have explored this issue from a healthcare perspective.Lu [9] conducted a natural field experiment to observe how a patient's insurance status impacts a doctor's behavior.The results showed that doctors expecting to receive a portion of patients' drug expenditures prescribed 43% higher-cost prescriptions to insured patients compared to uninsured ones, while doctors without a financial incentive did not respond to patients' insurance status.Huck and colleagues [32] conducted a lab experiment indicating that insured patients had more consultations and received more treatments than they ideally required.They hypothesized that this overtreatment is a result of doctors perceiving insured patients to be less concerned about the cost of treatments.
A large portion of the health economics experiments that look at payment systems and provider behavior in healthcare follows the seminal design presented by Henning-Schmidt et al. [11].Notable departures are the experiments conducted by Green [13], Lagarde and Blaauw [14], Green and Kloosterman [33], and Angerer et al. [17].Angerer et al. [17] aim to understand the effect of framing and subject pool in healthcare credence goods, not necessarily how payment systems affect healthcare decision-making.Green [13] and Green and Kloosterman [33] used a real-effort task of proofreading to examine how different payment methods impact provider decision-making.In her 2014 study, Green had subjects take on two roles: that of an expert (doctor) and a customer (patient).In the first phase, customers were asked to proofread 10 essays.The second phase involved subjects in the role of physicians providing proofreading assistance to the patients from the first phase, correcting any errors that the customers had missed.Green and Kloosterman [33] used a series of math quizzes, composed of SAT/PSAT questions, to investigate labor supply and performance in mission firms that produce credence goods, such as healthcare.It is important to note that these experiments were conducted under neutral framing without any specific reference to a healthcare context.Lagarde and Blaauw [14] conducted a medically framed real-effort task, asking subjects to enter patients' blood test results into a computer.They aimed to test the impact of different payment systems (FFS and salary) on productivity and the quality of outputs.In their experiment, patients were hypothetical.Instead, for each task completed correctly, ZAR 0.50 was donated to a charity of the subject's choice.The current experiment closely resonates with the designs of these experiments.
The current experiment deviates from the previously mentioned experiments for several reasons.Firstly, this experiment used a medically framed real-effort task to simulate a realistic situation.Similar to the task in Lagarde and Blaauw [14], this study focuses on a task that requires participants to perform actual work, either psychological or manual, to achieve a certain outcome.The goal was to examine whether a provider may choose to engage in the effort because they are motivated by an intellectual challenge or enjoyment in the work.The real-effort task also allows for the evaluation of the quality of care.Secondly, this experiment assigned subjects to the role of patients.However, unlike in Green [13], patients in this experiment are passive.It is acknowledged, again, that this is similar to a dictator game, but the researchers' concern in this study was how the presence of patients affected doctor behavior, in isolation from any market or reciprocity effects that could arise from active patients.Thirdly, this experiment used medical framing to emphasize that healthcare is a unique type of credence good, diverging from Hulk et al. [32].Fourthly, this experiment used patients with varying levels of insurance, specifically in the fee-forservice treatment, to better understand how insurance affects doctor-patient interactions.These differences highlight important contribution to the provider moral hazard debate in healthcare and experimental economics in healthcare.

Experimental Design
To examine the impact of payment systems on the quality and quantity of care in healthcare markets, this experiment used a modified version of the experiment detailed in Karunadasa, Sieberg, and Jantunen [12].Their experiment used a neutrally framed real-effort task to test how payment systems affect quantity and quality of care in credence goods markets.The experiment included two treatments: salary and FFS.In the salary treatment, regardless of the number of tasks performed, subjects received a flat wage of 13 ECUs per round.In each treatment, subjects are randomly assigned to the role of either a doctor or a patient.Patients in this experiment are passive and make no decisions that could affect their earnings.Doctors in each treatment make decisions that in turn affect their earnings and those of the patients.In FFS treatment, the amount subjects earned was tied to how many tasks they performed and varied between 0 and 23 ECUs.In both treatments, each completed task had a cost of 1 ECU, which was deducted from their payments to determine payoffs per round.In each round, subjects were randomly matched with a hypothetical customer.Before the start of a new round, subjects were informed of the customer type they were matched with and were shown a table containing payment and costs per task, individual profit and customer benefit.Subjects had the option of doing any number of tasks between 0 and 10.Payoffs to the customer monotonically decline with distance from the optimal number.This experiment in turn builds on Hennig-Schmidt et al. [11].
This experiment follows changes adopted in Karunadasa, Sieberg, and Jantunen [12] when calculating payoffs, in that it uses the profit function that they used.However, several design changes in this experiment are worth noting.More specifically, this experiment used medical framing in the instructions, informing participants that they were part of a decision-making game in a healthcare context that would last several rounds.Furthermore, some of the participants also assumed the role of patients.
This experiment had two treatments: salary and FFS.In each treatment, participants were randomly assigned to the role of either a doctor or a patient for the entire experiment, which comprised 12 rounds.In the salary treatment, subjects assigned to the role of a doctor received a fixed wage of 13 experimental currency units (ECUs) per round, irrespective of the number of tasks performed.In the FFS treatment, earnings were dependent on the number of tasks completed, ranging from 0 to 23 ECUs.The ECU-to-euro exchange rate was 0.09.In both treatments, each task had a cost of 1 ECU, deducted from income to determine round payoffs. 1  Subjects, acting as doctors, were tasked with solving health problems of subjects taking on the role of patients. 2Each subject was randomly paired with a new patient in each round.The experiment included three patient types: moderate conditions (type 1), mild conditions (type 2), and severe conditions (type 3).The optimal number of tasks needed to solve the problem varied depending on the patient type.However, subjects had the freedom to perform any number of tasks, from 0 to 10. Doctors' decisions influenced not only their earnings but also the benefits and earnings of their patients.Patient benefits, while dependent on their type, followed a concave pattern.The graph below illustrates patient benefits in relation to task number and patient type.Depending on the treatment, subjects were informed that their patients would incur costs based on the number of tasks performed.In the salary treatment, subjects were told that patients would pay a flat fee regardless of the services provided.In the FFS treatment, subjects were informed that patients would either pay for each service provided or have them covered via insurance. 3 In each round, participants acting as doctors could choose to complete between 0 and 10 decoding tasks correctly or incorrectly.If they chose to perform fewer tasks than required, they could move to the next round.Each round had a maximum duration of five minutes, after which it would automatically progress to the next round.The completion of tasks by the doctors determined the payoffs for both them and the patients.Doctors were paired with different patients in each round.The decoding task required participants to convert numbers into letters using a given grid of letters and a decoding key.An example of this task is provided in Appendix A. The experiments were designed using oTree [34] and all analyses were carried out via Rstudio version 4.2.2 for Windows [35][36][37].Figure 1 below shows patient benefits and doctor's payoffs in each round depending on the number of tasks completed.
experiment used medical framing in the instructions, informing participants that they were part of a decision-making game in a healthcare context that would last several rounds.Furthermore, some of the participants also assumed the role of patients.
This experiment had two treatments: salary and FFS.In each treatment, participants were randomly assigned to the role of either a doctor or a patient for the entire experiment, which comprised 12 rounds.In the salary treatment, subjects assigned to the role of a doctor received a fixed wage of 13 experimental currency units (ECUs) per round, irrespective of the number of tasks performed.In the FFS treatment, earnings were dependent on the number of tasks completed, ranging from 0 to 23 ECUs.The ECU-to-euro exchange rate was 0.09.In both treatments, each task had a cost of 1 ECU, deducted from income to determine round payoffs. 1  Subjects, acting as doctors, were tasked with solving health problems of subjects taking on the role of patients. 2Each subject was randomly paired with a new patient in each round.The experiment included three patient types: moderate conditions (type 1), mild conditions (type 2), and severe conditions (type 3).The optimal number of tasks needed to solve the problem varied depending on the patient type.However, subjects had the freedom to perform any number of tasks, from 0 to 10. Doctors' decisions influenced not only their earnings but also the benefits and earnings of their patients.Patient benefits, while dependent on their type, followed a concave pattern.The graph below illustrates patient benefits in relation to task number and patient type.Depending on the treatment, subjects were informed that their patients would incur costs based on the number of tasks performed.In the salary treatment, subjects were told that patients would pay a flat fee regardless of the services provided.In the FFS treatment, subjects were informed that patients would either pay for each service provided or have them covered via insurance. 3 In each round, participants acting as doctors could choose to complete between 0 and 10 decoding tasks correctly or incorrectly.If they chose to perform fewer tasks than required, they could move to the next round.Each round had a maximum duration of five minutes, after which it would automatically progress to the next round.The completion of tasks by the doctors determined the payoffs for both them and the patients.Doctors were paired with different patients in each round.The decoding task required participants to convert numbers into letters using a given grid of letters and a decoding key.An example of this task is provided in Appendix A. The experiments were designed using oTree [34] and all analyses were carried out via Rstudio version 4.2.2 for Windows [35][36][37].

Experimental Process
Experiments were conducted at Tampere University's DMLab from 27 September 2023 to 6 February 2024.The Ethical Review Committee of the Tampere Region granted ethics approval for this experiment.A total of 176 subjects participated in 13 sessions. 4 Eighty-nine subjects were assigned to the role of a doctor while eighty-seven were assigned as patients.The subjects, representing various disciplines within the Tampere University community, 5 were recruited using a convenience sampling method via the ORSEE online recruitment system.Subjects were given the choice to select a session from the 13 available options.In each session, subjects were randomly assigned the role of either a doctor or a patient.The researchers clarified at the start of each session that the experiment was simulating a medical decision-making context.In this context, the decisions made by the doctors would determine the payoffs for both them and their corresponding patients.This setup represents the concept of dual agency in healthcare, where doctors must balance the interests of patients and other stakeholders, considering patient benefits, fees, and service costs when making clinical decisions.To prevent doctors from compensating for losses in one round by altering their play in the next, doctors were randomly paired with a different patient in each round.Table 1 below summarizes the order of matching and the ideal number of tasks required per round.In the salary treatment, subjects were informed that patients would pay a flat fee, regardless of the number of services performed.In contrast, the FFS treatment involved patients paying a separate cost for each service provided.This cost was either paid directly by the patient or covered by insurance.Subjects were told in advance whether their patient would have insurance coverage.The presence of insurance coverage did not affect the optimal number of services required or doctor's profits but did influence patient benefits.
Each experimental session took about 90 min, with participants given a show-up fee of EUR 5.Each session was allocated to a single treatment, where subjects could earn up to 156 ECUs, equivalent to EUR 14 (in addition to the show-up fee).The researchers, but not the participants, knew which treatment was applied in each session beforehand, facilitating controlled comparisons between subjects.Once participants took their seats, instructions were read aloud and displayed on computer screens.Instructions were given in either Finnish or English depending on the session but were only available in English on the screens.After the instructions, there was a chance for participants to ask any questions.Instructions for the Salary sessions can be found in Appendix C. Following 12 rounds of decision-making, subjects completed a brief questionnaire about their decisionmaking factors before receiving payment.The questionnaire can be found in Appendix B. Quantitative data generated through the experiments were analyzed using non-parametric bivariate methods, specifically the Mann-Whitney U test and regression analyses.

Hypotheses
Hypothesis 1. Propensity for over(under)treatment is likely in FFS (salary) treatment.
As explained in the previous section, in both treatments, the number of tasks providers perform directly affects their profit.Profits increase with task volume in FFS, while in the salary model, profits decrease as tasks increase.This could lead to an overprovision of services in the FFS model, and an underprovision in the salary model.

Hypothesis 2.
Patient welfare losses may be equal in both fee-for-service and salary compensation structures.
Expanding on hypothesis 1, both treatments could lead to patient welfare losses due to unnecessary treatments or unmet needs, as providers may be motivated by profit.However, the researchers do not anticipate a difference in welfare losses between the two treatments.Hypothesis 3. Physicians factor in customer needs and benefits in their decision-making process, regardless of their impact on profit optimization.
Information asymmetry in healthcare can lead to providers overstating treatment needs for their benefit.However, medical ethics and pro-social motivations may limit such behavior.The hypothesis is that customer needs and benefits influence providers, even when it might not align with their profit-maximizing choice.
Hypothesis 4. The quality of services in the fee-for-service model will not differ from the salary model.
Salary model providers receive a fixed income, providing no financial incentive to increase service quantity or quality.FFS model providers are incentivized to increase service quantity.However, without repeated doctor-patient interactions or personal benefit from their work's quality, doctors might lack motivation to improve quality.Hypothesis 5.In the fee-for-service model, the mean deviation from the ideal number of tasks will be higher when serving insured customers.
The literature shows that insurance plays a significant role in shaping clinical decisions.When patients are insured, physicians might feel less ethically constrained and increase their prices or services.In this experiment, patients have varying insurance coverage.Salary model patients pay a flat service fee regardless of task number, while FFS model patients either bear their services' total cost or have full insurance coverage.The hypothesis is that in the FFS model, the mean deviation from the ideal number of tasks will be higher when serving insured patients.

Payment Systems and Quantity of Care
Table 2 below provides summary statistics of earnings for doctors and patients by payment system.Overall, regardless of the payment type, subjects assigned to the role of a doctor earned 108 ECUs, equivalent to EUR 9.8.Doctors in the FFS treatment earned 106 ECUs (EUR 9.6) while doctors in the salary treatment earned 111 ECUs (EUR 10).This difference was not statistically significant (Wilcoxon rank sum test, p = 0.24).On average, patients, regardless of the treatment, earned 69 ECUs (EUR 6.2).The results show patient earnings to be higher in the salary treatment.On average, patients in a salary treatment received 80.4 ECUs (EUR 7.3) in benefits, while patients in the FFS treatment received 61.85 ECUs (EUR 5.6).The difference was statistically significant (Wilcoxon rank sum test, p < 0.0005).Looking at the number of tasks performed under each payment system, the results show a tendency for overtreatment in FFS while undertreatment was more prevalent in salary.On average, regardless of the patient type and needs, doctors in the FFS treatment performed 7.55 tasks in comparison to 3.75 by doctors in the salary treatment, and this difference was statistically significant (Wilcoxon rank sum test, p < 0.0005).Dissecting the number of tasks according to patient types gives a slightly different picture, as seen in Figure 2 below.
show a tendency for overtreatment in FFS while undertreatment was more prevalent in salary.On average, regardless of the patient type and needs, doctors in the FFS treatment performed 7.55 tasks in comparison to 3.75 by doctors in the salary treatment, and this difference was statistically significant (Wilcoxon rank sum test, p < 0.0005).Dissecting the number of tasks according to patient types gives a slightly different picture, as seen in Figure 2 below.To be treated optimally, patient type 1 required five tasks; patient type 2, three tasks; and patient type 3, six tasks.On average, in FFS treatment, patient type 1 received 7.6 tasks, type 2 received 6.6 tasks, and type 3 received 8.4 tasks.In salary treatment, in comparison, patient type 1 received an average of 4 tasks, type 2 received 2.9 tasks, and type 3 received 4.2 tasks.In FFS, the overtreatment effect was significant for all patient types (Wilcoxon rank sum test, p < 0.0005).In salary, undertreatment for patient types 1 and 3 was statistically significant (Wilcoxon rank sum test, p < 0.0005), whereas the difference between the ideal number of tasks and the actual average for patient type 2 was not statistically significant (Wilcoxon rank sum test, p = 0.05).This indicates that overtreatment is prevalent in FFS while undertreatment is likely in salary (Hypothesis 1).
As shown in Figure 2, the average number of decisions per round for both treatments roughly follows the trendline for the ideal number of tasks needed.This finding is quite significant.Keep in mind that in this experiment, effort comes with a price.For each additional task a doctor performs, there is an extra cost deducted from their income.If doctors were focused solely on profit, those in the salary treatment would not perform any tasks.They receive a steady salary of 13 ECUs per round, with the cost of tasks performed subtracted from this salary to determine their profit.More tasks would mean higher expenses and lower profits.On the other hand, under the FFS model, doctors earn more by performing more tasks.Therefore, theoretically, regardless of patient needs, doctors under FFS should perform the maximum number of tasks per round.But that is not what can be seen in the data.Results show overtreatment in FFS and undertreatment in the salary model, but the number of tasks performed per round generally aligns with patient requirements.This suggests that service delivery is influenced by patients' needs (Hypothesis 3).

Payment Systems and Quality of Care
A key advantage of using a real-effort task in this experiment is it enables the monitoring of failed tasks.Therefore, the number of faulty tasks per session was used as a proxy quality of service when testing the relationship between payment systems and care quality.As seen in Figure 3 below, on average, subjects in the FFS treatment made 0.25 faulty To be treated optimally, patient type 1 required five tasks; patient type 2, three tasks; and patient type 3, six tasks.On average, in FFS treatment, patient type 1 received 7.6 tasks, type 2 received 6.6 tasks, and type 3 received 8.4 tasks.In salary treatment, in comparison, patient type 1 received an average of 4 tasks, type 2 received 2.9 tasks, and type 3 received 4.2 tasks.In FFS, the overtreatment effect was significant for all patient types (Wilcoxon rank sum test, p < 0.0005).In salary, undertreatment for patient types 1 and 3 was statistically significant (Wilcoxon rank sum test, p < 0.0005), whereas the difference between the ideal number of tasks and the actual average for patient type 2 was not statistically significant (Wilcoxon rank sum test, p = 0.05).This indicates that overtreatment is prevalent in FFS while undertreatment is likely in salary (Hypothesis 1).
As shown in Figure 2, the average number of decisions per round for both treatments roughly follows the trendline for the ideal number of tasks needed.This finding is quite significant.Keep in mind that in this experiment, effort comes with a price.For each additional task a doctor performs, there is an extra cost deducted from their income.If doctors were focused solely on profit, those in the salary treatment would not perform any tasks.They receive a steady salary of 13 ECUs per round, with the cost of tasks performed subtracted from this salary to determine their profit.More tasks would mean higher expenses and lower profits.On the other hand, under the FFS model, doctors earn more by performing more tasks.Therefore, theoretically, regardless of patient needs, doctors under FFS should perform the maximum number of tasks per round.But that is not what can be seen in the data.Results show overtreatment in FFS and undertreatment in the salary model, but the number of tasks performed per round generally aligns with patient requirements.This suggests that service delivery is influenced by patients' needs (Hypothesis 3).

Payment Systems and Quality of Care
A key advantage of using a real-effort task in this experiment is it enables the monitoring of failed tasks.Therefore, the number of faulty tasks per session was used as a proxy quality of service when testing the relationship between payment systems and care quality.As seen in Figure 3 below, on average, subjects in the FFS treatment made 0.25 faulty tasks, while those in the salary treatment made 0.07 faulty tasks.This difference was statistically significant (Wilcoxon rank sum test, p < 0.0005).Within each treatment type, researchers looked at if the patient type affected the number of faulty tasks.Essentially, the objective was to understand if patients with higher needs received more faulty tasks.In the FFS, there was no notable association between patient type and the number of faulty tasks (analysis of variance, p = 0.38).The same effect was observed in the salary treatment (analysis of variance, p = 0.12).However, the rate of incorrect tasks increased as the game progressed under the FFS, while the opposite happened under salary. 6 searchers looked at if the patient type affected the number of faulty tasks.Essentially, the objective was to understand if patients with higher needs received more faulty tasks.In the FFS, there was no notable association between patient type and the number of faulty tasks (analysis of variance, p = 0.38).The same effect was observed in the salary treatment (analysis of variance, p = 0.12).However, the rate of incorrect tasks increased as the game progressed under the FFS, while the opposite happened under salary.It is important to note that in this experiment, doctors received payment even when they performed a faulty task.They were randomly paired with different patients in each round, eliminating the possibility of reputation-building or repeated interactions.The objective was to understand how doctors would behave under different payment systems even when there was no added benefit from the quality of their work or reputation building.Findings suggest that doctors on salary seem motivated to improve quality, even without repeated doctor-patient interactions or personal financial benefit from their work's quality (Hypothesis 4).This further reinstates the idea that doctors' decisions are influenced by patients' needs (Hypothesis 3)

Insurance and Quantity and Quality of Care
In this experiment, patients in the salary treatment paid a flat service fee regardless of the number of tasks they recieved.However, in FFS system, patients had varying insurance rates.They either covered all tasks out of pocket or were fully insured.Insurance presence did not alter the ideal number of services required or the doctors' profits, but it did affect customer benefits.
To test whether doctors felt less compelled to limit the number of tasks provided when a patient is insured, researchers compared the total number of tasks performed on insured patients in the FFS treatment with those on uninsured patients.The average number of tasks from insured patients, regardless of their type, was 7.8, while uninsured patients presented with 7.3 tasks.This difference was statistically significant (Wilcoxon rank sum test, p = 0.0005).This effect was observed for patient types 1 and 3. Type 1 insured patients (for whom the ideal number of tasks was 5) presented an average of 8 tasks, compared to 7.14 for uninsured patients (Wilcoxon rank sum test, p = 0.005).Although type 2 insured patients (with an ideal task number of 3) presented 6.7 tasks on average compared to 6.1 for uninsured patients, the difference was not statistically significant (Wilcoxon rank sum test, p = 0.7).Type 3 insured patients (ideal number of tasks 7) presented with an It is important to note that in this experiment, doctors received payment even when they performed a faulty task.They were randomly paired with different patients in each round, eliminating the possibility of reputation-building or repeated interactions.The objective was to understand how doctors would behave under different payment systems even when there was no added benefit from the quality of their work or reputation building.Findings suggest that doctors on salary seem motivated to improve quality, even without repeated doctor-patient interactions or personal financial benefit from their work's quality (Hypothesis 4).This further reinstates the idea that doctors' decisions are influenced by patients' needs (Hypothesis 3)

Insurance and Quantity and Quality of Care
In this experiment, patients in the salary treatment paid a flat service fee regardless of the number of tasks they recieved.However, in FFS system, patients had varying insurance rates.They either covered all tasks out of pocket or were fully insured.Insurance presence did not alter the ideal number of services required or the doctors' profits, but it did affect customer benefits.
To test whether doctors felt less compelled to limit the number of tasks provided when a patient is insured, researchers compared the total number of tasks performed on insured patients in the FFS treatment with those on uninsured patients.The average number of tasks from insured patients, regardless of their type, was 7.8, while uninsured patients presented with 7.3 tasks.This difference was statistically significant (Wilcoxon rank sum test, p = 0.0005).This effect was observed for patient types 1 and 3. Type 1 insured patients (for whom the ideal number of tasks was 5) presented an average of 8 tasks, compared to 7.14 for uninsured patients (Wilcoxon rank sum test, p = 0.005).Although type 2 insured patients (with an ideal task number of 3) presented 6.7 tasks on average compared to 6.1 for uninsured patients, the difference was not statistically significant (Wilcoxon rank sum test, p = 0.7).Type 3 insured patients (ideal number of tasks 7) presented with an average of 8.6 tasks compared to 8.2 for uninsured patients, a statistically significant difference (Wilcoxon rank sum test, p = 0.04).
To understand how insurance affects overtreatment, researchers examined the number of additional tasks performed for patients with and without insurance.On average, patients with insurance had 3.1 additional tasks performed, while those without insurance had 2.8.This difference was not significant (Wilcoxon rank sum test, p = 0.1), irrespective of patient type.However, when differentiated by patient type, the results varied.Type 1 patients with insurance had 3.4 additional tasks performed versus 3 tasks for those without insurance.This difference was not statistically significant (Wilcoxon rank sum test, p = 0.1).Type 2 patients with insurance had 4.4 additional tasks performed compared to 3.4 for those without insurance.This difference was statistically significant (Wilcoxon rank sum test, p = 0.01).For Type 3 patients, those with insurance were treated for an average of 2.1 additional tasks compared to 1.7 for those without.This difference was statistically significant (Wilcoxon rank sum test, p = 0.01).These findings confirm that under a FFS system, the deviation from the ideal number of tasks is likely to be higher when treating insured patients, supporting Hypothesis 5.
To examine the impact of insurance on precision and quality of care, the number of erroneous tasks performed on insured versus uninsured patients in the FFS treatment was compared.On average, insured patients experienced 0.28 faulty tasks, compared to 0.21 faulty tasks for uninsured patients.This difference was not statistically significant (Wilcoxon rank sum test, p = 0.25).Furthermore, there was no significant effect between insurance status and quality of care when the analysis was differentiated by patient type.

Payment Systems, Insurance, and Patient Welfare
To test how payment systems impact patient welfare, patient benefits under different treatments were compared.Welfare loss was estimated by comparing the expected number of tasks in each round with the actual number completed.In this experiment, participants could complete anywhere from 0 to 10 tasks.This meant that some subjects may have completed fewer or exactly the required number of tasks.Any deviation from the ideal number was considered a loss.In instances where the actual number of tasks was less than the required number, the welfare loss was negative.These differences were then converted into absolute values to gauge the magnitude of the losses more accurately.In this experiment, patients are passive participants and do not make decisions.Their benefits, similar to a real-world healthcare setup, are entirely determined by doctors.Table 3 below summarizes welfare losses for each treatment.The average welfare loss per patient, irrespective of type, was 3.71 ECUs across all rounds.In the fee-for-service treatment, the average absolute welfare loss per patient was 4.85 ECUs, while for the salary treatment, it was 2 ECUs (Hypothesis 2).This difference was statistically significant.(Wilcoxon rank sum test p < 0.0005).
Upon analyzing patient types, the results show that type 1 patients experienced an average welfare loss of 5.18 ECUs under the FFS treatment.Type 2 patients recorded an average loss of 4.76 ECUs, and type 3 patients 4.6 ECUs.Ideally, there should be no welfare loss in an optimal treatment scenario.When compared to this ideal scenario, all losses experienced by patients in the FFS treatment were statistically significant (Wilcoxon rank sum test p < 0.0005).
In comparison, under the salary treatment, type 1 patients experienced an average welfare loss of 1.77 ECUs, type 2 patients 1.20 ECUs, and type 3 patients, who faced the largest loss, averaged 3 ECUs.These losses were also statistically significant (Wilcoxon rank sum test p < 0.0005).
To test if insurance influences the average welfare loss, insured and uninsured patients in the FFS treatment were compared.Insured patients experienced an average welfare loss of 5.66 ECUs, while uninsured patients faced a loss of 4.69 ECUs.This difference was statistically significant (Wilcoxon rank sum test p = 0.01).For type 1 patients, the welfare loss was 5.66 ECUs for the insured and 4.69 ECUs for the uninsured, a significant difference (Wilcoxon signed rank test, p = 0.009).For type 2 patients, the welfare loss was 4.95 ECUs for the insured and 4.57 ECUs for the uninsured, but this difference was not significant (Wilcoxon signed rank test, p = 0.26).Type 3 insured patients experienced a welfare loss of 5.07 ECUs, compared to 4.13 ECUs for uninsured patients, a significant difference (Wilcoxon signed rank test, p = 0.002).
Researchers also investigated if the order of treating insured and uninsured patients influenced providers' decisions.Specifically, they examined whether treating an uninsured patient before an insured one would make doctors more conscious of their task number.However, there was no significant order effect (Wilcoxon rank sum test p = 0.38).

Regression Analysis
A series of multi-level models were conducted to understand how payment systems and patient types affect the number of tasks, welfare losses, and precision while controlling for individual characteristics.The initial analysis excluded insurance from the analysis at this stage because it would not be applicable in the salary context.In the salary treatment, all patients had insurance, meaning they paid a flat rate irrespective of the number of tasks performed.The relevance of insurance status would only apply in the FFS context, where patients had varying insurance statuses.This analysis focused on three different outcome variables: the number of tasks, the number of faulty tasks, and the difference between the required and actual number of tasks.Given the nature of the outcome variables, a mix of Poisson multi-level models and a linear multi-level model were used.Specifically, Poisson multi-level models were used for the number of tasks and the number of failed tasks, as these are count variables [38].For the difference between the ideal and actual number of tasks, a linear multi-level model was used.The Poisson regression estimates a panel data random-effect model in the following form: Here, λij represents the outcome of interest for the ith observation in the jth individual.FFS is a dummy variable for treatment, where 1 represents FFS and 0 stands for salary.Patient type is a categorical variable denoting the type of patient.Zij represents individuallevel controls, and u 0j is the random intercept at the individual level.
The linear multi-level model, which uses the difference between the ideal and actual number of tasks, takes the following form: where Yij is the outcome for the ith observation in the jth individual and εij is the residual error term.This term represents the deviation of the ith observation from the predicted value, based on the fixed and random effects.
Table 4 presents the results of the regression models.Column 1 shows the effect of payment systems on the number of tasks performed.The rate ratio (RR) for tasks performed under the FFS system is estimated to be 2.3 compared to the salary system, supporting Hypothesis 1.
Column 2 indicates that subjects in the FFS system perform faulty tasks approximately 2.9 times more than those in the salary system, contradicting Hypothesis 4. Also, doctors treating type 2 and type 3 patients are expected to make fewer faulty tasks compared to the reference patient type, regardless of the payment type.
Column 3 examines the impact of payment systems on the difference between the ideal and actual number of tasks.Subjects under FFS treatment overtreated their patients by performing one additional task, confirming Hypothesis 1. Holding payment type constant, subjects serving type 2 patients performed 0.19 additional tasks; however, this effect was not significant.When serving type 3 patients performed, subjects did 0.23 fewer tasks.Column 4 explores the relationship between payment systems and welfare loss.Patients in the FFS system had a 2.8 higher expected welfare loss compared to those in the salary system, contradicting Hypothesis 2. However, this does not mean that patients in the salary system did not experience any welfare loss.Patient type 2 experienced a 0.48 lower welfare loss than patient type 1, regardless of the payment system, while patient type 3 had a 0.17 higher welfare loss, although this was not significant.
The analysis was then repeated, focusing only on FFS to understand how insurance affects provider behavior and patient welfare.The results are summarized in Table 5 below.Column 1 shows the relationship between insurance status and the number of tasks completed in the FFS treatment.The data reveal that doctors with insured patients complete about 1.058 times more tasks than for uninsured patients, assuming other variables remain constant.Column 2 discusses the influence of insurance on the quality of tasks performed.It suggests that insured individuals have an expected count of failed events that is 1.323 times higher than uninsured individuals, again holding other variables constant.Column 3 focuses on how insurance impacts the deviation from the ideal number of tasks.It shows that doctors with insured patients complete an additional 0.24 tasks compared to those for patients without insurance.This supports Hypothesis 5 that the mean deviation from the ideal number of tasks will be higher when treating insured patients.Finally, Column 4 presents the effect of insurance on welfare loss.The data indicate that insured patients experience an additional 0.76 loss compared to uninsured patients in the FFS treatment.These findings imply that a patient's insurance status affects both the quantity and quality of healthcare services offered under an FFS model.This could potentially lead to disparities in care.Providers might allocate more resources to insured patients, thereby impacting task completion and care quality.

Discussion and Concluding Remarks
This paper presents the results of a medically framed laboratory experiment with a real-effort task to understand how payment systems, specifically FFS and salary, affect service provision, patient welfare, and quality of care.This experiment also allowed for us to test how insurance affects doctors' decision-making in an FFS system.
The experimental findings provided substantial evidence in favor of three out of five hypotheses that were built on theory and evidence presented in Karunadasa, Sieberg, and Jantunen [12].Specifically, it shows the prevalence of overtreatment within FFS models, while undertreatment is found to be common in salary-based models.This difference between the systems highlights the factors shaping healthcare delivery, with financial incentives playing a key role.In addition, the healthcare decision-making process appears to be, as it should, influenced by the needs and perceived benefits of the patient, and this observation is particularly evident in the salary system.While the results show providers in the salaried system to be influenced by the needs of the patients, they do not indicate patient needs have any significant impact on the decisions made by providers in the feefor-service system.These findings are confirmed by a post-experiment survey.In this survey, most participants from the salary treatment group stated that balancing personal rewards and patient welfare was their main consideration in decision-making.In contrast, most participants from the FFS treatment group identified self-interest as the key factor influencing their decisions.The results also show that doctors within the FFS system tend to perform more tasks than required when treating insured patients.The results also found insurance to play a pivotal role in shaping doctors' decisions and patients' benefits in FFS treatment.Specifically, insured patients receive more unnecessary tasks and faulty tasks.These findings collectively highlight the complex interplay between payment systems, third-party payers, provider behavior, and patient welfare in a healthcare system.
These results fail to support two of the hypotheses.Firstly, they show that welfare losses to patients are significantly higher within the FFS model compared to those in salary.Welfare losses in the context of this experiment stem from over(under)treatment and performing tasks incorrectly.They also show significant discrepancies between welfare losses of different patient types, indicating that providers could be cream-skimming patients.The only type of patient who was optimally served during the whole experiment was type 2 patients (patients with mild conditions) by doctors in the salary system.These patients experienced the lowest welfare loss of all three patient types in this study.In contrast, insured type 3 patients suffered the highest welfare loss due to excessive treatment and compromised care quality.The observation that welfare losses in FFS exceed those in salary is particularly concerning, as in a real-world situation, this could compromise patient health outcomes and safety, leading to adverse effects, complications, or prolonged illness.These losses could also result in increased healthcare costs, exacerbating financial burdens on patients and healthcare systems.Secondly, there are significant disparities in the quality of care between the two systems.Particularly, subjects in the FFS perform more incorrect tasks.Additionally, these results show quality of care deteriorates when patients are insured.The implications of these results can be alarming.Suboptimal care and higher welfare losses may erode patient trust in healthcare providers and the healthcare system.Patients may feel dissatisfied with their care experiences, leading to decreased patient satisfaction and engagement in their healthcare management.
The findings of this experiment carry several implications that command careful consideration within the discourse surrounding healthcare policy and practices.Firstly, these results show that the quality of services is lower in an FFS system.This is particularly relevant in a time where recent trends of governments covering the bulk of healthcare costs are highly unlikely to continue.This highlights a significant disadvantage of incentives in private markets, where the drive to reduce costs can come at the expense of quality.This phenomenon has been noted previously by Hart, Shleifer, and Vishny [39], and it is particularly relevant in the context of healthcare.Secondly, these results show that providers in a salary system avoid costly effort and costly patients.In situations where medical professionals are paid a fixed monthly salary in a public healthcare system, a lack of financial incentives might lessen their motivation to go above and beyond.This could potentially result in hesitation to take on cases that require extra effort [39,40].However, the tendency to selectively choose cases or "cherry-picking" may happen irrespective of the ownership model or payment structure.Thirdly, insurance affects provider behavior, not only in terms of quantity of care but also the quality of care.
Theoretically, this challenges the mainstream health economics perspective on moral hazard, which suggests that when insured, patients consume excessive healthcare as insurance makes the costs of healthcare negligible.Specifically, findings from FFS treatments for both insured and uninsured patients indicate a trend of overtreatment by individuals acting as doctors in our study.This aligns with previous concerns that FFS can motivate supplierinduced demand, leading to unnecessary treatments.This is not merely inefficient-it could also have adverse effects on patients.If this overtreatment is considered moral hazard, the patient-despite not having requested the treatment-bears the blame and deterrent costs rather than the doctor.Practically, these results raise serious concerns about the accessibility and equity of healthcare.In particular, the results presented here raise a question of the extent to which private markets can be relied upon for effective universal health coverage.This study's findings primarily diverge from prior evidence in regard to profit maximization and providers responding to customer needs.Unlike several past studies [11,13,21], which found doctors often prioritize patient welfare over profit, the findings of this study suggest otherwise.In the salary treatment, this seems to be the case, but doctors in the FFS treatment tend to overprovide services, hinting at profit-maximizing behavior.Karunadasa, Sieberg, and Jantunen [12] observed similar results.
The findings of this study also support earlier experimental evidence regarding provider moral hazard [32,41,42].These findings reveal a positive correlation between insurance and the volume of services doctors provide in the FFS treatment.Additionally, these results suggest that insurance may increase instances of faulty tasks, aligning with previous experimental evidence from the credence goods literature on how insurance can reduce diagnostic precision.Applying these results to real-world contexts requires careful consideration.These results come from a highly controlled research context, where the researchers carefully controlled several variables.Patient benefits and payoffs were explicitly noted to both patients and doctors; however, given uncertainties in healthcare, these benefits might not be explicit in real-world conditions.The use of medical framing helps us simulate a situation that is closer to a real-world healthcare setup; however, this study used a subject pool that includes both medical and non-medical students and this might hamper these results.The existing literature indicates that medical students and physicians generally behave differently than others.An advantage of this experiment is the additional element of insurance.To the best of the authors' knowledge, only a handful of economic experiments in healthcare have explicitly tested the impact of insurance on doctors' behavior.Adding different insurance statuses to this experiment allows for a closer look at how insurance affects healthcare utilization from a provider's perspective.Nevertheless, in this experiment, only two extremes are considered, fully insured or completely uninsured, while in reality, insurance levels vary significantly.
This study adds to the expanding health economics literature that uses economic experiments as a method.Using this medically framed lab experiment, this study offers a useful understanding of less discussed aspects of healthcare service delivery dynamics.Specifically, this research explores the concept of supply-side moral hazard in healthcare, addressing a relevant topic in ongoing healthcare policy discussions.This study's findings aim to deepen understanding of the relationship between economic incentives, provider behavior, and patient outcomes in healthcare.

Appendix B. Post-Experiment Survey
Thanks a lot for being part of our decision-making experiment!Your insights are valuable to our research, so we're curious to pick your brain a bit.Don't worry, your responses will be kept completely anonymous and will not affect your experiment earnings!The decisions that you make will determine your earnings and those of your patients.Decisions: During the entire experiment you are in the role of a doctor.You will be randomly matched with a patient, who is experiencing a problem that you could solve.There will be 3 types of patients in this experiment, and the problems they each experience are different.For each patient type, there is an optimum number of tasks you should perform to solve the problem.You can choose how many tasks you want to perform: you can choose to do no tasks, less than the optimum number, the optimum number of tasks, or a maximum of 10 tasks.
Each patient benefits most from the optimal number of tasks and will lose benefits from tasks that are higher or lower than the optimal number.The patient bears a flat-fee cost regardless of how many tasks are performed.The patient's benefit is determined by the number of tasks you perform, relative to the patient's ideal number of tasks.
Earnings: You will earn the difference between the wage, which is given regardless of the number of tasks you perform, and a cost per task of 1 ECU.
You will receive a salary of 13 ECU for each round.You will choose a number of tasks to perform.That may be any number between (and including) 0 and 10.
If you have completed the number of tasks you wish to perform before the allotted time has run out, you may press the Next Round button to start a new round.
Earnings: More effort is costly in terms of effort costs.Each task completed will cost 1 ECU.Your earnings for the round are the difference between your salary and the effort cost (number of tasks times per-unit effort cost).Note: you do not have to complete the task successfully to earn money.
Interdependence: Half of the subjects in this experiment are patients.The number of tasks you decide to perform not only determine your own profit, but also the benefit of your patient.The patient gets a benefit from each task you perform correctly, but the benefits decline if the number of tasks is less than or greater than the patient's optimal number of tasks.You will be randomly rematched with a new patient in each round.
Patients receive a benefit that is relative to the ideal number of tasks for the patient.They benefit differentially, based on type, from how may services are provided.Patients bear a flat-fee cost.
Cumulative Earnings: The program will keep track of your total earnings for all rounds, and these will be shown as "cumulative earnings" on a results page.
The ideal number of tasks for the patient are detailed below: For each task that you perform, you will earn the indicated amount of profit (payment per task minus the cost per task), and the associated proceeds from successfully completed tasks will go to the patient.
For example, if you successfully solve one task, you will earn 13 ECU − 1 ECU for a profit of 12 ECU.The patient will benefit 1 ECU from your task.
If you successfully solve one task and fail one task, you will earn 13 ECU − 2 ECU for a profit of 11 ECU.The patient will benefit 1 ECU.
If you successfully solve 5 tasks, you will earn 13 ECU − 5 ECU = 8 ECU.The patient will benefit 10 ECU.
If you successfully solve 10 tasks, you will earn 13 ECU − 10 ECU = 3 ECU.The patient will benefit 5 ECU.
The ideal number of tasks for the patient are detailed below: For each task that you perform, you will earn the indicated amount of profit (payment per task minus the cost per task), and the associated proceeds from successfully completed tasks will go to the patient.
For example, if you successfully solve one task, you will earn 13 ECU − 1 ECU for a profit of 12 ECU.The patient will benefit 7 ECU from your task.
If you successfully solve 3 tasks, you will earn 13 ECU − 3 ECU = 10 ECU.The patient will benefit 10 ECU.
If you successfully solve 10 tasks, you will earn 13 ECU − 10 ECU = 3 ECU.The patient will benefit 1 ECU.
The ideal number of tasks for the patient are detailed below: For each task that you perform, you will earn the indicated amount of profit (payment per task minus the cost per task), and the associated proceeds from successfully completed tasks will go to the patient.
For example, if you successfully solve one task, you will earn 13 ECU − 1 ECU for a profit of 12 ECU.The patient will benefit 1 ECU from your task.
If you successfully solve 7 tasks, you will earn 13 ECU − 7 ECU = 6 ECU.The patient will benefit 10 ECU.
If you successfully solve 10 tasks, you will earn 13 ECU − 10 ECU = 3 ECU.The patient will benefit 7 ECU.

Top of Form
Matchings: Please remember that you will be randomly matched with different patient types in each round.Wage: The wage is 13 ECU.Effort: Each doctor sees the wage and then chooses an effort (number of tasks) that can be any amount between (and including) 0 and 10.Doctor Earnings: The doctor earns the difference between the wage and the cost of that doctor's effort (per-unit effort cost times effort choice).Patient Earnings: Each patient will receive a benefit based on his/her own need and the effort of the doctor.When you have completed the experiment please raise your hand and an assistant will tell you where to go to get paid.

Bottom of Form Notes
In the salary treatment, doctors' earnings were calculated as 13 ECUs minus the cost per task, based on the number of tasks performed.For example, if a doctor completed seven tasks in a round, their earnings for that round would be 13 ECUs − 7 ECUs = 6 ECUs.In FFS, round earnings were the amount earned per task minus the effort cost, based on the number of tasks performed.For example, if a doctor completed 10 tasks in a round, their earnings for that round would be 23 ECUs − 10 ECUs = 13 ECUs.Patients in this experiment were passive and did not make any decisions or perform any tasks.
3 Subjects were informed if they were serving an insured or uninsured patient.Insurance coverage did not affect the optimal number of services required, but it did influence patient benefits.4 This experiment had 14 sessions, but one was cancelled due to recurring errors.Data from this session were removed from the final analysis.Tampere University was chosen given the authors' affiliation to the university.6 This experiment unfortunately was frustratingly slow and involved rather long waiting times for both doctors and patients.It is possible that towards the end of the experiment, subjects simply wanted to maximize payoffs without paying much attention to accuracy.

Related Literature 2 . 1 .
Payment Systems, Provider Behavior, and Quality of Services Figure 1 below shows patient benefits and doctor's payoffs in each round depending on the number of tasks completed.

Figure 1 .
Figure 1.Profit parameters per round in FFS and salary.

Figure 1 .
Figure 1.Profit parameters per round in FFS and salary.

Figure 2 .
Figure 2. Average number of tasks per round.

Figure 2 .
Figure 2. Average number of tasks per round.

Figure 3 .
Figure 3. Average number of faulty tasks per round.

Figure 3 .
Figure 3. Average number of faulty tasks per round.

Table 1 .
Order of matching and ideal task count per round.

Table 3 .
Average welfare loss for each patient type.

Table 4 .
Impact of payment systems on service provision and patient benefits.Coefficients are exponentiated to be in the rate ratio scale.CIs and robust standard errors are in parentheses.Individual controls include controls for nationality, sex, and whether a subject is a medical/nursing student.

Table 5 .
Impact of FFS on service provision and patient benefits.Coefficients are exponentiated to be in the rate ratio scale.CIs and robust standard errors are in parentheses.Individual controls include controls for nationality, sex, and if or not a subject is a medical/nursing student.