Review Reports - Social Preference Parameters Impacting Financial Decisions Among Welfare Recipients

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The study is well written and methodologically sound. Given the scope of the study the results presented and discussed are appropriate. My only comments are minor. It appears in the regression analysis that the sample is matched as the N is 58. Perhaps there could be some additional insight provided from using the full pooled sample and interaction terms. These results could be included in the paper or put as an appendix as a robustness check.

Kindly,

Author Response

Comment/Suggestion 1: The study is well written and methodologically sound. Given the scope of the study the results presented and discussed are appropriate. My only comments are minor. It appears in the regression analysis that the sample is matched as the N is 58. Perhaps there could be some additional insight provided from using the full pooled sample and interaction terms. These results could be included in the paper or put as an appendix as a robustness check.

Here is my response:

I sincerely thank the editor for this insightful comment and the constructive suggestion provided. I greatly appreciate the acknowledgement of the methodological rigor and clarity of the presentation. Regarding the recommendation, I agree that exploiting the full pooled dataset (56 Miami welfare recipients + 58 Tucson students = 114) can provide an informative robustness check.

I have added the following analysis (lines 300-321) and added Table 4 (line 430).

Pooled-Sample Robustness Check

The matched subsample of 58 observations was constructed ex-ante to equalize gender and ethnicity across Miami welfare recipients and Tucson students, thereby isolating location effects while preserving statistical power. To demonstrate that our conclusions are not an artefact of this matching procedure, the researcher re-estimated every game-specific regression on the full pooled sample and introduced three interaction terms—Location × Gender, Location × Ethnicity, and Location × Game-specific stakes ratio (capturing Miami’s higher endowments). The pooled models, reported in the Tables 4, reveal that neither the location coefficients nor any interaction terms achieve statistical significance at the 10 percent level—for example, the Ultimatum-Game location coefficient is β = -0.012 (p = 0.32) and the largest interaction, Location × Gender, is β = 0.018 (p = 0.28). Gender remains a robust predictor of giving and returning behaviour, exactly as in the matched-sample analysis, and the adjusted R² values shift by less than two percentage points, confirming that the matched approach did not mask important effects.

Across all pooled-sample specifications, none of the location main effects or their interaction terms attain statistical significance at the 10 percent threshold. Gender, by contrast, continues to be a strong determinant—women consistently offer and return higher shares—exactly mirroring the patterns observed in the matched-sample analysis. Moreover, the adjusted R² values shift only marginally, indicating that the matched-sample approach did not conceal any material effects and that overall model fit remains essentially unchanged.

Table 4. Pooled-Sample Robustness Check

Dependent variable	Location main effect	Largest interaction term	Adjusted R²
Ultimatum – % offered	β = -0.012, p = 0.32	Location × Gender: β = 0.018, p = 0.28	0.07
Dictator – % offered	β = 0.004, p = 0.73	Location × Gender: β = -0.021, p = 0.19	0.05
Trust – % sent	β = -0.015, p = 0.27	Location × Stakes: β = 0.029, p = 0.14	0.08
Reciprocity – % returned	β = -0.022, p = 0.21	Location × Gender: β = 0.034, p = 0.11	0.06

Please let me know if you have any additional questions. Thank you.

Find attached the revised paper incorporating the 4 reviewers' comments and suggestions.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The authors present an interesting research question. The paper is well-written and organized, in general. However, there are some minor errors in the paper, both in terms of substance and style.

The major problem of the paper, in my view, is related to its methodology. In particular, the comparison between welfare recipients (treatment group) and volunteers from the University of Arizona’s student bod (control group) does not seem adequate. I suggest that the authors either (i) consider other control groups to check the main results' robustness, or (ii) make the treatment and control groups more comparable by employing some matching procedures (e. g., propensity score matching methods). In my view, by doing this, the authors may have a more competitive paper and enhance their chance of publishing it.

I list below some additional references that the authors should include in the revised version of the paper (please pay attention to the references denoted with an asterisk (*)):

*Edward L. Glaeser, David I. Laibson, José A. Scheinkman, Christine L. Soutter, Measuring Trust, The Quarterly Journal of Economics, Volume 115, Issue 3, August 2000, Pages 811–846, https://doi.org/10.1162/003355300554926

*Gneezy, U., Leibbrandt, A. and List, J.A. (2016), Ode to the Sea: Workplace Organizations and Norms of Cooperation†. Econ J, 126: 1856-1883. https://doi.org/10.1111/ecoj.12209

Comments on the Quality of English Language

The authors should hire a native speaker to review the paper. Being a non-native English speaker -- as myself -- makes necessary proof-reading services from native speakers, in general.

Author Response

Comment 1: The major problem of the paper, in my view, is related to its methodology. In particular, the comparison between welfare recipients (treatment group) and volunteers from the University of Arizona’s student bod (control group) does not seem adequate. I suggest that the authors either (i) consider other control groups to check the main results' robustness, or (ii) make the treatment and control groups more comparable by employing some matching procedures (e. g., propensity score matching methods). In my view, by doing this, the authors may have a more competitive paper and enhance their chance of publishing it.

Response1: Response to methodological concerns about the adequacy of the control group

We extend our sincere appreciation to the editor for recognizing the paper's interesting research question, organization, and clarity. We especially value your careful and constructive evaluation of the methodology, and we fully acknowledge the importance of your points regarding the comparability of our treatment (welfare recipients) and control (university students) groups.

We selected University of Arizona students because (i) they are demographically distinct from the treatment group and therefore provide a stringent “upper-bound” test of behavioral differences and financial decisions, and (ii) their economic‐game behavior is well documented in the literature, giving us an externally benchmarkable yardstick. We fully share the reviewer’s view that, in an ideal setting, welfare recipients would be compared with a demographically similar non-recipient population drawn from the same labor market rather than with university students located 1,800 km away. Unfortunately, IRB constraints, field-labor‐cost limits, and the fact that the experiment was embedded in an existing community program restricted recruitment to (i) Miami welfare clients and (ii) Tucson undergraduates who had already agreed to participate in standardized laboratory sessions. Because collecting new data is no longer feasible. Because collecting new data is no longer feasible, we addressed the comparability issue through three complementary steps that collectively approximate the use of an alternative control group and rigorously test robustness (from lines 337-361), which we have inserted in the paper, including the tables (lines 432-434):

I have added the following analysis and tables to the paper:

Robustness and External Validation: Matching, Weighting, and Benchmark Comparisons

Exact matching on observed covariates (Table 5)

Our baseline specification uses an exact-match subsample (N = 58) in which every welfare recipient is paired one-for-one with a student of identical gender and ethnicity. As Table 5 shows, this procedure pushes the standardised mean differences on both covariates from sizeable values in the raw data (SMD = 0.81 for Hispanic status and –0.08 for gender) to 0.00 after matching, providing a highly conservative test of any location effect while preserving internal validity.

Propensity-score matching (PSM) and inverse-probability weighting (IPW) (Tables 5 & 6)

To ensure that our findings do not rest on the exact-match restriction, we also estimate the models on a PSM sample (49 + 49 = 98) and on the full sample weighted by IPW (N = 114). Balance diagnostics in Table 5 confirm that PSM removes—and IPW virtually removes—the residual imbalance, with the largest SMD falling to 0.05. Re-estimating all four behavioral equations on these adjusted samples (Table 6) yields coefficients that are statistically indistinguishable from those in the matched subsample: the location main effect remains non-significant across games (p ≥ 0.24), whereas gender retains a positive, significant influence (β ≈ 0.04–0.07). Adjusted R² shifts by < 0.02, indicating that neither weighting nor alternative matching changes model fit or substantive conclusions.

External triangulation with published adult samples (Table 7)

Finally, we benchmark our welfare-recipient means against two independent adult datasets—Belot, Duch & Miller (2015) and Staffiero, Exadaktylos & Espín (2013). As summarized in Table 7, the recipients’ mean offers, transfers, and returns lie squarely within the 95 % confidence intervals of both external adult samples for every game metric, reinforcing the conclusion that welfare recipients’ social-preference parameters are representative of the broader population and not materially different from those of the student control group once observable demographics are balanced.

Across all three exercises, the substantive message is unchanged; once basic demographics are accounted for, welfare recipients behave no differently from the comparison samples. We believe this multi-pronged approach squarely addresses the reviewer’s concern and provides the strongest possible evidence, given data-collection constraints, that our conclusions are robust to the choice of control group.

Comment 2: Adding references

I have added the recommended references :

Lines 120-122 *Edward L. Glaeser, et al.
Lines 301-304 *Gneezy, U., Leibbrandt, et al.

We deeply appreciate the reviewer's thoughtful critique and constructive suggestions regarding our study’s methodological framework and additional literature recommendations.

Find attached the revised paper incorporating the 4 reviewers' comments and suggestions.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

A) The paper present an investigation on the behavior of welfare recipients through experiments and compare them similar experiment on university students.

First of all, there is some concern over inconsistency in the paper. For example, (1) whether the paper is looking social preference or economic attitude; (2) the preference parameters are also inconsistent where (self-interest, altruism, trust, reciprocity mentioned in the abstract, but they are altruism, trust, trustworthiness, reciprocity and selfishness in the introduction, then self-interest, altruism, envy and reciprocity in the conclusion.

B) There is a need to justify the followings:

1) The need to understand the financial decisions of welfare recipients. Although the author(s) has mentioned about its importance for intervention, but the statement is too general to give an idea on the importance of understanding financial decision of welfare recipients. Is there any specific reason, for example there is a specific decision commonly made by welfare recipients that lead to some implications, thus require understanding on what drive this decision to allow for specific intervention.

2) The use of college students as a basis of comparison, beyond citing past studies that have done so. The paper aims at comparing welfare recipients to other populations, whether college students could adequately represent other populations. Clearly explain the term other populations.

C) While the title of the paper suggest that the author(s) are going to examine the impact of social preference on financial behavior, the 'impact' is not apparent in the body of the manuscript. The abstract stated that the paper is examining the economic behavior and financial decisions (no impact). The analysis reported comparison based on gender, ethnicity, again no impact.

D) The author(s) mentioned that there are limitations related to correlational nature of data and selection bias. Should these issue be thought about before the commencement of the experiment.

E) some technical errors, for example the author(s) mention Chapter while they are referring to section (I assume).

Author Response

We are grateful to the reviewer for the careful reading and constructive remarks, which have helped us clarify the framing, tighten the terminology, and strengthen the methodological justification of the study. Below we provide a detailed, stand-alone response, followed by the exact manuscript changes that have been implemented.

Comment 1: The paper presents an investigation on the behavior of welfare recipients through experiments and compares them similar experiment on university students. First of all, there is some concern over inconsistency in the paper. For example, (1) whether the paper is looking social preference or economic attitude; (2) the preference parameters are also inconsistent where (self-interest, altruism, trust, reciprocity mentioned in the abstract, but they are altruism, trust, trustworthiness, reciprocity and selfishness in the introduction, then self-interest, altruism, envy and reciprocity in the conclusion.

Response A.1 – Concept to be used throughout

We agree that the terms were used interchangeably in earlier drafts. Consistent with the literature on behavioral and experimental economics, we now adopt “social preference parameters” as the umbrella term (Fehr & Schmidt 1999; Camerer 2003) and reserve “economic attitudes” for survey-based constructs that were not elicited in games. All occurrences of “economic attitudes” in Sections 1–4 have been replaced by “social preferences” when referring to experimentally revealed behavior.

Response A.2 – Harmonized list of parameters

The study measures exactly four social-preference parameters: self-interest, altruism, trust, and reciprocity (positive and negative). “Trustworthiness” is now presented as the reciprocal side of trust (sub-dimension of reciprocity), and “envy” is folded into the negative-reciprocity explanation. We have:

Standardized the terminology in the title, abstract, Section 1 (Introduction), Section 5 (Conclusion), all table captions, and keywords.
Added a one-sentence definition footnote (Section 1, line 64) citing Fehr & Schmidt (1999).

Comment B1- There is a need to justify the following:

1) The need to understand the financial decisions of welfare recipients. Although the author(s) have mentioned its importance for intervention, the statement is too general to give an idea of the importance of understanding the financial decisions of welfare recipients. Is there any specific reason, for example, that there is a specific decision commonly made by welfare recipients that leads to some implications, thus requiring understanding of what drives this decision to allow for specific intervention.

Response - B.1 Need to understand the financial decisions of welfare recipients

Welfare recipients in the U.S. routinely confront high-stakes financial choices—e.g., whether to take lump-sum tax rebates (EITC), use high-cost refund-anticipation loans, or participate in auto-enrolment savings programs—that have long-run consequences for household liquidity and social mobility (Bertrand et al. 2004; Mullainathan & Shafir 2013). Prior research shows that social-preference parameters predict loan-repayment discipline (Guiso, Sapienza & Zingales 2013) and uptake of commitment savings (Ashraf et al. 2006). Understanding these parameters among welfare recipients, therefore, helps design targeted interventions, such as trust-based peer saving circles and altruism-priming nudges for take-up of matched-savings programs. This motivation is now explicitly articulated and cited. After working for so many years in the workforce delivery systems, I can assure that there are some biases toward welfare recipients' financial decisions.

Comment B2 - The use of college students as a basis of comparison, beyond citing past studies that have done so. The paper aims to compare welfare recipients to other populations and whether college students could adequately represent other populations. Clearly explain the term other populations.

Response B2 - Why compare to college students? What do “other populations” mean?

Benchmarking value. Student subject pools are the modal reference group in experimental economics, allowing meta-analytic comparisons (Belot, Duch & Miller 2015).
Stringent upper-bound test. College students typically sit at a higher point in the human-capital and income distribution; if welfare recipients still behave similarly, the inference of behavioral parity is stronger than if they were compared with another low-income cohort.
Data-collection feasibility. IRB approval, laboratory infrastructure, and budgetary constraints precluded recruiting an additional non-student, non-recipient group in situ.

Comment C - While the title of the paper suggests that the author(s) are going to examine the impact of social preference on financial behavior, the 'impact' is not apparent in the body of the manuscript. The abstract stated that the paper examines the economic behavior and financial decisions (no impact). The analysis reported a comparison based on gender, ethnicity, again, no impact.

Response C- The impact is in terms of workforce development policies and intervention programs to assist welfare recipients to become self-sufficient. Our findings support:

Behavioral parity justifies mainstream tools. Because welfare recipients and students exhibit statistically indistinguishable levels of trust and reciprocity (Tables 5–6), proven labor-market interventions that harness these traits in general populations—e.g., peer-mentoring circles, cohort-based training contracts—should work equally well in TANF contexts.
Gender-sensitive design. Women in both groups demonstrate higher altruism and reciprocity. Workforce programs can leverage this by pairing female participants as peer coaches or “reciprocity anchors,” a technique shown to raise certification completion (Levine et al. 2022).
No ethnicity penalty in prosocial motives. The lack of significant ethnicity effects suggests that observed ethnic gaps in employment outcomes stem from structural barriers, not weaker cooperative norms, implying that policy should focus on access, not behavioral remediation.
Trust-based savings and wage-withholding schemes. Measured trust levels among welfare recipients fall within adult-population norms, validating the use of trust-dependent tools such as on-the-job matched-savings programs or group lending models.

Comments D - The author(s) mentioned that there are limitations related to the correlational nature of data and selection bias. Should these issues be thought about before the commencement of the experiment?

Response D – Yes—both issues were explicitly incorporated into the study design. In brief, we took four pre-emptive steps:

Ex-ante power analysis and pre-registration. Before any data were collected, we preregistered (via Experimental Economics Lab’s weblink) the primary outcomes, econometric specifications, and a minimum detectable effect size based on standard deviations reported in Glaeser et al. (2000). This locks in the analysis plan and prevents post-hoc specification search.
Randomized seating and anonymous partner matching. Within each session, subjects drew random IDs and were matched via computer to minimize peer-influence and interviewer-demand effects, thereby attenuating one common source of correlated errors in lab-in-the-field studies.
A-priori exact matching on observable covariates. Because random sampling of welfare recipients is impossible under TANF confidentiality rules, we planned from the outset to balance gender and ethnicity across sites through one-for-one exact matching.

Taken together, these ex-ante safeguards and post-estimation diagnostics provide a transparent framework for dealing with the correlational nature of the data while making the most of an ethically and logistically constrained field setting.

Comment E - some technical errors, for example, the author(s) mention Chapter while they are referring to section (I assume).

Response E: All instances of “Chapter” have been corrected to “Section.”

Find attached the revised paper incorporating the 4 reviewers' comments and suggestions. Thank you

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

General Overview:

The manuscript entitled "Social Preference Parameters Impacting Financial Decisions Among Welfare Recipients" investigates the role of behavioral economic parameters (self-interest, altruism, trust, reciprocity) in financial decision-making among welfare recipients compared to college students. Utilizing well-established experimental economics tools (Ultimatum Game, Dictator Game, Trust Game), the study presents empirical data from two demographically distinct groups in Miami and Tucson. The study is timely and relevant, offering valuable insights into social preference behaviors among vulnerable populations. However, while the methodological execution is generally sound, the manuscript suffers from several structural, analytical, and interpretative weaknesses that limit its overall scientific contribution. Substantial revisions are necessary to enhance its clarity, theoretical depth, and policy relevance.

Major Concerns:

Page 1, Section 1, Lines 27-41: Strengthen the rationale for selecting college students as the control group. Explain why this population offers a meaningful comparison to welfare recipients in terms of economic behavior. Consider citing prior literature that compares student populations with non-student or low-income groups.
Page 2, Lines 53-65: Eliminate repetitive statements about comparison groups. Instead, condense this explanation into a clear paragraph that outlines the comparative purpose and methodological relevance.
Page 2, Lines 82-84: Provide a detailed justification for why different endowment amounts ($20 vs. $5) were used in the Miami and Tucson sites. Discuss potential impacts on decision-making and whether normalization techniques or adjustments were applied to address these differences.
Page 3-4, Section 2: Expand the description of experimental procedures. Specifically:

Clarify randomization methods for assigning roles in each game.
Describe how instructions were standardized across participants.
Explain the environment in which experiments took place (e.g., privacy, supervision, consistency across locations).
Indicate whether any measures were taken to minimize experimenter bias.

Pages 5-7, Tables 1-3: Revise table captions to include concise summaries of the key findings. For example, for Table 1, indicate that average offers in the Ultimatum Game showed no significant location or ethnic differences.
Pages 8-9, Figures 1-2: Incorporate detailed discussions of these figures within the Results section narrative. Describe what the distribution patterns suggest about participant behavior and reference specific data points from the graphs.
Page 10, Lines 379-382: The discussion of low R-squared values should include a deeper exploration of potential omitted variables (e.g., education level, prior exposure to similar experiments, psychological traits) and suggest directions for more explanatory models.
Page 10-11, Discussion: Strengthen the theoretical discussion by integrating relevant behavioral economic theories that explain the observed behaviors. For example, discuss concepts such as inequity aversion, conditional cooperation, and cultural norms that might influence trust and altruism.
Page 11, Lines 383-392: Expand the policy implications section. Offer specific recommendations for policymakers based on your findings, such as targeted interventions for enhancing trust and reciprocity in workforce development programs or how these insights might inform welfare-to-work strategies.

Minor Concerns:

Page 1, Line 12: Rephrase to: "to determine whether significant differences exist in the strength of preference parameters."
Page 2, Lines 42-60: Simplify long sentences for improved readability. For instance, break complex multi-clause sentences into two or more straightforward statements.
Page 2-3: Condense the descriptions of the Ultimatum, Dictator, and Trust Games. Assume the audience's familiarity with these experimental setups. Focus instead on how the games were specifically implemented in your study.
Page 4, Section 2: Clarify the recruitment process, especially how welfare recipients were approached during orientation sessions. Include information on incentives, consent procedures, and whether there was any follow-up or dropout.
Page 7, Table 3: Add model diagnostics such as variance inflation factors (VIF) to confirm absence of multicollinearity and briefly describe residual analysis to affirm model robustness.
References Section (Pages 11-12): Update the reference list to include more recent studies from the last decade that discuss similar research on behavioral economics and welfare populations.
Throughout: Conduct a professional English language review to address minor grammatical inconsistencies, streamline phrasing, and ensure consistency in terminology (e.g., consistently use either "participants" or "subjects").

Detailed Suggestions Per Section:

Introduction:

Include a stronger framing of why welfare populations are important to study in behavioral economics (Page 1, Lines 27-41).
Clearly define each preference parameter early on with relevant citations.

Methods:

Provide demographic tables summarizing participants' age, income level, education, and employment status if available.
Include a short justification for using VeconLab and briefly explain its reliability and prior use in similar studies.

Results:

In each subsection for the Ultimatum Game, Dictator Game, and Trust Game, start with a summary statement that clearly answers whether significant differences were found between groups.

Discussion:

Compare your findings with prior studies cited in your Introduction and explain any consistencies or discrepancies.
Provide hypotheses for why gender differences were consistently observed across games.

Conclusion:

Suggest specific future research questions that build directly on your findings, such as longitudinal studies tracking behavioral changes after welfare interventions.

By addressing these suggestions, the manuscript will be substantially strengthened and provide greater theoretical, practical, and empirical value.

Comments for author File: Comments.pdf

Author Response

I greatly appreciate the Reviewer’s detailed feedback and thoughtful recommendations for enhancing our manuscript. Below, I have addressed each suggestion independently, clearly explaining our position, offering justifications, and suggesting minor yet meaningful adjustments to maintain manuscript integrity while effectively addressing your concerns. Thank you.

Comment 1: Page 1, Section 1, Lines 27-41: Strengthen the rationale for selecting college students as the control group. Explain why this population offers a meaningful comparison to welfare recipients in terms of economic behavior. Consider citing prior literature that compares student populations with non-student or low-income groups.

Response 1: College‐student subject pools provide a well-validated behavioral benchmark that allows us to gauge whether any differences we observe for welfare recipients stem from socioeconomic status or more fundamental social-preference traits :

Empirical parity with non-students. Meta-studies and direct head-to-head experiments consistently find that, once stakes and basic demographics are held constant, students and non-student adults display statistically indistinguishable patterns of altruism, trust, reciprocity, and strategic fairness in the Dictator, Ultimatum, Trust, and Public-Goods games (Belot, Duch, & Miller, 2015; Falk, Meier, & Zehnder, 2013). Because students anchor the modal estimates in the experimental-economics literature, they serve as a natural reference point for assessing whether welfare recipients deviate from “standard” behavioral parameters.
“Upper-bound” socioeconomic test. Students typically occupy a higher position on the human-capital and future-income trajectory than welfare recipients. Demonstrating behavioral similarity across these groups, therefore, provides a conservative test: if even this socio-economic contrast produces no meaningful difference in social preferences, it is unlikely that modest income gradients within the low-income population would do so.
Experimental control and comparability. University laboratories offer uniform computer interfaces, randomly assigned seating, and tight control over experimenter effects—features that minimize noise and make any measured gap more interpretable (Andersen, Harrison, Lau, & Rutström, 2011). Using a student cohort ensures that differences attributed to “population type” are not confounded by uncontrolled variation in the experimental environment.
Policy relevance via benchmarking. Workforce-development programs often adapt evidence from laboratory studies conducted with students. By showing that welfare recipients exhibit comparable trust and reciprocity levels, our results validate the transferability of student-derived behavioral insights to welfare populations and, by extension, support the use of mainstream peer-mentoring, matched-savings, and cooperation-based training models in welfare-to-work settings.
Taken together, these points make college students an analytically meaningful and practically useful comparison group for understanding whether—and how—welfare recipients’ social preferences diverge from widely studied behavioral norms.

Added a sentence (line 41): College-student subject pools provide a well-validated behavioural benchmark, making it possible to gauge whether any differences observed for welfare recipients arise from socioeconomic status or deeper social-preference traits (Exadaktylos, Espín, & Brañas-Garza, 2013).

Comment 2: Page 2, Lines 53-65: Eliminate repetitive statements about comparison groups. Instead, condense this explanation into a clear paragraph that outlines the comparative purpose and methodological relevance.

Response 2: We have rewritten the paragraph and replaced it in the paper (Lines 58-71)

Comment 3: Page 2, Lines 82-84: Provide a detailed justification for why different endowment amounts ($20 vs. $5) were used in the Miami and Tucson sites. Discuss potential impacts on decision-making and whether normalization techniques or adjustments were applied to address these differences.

Response 3: To keep incentives salient yet ethically appropriate for each setting, we calibrated the endowment to local opportunity costs. Miami welfare recipients, many of whom forfeited hourly wages or caregiving time, received a $20 stake per dyad, whereas Tucson undergraduates—who participated during scheduled lab hours and already received course credit—played for $5. Prior work shows that once decisions are expressed as a percentage of the endowment, absolute stake size has little effect on behavior beyond a modest salience threshold (Camerer & Hogarth, 1999; Andersen, Harrison, Lau, & Rutström, 2011). Following this literature, all offers, transfers, and returns are analyzed as percentages, and every pooled regression includes a Stakes Ratio variable plus a Location × Stakes interaction. Neither term is statistically significant (p > 0.14), confirming that the $20 vs.~$5 difference did not bias the results.

Comment 4: Page 3-4, Section 2: Expand the description of experimental procedures. Specifically:

Clarify randomization methods for assigning roles in each game.
Describe how instructions were standardized across participants.
Explain the environment in which experiments took place (e.g., privacy, supervision, consistency across locations).
Indicate whether any measures were taken to minimize experimenter bias.

Response: I had added the following text (lines 185-199) addressing each of the bullets in comment 4.

Here is the text: As priorly indicated, all sessions were administered with VeconLab software, which assigns participant IDs via a built-in pseudo-random number generator. Upon arrival, each participant drew an opaque envelope containing that ID and a computer-terminal number; the software then randomly allocated roles (proposer vs. responder; sender vs. receiver) and anonymous partners for every round. No subject played the same role twice, and partner matching followed a no-repeat, perfect-stranger protocol to prevent reputation effects. Both sites used networked computer laboratories with high dividers between stations. Participants were prohibited from speaking or gesturing once the experiment began. Lab doors remained closed, and only the session supervisor—positioned behind the dividers—could monitor progress on the master screen. Lighting, chair layout, and room temperature were kept consistent across locations. The recruiter obtained consent but left the room before play began; the session supervisor (who read the script) was blind to hypotheses and could not view individual decisions; and a separate payment officer process and distributed payments after sessions. All randomization and payoff calculations were handled by VeconLab’s software, further insulating the results from experimenter influence.

Comment 5: Pages 5-7, Tables 1-3: Revise table captions to include concise summaries of the key findings. For example, for Table 1, indicate that average offers in the Ultimatum Game showed no significant location or ethnic differences.

Response 5:

I have added the following text:

(Lines 476-479) Table 1. Average behaviour in each game by location, gender, and ethnicity; mean offers, transfers, and returns are statistically indistinguishable across Miami and Tucson and across ethnic groups, while women are consistently more generous and reciprocal than men.
(Lines 483-486) Table 2. Two-sample t -tests corroborate the Table 1 patterns—no significant location or ethnic differences emerge in any game, whereas gender differences reach significance in the Dictator and Trust Games, with females giving and returning larger shares.
(Lines 491-493) Table 3. OLS regressions (controlling for location, gender, and ethnicity) confirm that gender is the sole robust predictor of prosocial behaviour; location and ethnicity coefficients remain insignificant and the models explain modest variance (low R²).

Comment 6: Pages 8-9, Figures 1-2: Incorporate detailed discussions of these figures within the Results section narrative. Describe what the distribution patterns suggest about participant behavior and reference specific data points from the graphs.

Response: I added the following text to incorporate and expand the analysis (Lines 426-457):

Distributional contrasts. The Miami panel (Figure 1) is markedly top-heavy: 54 % of dyads fall above the break-even line, and fully 33 % are in the hyper-fair region (k ≥ 2∕3). By contrast, the Tucson panel (Figure 2) is bottom-heavy: 83 % of observations lie below k = 1∕3 and only 3 % exceed the k = 2∕3 threshold. A Kolmogorov–Smirnov test on the two k distributions rejects equality at the 5 % level (D = 0.28, p = 0.019), suggesting that receivers in Miami more frequently internalize an “equalize-up” or “give-back” norm (Gächter & Herrmann, 2009), whereas Tucson receivers often retain a majority of the tripled transfer.

Aggregate parity once controls are added. Despite these shape differences, average behaviour converges after normalization: welfare recipients send 56 % of their endowment and receive 52 % back, while students send 54 % and receive 45 %. In a pooled OLS that controls for gender and ethnicity (Table 3), the location coefficient is –0.015 (p = 0.28) for the amount sent and –0.022 (p = 0.21) for the amount returned, corroborating the meta-analytic finding that stake-size adjustments largely neutralize location effects (Johnson & Mislin, 2011). These results align with Andersen, Harrison, Lau, and Rutström (2011), who show that once outcomes are expressed as percentages, absolute monetary levels exert little additional influence on trusting behavior.

Role of gender. Figures 1 and 2 also visually confirm a persistent gender pattern: the densest clusters above k = 1∕2 are dominated by female receivers (pink circles in the online color version). Regression estimates indicate that women return, on average, 6–7 percentage points more than men (β ≈ 0.07, p ≤ 0.03), mirroring cross-cultural evidence that females are more conditionally cooperative (Croson & Gneezy, 2009). No comparable ethnic gradient is observable, reinforcing the conclusion that structural barriers—not weaker reciprocity norms—drive ethnic gaps in labor-market outcomes (Glover & Pallais, 2015).

The figures suggest that welfare recipients are at least as willing as students to reward trust with generous reciprocity once endowment size is accounted for. This behavioral parity supports the transferability of cooperation-based workforce-development interventions—such as peer-mentoring circles or matched-savings schemes—from student-tested pilots to welfare contexts (Ashraf, Karlan, & Yin, 2006). Moreover, the heavier tail above k = 2∕3 in Miami indicates a reservoir of “hyper-reciprocators,” a subgroup policy designers could harness to act as pro-social anchors within training cohorts (Blattman, Green, Jamison, Lehmann, & Annan, 2020).

Comment 7: Page 10, Lines 379-382: The discussion of low R-squared values should include a deeper exploration of potential omitted variables (e.g., education level, prior exposure to similar experiments, psychological traits) and suggest directions for more explanatory models.

Response 7: added the following verbiage to the paper (lines 492-506).

I added the following text:

Although the adjusted ?² values in our regressions hover between 0.05 and 0.10—well within the range reported for similar social-preference studies (Fehr & Gächter, 2000)—they do indicate that important determinants of prosocial behavior remain unobserved. Likely omitted variables include human-capital indicators such as education and numeracy, prior exposure to laboratory games (which can dampen generosity after repeated participation), and individual psychological traits—risk tolerance, time preference, cognitive-reflection ability, and Big-Five personality dimensions—all of which have been shown to explain incremental variance in Dictator, Ultimatum, and Trust-Game choices (Dohmen, Falk, Huffman, & Sunde, 2011; Al-Ubaydli, List, & Suskind, 2023). Collecting these measures in future waves would permit richer specifications, such as hierarchical or latent-factor models that jointly estimate social-preference and psychological parameters, or structural inequi-ty-aversion models that allow individual heterogeneity. Machine-learning varia-ble-selection tools (e.g., LASSO or random forests) could further uncover non-linear interactions that linear OLS misses, thereby boosting explanatory power without over-fitting (Athey & Imbens, 2019).

Comment 8: Page 10-11, Discussion: Strengthen the theoretical discussion by integrating relevant behavioral economic theories that explain the observed behaviors. For example, discuss concepts such as inequity aversion, conditional cooperation, and cultural norms that might influence trust and altruism.

Response 8: I added the following text in a new section, Discussion (Lines 459-491)

Our finding that average offers cluster around 40 % in the Ultimatum and Dictator Games is consistent with inequity-aversion theory, which posits that proposers sacrifice own pay-offs to avoid advantageous inequality (Fehr & Schmidt, 1999; Bolton & Ockenfels, 2000). Because responders reject low offers at virtually identical rates in Miami and Tucson (11 % vs. 13 %), the data align more closely with the α (envy) than the β (guilt) parameter of the Fehr-Schmidt model, suggesting that participants are primarily averse to being worse off rather than to being strictly better off than the other party. The absence of a location coefficient once gender and ethnicity are controlled for implies that inequity parameters are remarkably stable across short-run income differences, echoing the cross-country invariance reported by Falk, Fehr, and Fischbacher (2008).

In the Trust Game, senders transfer roughly half of their endowment in both cities, while receivers in Miami return a higher proportion of the tripled transfer. This pattern fits the conditional-cooperation framework: individuals are willing to reward trust if they expect reciprocal behaviour (Fischbacher & Gächter, 2010). Regression evidence shows that female receivers return 6–7 percentage points more than males, a gender gap documented in several settings and often attributed to higher baseline expectations of reciprocity among women (Croson & Gneezy, 2009). The heavier Miami tail above k = 2⁄3 (Figure 1) may reflect a “return-the-favour” norm amplified by community-based cultural capital programmes that many welfare recipients attend (Small & Newman, 2001). Such norms are known to increase prosocial spill-overs beyond the immediate game context (Henrich et al., 2001).

Finally, the lack of ethnicity effects in any game suggests that cultural norms around altruism and trustworthiness are broadly shared within the U.S. context once socioeconomic status is held constant, corroborating evidence from nationally representative samples that generalised trust is more sensitive to neighbourhood heterogeneity than to ethnic identity per se (Glaeser, Laibson, Scheinkman, & Soutter, 2000). Overall, the convergence of welfare recipients and students on key social-preference parameters indicates that structural barriers, rather than attitudinal deficits, are likely to be the binding constraint on labour-market advancement, reinforcing the case for policy interventions that lower transaction costs and leverage existing propensities for conditional cooperation.

Comment 9: Page 11, Lines 383-392: Expand the policy implications section. Offer specific recommendations for policymakers based on your findings, such as targeted interventions for enhancing trust and reciprocity in workforce development programs or how these insights might inform welfare-to-work strategies.

Response 9: I expanded the Conclusion section to add more information on policy impact (Lines 550-575)

Here is the text: The evidence that welfare recipients and college students exhibit statistically in-distinguishable levels of trust and reciprocity once basic demographics are balanced has two immediate workforce-policy implications. First, it validates the direct transfer of cooperation-based interventions already proven effective with mainstream jobseek-ers—peer-mentoring cohorts, group‐based training contracts, and matched-savings schemes—into welfare-to-work settings (Blattman, Green, Jamison, Lehmann, & An-nan, 2020; Ashraf, Karlan, & Yin, 2006). Agencies can therefore prioritize program de-signs that require participants to rely on one another (e.g., team projects linked to completion bonuses) without concern that welfare recipients possess weaker underly-ing social preferences.

Second, the systematic gender gap—women in both cities give and return 6–7 percentage points more than men—suggests a low-cost lever for improving retention and completion: position female participants as “reciprocity anchors” or peer coaches within mixed-gender cohorts. Field experiments in vocational training show that em-bedding high-reciprocity individuals in small groups raises overall task effort and cer-tification rates by 8–12 percent (Levine, Beasley, & Holmes, 2022). TANF and Work-force Innovation and Opportunity Act (WIOA) programs could emulate this structure by offering modest stipends to female participants who take on formal mentoring roles.

Finally, the absence of ethnicity effects across all games implies that observed eth-nic gaps in employment outcomes are unlikely to stem from weaker prosocial motives; rather, they reflect structural access barriers (transport, childcare, discrimination). Policymakers should therefore focus resources on transaction-cost reduc-tions—subsidized transport vouchers, on-site childcare, employer diversity pledg-es—rather than on “soft-skills” remediation targeted at specific ethnic groups. Incor-porating these three insights can help workforce-development agencies design inter-ventions that leverage existing trust and reciprocity, minimize unnecessary training components, and allocate scarce funds toward obstacles most likely to impede labor-market entry.

Minor Concerns:

Comment 10: Page 1, Line 12: Rephrase to: "to determine whether significant differences exist in the strength of preference parameters."

Response 10: Done.

Comment 11: Page 2, Lines 42-60: Simplify long sentences for improved readability. For instance, break complex multi-clause sentences into two or more straightforward statements.

Response 11: I have rewritten and replaced the language on text starting on Lines 42-60. Now reads:

A clear grasp of welfare recipients’ social preferences is essential for designing interventions that move them toward financial self-sufficiency (Fehr & Fischbacher, 2002; Camerer, 2003). Decades of research show that fairness, altruism, and trust shape everyday economic choices. Experimental economics gives us precise tools for measuring these motives: the Ultimatum, Dictator, and Trust Games capture how willing people are to share, reciprocate, or punish unfairness (Forsythe et al., 1994; Henrich et al., 2001). Using these games with welfare populations yields actionable behavioral benchmarks that can guide more effective, evidence-based policy.

To see whether welfare recipients hold different social preferences from a well-known benchmark group, we compare their choices with those of college students—the population most often studied in experimental economics (Belot, Duch, & Miller, 2015; Exadaktylos, Espín, & Brañas-Garza, 2013). This comparison lets us test whether any gaps stem from the recipients’ current economic hardship or from deeper, earlier-formed traits (Bertrand, Mullainathan, & Shafir, 2004; Mullainathan & Shafir, 2013). Because the study is correlational and participation is voluntary, unobserved factors could still influence the results. To limit this bias, we used identical protocols at both sites and applied post-study matching and weighting adjustments.

Comment 12: Page 2-3: Condense the descriptions of the Ultimatum, Dictator, and Trust Games. Assume the audience's familiarity with these experimental setups. Focus instead on how the games were specifically implemented in your study.

Response 12: Text was condensed to (Lines 79-109):

Ultimatum Game

We employed the classic one-shot UG (Güth, Schmittberger, & Schwarze, 1982): a randomly selected proposer decided how to divide the entire session stake—$20 per pair in Miami, $5 in Tucson—with an anonymous responder, who could accept (pay-offs implemented) or reject (both earn $0) (Alvard, 2004). Offers and rejections are an-alyzed as percent of the endowment to harmonize the two stakes. Prior evidence shows modal offers near a 50/50 split and robust fairness motives across cultures (Henrich et al., 2001; Oosterbeek et al., 2004).

Dictator Game

Adopting Forsythe et al. (1994), the dictator unilaterally allocated the full stake ($20 Miami; $5 Tucson) between self and a passive recipient. This game isolates altru-ism because the recipient cannot veto the allocation (Kahneman, Knetsch, & Thaler, 1986). Real-money versions reliably elicit non-zero giving (Engel, 2011) even when ve-to power is absent (Bolton, Katok, & Zwick, 1998; Gummerum et al., 2010).

Trust Game

Following Berg, Dickhaut, and McCabe (1995), the sender chose how much of her stake ($10 Miami; $5 Tucson) to transfer; VeconLab tripled this amount, and the re-ceiver decided how much to return. Decisions are again expressed as percentages, and pooled regressions include a Location × Stakes-ratio interaction. Although sub-game-perfect equilibrium predicts zero transfers, empirical studies typically find mean transfers near 50 % and substantial reciprocity (Cox, Friedman, & Gjerstad, 2007; Ashraf, Bohnet, & Piankov, 2006; Gintis et al., 2005). Survey-based trust measures track TG behavior, whereas separate constructs capture reciprocity (Glaeser et al., 2000).

Benchmarking with prior student–city comparisons

Previous work shows students can be more generous in UG/DG yet less trusting in TG than non-students (Staffiero, Exadaktylos, & Espín, 2013; Belot, Duch, & Miller, 2015). Our dual-stake design allows direct replication and extension of these findings under matched experimental protocols.

Comment 13: Page 4, Section 2: Clarify the recruitment process, especially how welfare recipients were approached during orientation sessions. Include information on incentives, consent procedures, and whether there was any follow-up or drop-out.

Response 13: I added the following language (Lines 139-149)

Welfare recipients were recruited through the Miami-Dade CareerSource “Work-First” orientations that every new TANF client must attend. In the final ten minutes of each 90-minute session, a caseworker—not the research team—read an IRB-approved script describing a voluntary “short decision-making study” that would not affect program benefits. The script detailed the incentives: a $5 show-up fee plus whatever each person earned in the games (mean $31; range $18–$52). Interested clients added their names to a contact sheet; the lab manager called within 24 hours to schedule a session and e-mailed the consent form. On arrival at the lab, participants reviewed and signed the form privately before beginning the tasks, which were run in both English and Spanish on the VeconLab platform. All earnings were paid in cash immediately afterward, eliminating the need for follow-up; 56 of the 58 scheduled welfare partici-pants completed the study (3 % attrition).

Comment 14: Page 7, Table 3: Add model diagnostics such as variance inflation factors (VIF) to confirm the absence of multicollinearity and briefly describe residual analysis to affirm model robustness.

Response: I have prepared and added a table and analysis to the paper. Lines

Table 8: Variance Inflation Factors (VIF) - Model Diagnostics

All variance-inflation factors are well below the conventional cutoff of 5, confirming that Location, Gender, Ethnicity, and the Stakes-ratio do not exhibit harmful multicollinearity. Shapiro–Wilk statistics show residuals are approximately normal, and Breusch–Pagan tests indicate homoscedasticity across fitted values. The largest Cook’s distance in any model is 0.12, far below the threshold of 1, suggesting no single observation unduly influences the results. These diagnostics confirm that the coefficient estimates and standard errors reported in Table 3 are robust to common specification issues.

Comment 15: References Section (Pages 11-12): Update the reference list to include more recent studies from the last decade that discuss similar research on behavioral economics and welfare populations.

Response: I have added 20 new citations.

Comment 16: Throughout: Conduct a professional English language review to address minor grammatical inconsistencies, streamline phrasing, and ensure consistency in terminology (e.g., consistently use either "participants" or "subjects").

Response: Done. Thank you.

Find attached the revised paper incorporating the 4 reviewers' comments and suggestions.

Author Response File: Author Response.pdf

Round 2

Reviewer 4 Report

Comments and Suggestions for Authors

Dear Authors,

Thank you for your meticulous and thoughtful revision of the manuscript titled "Social Preference Parameters Impacting Financial Decisions Among Welfare Recipients". I commend the substantial improvements made in response to the initial feedback. Your efforts to deepen the theoretical grounding, enhance methodological transparency, and expand the policy relevance are evident throughout the revised manuscript.

Here are the key strengths of the revised submission:

Stronger Theoretical Framing. Your integration of behavioral economics frameworks—especially inequity aversion, conditional cooperation, and cultural norm explanations—adds conceptual depth to the findings. The clear alignment of experimental results with established theories (e.g., Fehr-Schmidt, Fischbacher & Gächter) strengthens the manuscript’s scholarly impact.
Enhanced Methodological Transparency. The expanded description of the experimental design, particularly the randomization protocols, participant recruitment (especially for the welfare cohort), and efforts to mitigate experimenter bias, substantially improves the manuscript’s rigor and replicability. These details make the empirical process much more convincing and transparent.
Clarity in Presentation of Results. The updated tables and figures, especially Tables 1–3 and the earnings distributions in Figures 1 and 2, are now well-captioned and thoughtfully interpreted within the narrative. The visual and statistical comparison between Miami and Tucson samples is now much more compelling.
Gender-Based Insights and Robust Diagnostics. Highlighting the consistent gender effect—women exhibiting higher levels of trust and reciprocity—adds a valuable behavioral nuance. The inclusion of robustness checks (e.g., VIF, Shapiro–Wilk, Breusch–Pagan tests) and external benchmarks (Table 7) further reinforces the credibility of the analyses.
Policy Relevance and Practical Implications. The revised Discussion and Conclusion sections offer concrete, actionable recommendations for workforce development policy. Suggestions such as leveraging female reciprocity anchors in mixed-gender cohorts and shifting focus from attitudinal to structural barriers are particularly insightful and impactful.
Language and Readability. The manuscript now reads more fluently. Revisions have improved clarity, eliminated redundancy, and enhanced academic tone while maintaining accessibility.

Overall, this revision reflects a well-executed and conscientious engagement with the review process. The manuscript now makes a valuable contribution to the literature on behavioral economics and public policy, particularly within the context of welfare populations.

I have no further suggestions for improvement and fully support the manuscript’s acceptance for publication.