Credit Constraints and Beginning Farmers’ Production in the U.S.: Evidence from Propensity Score Matching with Principal Component Clustering

: Beginning Farmers and Ranchers (BFRs) in the U.S. represent a diverse and important subset of family farms. Understanding their ﬁnancial needs is of paramount importance for supporting the future of American farmers. The focus of this work is on evaluating to what extent credit constraints a ﬀ ect the BFRs’ production. We use propensity score matching to show that credit constraints are associated with signiﬁcantly lower production levels. To address the highly heterogeneous nature of BFRs, we complement the matching procedure with Principal Components Analysis and clustering to extract more information from the available Agricultural Resource Management Survey data. The results show losses in the total and per acre production values attributed to being credit-constrained, ranging between 14–77% and 24–72%, depending on the matching method, which has important policy implications.


Introduction
Beginning Farmers and Ranchers (BFRs) represent the future of the family farm model that is firmly rooted in American culture and history. Beginning farmers (i.e., those with less than 10 years of experience) are a heterogeneous group in terms of size and value of production. Recent data show that two-thirds of them produce less than $10,000 in output, while about two percent have annual production value of at least $1,000,000 [1]. The literature has found that the output of highly leveraged farms is significantly higher than that of farms without credit [2]. Compared to established farms, BFRs need more credit to fund their expansion, but they still borrow less because agricultural lending remains collateral-driven and BFRs have limited collateral and thus are more credit-constrained [1,3].
Previous work has established a negative relation between credit constraints and productivity, especially in developing countries [4][5][6][7]. In the US, credit constraints are found to account for 3% loss in the total value of production for farm and 13% loss in and non-farm sole proprietorships [8]. While these differences suggest that a well-developed agricultural credit market meets most of the needs of established farmers and rural residents, it is not clear whether it meets the needs of the beginning farmers who will progressively play a more important role in the US agriculture [9,10]. They are an increasingly important cohort that comprises 27% of all operators on any farm according to the latest 2017 Census of Agriculture data. This article focuses on evaluating to what extent more severe credit constraints faced by less experienced and more vulnerable beginning farming operations affect their production. The group of BFR comprises heterogeneous operations and, to deal with this challenge, we offer a novel empirical approach that uses the propensity score matching enhanced by principal component clustering.
While previous work establishes that BFRs have limited access to credit [3,11], the potential production losses due to BFRs inability to get credit remain largely unknown and this work offers insights into the magnitudes of these losses. The literature has established that beginning farmers face numerous additional challenges [1,12,13] but credit constraints are likely to substantially limit BFRs ability to produce at an optimal level and to contribute to the low survival rates of about 30 percent for new farmers [14].
Access to credit is also important in the context of the pending transition of farm assets from retiring operators to the new generations of farmers, as well as in the context of the agricultural downturn and the economic consequences of the COVID-19 pandemic [15][16][17][18]. For example, BFRs are at a greater risk of financial stress and the agricultural downturn has negatively affected their profitability, liquidity, and solvency, although it had no effect on their repayment capacity [19]. As the share of BFRs in the US agriculture increases, evidence of the impact of limited access to credit on agricultural output would inform future policy interventions as well as agricultural lenders in designing policies to improve credit access.
BFRs are a heterogeneous group that includes niche farming operations, hobby farmers, widows, young farmers and others. This requires methodological innovation to properly evaluate the impact of credit constraints and account for the excessive heterogeneity and difficulty of separating access to credit from the value of production. Our proposed approach is to enhance the propensity score matching (PSM) by using the principal components of the explanatory variables to cluster the data and include cluster dummies in the PSM analysis. This helps alleviate the issues of endogeneity, multi-collinearity, and dimensionality inherent in the small-size dataset. To briefly preview the results, we find that, unlike established farmers, BFRs experience significant losses of production that can be attributed to being credit-constrained.
The rest of the paper is structured as follows. Section 2 outlines the theoretical model that justifies the empirical approach described in Section 3. The data are described in Section 4. Results are described in Section 5. Section 6 concludes.

Analytical Framework
Following Briggeman [8] and Japelli [20], a two-period farm household consumption and production model with borrowing and repayment can be defined as: where c 0,1, is consumption in the beginning and the end of a borrowing period, z h is a set of exogenous household characteristics. Output is determined by a standard production function where x is a vector of variable inputs purchased at the beginning of the period at prices w used to produce output vector y by the end of the period, z y is a vector of fixed exogenous inputs (land, machinery, etc.), and the input prices and the value of the marginal product are normalized by output price P. Household consumption and production functions are subject to first and second period budget constraints: where B is the intertemporal borrowing and A is income from off-farm revenue-generating activities.
The beginning and end of period correspond to the beginning and the end of the harvest cycle (seasonal borrowing) but can also refer to the beginning and end of an investment cycle. Both types of borrowing can result in credit constraints. Equation (3) states that money borrowed during period 0 is used for both consumption and the purchase of variable inputs, while Equation (4) indicates that funds borrowed in period 0 must be repaid with interest in period 1 through production and other sources of income in A. W 0 can be interpreted as part of the income from the previous harvest cycle set aside for input purchase in which case W 0+1 should be subtracted from LHS of Equation (4) in a multi-period version of the model, which does not change the comparative statics from the FOC for the theoretical model. The borrowing constraint that can be lender or self-imposed and may or may not be binding is where − B is a function of observed household and production characteristics. The first order necessary condition of the Lagrangian for Equations (1)-(5) with respect to output is: where µ and λ are the Lagrangian multipliers of the borrowing constraint and the first period budget constraint, the former being positive when the borrowing constraint is binding and 0 when it is not. This implies that ceteris paribus for a concave production function, output of credit-constrained farmers is lower, which also negatively affects their consumption.

Empirical Approach
We use a propensity score matching (PSM) to identify the impact of credit constraint on agricultural output. By creating matched control and treatment (credit-constrained) groups, the PSM algorithm disambiguates the difference in covariates between them from the difference in outcomes of interest [21]. The method uses the estimated likelihood (i.e., propensity) of receiving treatment as a function of explanatory variables to match the treated and control observations. The mean difference in outcomes (value of production) between the matching pairs is then attributed to the treatment (being credit-constrained).
We follow Briggeman [8] and write the difference between unconstrained and constrained performance Y1,0 as E(Y0) − E(Y1) and test the hypothesis: Since we can only observe Y1 when the dummy for credit-constrained D = 1 and Y0 when D = 0, E(Y1 − Y0) is referred to as the average treatment effect (ATE). We are also concerned with which is the Average Treatment Effect of the Treated (ATET). The second expression on the right-hand side can be thought of as the average of the untreated if they had been treated. The difference is ATE = ATET + bias as described in Angrist and Pischke [22]. Defining the probability of being credit-constrained, P(D = 1), as P(B* > 0) where B* = γZ + ui, one can write: where Y = X + ε.
To estimate the probability of being credit-constrained, we use the Probit model: where Φ is the cumulative standard normal. Solving for probability, we get: E[P(D = 1|X)] which is substituted into Equation (8) for Y1 and Y0 making the the PSM method is based on two assumptions that are met in our case. The first assumption is that we do not have unobservable explanatory variables in X that affect treatment (credit-constrained). We believe that we satisfy this assumption because we follow the literature in controlling for the characteristics that previous work has identified as determinants of being credit-constrained and use high-quality data [8,23].
The second requirement is the existence of common support, which we also find. BFRs are a heterogeneous group, and we have a relatively small sample of credit-constrained operators, and thus, the two groups may not have large overlapping common support. Lack of, or limited, common support can be addressed by discarding individual observations with propensity score values outside the range of the other group. Alternatively, it can be addressed by identifying the multidimensional space that allows interpolation rather than extrapolation [24,25]. Our approach is close to the second method in that we estimate a Probit regression with added dummy variables derived from clustering based on the principal components. This helps us to improve the properties of the resulting matched samples, specifically the covariates' balance in the matched groups, where balance is defined as the similarity of the empirical distributions of the full set of covariates in the matched treated and control groups [25].
Additional justification of our approach comes from another consequence of population heterogeneity. First, since BFRs include niche farmers, hobby farmers, widows, young, socially disadvantaged farmers, etc., it is important to identify these heterogenous BFR groups that would otherwise confound the treatment effect. Further, when populations are heterogeneous, the simple matching procedure of univariate standard propensity score matching can only partially account for covariates (i.e., farm proprietorship demographic and economic characteristics) being correlated with the treatment and each other [26]. The principal component clustering identification serves as a robust method of identifying the likelihood of being treated and supplementing the likelihood of treatment with a control for multivariate heterogeneity via Mahalanobis distance matching.
In addition, when it is difficult to separate the effect of the treatment (access to credit) from other differences between groups, well specified regression model with many interactions of the explanatory variables may be effective for estimating treatment effects [27]. With our approach, principal component analysis (PCA) extracts relevant additional information orthogonal to the original explanatory variables, which further helps to improve the functional form as well as balancing of the two groups [26]. Specifically, if we define the threshold for being credit-constrained as B* = g(B) where B is a vector of demeaned credit constraint determinants, B, if it is positive definite, can be rewritten as a product EΛE, where E is a matrix of normalized eigenvectors and Λ is a diagonal vector of eigenvalues [26]. The eigenvectors in E are the orthogonal principal components of the data in B and their ranking by the eigenvalues in Λ in ascending order defines the order of their significance in explaining B*. With this approach, the PCA clusters "chose themselves" based on Japelli [20] and Briggeman et al. [8] empirically tested specifications.
We use the principal components of the variables that explain credit constraints to cluster the data by running a hierarchical clustering algorithm that randomly creates n centroids (cluster means), assigns each observation to the nearest cluster, recalculates the centroid, and then repeats until convergence or reaching the maximum number of iterations [25] (Johnson and Wichern, 2007). Using this technique, we select three clusters based on the Cubic Clustering Criteria (CCC) and add two of them as orthogonal population groupings that may not be fully captured by the other explanatory variables in the Probit model. The clusters are real groups of BFRs whose different productivity and propensity for treatment would otherwise confound the treatment effect. While this does not eliminate the endogeneity of the production volume, it makes it less severe.

Data
The data consist of individual sole-proprietor observations from the Agricultural Resource and Management Survey (ARMS), Phase III, conducted by the US Department of Agriculture in 2005. This is the only survey year we are aware of that recorded self-identified credit-constrained BFRs (Q 26, also see [8]). Using these data may be helpful in understanding financial constraints before the credit market turmoil or the cyclical agricultural sector downturn. While more recent data are desirable, the question of self-identified credit constraints was not asked in later years. Furthermore, Ifft et al. [2] found that in the past 20 years, the use of credit has remained stable for younger farmers suggesting that our evaluation remains relevant. Similarly, in a historical overview of financial conditions of the agricultural sector, Key et al. ( [15], page 14), analyze BFR data starting in 2007 and show that the share of BFRs in extreme financial stress (1.5%) in 2017 was larger than the share of all U.S. farms (1.0%). This suggests that, as riskier borrowers, BFRs would have been more credit-constrained during the 2007-2017 period as well. The authors also find that farms with a lower level of production (sales less than $100,000) to which most BFRs belongs had a lower level of financial stress throughout the 2000-2018 period than larger farms, possibly because these farmers used less credit and produced less. This implies a continuous link between credit and production by BFRs for the past 15-20 years and supports the continued relevance of our work.
We describe the variables used in the Probit and discriminant models in Table 1. The variables are standard demographic and economic producer characteristics used in the literature to explain farmers' behavior. Table 2 shows the (population weighted) means of farm characteristics and performance variables. The survey weights correspond to a total population of approximately 307,741 BFRs in 2005 while the 2002 and 2007 Census of Agriculture report 583,000 and 593,000 BFRs, respectively. The differences are likely due to the subset farmers that both answered the Cost and Returns Report (CRR) and had less than 10 years of operation. The ARMS data show that there were close to 800 BFR sole proprietors and 5.82% of them were credit-constrained, i.e., answered "yes" to at least one of the credit denial questions. This is close to double the rate of 3.75% (of total 5184 sole proprietors) of all financially constrained operations reported by Briggeman et al. [8]. The number of farmers and ranchers comes from all farmers that answered the Cost and Returns Report (CRR) in the ARMS Phase III survey. The sample contains only the farms that are registered as sole proprietorships. The means of the Total Value of Production and operated acres by BFRs were $33,125 and 155 acres, respectively, which puts them at less than half the average of the established farmers with more experience.
We also observe that approximately 51% of the BFRs in the sample are from the Southern region, while 21% and 19% are from the West and Midwest respectively and the remaining 9% are in the Northeast. The enterprise breakdown consists of 13%, 10.5%, 2%, and 25.5% of farmers being in GrainOil, Dairy, Hog, Poultry, and Beef, respectively, with the remaining ≈49% in other enterprises. "Other" enterprises include vegetables, fruit, tobacco, cotton/cottonseed, nurseries, Christmas trees, grasses, sheep, equine, aquaculture, bees, rabbit, and other niche enterprises. We also observe that only 85.75 acres are owned on average, implying that BFRs lease about as many acres as they own. Mean household income is $70,745, while farm net worth is $340,207. The average BFR has been operating 4.71 years while their age is 47 years old, whereas the average age all farmers in the sample is 55 years. Collectively, principal operator and spouse spent an average of 74 weeks off-farm and have one dependent, possibly reflecting an earlier stage in the life-cycle. The average number of outstanding loans is 0.73 and only 7% of farm households have neither operator nor spouse with at least some college degree. The working capital to assets ratio is 0.55 overall but only 0.13 for the credit-constrained operators. The average full-time employee equivalent of all operator hours (FTE) is 0.81 and it is larger for the credit-constrained group. FTE only includes labor for the principal operator, spouse, and other operators. Non-operator seasonal labor and contract labor are not included.
After adding three clusters (two dummy variables, one serving as a base) as explanatory variables based on principal components analysis and cubic clustering criterion of the predictors, we have a subset of 551 observations, of which only 35 are credit-constrained, which is still sufficient for unbiased results for the two-sampled t-test of differences in outcomes.

Clustering and Principal Components as Determinants of Credit Constraints
The Principal Component Analysis identifies 12 out of 22 principal components that explain approximately 80% of the variation in the X and no few components dominate the total variance of the system, so reducing the number of predictors does not add much in the way of model prediction. A separate analysis on a reduced set of 12 principal components produces essentially unchanged results. Both the Probit results and the matching results were essentially the same with the only notable difference in the fit statistics, i.e., percent concordant/discordant and error count estimates. Therefore, we proceed with all 22 principal components. We utilize the Cubic Clustering Criteria (CCC) for optimal cluster selection given 1-10 nonhierarchical groups clustered on all 22 principal components. The benefit of this technique is that it utilizes local maxima in the changes of CCC. One drawback is that the nonhierarchical methods require the number of clusters to be predetermined. To get around this, we chose a number that allows a diversity of clusters while still allowing for hypothesis testing within the constrained group. Figure A1 in the Appendix A shows cumulative and percent change in CCC plotted against the number of clusters. The CCC is monotonically increasing before and after six clusters, but the percent change in CCC at both three and six clusters making them possible correct specifications. After running the analysis on both sets of clusters, we selected three as the best fit for the data and added two of them (Clusters 2 and 3) as dummies for the Probit regression (reported in Table A1 in the Appendix A), also saving on scarce degrees of freedom.
The three selected clusters are presented in Table A2 in the Appendix A and illustrate the raison d'etre of the clustering to tease out any group contributing to the heterogeneity of treated or untreated BFRs. For example, one group of BFRs (Cluster 2) appear to be so different from the others that their cluster is completely untreated. This tiny subset of farmers (n = 19) is significantly older, wealthier, located in the Western U.S., and may represent thousands of similar beginning farmers nationwide. These BFRs are qualitatively different from the unconstrained BFRs in Clusters 1 and 2 but helpful in the matching procedure. Table 3 summarizes the average treatment effects (ATE) for the main variables of interest: total value of production (VProdTot) and the per-acre value of production for unmatched, nearest neighbor, and Mahalanobis matching procedures. We first observe that matched VProdTot means are lower than the unmatched means. The unmatched VProdTotT is not significantly different between groups, likely due to outliers (results using the ARMS data cannot report the minimum and maximum values due to National Agricultural Statistical Service data use restrictions). The VProdTot is only significantly different in the Mahalanobis (MN) but not in the Nearest Neighbor (NN) matching criteria. The significant difference corresponds to loss in production of $35,430, while the unmatched and NN differences are $4700 and $10,747.

Matching Results
The per acre ATE effect is statistically significantly different across all specifications and is the highest in the Mahalanobis matching ($146). The matched difference shows more than double the loss of production for the credit-constrained BFRs when logarithms of VProdTot and VProdPA are used. The differences in the magnitude of the treatment effect are attributable to the differences in samples arising from matching methods, namely the difference in criteria between Nearest Neighbor's exclusive use of propensity score and Mahalanobis' addition of Mahalanobis distance. Table 4 summarizes the average treatment effects of the treated (ATET). All the ATT are now negative and also shown in percentages change to emphasize the losses that credit-constrained BFRs face. Depending on the method, the loss in the total production value varies between 14% and 77%. The per acre production loss is also significant ranging from 24% to 72%. Based on the results, we tend to favor the Mahalanobis match. The treatment effect of the treated in BFRs is much larger than in the reported effect in [8], for all operators. Our preferred method bifurcates the BFR sample into high and low producers while accounting for individual producer characteristics. We have to note that the VProdPA differences are all highly significant suggesting that, despite the difficultly with small sample precision, there still remains a substantial loss in farm revenue for constrained BFRs, whether one accounts for endogenous determinants of credit constraints or not (unmatched groups have statistically significant difference).  The large difference in the production volume may be explained by the different reasons for entry. Operators who recently started farming (BFRs) in a niche market, may be exploring farming on inherited land, or hobby farmers might be less prone to invest in production and cause more variance within the performance of unconstrained BFRs. Meanwhile, the constrained BFRs who had been farming for just a few years may still be in the growth stage and have a higher demand for credit and, thus, are denied the full amount of loans that they seek. The different natures of BFRs' reasons for farming, personal objectives, and growth stage likely exacerbate the between group differences in the value of production.

Conclusions
Beginning farmers and ranchers (BFRs) in the US represent a diverse and important subset of family farms. Understanding their financial needs is of paramount importance for supporting the future of American farmers. The focus of this work is on evaluating to what extent credit constraints affect BFRs production output. We use propensity score matching to show that credit constraints are associated with significantly lower production levels. Since BFRs include a variety of groups such as hobby farming, widows, socially disadvantaged farmers, etc., we address their excessive heterogeneity in a novel way by enhancing the Propensity Score Matching with Principal Components Analysis and clustering to extract more information from the available data. This approach could be applied to a variety of cases with heterogeneous treatment groups to improve the balancing properties of the matching procedure.
We find that the losses in the total and per acre production values attributed to being credit-constrained are statistically significant and range between 14-77% and 24-72%, depending on the matching method. The implied credit shortage may force BFRs into leasing significantly more land than they own. Such decisions may improve short-run liquidity, but in the long-run, rob BFRs of that "opportunity" rent and lower their return on investment.
The findings have several important policy implications. Expanding land purchase and offering production loans to new principal operators may improve the long-term performance and viability of these enterprises. Likewise, helping BFRs to acquire program acres may save farmers lost revenue share on their leases, provide leasing income, and serve as an appreciating asset to draw upon for future credit. Other targeted interventions, such as the Farm Service Agency microloans for small and beginning operators producing for local markets, can also improve their liquidity.

Conflicts of Interest:
The authors declare no conflict of interest.