Estimation of Determinants of Farmland Abandonment and Its Data Problems

: Abandoned farmland is particularly problematic in developed countries where agriculture has a comparative disadvantage in terms of effective use of land resources invested over time. While many studies have estimated the causes of these problems, few have discussed in detail the impact of data characteristics and accuracy on the estimation results. In this study, issues related to the underlying data and the estimation of the determinants of farmland abandonment were examined. Most previous studies on farmland abandonment in Japan have used census data as the basis of their analyses. However, census data are recorded subjectively by farmers. To address this, surveys of abandoned farmland are being conducted by a third party, and the results are compiled into a geographic information system (GIS) database. Two types of datasets (subjective census data and objective GIS data) were examined for their estimation performance. Although the two sets of data are correlated, there are considerable differences between them. Subjective variables are compatible with subjective data, and objective variables are compatible with objective data (meaning that parameters are easily identiﬁed). Original data for analysis, such as policy variables, are compatible with objective data. In policy evaluation research, attention should be paid to objective data collection.


Introduction
Many developed countries, such as Japan and those in the European Union (EU), have been experiencing farmland abandonment problems for decades. The determinants of abandonment are a key concern for government policymakers and researchers [1][2][3][4][5][6][7]. This is the case because generally, farmland has been cultivated over a long period of time and once it is degraded, it not only incurs huge costs to restore but also has a significant impact on the surrounding ecosystem [8]. In addition, farmland is an important and essential food production factor and, therefore, the prevention of abandonment is an important component of food security [3,[9][10][11].
There have been many analyses of the causes of farmland abandonment but relatively little discussion of what kind of data should be used in these analyses. In Japan, where the farmland abandonment problem is becoming increasingly serious, many researchers have analyzed the determinants of farmland abandonment [12][13][14][15][16][17][18][19]. They have used data from the census surveys conducted by the Ministry of Agriculture, Forestry and Fisheries (MAFF) every five years as the basis [20]. However, these data are based on subjective abandoned areas as declared by farmers. Therefore, it may contain measurement errors based on subjective judgments and farmers' social desirability biases [21].
Fortunately, in Japan, apart from these subjective data, field surveys are conducted under the initiative of agricultural committees (called "Nougyo iinkai") established in each municipality, with some of these surveys compiled into databases. This type of survey, called "farmland patrol," is regulated by the amended Agricultural Land Act and is conducted at least once a year on all farmland [22,23]. However, this information is not As referenced, the prevention of farmland abandonment is closely related to payments for activities to enhance multifunctionality (PAEMF) and direct payments to farmers in hilly and mountainous areas (DPFHM) among the Japanese direct payment schemes. The DPFHM scheme, in particular, is designed to compensate for the difference in production costs between the flatlands and the group of hilly and mountainous areas, and it strongly supports agriculture in the disadvantaged areas in Japan. On the other hand, it imposes a rule that all subsidies received during the support period must be returned when abandoned farmland occurs, which effectively functions as a policy "conditionality" [7,34,35]. Thus, the accuracy of data used for farmland abandonment analysis is important to accurately estimate the causal effects of these policies.
A review paper by Huang et al. [36] points out that research on farmland abandonment has increased in the last decade. They argue that among the factors of farmland abandonment, socioeconomic factors are the most important and that research on them will continue to be important. Until now, how abandoned farmland affects the natural environment and ecosystems has been important in research fields such as landscape, and similar research has been conducted in Japan [37]. Terres et al. [29] examined the drivers of farmland abandonment in the EU using large-scale and extensively aggregated data. However, they acknowledged that farmland abandonment is a local phenomenon and that local data are needed to estimate its risk. Corbelle-Rico et al. [38] also conducted a causal analysis of long-term farm abandonment in Spain using categorical data and multinomial logistic models and found that the phenomenon is a complex local phenomenon that needs to be analyzed using at least municipal-level data. Shi et al. [39] used GIS-processed data and multiple regression analysis to analyze the factors of agricultural land abandonment in mountainous areas in China, but the analysis does not consider a sufficient number of socioeconomic factors and related policies. The community data used in this study were much more localized and allowed the current analysis to be tailored to local conditions (in this study, each community has only about 20 farmers on average).
In Japan, most studies on the farmland abandonment problem use data from the agricultural censuses. One study was conducted in 2011 by Takayama and Nakatani [16] using a rich dataset, which covered six prefectures in Japan. However, they used census data prior to 2000, with a binary dependent variable indicating whether a community had abandoned farmland. Now that almost every community has a sizeable amount of abandoned farmland, analyses need to focus on the percentage of abandoned farmland area rather than a simple binary variable.
Early studies in 1998 of farmland abandonment in Japan used data on individual farmers from agricultural census data, for example, Senda [12] and Senda [13]. Although individual-level data, now unavailable due to restrictions of the Personal Information Protection Law, are attractive, they have the same problem as that in the study by Takayama and Nakatani [16], as they are used as binary variables of farmland abandonment. In addition, these studies did not consider variables related to regional agricultural structure, so no implications for regional policies were obtained.
In 2018, Su [19] analyzed the determinants of farmland abandonment using GIS data, and in 2014 Matsui [17] developed an estimation model for the area of abandoned farmland using machine learning (generalized linear models, random forest, and multivariate adaptive regression splines). However, the data used in these studies were also census-based. The current study used data based on objective measurements by a third party rather than census data based on subjective statements about farming and farmland. The novelty of this study lies in the fact that it used objective data and GIS data-processing (by ArcGIS 10.8) to accurately estimate the abandoned farmland rate model, then the data's estimation result was compared with the result from conventional data, ultimately deriving a new, more accurate estimation result.
Specifically, the survey-based data measured by a third person were defined as objective data (GBD), while the conventional data reported by farmers were treated as subjective data (CBD). After examining the characteristics of both types of data, their performance was compared using a statistical analysis of the determinants of farmland abandonment. The second objective was to refine the model of these mechanisms by adding new variables, such as altitude and slope maps, bird and animal damage data, and direct subsidy payment data in terms of monetary amounts, which were obtained from Yabu City. These data have not been used in previous similar analyses. The statistical models used to analyze the determinants of abandoned farmland were the ordinary least squares (OLS) method and a modified version of it, the Tobit model.
The structure of this article is as follows. Section 2 provides an overview of the author's field of study, the estimation model and the data, the most important element of this study. Section 3 presents the estimation results and a comparison of the models. The implications of the results and the limitations of this study are discussed in Section 4, and the conclusion summarizes the findings of this study.

Study Area
In this study, data from Yabu City, Hyogo Prefecture, Japan were used. Yabu City is located in the center of the Tajima area in Hyogo Prefecture, with 84% of its total area covered by forest [40]. Because of its proximity to the Sea of Japan, the region is Land 2021, 10, 596 4 of 17 characterized by high humidity and rainfall in summer and significant snowfall in winter. According to the MAFF's classification of agricultural regions, 8 of the 14 administrative districts in Yabu City are categorized as hilly areas and 6 as mountainous areas, making it a typical hilly and mountainous area. In addition, 12 of the 14 districts are categorized as paddy-based districts by the MAFF, with the majority of farmers engaged in rice farming. Figure 1 shows the location and topography of Yabu City. The western part of the city is mountainous, with some areas rising to over 1000 m in altitude. From here, several rivers flow eastward along the valleys, joining the Maruyama River, a major river in the eastern part of the city. Along the valleys around the tributaries and the Maruyama River, there are flat areas with good conditions for paddies, where large-scale rice farming takes place. The light blue area in the figure reflects a low altitude mainly dominated by rice paddies that is densely populated.

Study Area
In this study, data from Yabu City, Hyogo Prefecture, Japan were used. Yabu City is located in the center of the Tajima area in Hyogo Prefecture, with 84% of its total area covered by forest [40]. Because of its proximity to the Sea of Japan, the region is characterized by high humidity and rainfall in summer and significant snowfall in winter. According to the MAFF's classification of agricultural regions, 8 of the 14 administrative districts in Yabu City are categorized as hilly areas and 6 as mountainous areas, making it a typical hilly and mountainous area. In addition, 12 of the 14 districts are categorized as paddybased districts by the MAFF, with the majority of farmers engaged in rice farming. Figure 1 shows the location and topography of Yabu City. The western part of the city is mountainous, with some areas rising to over 1000 m in altitude. From here, several rivers flow eastward along the valleys, joining the Maruyama River, a major river in the eastern part of the city. Along the valleys around the tributaries and the Maruyama River, there are flat areas with good conditions for paddies, where large-scale rice farming takes place. The light blue area in the figure reflects a low altitude mainly dominated by rice paddies that is densely populated. In 2015, the average area of farmland per farmer in Yabu City was 0.41 ha, which was quite small compared with the national average at the time of 1.42 ha (Hyogo Prefecture: 0.62 ha) [20]. The proportion of subsistence farmers was also high at 59%, compared with 38% nationwide (Hyogo: 42%) [20]. The abandoned farmland rate (AFR) reached 23.5% in terms of area, which was much higher than the national average of 11.4% [20]. In terms of AFR by farmer type, the rate was higher for subsistence farmers (32.2%) and landholding non-farmers (62.3%) than for commercial farmers (8.2%) [20]. The national AFR for landholding non-farmers was 31.1%; it is likely that more farmland abandonment occurred because farmers retired from farming due to small-scale and unfavorable conditions for agricultural production [20]. The average age of farmers was 61.9 years, which was higher In 2015, the average area of farmland per farmer in Yabu City was 0.41 ha, which was quite small compared with the national average at the time of 1.42 ha (Hyogo Prefecture: 0.62 ha) [20]. The proportion of subsistence farmers was also high at 59%, compared with 38% nationwide (Hyogo: 42%) [20]. The abandoned farmland rate (AFR) reached 23.5% in terms of area, which was much higher than the national average of 11.4% [20]. In terms of AFR by farmer type, the rate was higher for subsistence farmers (32.2%) and landholding non-farmers (62.3%) than for commercial farmers (8.2%) [20]. The national AFR for landholding non-farmers was 31.1%; it is likely that more farmland abandonment occurred because farmers retired from farming due to small-scale and unfavorable conditions for agricultural production [20]. The average age of farmers was 61.9 years, which was higher than the national average of 60 years. In particular, 58.2% of farmers in Yabu City were over 70 years old. Since the national average for this age group was 46.9%, the farming community of Yabu City was considerably aged. The percentage of farm managers over 70 years old had reached 49.2%, which was likely to accelerate future farmland abandonment [20].

Data
The characteristics of the data and their handling were the most important elements of the study. In addition, to analyze the causes of farmland abandonment precisely, GIS data in 2010, which have not been used in many similar studies targeting Japan, were used. In this subsection, the details of the GIS data and then the characteristics and problems of the two types of AFRs, which are used as the dependent variables in the regression analysis, are introduced. Finally, the socioeconomic variables that affect farmland abandonment are explained, along with their hypotheses. The descriptive statistics of the data are summarized in Table 1. Due to the academic agreement with Yabu City, the detailed farmland GIS data used could be obtained. It should be noted that since the GIS data were available only for 2010, the 2010 data were used for the other variables.

GIS Data
First, the abandoned farmland rate, the altitude and the slope of each agricultural land plot were calculated from the information of 18,600 agricultural land plots in the farmland ledger of Yabu City using GIS software (ESRI's ArcGIS). Each polygon indicates an individual agricultural land plot contains information regarding the results of farmland patrols conducted by the agricultural committee. All of this information has been compiled into a GIS database by Yabu City. Therefore, it is possible to calculate the AFR-G (abandoned farmland ratio from GIS) for each community (Table 1) by taking the ratio of abandoned farmland area to all farmland area derived from each polygon. The use of these databases is one of the innovations and objectives of this research, as there has been no research using such databases in Japan. Finally, a GIS database of the 138 agricultural communities was compiled and used for the analysis. Figure 2 shows an example of the GIS map using an aerial photograph. The plots surrounded by blue lines are paddy fields; the plots surrounded by green lines are farmland excluding paddy fields, and the red areas are abandoned farmlands. The yellow areas refer to plots that are not currently being cultivated but are expected to be cultivated in the near future. Therefore, these yellow areas were excluded from the list of abandoned farmlands. communities was compiled and used for the analysis. Figure 2 shows an example of the GIS map using an aerial photograph. The plots surrounded by blue lines are paddy fields; the plots surrounded by green lines are farmland excluding paddy fields, and the red areas are abandoned farmlands. The yellow areas refer to plots that are not currently being cultivated but are expected to be cultivated in the near future. Therefore, these yellow areas were excluded from the list of abandoned farmlands. In addition, GIS data on the average altitude and slope of the plots, which represent the geographical conditions of each community, was created. For the average altitude, the 10 m mesh data of the "Fundamental Geospatial Information" published by the Geospatial Information Authority of Japan was used to create an altitude model in GIS as shown in Figure 1 [41]. The first step was to create a triangulated irregular network (TIN) surface and to interpolate the shape [42] (TIN was created from digital elevation model data using the ArcGIS 3D Analyst function). Shape interpolation is the process of adding z-information, which is height information, to the x-and y-coordinates of polygons. All altitudes calculated for all plots in each community were aggregated and then used to calculate the average altitude (AVAL) ( Table 1). A higher average altitude means that the farmland is likely to belong to a hilly and mountainous area, and its conditions for agriculture are more severe. Therefore, the sign of the coefficient of this variable was expected to be positive.
Other data indicating field conditions include the average slope (gradient) of the farmland (the unit is "degree"). Similar to the average altitude, it was calculated from the altitude model using GIS. It is calculated as the ratio of the vertical change to horizontal change between any two plots in the altitude model. In the case of this gradient variable (GRAD) ( Table 1), the sign of the coefficient is expected to be positive because the burden on agricultural work is greater and it is more difficult for farmers to increase plot size. In addition, GIS data on the average altitude and slope of the plots, which represent the geographical conditions of each community, was created. For the average altitude, the 10 m mesh data of the "Fundamental Geospatial Information" published by the Geospatial Information Authority of Japan was used to create an altitude model in GIS as shown in Figure 1 [41]. The first step was to create a triangulated irregular network (TIN) surface and to interpolate the shape [42] (TIN was created from digital elevation model data using the ArcGIS 3D Analyst function). Shape interpolation is the process of adding zinformation, which is height information, to the x-and y-coordinates of polygons. All altitudes calculated for all plots in each community were aggregated and then used to calculate the average altitude (AVAL) ( Table 1). A higher average altitude means that the farmland is likely to belong to a hilly and mountainous area, and its conditions for agriculture are more severe. Therefore, the sign of the coefficient of this variable was expected to be positive.
Other data indicating field conditions include the average slope (gradient) of the farmland (the unit is "degree"). Similar to the average altitude, it was calculated from the altitude model using GIS. It is calculated as the ratio of the vertical change to horizontal change between any two plots in the altitude model. In the case of this gradient variable (GRAD) ( Table 1), the sign of the coefficient is expected to be positive because the burden on agricultural work is greater and it is more difficult for farmers to increase plot size.

Farmland Abandonment Rate: Objective vs. Subjective Data
This section describes two types of AFRs that served as the dependent variables for the regression. The AFR-G was calculated from the results of the farmland patrol survey and GIS, referred to as objective data. The conventional AFR-C (abandoned farmland ratio from CBD) ( Table 1) is derived from census data, referred to as subjective data. The effect of the differences in the data on the estimation performance is this study's most important and interesting finding because it suggests a reconsideration of the data used. Figure 3 presents a comparison between two types of AFRs, where AFR-G is shown on the x-axis and AFR-C on the y-axis. Although these two AFRs have a positive correlation, there is a considerable difference between them. There are two major reasons for this. First, while the AFR-C is based on people, the AFR-G is based on territory. In the case of AFR-C, the abandoned farmland is not necessarily located in the community where the farmer who reported it lives. Therefore, the AFR-C differs from the AFR-G, which is calculated from observed abandoned farmland that is definitely located in the community.
This section describes two types of AFRs that served as the dependent variables for the regression. The AFR-G was calculated from the results of the farmland patrol survey and GIS, referred to as objective data. The conventional AFR-C (abandoned farmland ratio from CBD) ( Table 1) is derived from census data, referred to as subjective data. The effect of the differences in the data on the estimation performance is this study's most important and interesting finding because it suggests a reconsideration of the data used. Figure 3 presents a comparison between two types of AFRs, where AFR-G is shown on the x-axis and AFR-C on the y-axis. Although these two AFRs have a positive correlation, there is a considerable difference between them. There are two major reasons for this. First, while the AFR-C is based on people, the AFR-G is based on territory. In the case of AFR-C, the abandoned farmland is not necessarily located in the community where the farmer who reported it lives. Therefore, the AFR-C differs from the AFR-G, which is calculated from observed abandoned farmland that is definitely located in the community. The second reason is related to Japan's Agricultural Land Act. Under this act, unless there is a special reason, those who own agricultural land are required to "ensure the proper and efficient use of agricultural land" [22]. In addition, Japanese farmers have a norm that they must not disturb other farmers by neglecting their own farmland [43]. This is because, for example, paddy fields with many weeds attract pests and wildlife. Hence, farmers have an incentive to manage their farmland, even if they have no intention of cultivating it for production. However, the AFR-G may be estimated as smaller than the AFR-C because it depends on whether the farmland is objectively uncultivable, regardless of the intention of the owning farmer. AFR-G is a more appropriate indicator from the perspective of food security because some of the abandoned areas in AFR-G include farmland that is managed properly. Figure 4 illustrates the two abandoned farmland rates (AFR-G and AFR-C). Generally, in northeastern Yabu, where there are many flat areas, the AFRs are low. However, some communities show significant differences between the two AFRs. The second reason is related to Japan's Agricultural Land Act. Under this act, unless there is a special reason, those who own agricultural land are required to "ensure the proper and efficient use of agricultural land" [22]. In addition, Japanese farmers have a norm that they must not disturb other farmers by neglecting their own farmland [43]. This is because, for example, paddy fields with many weeds attract pests and wildlife. Hence, farmers have an incentive to manage their farmland, even if they have no intention of cultivating it for production. However, the AFR-G may be estimated as smaller than the AFR-C because it depends on whether the farmland is objectively uncultivable, regardless of the intention of the owning farmer. AFR-G is a more appropriate indicator from the perspective of food security because some of the abandoned areas in AFR-G include farmland that is managed properly. Figure 4 illustrates the two abandoned farmland rates (AFR-G and AFR-C). Generally, in northeastern Yabu, where there are many flat areas, the AFRs are low. However, some communities show significant differences between the two AFRs.
Until now, AFR-C data have been used in the analysis of farmland abandonment problems in Japan. The AFR-G, introduced in this study for the first time, was measured by a farmland patrol. Therefore, the AFR-G can be considered to be more objective than the declaration-based AFR-C. However, which dataset should be used depends on the purpose of the analysis. In recent years, many community-based agricultural policies have been implemented in Japan, such as the Japanese direct payment system [33]. In this context, it is important to pay attention to appropriate community-based data collection when conducting sophisticated policy evaluations. This study assumed that objective data are more desirable for such an evaluation and examined measurement error and estimation bias when using conventional data.  Until now, AFR-C data have been used in the analysis of farmland abandonment problems in Japan. The AFR-G, introduced in this study for the first time, was measured by a farmland patrol. Therefore, the AFR-G can be considered to be more objective than the declaration-based AFR-C. However, which dataset should be used depends on the purpose of the analysis. In recent years, many community-based agricultural policies have been implemented in Japan, such as the Japanese direct payment system [33]. In this context, it is important to pay attention to appropriate community-based data collection when conducting sophisticated policy evaluations. This study assumed that objective data are more desirable for such an evaluation and examined measurement error and estimation bias when using conventional data.

Community-Based Data
This subsection describes the explanatory variables in the estimation model and the hypotheses regarding their impact on the AFR. Since the AFR is aggregated on a community basis, the explanatory variables must also be based on community data. Many studies have used agricultural community-based data from Japanese censuses [44]. For more details on such data, please refer to previous studies [7,[45][46][47]. Among these data, seven determinants shown in Table 1 that were likely to influence AFR were used for the analyses.
The first is the AREA, which shows the area of farmland in the community. As shown in Table 1, the average area is 16.2 ha, which is close to the average of 15 ha in the Kinki region where Hyogo Prefecture is located. If the average number of farmers is homogeneous among communities, it is expected that the larger the area of farmland in a community, the greater the burden of collective action for farmland management, and thus the higher the abandonment rate. Next, PPUD, which represents the percentage of paddy field area to all farmland, is expected to have a negative impact on the AFR, because paddy fields are generally found in every community in Japan and, owing to high tariffs, laborsaving technology for rice farming has advanced considerably. ROUT is the percentage of rented-out farmland in the community. This variable is higher when the farmland market works well, and if so, the efficient use of farmland is expected to reduce farmland abandonment.
NFHH and NFHP are variables related to the owners and users of farmland in the community, respectively. NFHH indicates the percentage of landholding non-farmers who own farmland but do not cultivate it and rent out all their farmland. These are mainly people who have retired from farming and, if the transaction cost of renting out their farmland exceeds the land rent, they may abandon the farmland without renting it out. Hence, an increase in NFHH is expected to increase the AFR. NFHP indicates the number

Community-Based Data
This subsection describes the explanatory variables in the estimation model and the hypotheses regarding their impact on the AFR. Since the AFR is aggregated on a community basis, the explanatory variables must also be based on community data. Many studies have used agricultural community-based data from Japanese censuses [44]. For more details on such data, please refer to previous studies [7,[45][46][47]. Among these data, seven determinants shown in Table 1 that were likely to influence AFR were used for the analyses.
The first is the AREA, which shows the area of farmland in the community. As shown in Table 1, the average area is 16.2 ha, which is close to the average of 15 ha in the Kinki region where Hyogo Prefecture is located. If the average number of farmers is homogeneous among communities, it is expected that the larger the area of farmland in a community, the greater the burden of collective action for farmland management, and thus the higher the abandonment rate. Next, PPUD, which represents the percentage of paddy field area to all farmland, is expected to have a negative impact on the AFR, because paddy fields are generally found in every community in Japan and, owing to high tariffs, labor-saving technology for rice farming has advanced considerably. ROUT is the percentage of rented-out farmland in the community. This variable is higher when the farmland market works well, and if so, the efficient use of farmland is expected to reduce farmland abandonment.
NFHH and NFHP are variables related to the owners and users of farmland in the community, respectively. NFHH indicates the percentage of landholding non-farmers who own farmland but do not cultivate it and rent out all their farmland. These are mainly people who have retired from farming and, if the transaction cost of renting out their farmland exceeds the land rent, they may abandon the farmland without renting it out. Hence, an increase in NFHH is expected to increase the AFR. NFHP indicates the number of agricultural management entities (non-farm household producers) other than family farms in the community. In recent years, they have increased their presence in Japan, and they are expected to become the main lessees of farmland, playing a role in reducing farmland abandonment [48].
TRAC indicates the number of tractors owned per farmer. The use of tractors enables labor-saving management and is expected to reduce farmland abandonment in areas where there is a shortage of agricultural labor force [17]. In addition, GINI is an indicator of the diversity of farm sizes in the community. The greater the diversity, the more diverse the farmers' motivation toward agriculture, and consequently, collective action for the management of farmland and irrigation may not work well. However, a stratified farm size may lead farmers to share roles more efficiently. Hence, the expected sign condition is not clear.

Confidential Data
In addition to the data presented in the previous subsections, new data (animal damage and the amounts of agricultural subsidies) that had not been used in previous studies was used. As mentioned above, thanks to the agreement with Yabu City, GIS data of farmland at an individual level and confidential data could be accessed. Currently, published CBD aggregated for each community do not include some information when the number of households in a community is less than two to protect personal information [49]. In other words, the CBD used in previous studies excluded confidential data (ECD). However, thanks to our agreement, all data from the 138 communities (ACD) were available (there are 144 agricultural communities in Yabu City, but the data of six communities were not available due to a lack of farmers or farmland). However, ECD only contain data on 105 communities. The estimation bias due to ECD was also examined (see Appendix A).
In addition to CBD (ACD), two original data types (YOD) from Yabu City were used. The first were data on animal damage caused by wild birds and animals, mainly in hilly and mountainous areas, with a total of 15.8 billion yen in damage nationwide. This is one of the most important reasons why farmers abandon cultivation [50]. Specifically, the data obtained were the number of deer and wild boars captured in each community (DEER and BOAR). Deer and wild boar account for the most and second most damage, respectively; the data were therefore used as proxy variables for animal damage. Naturally, these variables are expected to be positively correlated with damage and, consequently, with the AFR.
Other new variables were the amounts of agricultural subsidies of the DPFHM and PAEMF. Several studies have analyzed the impact of these policies on AFR, but they have used only binary variables indicating whether they are implemented in a community. The DPFHM scheme is designed to fill the productivity gap between flatlands and geographically disadvantaged areas and to prevent abandonment of cultivation [51], while the PAEMF scheme is designed to maintain agricultural multifunctionality that involves external economies [52]. Both are community-based schemes, and two variables, DPFHM and PAEMF, were created by dividing the amount received in the community by the number of farmers (beneficiaries). Naturally, the expected effects of these variables are negative [7,47].

Estimation Method and Performance Check
A regression model was used to derive the determinants of farmland abandonment. The AFR percentage was used as the dependent variable. The unit of observation was the agricultural community within the administrative boundary [45]. However, in some communities, there is no farmland abandonment; therefore, the data inevitably contain many zeros. Hence, applying the usual OLS with these data will lead to a bias in the parameter estimates. To overcome this problem, in addition to the OLS estimation, the Tobit model was used for the censored data [53].
The Tobit model, which corresponds to censored data, is an approach that uses latent variables like used in binary choice models such as the logit model. Placing the observed continuous variable, AFR, as y, and its latent variable as y * , the Tobit model can be expressed as Equation (1).
where x i is an explanatory variable that affects AFR, β is the parameter to be estimated, and u is the error term, which follows a normal distribution with mean 0 and variance σ 2 . The other variables shown in Table 1 and in the previous subsections are used for x i . By testing the coefficients β of these variables, it is possible to identify the determinants of farmland abandonment. The likelihood function is formed by the product of densities of y i ( y * i > 0) and the probability when y i = 0 is estimated using the maximum likelihood method.
Furthermore, the predicted values of the estimation results were used to compare the performance between the subjective and objective data. The root mean squared error (RMSE) and mean absolute error (MAE) were used as performance measures of the predicted values [54]. In addition, I used kernel density estimation of the predictive distribution to compare performance [55]. In the analysis of bias when using CDC against ACD, whether there is a difference in each coefficient between both estimates was checked by conducting Welch's pairwise t-test. Finally, whether the estimation bias is larger for AFR-C or AFR-G was examined by calculating the ratio of the estimated coefficients and computing their root mean square (RMS).

Estimation Results
The estimation results are shown in Table 2. The left side shows the results with AFR-C as the dependent variable, and the right side shows the results with AFR-G as the dependent variable. There is no difference in the coefficients for AFR-C between the OLS and Tobit models because there are no censored observations among the AFR-C dependent variables. However, due to the robust estimation of the Tobit model, the significances of the parameters are different. When the AFR-G is used as the dependent variable, the Tobit model fits better than OLS in terms of the Akaike information criterion (AIC). As a result, both Tobit models were considered as the final ones. Note: *, **, and *** indicate statistical significance at the 10%, 5%, and 1% levels, respectively.
First, for the AFR-C, the AREA (farmland area in a community) is positively significant. This is a reasonable result, indicating that the more farmland a community has, the more farmland is abandoned. PPUD (percentage of paddy fields in AREA) has a negative impact on the AFR, which is consistent with the findings of Takayama and Nakatani [16]. ROUT (percentage of rented out farmland) also has a negative impact, meaning that a well-functioning agricultural land market prevents farmland abandonment. Since NFHH (percentage of land tenure non-farm households) includes many retired farmers, it positively promotes farmland abandonment. NFHP (number of non-farm household producers) has a negative impact, suggesting the importance of non-farm household producers, such as community farms [56]. The coefficient of TRAC (number of tractors per farm household) is also negative and significant, indicating the effect of tractor ownership on reducing the AFR. These results are consistent with those of previous studies [13]. The results of BOAR (number of boars caught annually), the proxy variable for animal damage, also support the hypothesis.
For AFR-G, variables from CBD are hardly significant, while variables from GIS data and YOD tend to be significant. GINI (variation in farmland size) is positively significant, indicating that uneven or diverse farmland sizes tend to increase abandoned farmland. As expected, DEER (number of deer caught annually), AVAL (average altitude of farmlands in the village), and GRAD (average slope of farmlands in the village) show positive and significant effects on AFR. The policy variable DPFHM (direct payments to farmers in hilly and mountainous areas) is not significant for AFR-C but shows a significant policy effect for the AFR-G model, which is consistent with the results of Ito et al. [7].

Performance of Models and Data
First, under the assumption that the objective dataset AFR-G is a desirable measure of farmland abandonment rates, the predictive accuracy of the estimation model was evaluated. First, RMSE and MAE, both of which are shown in Table 3, were calculated. AFR-G is superior to AFR-C in terms of prediction accuracy. Figure 5 shows the results of the kernel density estimation of raw data (AFR-G and AFR-C) and the predicted values from the models. For both the AFR-G and AFR-C data, the predictions appear to be overestimated compared with the raw data. It should be noted that the original AFR distribution was smaller for AFR-G than for AFR-C; therefore, by using the AFR-G data, estimation results can become conservative and robust.  The bottom row of Table 4 shows RMS of the distance from the ratio of each coefficient to 1. The ratio of the coefficients for AFR-C is almost twice as large as that of AFR-G, suggesting that the bias is relatively larger when subjective data are used. Note that alt-  Next, how the results change when ACD is used was examined. Table 4 shows the ratio of the estimated parameters when using the Tobit model with the ACD to the estimated results when using the ECD. The rightmost column shows the results of a pairwise t-test to determine whether there are significant differences in the estimated coefficients between ECD and ACD. While the AFR-C model has four significant coefficients, AFR-G has only two at the 1% significance level, indicating that the estimation using AFR-G is more robust regardless of the presence of confidential data. The bottom row of Table 4 shows RMS of the distance from the ratio of each coefficient to 1. The ratio of the coefficients for AFR-C is almost twice as large as that of AFR-G, suggesting that the bias is relatively larger when subjective data are used. Note that although bias occurs in many variables, there are no results where the sign conditions of the significant coefficients are opposite.

Discussion
The results of the estimation ( Table 2) support many of the hypotheses. However, the results show that the significant variables tend to depend on the type of data used. An important finding is that AFR-C data are compatible with CBD data, and AFR-G data are compatible with geographic information and other data. The meaning of the word "compatible" here is that the coefficients of a variable are likely to pass the test (be identified). Presumably, census-derived subjective data are highly correlated with censusderived data; namely, there is a possibility of a high correlation between the same types of data. For example, the AFR-C is likely to be effective when analyzing the relationship between community structure and farmland abandonment from the CBD. However, in the case of objective data, such as geographic information or original data, it is more efficient to use AFR-G.
The important point is the relationship between the AFR and the original data, such as DEER, BOAR, DPFHM, and PAEMF. The latter two are particularly important, as these kinds of data have recently been used for evidence-based policy research, which needs to capture causal relationships accurately. The results show that AFR-G is more accurate in capturing the causal relationships. In particular, previous studies have demonstrated that direct payments for hilly and mountainous areas reduce farmland abandonment [7]. In this study, the relationship between DPFHM and AFR was significant only in the AFR-G model. Hence, in the field of policy research, the use of objective data such as AFR-G and GIS is likely to be more effective.
The results also show that ECD can generate bias. In Japan, MAFF avoids publishing data on a community with two or fewer households to protect personal information. While such treatment is inevitable from the perspective of privacy protection, it may distort policy evaluation and judgments in policy decision-making. Therefore, it is necessary to establish a system in which comprehensive data (without selection bias), such as the ACD, can be used for policy purposes and academic research.
The implications of the estimation results of the AFR-G validated in this study were reviewed. Geographical conditions that are unfavorable to agriculture, such as high altitude and steep slope, naturally induce farmland abandonment. The occurrence of bird and animal damage is also likely to reduce farmers' motivation to cultivate. Therefore, strong measures to prevent birds and animals from entering the community are also required. The most important policy implication is that the direct payment for hilly and mountainous areas is effective against farmland abandonment. This is a finding that is not possible when using the binary AFR-C data.
Let us now discuss the position of the problem of abandoned farmland from international and policy perspectives: the international literature on abandoned farmland in the EU is vast, but most of it is concerned with the impact of abandoned farmland, and little of it identifies the drivers of abandonment. Some studies suggest that public policy exogenously influences farmland abandonment but that socioeconomic factors and ownership systems dominate its dynamics [57].
In the EU, many researchers have proposed amendments to the Common Agricultural Policy (CAP) in response to the increase in abandoned farmland [58]. However, there are two main opinions on the impact of abandoned farmland: one is that abandonment harms the environment, and the other is that abandonment improves the environment by returning farmland to the wilderness. Dolton-Thornton [58] argues that the CAP's agroenvironmental measures are still too productive and neoliberal to address the problem of abandoned farmland as a part of them and that they need to be repositioned to address population decline, including in the rural non-farm sector.
Since the population decline in rural areas is also serious in Japan, it is believed that the problem of abandoned farmland needs to be considered as a part of regional economic policy, beyond the scope of mere agricultural policy. As Terres et al. [29] argue in the EU case, food security is an extremely important factor in Japan, where the grain self-sufficiency rate has fallen to 30%, and securing the potential of agricultural production is a policy that is not only materially but also psychologically important for the Japanese people.
Finally, this study has some limitations. It used 2010 data from Yabu City in Hyogo Prefecture, one of the many municipalities in Japan; therefore, the limitation of the external validity of the results should be noted. However, because 73% of the country's land and 41% of its farmland belong to hilly and mountainous areas such as the area of this study [59], and Japanese society and communities are relatively homogeneous in character [60], the results of this study should have some generalizability in Japan. Moreover, since the academic value of this study is in the estimation and use of data, the results are sufficient to point out the problems in a particular case (i.e., in at least one case, a problem arises when using conventional materials and methods). Although this study addresses the case of Japan, where rice farming dominates, it is only a matter of time before similar problems emerge in the rest of East Asia, which is experiencing rapid economic development. In fact, problems of farmland abandonment are beginning to arise in China according to a 2020 study by Zhou [61], and the analysis of Japan will provide useful insights for these regions.
Another shortcoming concerns the estimation method. When presenting the estimation results, the causal relationship between some variables and the AFR was mentioned. However, causal effects, such as policy and socially operative variables, require close attention to their identification. For example, the results show that damage from birds and animals tends to increase the amount of abandoned land, but the opposite causal relationship can also be assumed. Specifically, abandoned land tends to attract birds and animals to the community. Due to data limitations, it was not possible to examine these endogenous issues. These issues need to be examined based on more extensive data collection and advanced estimation methods, such as the instrumental variable (IV) method (if IV is available) [62].

Conclusions
This study mainly focused on the data problem related to analyzing the determinants of the farmland abandonment problem, which is becoming a serious problem in developed countries such as Japan, where agriculture has lost its comparative advantage. The main findings are as follows.

•
The estimation results differed if objective or subjective data were used.

•
There is a possibility of bias in the estimation when conventional census data, which farmers subjectively provide, was used in the analysis. • Correlations (coefficient parameters) between the same types of data (objective or subjective) are easy to identify. • When using data restricted from the perspective of privacy protection, a bias occurred in some estimates, but it did not reach a serious level. (This is the issue of confidentiality of community-based data, which has been used in many studies on Japanese agricultural policy.) The important implication of the third finding mentioned above is that while the variables from CBD data are more compatible with each other, variables from GIS are more compatible with the same type of data (GIS-related data). In addition, other original data, such as policy variables, were shown to be more compatible with objective data such as GIS data.
In recent years, there has been a rapid development of analytical methods in microeconomic analysis and policy research, but it is assumed that the data used in these analyses are accurately measured. However, as seen in this study, when there are differences in measures of land use conditions, or when the data are not fully available for social reasons, there may be differences or biases in the results. Therefore, when using land use data, the origin of the data should be carefully considered before conducting an analysis.
Funding: This research was supported by the Japan Society for the Promotion of Science (grant number: 21K05796).

Data Availability Statement:
The data presented in this study are available on request from the author. The data are not publicly available due to privacy protection.
Acknowledgments: I would like to express appreciation to the Agriculture and Forestry Promotion Divisions, Yabu City, for providing data on issues such as agricultural status, land use, and agricultural subsidies in Yabu City.