The Role of Clustering in the Adoption of Organic Dairy : A Longitudinal Networks Analysis between 2002 and 2015

This paper uses network analysis to study the geo-localization decisions of new organic dairy farm operations in the USA between 2002 and 2015. Given a dataset of organic dairy certifications we simulated spatio-temporal networks based on the location of existing and new organic dairy farming operations. The simulations were performed with different probabilities of connecting with existing or incoming organic farmer operations, to overcome the lack of data describing actual connections between farmers. Calculated network statistics on the simulated networks included the average degree, average shortest path, closeness (centrality), clustering coefficients, and the relative size of the largest cluster, to demonstrate how the networks evolved over time. The findings revealed that new organic dairy operations cluster around existing ones, reflecting the role of networks in the conversion into organic production. The contributions of this paper are twofold. First, we contribute to the literature on clustering, information sharing, and market development in the agri-food industry by analyzing the potential implications of social networking in the development of a relatively new agriculture market. Second, we add to the literature on empirical social networks by using a new dataset with information on actors not previously studied analytically.


Introduction
A dairy farmer's decision to adopt organic farming practices can be considered in the following way: a dairy farmer chooses to adopt organic practices if the expected profits from organic farming less cost of conversion exceed the expected profits from conventional dairy farming.If this is not true, the dairy farmer will continue conventional dairy production or close up shop, if expected profits are not sufficiently high.The profit-maximizing model suggests that the decision is strictly based on economic considerations.Yet conversion to organic farming has been slow in the United States, despite the finding that organic farming is more profitable than conventional farming [1,2].This suggests that other factors are equally important in the decision.
Our underlying hypothesis is that while organic is more profitable, it is not adopted widely for several reasons.First, the method is outside of the social norm in many regions of the country [3].Anecdotal evidence suggests that a successful conversion from conventional to organic farming requires changing the way a farmer thinks about agriculture [4].The importance of farmer mindset is supported by research indicating that the lack of peer support or existing social norms regarding organic farming are barriers to conversion [5][6][7].Another contributory factor is the wide differences between organic and conventional farming practices, which may lead an individual farmer to perceive that participating in the organic market is too risky [8].The lack of technical knowledge about organic practices may lead farmers to estimate low expected profits.Additionally, the costs of conversion, in terms of reduced yields during the transition period, may be too high for many farmers [9].Some of these barriers may be reduced by the presence of either university agricultural extension agents, private sector companies supporting the transition to organic, or certifiers conducting outreach to farmers [8,10].
Proximity to other organic farmers may overcome many of these perceived barriers.Knowing or being proximate to one or more certified organic farmers may normalize the production method.Locating near other organic farmers also facilitates sharing of technical information about organic farming practices, as new farmers learn from more experienced farmers.This is supported by theoretical evidence showing that economic growth is based on innovations that "spill over," which is further supported by empirical evidence showing that geographical proximity and knowledge spillovers are correlated [11][12][13][14].This has been documented particularly well in the case of cities [15].
Theories on information sharing and clustering as determinants for economic growth distinguish between two different forms.The first considers information that spills within companies that belong to the same industry.An example of this is Silicon Valley, where information flows through firms that comprise the technology industry.Second, other work examines information that spills from one industry to another such as from "the brassiere industry which grew out of dressmakers' innovations rather than the lingerie industry" [15,16].Our work focuses on the flow of information within the organic dairy farm sector and does not consider the second type, where the information spills over to other industries.
The evidence linking geographical proximity and knowledge transmission is particularly important for agricultural and food processing industries, as it finds that knowledge diffusion of plant biotechnology research was facilitated by proximity, particularly for small firms [17].Another important factor is proximity to consumers willing to buy organic products, which has been shown to be a key determinant in the retailing of organic products [18].Proximity of the farm to buyers of farm-level organic products is another critical aspect, to ensure there is a market for their organic farm production [19].Thus, over time, given the influence of these factors, we hypothesize that networks of organic farmers will emerge as more farmers transition to organic farming.We infer spatio-temporal networks [20] to account for these phenomena.This approach has been used in several domains such as predicting the formation of transportation networks [21,22], the diffusion of new words [23], predicting and recommending actual connections in social networks [23], and recommending new restaurants to social media users [24], among others [25].This paper starts with a discussion of the literature on social networks with an emphasis on information exchange in the agricultural sector.Next, we discuss the evolution of the organic dairy industry from the inception of the National Organic Standards in 2002 until 2015, tracing changes in the supply and demand of organic milk to identify key moments that are important to the network analysis.After presenting the methodology used, we discuss the results of the network analysis.We conclude with a discussion of the potential for this method, as well as the implications for the organic dairy sector.The research adds to the literature on organic production by analyzing the potential implications of social networking in the decision of farmers to adopt organic dairy production practices, and to the literature on empirical social networks by using a new dataset with information on actors not previously studied analytically.

Background on Social Network Analysis
We assert that social network analysis is able to demonstrate how hypothesized organic farmer networks changed over time in the United States.Social network analysis is a methodology that reflects the relational nature of the object of study.This methodology has been used by social scientists since 1945 [26], and has contributed to an understanding of varied social phenomena, such as the identification of 'persons of interest' by law enforcement agents [27].More specifically, in food related research it has been used to study food vendor-supplier networks [28], the spread of obesity [29], food recipes and ingredient pairings [30,31], food choice by socially connected individuals, such as spouses and friends, [32], the role of food retailers in the formation of cohesive local food supply networks [33], and the relationship between country obesity rates and food recipes [34], among others.
Furthermore, social networks methodology has been used in agriculture to study the relationship between belonging to a network and the adoption of agricultural technology [35,36] and knowledge exchange [37][38][39].Empirical studies demonstrate that "farmers do not act unilaterally; instead they collaborate, consult and negotiate.Embedded in these interactions is a flow of knowledge, ideas and information ( . . .)" [36].It is important to note that these collaborations in the agri-food industry are dependent on geographical proximity [17].Hence, if farmers are connected to a greater number of farmers, due to geographical proximity, information flows more efficiently.This will benefit the farmers on a denser network as information on prices, best practices, and the certification process can be shared more efficiently.In fact, lack of knowledge of the benefits derived from converting into organic has been identified as one of the main causes of failure to convert.This lack of information is addressed through the creation of social networks, as observed by researchers using qualitative techniques to study the transition into organic dairy in Southwestern Wisconsin [40].
Building on the literature on network analysis and knowledge exchange in agriculture, our analysis examines the evolution of the inferred network of certified organic farmers from 2002 to 2015.We assess key network statistics over time, calculated from the simulated networks that are built based on geographical distance.These networks statistics and their development over time allow us to investigate whether the network evidences that new organic dairy operations are closer to other existing operations, in other words, forming geographical clusters, or if the choice of a new location, or the conversion into organic dairy, occurs in geographical areas not associated with existing and incoming organic dairy operations.Our hypothesis is grounded in the fact that geographical clusters bring benefits to new and already established organic farming operations.The benefits may be due to potential cooperation with others, which results in improved access to transportation, as a truck can pick up the dairy from several farmers located in the cluster, especially those that don't have enough volume to fill up a truck.While outside of the scope of this analysis, cooperation may also lead to establishing new handling facilities.
We use the case of organic dairy to explore the dynamics of organic farmer networks in the United States.The data on certified organic farmers comes from the U.S. Department of Agriculture's Organic Integrity database, which we used to identify organic dairy farmers by year of receiving organic certification since the inception of the federal organic regulations in 2002.Using this information, we developed a model to infer longitudinal social networks based on geographic coordinates, which are then analyzed using network analysis where organic farmers are represented by nodes and edges to characterize social connections amongst them [41][42][43].We assumed that producers connected with others located within a 50-mile radius, as in fact theoretical and empirical evidence points out that at least 66% of real world social network connections are correlated with geographical proximity [44].The methodology used allows us to infer social networks without data on actual connections.The simulations consider a variety of possibilities and degrees of agent sociabilities within the geographic radius.

Background on the US Organic Dairy Industry
The organic dairy sector consists of certified organic farms, where fluid milk is produced, and certified organic processors, who bottle fluid milk or create milk-based products such as cheese and yogurt.At the farm level, in 2016, there were 2,559 certified organic dairy farms, with 268 thousand cows at year end [45].The organic and conventional markets are connected: the conventional market is the residual market for organic products, where organic milk can be sold as conventional in the absence of an organic buyer.Note that doing so will be costly, because, relative to conventional, organic production costs and prices are higher, with the farm level premiums estimated at 140 percent in 2017 [46].Three large processors distribute organic milk nationally: Organic Valley, Horizon Organic and Aurora Dairy.In addition, there are smaller processors who work regionally.
Prior to 2002, organic certification was in the control of multiple private certifiers each with different standards.Since 2002, the National Organic Standards dictate the certification rules, which are regulated by the US Department of Agriculture (USDA) but implemented by third party accredited certifiers.The standards regulate how certified organic cows are raised, and include rules about animal food and access to the outdoors.The organic dairy industry has experienced important milestones and influential events, which are outlined in Table 1.Organic Valley, a farmer owned cooperative, began producing and selling organic milk in 1988.Horizon began as an independent company in 1991, and was purchased by Dean Foods in 2004; the purchase was critical to the development of organic dairy market, as Dean Foods had well established distribution channels that opened to Horizon with the sale.Horizon Organic contracts with farmers for their milk.Aurora Dairy's organic business started later, in 2003, and was developed by one of the founders of Horizon Organic Dairy.Aurora produces lower priced, store brand milk, and owns a handful of large-scale organic dairy farms.
Several changes in the national organic standards influenced the development of the organic dairy farm sector.The first key event was the Harvey case, in which a farmer sued USDA over specific aspects of the initial organic rule; in terms of dairy, the original rule allowed farmers to use a portion of conventional feed during the transition process.One result of the Harvey rule was a shift to 100 percent organic feed during the entire transition process; this change encouraged a bump in transition prior to the rule change in June 2007.In 2005, there were 87 thousand organic dairy cows, which increased to 250 thousand dairy cows in 2008.Between 2008 and 2011, there was a small increase in the number of certified organic dairy cows, to 255 thousand total organic cows [47].The second significant event was the 2010 rule, which closed a loophole in the initial organic regulation regarding access to the outdoors for organic livestock.The pasture rule, as it is called, specified that cows must be grazed on pasture for at least 120 days of the year [48].
Other events, such as changing consumer preferences and increasing input costs, are significant in that they alter incentives to transition as well as increase costs of production.For example, the drought in California made it challenging for organic dairy farmers to meet the grazing requirement in 2014, as the lack of rain prevented the growth of grasses for grazing on the pastureland.Examples of optimal responses to some of these exogenous changes are to transition to organic more quickly, or to give up organic certification.
The organic dairy sector has experienced significant cycles over the past 20 years.While such boom and bust cycles are common for agriculture in general, the organic sector has largely been insulated from price volatility: this is likely due to continued growth of consumer demand for organic products.Shortages of organic milk were common prior to 2007, and empty shelves in the supermarket were not uncommon [49].The recession of 2008, combined with the boost in organic milk supply caused by the increased number of farmers (in response to the regulatory changes effective June 2007), had a huge effect on the organic dairy market: consumer demand dropped while supply increased, and organic farm gate dairy prices fell for the first time.Processors bought less organic milk, and some failed to renew contracts with farmers.But by 2015, there were shortages of organic milk again.In 2017, there was a surplus of organic milk again, with processors cutting prices paid to farmers.Producers of organic milk, unable to find buyers, sold some product into the conventional market, at the lower conventional milk price, or just dumped the milk [50].

Materials and Methods
To study the role of peer influence on the distribution and location of certified organic dairy farmers, we analyzed simulated networks of producers.We assumed that buyers (processors such as Organic Valley) were located near existing farmers, and that the market could absorb the production of newly transitioned dairy farmers.These assumptions allowed us to focus on the formation of a farmer network.
The entire process of inferring the network consists of four stages.The analysis of networks began with the collection and processing of the data that forms the basis of the analysis.Next, nodes (farmers) were geolocated using the Google maps API based on the address registered in the certification [51].Next, we inferred networks in which farmers connected to other farmers by assuming different levels of sociability and placing restrictions on the geographical distance in which a connection between two farmers may be formed.Next, we used network statistic measures to study the simulated networks in a year to year basis both at the micro and macro level to asses for clustering and connectivity [52,53].

Creation of the Dataset
The data were downloaded from a publicly available dataset of the USDA-the Organic Integrity Database-on June 8 2017.(See Supplementary Materials for dataset and R code.).This database is the most comprehensive source of information on organic production covering all farms and businesses that are certified to the USDA organic standard.Included information was the address of the certified organic operation, the products grown or processed there, the business name, the scope of activity, the date first certified, and the date certification was revoked or surrendered.Next, the data were filtered so that only certified organic milk and dairy livestock operations, located in the United States, were included.All certification dates prior to 2002 were recoded to 2002, which is the first year of the national organic regulation in the US, which marks the point in time when all farms were certified to the same standard.The final full year of data, at the time of the download, was 2016, but because the 2016 data was incomplete at the time of downloading, that year was excluded from the analysis.Therefore, the analysis covered 2002 to 2015.The final dataset for that time period included 2433 producers, who held 5175 certifications.Producers may hold multiple organic certifications, and because certification is costly, producers usually select ones required by the buyers.
The changes in the number of producers (shown in Figure 1) over time matched well with the industry shifts discussed in the previous section, reflecting exit and entry in the sector.For example, between 2004 and 2006, the number of producers and certifications increased, while these numbers decreased between 2008 and 2010, before increasing in 2011.The certifications and producers that were considered in the analysis are those represented in gray (Figure 1).These observations are for producers for whom we were able to obtain a GPS coordinate on using the Google Maps API and who have at least one certification with status in certified.As can be seen here, we have filtered the data losing some information for producers for whom we were unable to obtain GPS coordinates or whose certification status was surrendered or suspended.
Sustainability 2018, 10, x FOR PEER REVIEW 6 of 18 standard.The final full year of data, at the time of the download, was 2016, but because the 2016 data was incomplete at the time of downloading, that year was excluded from the analysis.Therefore, the analysis covered 2002 to 2015.The final dataset for that time period included 2433 producers, who held 5175 certifications.Producers may hold multiple organic certifications, and because certification is costly, producers usually select ones required by the buyers.The changes in the number of producers (shown in Figure 1) over time matched well with the industry shifts discussed in the previous section, reflecting exit and entry in the sector.For example, between 2004 and 2006, the number of producers and certifications increased, while these numbers decreased between 2008 and 2010, before increasing in 2011.The certifications and producers that were considered in the analysis are those represented in gray (Figure 1).These observations are for producers for whom we were able to obtain a GPS coordinate on using the Google Maps API and who have at least one certification with status in certified.As can be seen here, we have filtered the data losing some information for producers for whom we were unable to obtain GPS coordinates or whose certification status was surrendered or suspended.

Inferring Social Networks
The next step involved obtaining a pair of longitude and latitude GPS coordinates for the location of each certified dairy operation.The first step in this process was creating a vector concatenating the address fields, and then pushing the vector to Google Maps to retrieve latitude and longitude.Since the address vector contains address lines, city, state, country, and zip code, this minimized errors.The authors manually checked some of the randomly selected GPS coordinates using google maps to further scrutinize the results.
The next step was to infer social networks on the GPS coordinates.We assumed several scenarios of interconnectedness inside a geographical boundary based on a 50-mile radius.Producers were assumed to be connected to other producers located at most 50 miles away (Euclidean distance).Several models were studied, ranging from a random 5% probability of being connected to another farmer in the 50-mile radius to 100% probability of being connected.

Inferring Social Networks
The next step involved obtaining a pair of longitude and latitude GPS coordinates for the location of each certified dairy operation.The first step in this process was creating a vector concatenating the address fields, and then pushing the vector to Google Maps to retrieve latitude and longitude.Since the address vector contains address lines, city, state, country, and zip code, this minimized errors.The authors manually checked some of the randomly selected GPS coordinates using google maps to further scrutinize the results.
The next step was to infer social networks on the GPS coordinates.We assumed several scenarios of interconnectedness inside a geographical boundary based on a 50-mile radius.Producers were assumed to be connected to other producers located at most 50 miles away (Euclidean distance).Several models were studied, ranging from a random 5% probability of being connected to another farmer in the 50-mile radius to 100% probability of being connected.
Furthermore, those connections have been programmed to be time dependent.For the purpose of the network analysis, the producers were organized into yearly chronological periods, based on the date they were first certified.Thus, we constructed 280 different networks, one for each year between 2002 and 2015, and 20 for each year as we modeled different probabilities of connecting with other farmers in the 50-mile radius, ranging from 5% to 100% using 5% increments.The first years consisted of the early adopters of organic dairy farming, and many of these farmers were in the organic business prior to the implementation of the federal organic program.We assumed that a producer in the first year (2002) would only connect with a producer in that same year.A new entrant producer in the following years would connect with the already established producers in the previous years, or with other new entrants in the same year.The following pseudo-code summarizes the algorithm used to infer the networks: Furthermore, those connections have been programmed to be time dependent.For the purpose of the network analysis, the producers were organized into yearly chronological periods, based on the date they were first certified.Thus, we constructed 280 different networks, one for each year between 2002 and 2015, and 20 for each year as we modeled different probabilities of connecting with other farmers in the 50-mile radius, ranging from 5% to 100% using 5% increments.The first years consisted of the early adopters of organic dairy farming, and many of these farmers were in the organic business prior to the implementation of the federal organic program.We assumed that a producer in the first year (2002) would only connect with a producer in that same year.A new entrant producer in the following years would connect with the already established producers in the previous years, or with other new entrants in the same year.The following pseudo-code summarizes the algorithm used to infer the networks:

Network Statistics
We further studied the development of the simulated networks along with the corresponding network analysis statistics.To do so we used measures that described the network at different points in time following Barabási et.al. [54], who examined the development of scientific collaborations in mathematics and neuroscience using co-authorship networks.Their paper shows how information exchange that derives in scientific cooperation is achieved in communities of scientists that form tightly connected clusters Although their analysis has a particular domain focus on scientometrics, its methodologies can be extended to other types of networks.Unlike Barabási et.al. [54], we used a random graph model [55] instead of a scale-free network model, as our dataset provided no information that allowed us to gauge the importance of a node, such as the size of the operation or its revenue.In this sense we lost the most important component of a scale-free network: real data that would allow us to infer preferential attachment.Thus, we could not identify which nodes had more actual connections or which nodes could attract more connections.
Our analysis looked at five features of the network at the micro and meso levels.At the micro level, that is the node or farmer level, we used three network statistics: average degree, average shortest path, and closeness.These measures revealed statistical properties of the nodes that pointed towards the role of being more connected to other nodes as well as the role that each node plays in the network.At the meso level, that is the mid-level between nodes and network, we studied the clustering coefficient and the relative size of the largest cluster.With these measures, we studied the formation of clusters in the network and the size of the largest cluster.
It is important to note that our inferred network was not directed.Thus, an edge between two farmers lacks a directionality that can be interpreted as a farmer sending a communication or a friend request or an advisory request.Instead, for our undirected network, we assumed two farmers were connected, which could mean that they know each other, that they are friends, or that they collaborate or exchange information.

Network Statistics
We further studied the development of the simulated networks along with the corresponding network analysis statistics.To do so we used measures that described the network at different points in time following Barabási et al. [54], who examined the development of scientific collaborations in mathematics and neuroscience using co-authorship networks.Their paper shows how information exchange that derives in scientific cooperation is achieved in communities of scientists that form tightly connected clusters Although their analysis has a particular domain focus on scientometrics, its methodologies can be extended to other types of networks.Unlike Barabási et al. [54], we used a random graph model [55] instead of a scale-free network model, as our dataset provided no information that allowed us to gauge the importance of a node, such as the size of the operation or its revenue.In this sense we lost the most important component of a scale-free network: real data that would allow us to infer preferential attachment.Thus, we could not identify which nodes had more actual connections or which nodes could attract more connections.
Our analysis looked at five features of the network at the micro and meso levels.At the micro level, that is the node or farmer level, we used three network statistics: average degree, average shortest path, and closeness.These measures revealed statistical properties of the nodes that pointed towards the role of being more connected to other nodes as well as the role that each node plays in the network.At the meso level, that is the mid-level between nodes and network, we studied the clustering coefficient and the relative size of the largest cluster.With these measures, we studied the formation of clusters in the network and the size of the largest cluster.
It is important to note that our inferred network was not directed.Thus, an edge between two farmers lacks a directionality that can be interpreted as a farmer sending a communication or a friend request or an advisory request.Instead, for our undirected network, we assumed two farmers were connected, which could mean that they know each other, that they are friends, or that they collaborate or exchange information.

Average Degree
The average degree k is a measure of the average number of connections of each node k [42].It is computed in the case of an undirected network as shown in Equation (2).That is, for each node one counts the number of connections it has and then takes the sum of those connections divided by the total number of nodes.As a network becomes more interconnected, each node will have a greater number of connections.In our example, this shows how connections among farmers change over time as the organic dairy industry evolves.

Average Shortest Path
One of the most intuitive statistics of network cohesion is the shortest path, which is a measure of separation between a random pair of nodes in the network.It can be interpreted in this network as the minimum number of farmers that a random farmer (i) who tries to reach another farmer (j) needs to go through.This measure is widely used and well known outside the network analysis community [56].The popularized version of this concept, attributed to the writer Frigyes Karinthy, is that any two people in the world can connect to one another by going through only six degrees of separation, that is, if you want to connect with anyone in the world you will need to go, on average, through only six other people asking them to connect you with the person you have in mind.A study that used Facebook data to find out whether this holds true or not found that the actual degree may be smaller, closer to four degrees [57].
The average separation is formally defined as: "The ability of two nodes, i and j, to communicate with each other (that) depends on the length of the shortest path, l i,j , between them.The average of l i,j over all pairs of nodes is denoted by d = l i,j and we call it the average separation of the network, characterizing the networks interconnectedness" [54].The length, in this case is the minimum number of nodes that separate i and j.This measure reflects whether the organic dairy farmer network is becoming more tightly connected, which would result in a declining average shortest path over time.A decline would reflect that new entrants are clustering in the same regions.
Since the simulated networks place a geographical restriction on the connections between two nodes i and j, the average shortest path does not consider connections between nodes that belong to different connected components.That is, the average shortest path between two nodes is calculated only if those two nodes i and j have the possibility to connect either directly, or by going through other connected nodes.It is important to note that the length of the shortest path, l i,j , is not a measure of geographical distance.For example, the length of the shortest path between i and j, in which i connects to k, and k connects to j is 1 but in the case of the simulated networks the geographical distance between i and j varies between 0 and 100 miles.

Closeness (Centrality)
Another measure that is derived from the shortest path is closeness.This node level measure counts the number of shortest paths that cross a given node.That is, for every node i we count the number of shortest paths l i,j that go through the given node.This measure is used to infer power or centrality (Equation ( 3)).The notion behind this is that if one node is on the shortest path between others, it has a privileged position as those two nodes that want to connect will have to go through it, this measure is computed using the following equation: where "the nodes in a network N [are] labeled 1...n.For a node i, let nk(i) be the number of nodes at distance k from i" [58].
One challenge of this measure is that it assumes that every node is connected to one another, that is, there is no isolated node, which is not the case for our network.For this reason, we used a variation of the closeness measure (Equation ( 4)) that is correlated with the original one and is fit to study disconnected networks [58].This equation calculates a score for each node and then inversely normalizes it to make comparisons among nodes possible: In this measure, "1 is a most central node in the network.The index can take values between zero and one, with the minimum value zero being obtained when all nodes have the same centrality value, for example if the network is a clique.A highly centralized network is one in which the most central node, 1, has high degree and is connected to relatively distinct groups" [58].

Clustering Coefficient
The clustering coefficient measures the average probability, over the entire network, that the two closest nodes of a nearest neighbor node are connected.In a more intuitive way, it seeks to capture the formation of triads in a network, that is groups of three nodes that are interconnected among them reflecting the fact that communities form as their nodes become more connected with each other.We used a clustering measure that corrected for autocorrelation [59,60] with the following equation: where "x i j , x i k , x k j , [are tie variables] defined as the dichotomous (0/1) indicators of the existence of the ties i → j, i → k , and k → j, respectively."[59].

Relative Size of the Largest Cluster
The relative size of the largest cluster is a highly intuitive network statistic that is obtained by dividing the number of nodes that form the largest cluster, a group of connected nodes that has the largest number of nodes, by the total number of nodes in the network.Intuitively, if the relative size of the largest cluster increases, it means that the largest cluster is becoming more important in relation to the network.In a network that becomes more connected over time, one expects this number to increase as the largest cluster gains nodes, further attracting new nodes.The computational estimation of the largest cluster, also known as largest component, is a topic of research in Computer Science as it can take a long time for extremely large graphs, therefore new algorithms are developed in order to make it faster [61,62].This reference [63] provides the earliest algorithm implementation to find the largest cluster, dating back to 1972.The intuition is to start from a random node and move to another one that is connected to the starting node, one repeats this process and keeps in memory the number of nodes until there are no more nodes to hop on to.One repeats this process starting at every node of the graph and compares the number of nodes with the new one until one reaches the maximum number.Then one divides the largest number of nodes by the total number of nodes in the network.

Limitations
This study has four main limitations.First, given the limited nature of the dataset, which contains only a list of organic dairy certification with information on the certifier and certified for which an address was provided, our study does not intend to provide an exact estimate of the real connections and cooperation between farmers.We modeled those connections using geographical boundaries which are based on empirical evidence stating that social networks are usually localized [44,57] however we are aware that some collaborations may occur in distances beyond 50-mile radiuses, particularly with the use of internet.If data on real world connections among organic dairy farmers is collected, we could test and compare our results, nevertheless we are not aware of the existence of such datasets.
Second, our dataset does not reflect the size or revenue of the operations.Thus, we treated every node equally.This could bias our results towards areas in which there are larger quantities of operations at a smaller scale.Potentially the areas in which there are large operations may not be as attractive for new entrants.Due to the nature of the dataset we could not control for this.
Third, further research is needed to isolate the effect of networking in the adoption of organic dairy production in order to control for the weather, land characteristics, and economic effects such as income.We recognize that there are other characteristics that may attract new farmers to an area, however there is empirical and theoretical evidence that supports that collaborations and knowledge sharing do in fact correlate with geographical distance [13,14,17].This evidence is also found in studies using network analysis [64][65][66].
Fourth, the inclusion of dairy handler facilities, distribution centers, supermarkets, and consumer density may provide a better understanding of the geo-localization problem of new organic dairy operations as these factors may contribute to this decision, particularly in the case of new operations as opposed to operations that transitioned from conventional into organic.

Network Visualizations
The following maps shown in Figure 2 provide network visualizations for select years in the chronological period of study.The first (a) reflects the producers holding an organic certification in 2005, three years after the implementation of the National Organic Standards.As can be seen, the largest networks of organic producers are in the Upper Midwest (where Organic Valley is headquartered) and the Northeast, with smaller networks appearing in other parts of the country.The visualizations shown in these figures were obtained using a 100% probability of connecting in the 50-mile radius.That is, every farmer represented with a green dot, is connected with an edge, represented by a black line, to every other farmer in a 50-mile radius.farmers is collected, we could test and compare our results, nevertheless we are not aware of the existence of such datasets.Second, our dataset does not reflect the size or revenue of the operations.Thus, we treated every node equally.This could bias our results towards areas in which there are larger quantities of operations at a smaller scale.Potentially the areas in which there are large operations may not be as attractive for new entrants.Due to the nature of the dataset we could not control for this.
Third, further research is needed to isolate the effect of networking in the adoption of organic dairy production in order to control for the weather, land characteristics, and economic effects such as income.We recognize that there are other characteristics that may attract new farmers to an area, however there is empirical and theoretical evidence that supports that collaborations and knowledge sharing do in fact correlate with geographical distance [13,14,17].This evidence is also found in studies using network analysis [64][65][66].
Fourth, the inclusion of dairy handler facilities, distribution centers, supermarkets, and consumer density may provide a better understanding of the geo-localization problem of new organic dairy operations as these factors may contribute to this decision, particularly in the case of new operations as opposed to operations that transitioned from conventional into organic.

Network Visualizations
The following maps shown in Figure 2 provide network visualizations for select years in the chronological period of study.The first (a) reflects the producers holding an organic certification in 2005, three years after the implementation of the National Organic Standards.As can be seen, the largest networks of organic producers are in the Upper Midwest (where Organic Valley is headquartered) and the Northeast, with smaller networks appearing in other parts of the country.The visualizations shown in these figures were obtained using a 100% probability of connecting in the 50-mile radius.That is, every farmer represented with a green dot, is connected with an edge, represented by a black line, to every other farmer in a 50-mile radius.These network representations examine the closeness of the geographical distance and capture the development of the communities of farmers, potentially providing evidence for the hypothesis that organic dairy farmers cluster together as a result of peer support.Although the visualizations appear to support our hypothesis, we examine this further by using formal networks statistics to study the simulated networks.

Average Degree
The average degree, as applied to the organic dairy sector, measures the average number of connections that each farmer in the network has.As can be seen in the results shown in Figure 4, the average degree decreased in the first years and increased in the more recent years, especially after 2008.The decreases in the early years corresponded to changes in regulation and market conditions, as farmers entered or exited the organic dairy sector.Afterwards, right before 2008 the average number started to pick up.Not surprisingly, the effect is stronger the greater the sociability, as more assumed sociability by definition connects more farmers to one another within the 50-mile radius.The overall increases were particularly strong after 2009 when the organic dairy sector started to pick up after the 2008 crisis.These findings support the idea that new entrants cluster around established organic dairy farmers, since by the design of the study the connections can only happen within a 50-mile radius.

Average Shortest Path
The average shortest path is relatively stable over time, as shown in Figure 5, at approximately four degrees.Thus, even as the number of organic dairy farmers increased and as the average degree increased, the shortest path varied little.This result is surprising, as most models for network evolution suggest that as the number of nodes grows, the average shortest path increases [54].This finding may be due to the fact that our simulated networks are disconnected due to the 50-mile radius

Average Shortest Path
The average shortest path is relatively stable over time, as shown in Figure 5, at approximately four degrees.Thus, even as the number of organic dairy farmers increased and as the average degree increased, the shortest path varied little.This result is surprising, as most models for network evolution suggest that as the number of nodes grows, the average shortest path increases [54].This finding may be due to the fact that our simulated networks are disconnected due to the 50-mile radius geographical boundary.While in some regions (The Northeast), the network is continually growing larger, and more connected as the average degree becomes larger, in other parts the network remains relatively small.

Average Shortest Path
The average shortest path is relatively stable over time, as shown in Figure 5, at approximately four degrees.Thus, even as the number of organic dairy farmers increased and as the average degree increased, the shortest path varied little.This result is surprising, as most models for network evolution suggest that as the number of nodes grows, the average shortest path increases [54].This finding may be due to the fact that our simulated networks are disconnected due to the 50-mile radius geographical boundary.While in some regions (The Northeast), the network is continually growing larger, and more connected as the average degree becomes larger, in other parts the network remains relatively small.Our results are similar to those of the Facebook study [57], which found that the average shortest path is close to four.Nevertheless, this is quite surprising as people in Facebook can connect to others without the constraints of geographical distances.In contrast, we placed a 50-mile restriction on the maximum geographical distance allowed for forming connections.If a Facebook user wants to connect with any other user in the world, on average the minimum number of people they will have to ask them to connect with someone who knows the person they want to connect to is four.The Facebook user has the advantage that their connections are not geographically constrained.In the case of the farmers, this number resembles that of the Facebook study, however the farmers have a geographical constraint.In this model a farmer in Wisconsin cannot reach another one in California Our results are similar to those of the Facebook study [57], which found that the average shortest path is close to four.Nevertheless, this is quite surprising as people in Facebook can connect to others without the constraints of geographical distances.In contrast, we placed a 50-mile restriction on the maximum geographical distance allowed for forming connections.If a Facebook user wants to connect with any other user in the world, on average the minimum number of people they will have to ask them to connect with someone who knows the person they want to connect to is four.The Facebook user has the advantage that their connections are not geographically constrained.In the case of the farmers, this number resembles that of the Facebook study, however the farmers have a geographical constraint.In this model a farmer in Wisconsin cannot reach another one in California because there is no shortest path between them.But if they want to reach someone else on their same cluster, they will also need to go through only four other farmers.This also suggests that, over time, organic farming peer networks within each of the connected regions remains important.Other possible explanations for this formation is that the proximity of organic dairy farmers to each other reduces the cost of the transportation of raw milk to the processor, or that dairy processors open new handling facilities proximate to the clusters of new farmers.

Closeness (Centrality)
For all levels of sociability, the average closeness decreases over time, as shown in Figure 6, which can be interpreted as an overall decrease in the number of shortest paths that go through each farmer.This is a characteristic of more connected networks as the central nodes can be bypassed revealing a more cohesive network.Note that after 2008 and 2012 there were increases in closeness for all levels of sociability.This timing coincides with negative stressors on the organic dairy sector, including rising production costs and overall decreased demand for milk.This may have decreased the entry of new organic farmers or encouraged some to exit.These results are highly intuitive as we see that the negative stressors increased the level of closeness reflecting that some nodes became more "powerful" in a way.This power in a geographically constrained area can be seen from two different perspectives.First, in a region in which there is a decrease in new entrants of organic dairy, a farmer i will be more central, particularly if other farmers are trying to reach one another and have to go through i (either as in physically moving a truck to pick up milk, or asking i to introduce each other).However, from a network or cluster perspective when clustering decreases it means that the network is becoming more cohesive.If that farmer i can be bypassed by going through other nodes who are trying to connect then the network is more cohesive, reflecting the formation of new connections, which make, on average nodes less "powerful" or central.Negative macroeconomic stressors are likely to have an impact on the least powerful nodes and on the potential entry of new nodes, thus increasing the centrality of existing nodes.Our network simulations may be reflecting these negative macroeconomic stressors.
For all levels of sociability, the average closeness decreases over time, as shown in Figure 6, which can be interpreted as an overall decrease in the number of shortest paths that go through each farmer.This is a characteristic of more connected networks as the central nodes can be bypassed revealing a more cohesive network.Note that after 2008 and 2012 there were increases in closeness for all levels of sociability.This timing coincides with negative stressors on the organic dairy sector, including rising production costs and overall decreased demand for milk.This may have decreased the entry of new organic farmers or encouraged some to exit.
These results are highly intuitive as we see that the negative stressors increased the level of closeness reflecting that some nodes became more "powerful" in a way.This power in a geographically constrained area can be seen from two different perspectives.First, in a region in which there is a decrease in new entrants of organic dairy, a farmer i will be more central, particularly if other farmers are trying to reach one another and have to go through i (either as in physically moving a truck to pick up milk, or asking i to introduce each other).However, from a network or cluster perspective when clustering decreases it means that the network is becoming more cohesive.If that farmer i can be bypassed by going through other nodes who are trying to connect then the network is more cohesive, reflecting the formation of new connections, which make, on average nodes less "powerful" or central.Negative macroeconomic stressors are likely to have an impact on the least powerful nodes and on the potential entry of new nodes, thus increasing the centrality of existing nodes.Our network simulations may be reflecting these negative macroeconomic stressors.

Clustering Coefficient
The clustering coefficient reflects the formation of triads of nodes, this structure reflects cohesion in a network.For an organic farmers network, a dyad will represent, for example, that two farmers share information on best practices.A triad in which three farmers share information may lead to advantages as information flows more efficiently and each node in this triad will have two points of view to consider when making a decision.Intuitively, if the decision is to transform into organic

Clustering Coefficient
The clustering coefficient reflects the formation of triads of nodes, this structure reflects cohesion in a network.For an organic farmers network, a dyad will represent, for example, that two farmers share information on best practices.A triad in which three farmers share information may lead to advantages as information flows more efficiently and each node in this triad will have two points of view to consider when making a decision.Intuitively, if the decision is to transform into organic dairy, if a potential node i gets information from two successful nodes k and j then it will be considering two points of view, which may have a larger impact on i's decision.Furthermore, triads are considered to be the fundamental building blocks of communities [67].
The simulated networks for organic dairy farmers show (Figure 7) several ups and downs for all levels of sociabilities.Up to 2004 the coefficient decreased.The dramatic changes for these years were possibly due to the nature of the dataset and do not reflect trends, since the dataset records all certifications that were in place on or before 2002 as occurring in 2002, the year in which the National Organic Standards became law.A more interesting behavior occurred between 2004 and 2007, where the clustering coefficient increased.This increase corresponded to the time when many conventional dairies converted prior to the rule change of 2007, as a result of the Harvey case.Between 2007 and 2011 there are no major changes, which corresponded to the time period the organic dairy sector struggled, partly due to the recession.After 2011 up to 2015 there was a tendency towards increasing clustering.This increase further reflected that a new organic dairy operation i was likely to be located in close proximity to two existing operations j and k.Given that the maximum distance between i and j and i and k is 50 miles, operations i, j, and k thus form triads.
certifications that were in place on or before 2002 as occurring in 2002, the year in which the National Organic Standards became law.A more interesting behavior occurred between 2004 and 2007, where the clustering coefficient increased.This increase corresponded to the time when many conventional dairies converted prior to the rule change of 2007, as a result of the Harvey case.Between 2007 and 2011 there are no major changes, which corresponded to the time period the organic dairy sector struggled, partly due to the recession.After 2011 up to 2015 there was a tendency towards increasing clustering.This increase further reflected that a new organic dairy operation i was likely to be located in close proximity to two existing operations j and k.Given that the maximum distance between i and j and i and k is 50 miles, operations i, j, and k thus form triads.

Relative Size of the Largest Cluster
For the modeled networks (shown in Figure 8), the relative size of the largest cluster remained relatively stable over the 2002-2015 period although it has decreased for most levels of sociability since 2009.This result was unexpected, as a more cohesive network that reflects clustering is typically characterized by a largest cluster that can become almost the entirety of the network.Thus, this is a remarkable finding for our network, given the geographical constraints.
Furthermore, our results show that in spite of new entrants, the largest cluster remains quite large, covering almost 30-40% of the total number of farmers.This is particularly interesting, again due to the 50-mile radius imposed on the network.That means that around 30-40% of the farmers in the country can be reached with at most 50 miles between each other if one happens to be in the largest cluster (the Northeast).This remarkable result may offer an explanation for the decreasing relative size of the largest cluster since 2009.We posit that the largest cluster, in the Northeast, may have reached a tipping point because the region is saturated.Speculating about the future, by looking back at the maps (shown in Figure 2), at some point in the future, the largest cluster west of Chicago is likely to merge with the cluster around Michigan-Indiana-Ohio.

Relative Size of the Largest Cluster
For the modeled networks (shown in Figure 8), the relative size of the largest cluster remained relatively stable over the 2002-2015 period although it has decreased for most levels of sociability since 2009.This result was unexpected, as a more cohesive network that reflects clustering is typically characterized by a largest cluster that can become almost the entirety of the network.Thus, this is a remarkable finding for our network, given the geographical constraints.
Furthermore, our results show that in spite of new entrants, the largest cluster remains quite large, covering almost 30-40% of the total number of farmers.This is particularly interesting, again due to the 50-mile radius imposed on the network.That means that around 30-40% of the farmers in the country can be reached with at most 50 miles between each other if one happens to be in the largest cluster (the Northeast).This remarkable result may offer an explanation for the decreasing relative size of the largest cluster since 2009.We posit that the largest cluster, in the Northeast, may have reached a tipping point because the region is saturated.Speculating about the future, by looking back at the maps (shown in Figure 2), at some point in the future, the largest cluster west of Chicago is likely to merge with the cluster around Michigan-Indiana-Ohio.

Conclusion
This paper adds both to the applied economics literature that examines how new agricultural markets develop over time and the literature on the organic markets, as well as to the literature on social network theory.The results provide evidence, from both a visual and statistical point of view,

Conclusions
This paper adds both to the applied economics literature that examines how new agricultural markets develop over time and the literature on the organic markets, as well as to the literature on social network theory.The results provide evidence, from both a visual and statistical point of view, that the location of new organic dairy operations reflects advantages from clustering around existing operations.These effects may partially explain the slow adoption of organic dairy in the United States despite it being more profitable than conventional dairy.
Following this idea, in order to further the growth of organic dairy operations we believe that a successful food policy would foster cooperation and information sharing between existing organic dairy farmers and prospective new organic dairy operations.There is some evidence of grassroots organizations and organic certifiers providing a forum for peer mentoring of new and potential organic farmers, but more formal policy or increased funding might facilitate the expansion of peer networks.An example of such a policy is improving the extension services for organic farmers, for example, through the appointment of agents that specialize in organic farming to provide assistance to farmers interested in transitioning to organic.Potential publicly funded policies enhancing peer mentoring and expansion of organic extension would likely be complementary to existing private sector support of the organic sector, such as the current private sector collaboration between General Mills and Organic Valley which aims to increase the supply of organic milk [68].
This paper provides a new way of examining the evolution of the organic dairy farm sector, as well as presenting a novel application of social network theory.These two contributions to the literature are possible despite the limitation of having no data reflecting real connections among organic dairy farmers.We find that use of geo-location is a reasonable proxy for social phenomena occurring from proximity to other social agents.For applied research examining the evolution of agricultural markets, obtaining a longitudinal dataset of connections among farmers is cost prohibitive, necessitating using an alternative method such as the inferred networks simulated in this paper.We hope that this methodology encourages other researchers to study social networks by modeling relations between actors based on location and thus overcome the common problems arising from lack of data on real world connections between social actors.

Figure 1 .
Figure 1.Organic dairy certifications and producers 2002-2015.This figure shows the total number of organic dairy certifications (a) and the total number of producers that were awarded those certifications (b) between 2002 and 2015.Source: US Department of Agriculture (USDA), Integrity Database.Author analysis of data from USDA's Organic Integrity Database.

Figure 1 .
Figure 1.Organic dairy certifications and producers 2002-2015.This figure shows the total number of organic dairy certifications (a) and the total number of producers that were awarded those certifications (b) between 2002 and 2015.Source: US Department of Agriculture (USDA), Integrity Database.Author analysis of data from USDA's Organic Integrity Database.

Figure 2 .
Figure 2. Simulated Organic Dairy Producer-Producer Networks 2005, 2010, 2015.This figure shows the maps of networks for organic dairy farmers network in 2005 (a), 2010 (b), and 2015 (c).Source: USDA, Integrity Database.Author analysis of data from USDA's Organic Integrity Database.

Figure 3 .
Figure 3. Organic dairy farmers around the Organic Valley Distribution Center, 2005, 2010, 2015.This figure shows the map of the organic dairy farmers network around Organic Valley's Distribution Center for 2005 (a), 2010 (b), and 2015 (c) (coordinates 43.731517, −90.801919).Source: USDA, Integrity Database.Author analysis of data from USDA's Organic Integrity Database.

Figure 2 .
Figure 2. Simulated Organic Dairy Producer-Producer Networks 2005, 2010, 2015.This figure shows the maps of networks for organic dairy farmers network in 2005 (a), 2010 (b), and 2015 (c).Source: USDA, Integrity Database.Author analysis of data from USDA's Organic Integrity Database.In 2010, as the second map and network shows (Figure 2b), clusters in the Midwest and Northeast were larger.New communities appeared in many parts of the country.The small cluster in Texas reflects the dairy farms of Aurora Dairy, one of the three large processors.The 2015 visualizations (Figure 2c) show the Upper Midwest and Northeast clusters continue to grow larger, and sometimes are connecting to others, as new farmers locate in the periphery of the clusters.The following maps shown in Figure 3 depict a zoom of the area surrounding Organic Valley's Distribution Center in the state of Wisconsin for 2005 (a), 2010 (b), and 2015 (c) respectively.The networks around Organic Valley's Distribution center show how new producers arrive to the area over time.The network gets denser, from a visual perspective, as more connections appear.

Figure 3 .
Figure 3. Organic dairy farmers around the Organic Valley Distribution Center, 2005, 2010, 2015.This figure shows the map of the organic dairy farmers network around Organic Valley's Distribution Center for 2005 (a), 2010 (b), and 2015 (c) (coordinates 43.731517, −90.801919).Source: USDA, Integrity Database.Author analysis of data from USDA's Organic Integrity Database.

Figure 4 .
Figure 4. Average degree in an organic dairy farmers network 2002-2015.Each line represents a modeled degree of sociability ranging from 5-100%, reflected by the gradient.

Figure 4 .
Figure 4. Average degree in an organic dairy farmers network 2002-2015.Each line represents a modeled degree of sociability ranging from 5-100%, reflected by the gradient.

Figure 4 .
Figure 4. Average degree in an organic dairy farmers network 2002-2015.Each line represents a modeled degree of sociability ranging from 5-100%, reflected by the gradient.

Figure 5 .
Figure 5. Average shortest path in an organic dairy farmers network 2002-2015.Each line represents a modeled degree of sociability ranging from 5-100%, reflected by the gradient.

Figure 5 .
Figure 5. Average shortest path in an organic dairy farmers network 2002-2015.Each line represents a modeled degree of sociability ranging from 5-100%, reflected by the gradient.

Figure 6 .
Figure 6.Closeness (Centrality) in an organic dairy farmers network 2002-2015.Each line represents a modeled degree of sociability ranging from 5-100%, reflected by the gradient.

Figure 6 .
Figure 6.Closeness (Centrality) in an organic dairy farmers network 2002-2015.Each line represents a modeled degree of sociability ranging from 5-100%, reflected by the gradient.

Figure 7 .
Figure 7. Clustering Coefficient in an organic dairy farmers network 2002-2015.Each line represents a modeled degree of sociability ranging from 5-100%, reflected by the gradient.

Figure 7 .
Figure 7. Clustering Coefficient in an organic dairy farmers network 2002-2015.Each line represents a modeled degree of sociability ranging from 5-100%, reflected by the gradient.

Figure 8 .
Figure 8. Relative size of the largest cluster in an organic dairy farmers network 2002-2015.Each line represents a modeled degree of sociability ranging from 5-100%, reflected by the gradient.

Figure 8 .
Figure 8. Relative size of the largest cluster in an organic dairy farmers network 2002-2015.Each line represents a modeled degree of sociability ranging from 5-100%, reflected by the gradient.

to infer networks
-For each GPS coordinate create a 50 mile radius; -Connect to other coordinates belonging to t that are inside the radius with probability p; Else if t > 2002 and t <= 2015 then -For each GPS coordinate create a 50 mile radius; -Connect to other coordinates inside the radius that belong to t <= t with probability p; end end end