1. Introduction: The Producer Share in Agrifood Value Chains
For people not directly involved in the agrifood system, the gap between the price at which farmers sell their products and the final retail price can be striking. This difference is often so large that it becomes difficult to understand how producers can continue operating and why ordinary consumers end up paying so much in the marketplace for products that are essentially inexpensive [
1]. Yet this situation is not new: it is a long-established trend that seems practically impossible to reverse. In certain social contexts—such as southern Europe, which concentrates a significant share of European production—this dynamic pushes producers to seek alternative ways of marketing their goods: local markets, direct sales, cooperative or associative channels, and other strategies aimed at reducing the often excessive margins imposed by the distribution chain. However, these initiatives remain insufficient, as much of the chain is now controlled by large market operators who dominate both sides of distribution, pressuring producers to sell at low prices while consumers continue to face high retail prices.
A substantial body of empirical literature has documented the existence of asymmetric adjustments in price transmission along supply chains, whereby downstream prices respond unevenly to upstream cost shocks. One of the most frequently observed patterns is the so-called “rockets and feathers” behavior, in which output prices increase rapidly following input price rises but adjust more slowly when input prices decline. Early empirical evidence of such asymmetries was reported in energy and petroleum markets [
2,
3,
4], and later systematized in comprehensive surveys focusing on agricultural and food markets [
5]. These studies suggest that asymmetric price responses may stem from market power, adjustment costs, inventory management strategies, or informational frictions along the value chain. More recent contributions have extended the analysis to increasingly complex and globalized food systems. Empirical evidence of persistent asymmetric price dynamics has been found in European food markets [
6], as well as in sector-specific contexts such as aquaculture, where structural changes in the supply chain may amplify non-linear price adjustments [
7]. Taken together, this literature indicates that asymmetric price transmission represents a structural feature of many agrifood markets, with significant implications for producer welfare, consumer prices, and policy design.
Following a current trend of mixing new methodological contributions with local studies [
8,
9], in this paper we present an integrated mathematical approach to analyze and quantify the price increases generated by the distribution chain by comparing the behavior of the marketing margin for different products using mathematical and artificial intelligence (AI) tools for data analysis. Our aim is to develop an analytical tool that supports farm management, particularly regarding the flexibility to choose among different products and to substitute one product for another for strategic reasons. By analyzing the time series of producer shares, we can identify similarities between products and determine which ones behave alike. In this way, farmers are informed about alternative products with comparable marketing characteristics, should they need to switch the specific crop they cultivate. A concrete interpretation of how cluster membership can support crop substitution and strategic farm decisions is provided at the end of this section and in
Section 4.
The index on which we focus our attention, and which is a well-recognized measure of the relative marketing margin, is the so-called producer share: the ratio between the price at which farmers sell their products and the final price paid by consumers for those same products. Being a relative value, it clearly reflects how far the market price of vegetables is from the amount producers actually receive. Typically, a high producer share is interpreted as an indicator of a well-balanced market, although what “high” means depends on the specific context: transportation, logistics, and marketing naturally entail costs.
On the other hand, mathematical modeling has become a powerful tool for addressing complex management problems in agricultural planning, particularly those related to economic, market, and resource foresight [
10,
11,
12]. In this context, the present paper seeks to contribute to this body of technical tools in order to support decision-making within these tasks. However, far from being an abstract investigation, we used direct data from Spanish wholesale markets to determine the validity of the model, as well as to draw some general conclusions about the state of these markets in the country. Beyond reporting concrete figures for these ratios, we use the time series of relative price ratios to define classes of similarity among certain horticultural products, showing global trends as well as subgroups of vegetables with similar behavior. Our goal is to identify each horticultural product with a vector representing the value of the time series of producer shares and to group products according to the trend they have shown over recent years. Since we have chosen the values for a fixed month of the year (this is done in the first part of our analysis,) the behavior should be similar, or at least it can be assumed that market distortions due, for example, to weather conditions, have affected all products in the same way. In the second part of the study, however, we consider the entire historical dataset that we compiled from publicly available online sources in order to define the vectors representing each orchard product.
The calculated grouping provides us with a description of the intrinsic value of one of the main market properties of the vegetables observed, as well as their behavior over the years, which can result in a solid and easy-to-interpret similarity relationship that can help farmers change the product they grow in order to cope with unforeseen circumstances or simply to become more competitive. The results could be useful in themselves for a particular group of field managers, but they also provide scientific evidence of how products perform in national marketing, offering a reproducible methodology for researchers and managers.
This paper is primarily devoted to improving the application of mathematical tools for managing small-scale agricultural productive farms. While the mathematical techniques employed are well-established, our aim is to propose a unified methodology that integrates data management, data analysis, mathematical modeling of relevant agricultural variables, statistical and AI procedures, and comprehensive representation of results. Rather than enhancing a particular method, we present a holistic mathematical/AI framework, aligning with contemporary trends in applied mathematics that emphasize integrated tools combining data management, mathematics, statistics, AI, and easily interpretable graphics to provide complete and rapid insights into complex problems. This methodological contribution represents the core objective of our work.
1.1. Context and Related Literature
As explained above, the divergence between agricultural (origin) and retail (destination) prices, often referred to as the price differential between farms and retailers, and its complementary quantity, the producer share, are fundamental parameters for the agrifood economy and policy. An initial discussion of asymmetric price transmission has been presented in the first section of the Introduction. In developed countries, public statistics agencies routinely monitor these indicators to track how marketing, transportation, processing, storage, and retail costs influence the final prices paid by consumers. For example, the United States Economic Research Service provides long-range series and methodological documentation on farm-to-retail spreads and farm shares for a wide range of foods [
13]. The European Union also provides coverage for vegetable markets, including indicators of production, trade, and prices that are relevant to margins between production and retail and farm shares in horticulture [
14,
15].
However, the scientific and technical literature on price transmission in this area is often incomplete and sometimes fails to adequately explain its complexity. But this is a fundamental issue for agricultural development, so a great effort has been made to provide a theoretical framework and well-founded practical tools to aid in the management of agrifood systems [
5,
16]. In the European context, technical reports and applied studies are systematically produced to analyze how transmission occurs along specific supply chains (e.g., meat, vegetables, fruit), highlighting and updating lists of relevant indices on the performance of agriculture in countries at different stages of the chain [
17,
18].
International institutions such as the FAO also provide complementary international information, connecting local transmission with global markets [
19,
20]. Case studies at the national/sectoral level (e.g., the sheep sector in Spain) show the complexity of the food supply chain, where wholesale and retail prices [
21] are often not linearly connected. Recent reports on specific production sectors also show how the new international situation (including logistical factors, retail concentration, and sustainability requirements) has strong implications for margins and transmission dynamics [
22]. Finally, systematic reviews conducted in the wake of recent global crises also highlight the vulnerability of supply chains for perishable goods [
23].
In this context, our analysis focuses on horticultural products and analyzes time series of producer shares to compare the dynamics of different items. We use measures based on correlation (similarity of trends), Euclidean distances (proximity of level/shape), and clustering techniques to organize products into equivalence classes. In this way, we provide an interpretable map of items that show comparable price trajectories. Our goal is to provide a practical tool for understanding where the distribution tends to widen (or compress), with implications for producer welfare and the targeting of policies addressing unfair trade practices. We also relate our approach to recent data specific to fruit and vegetable supply chains: studies on channel selection and market performance for vegetables [
24], costs and prices in global fruit and vegetable value chains [
25], and micro-level case studies of cost transmission between nodes for horticultural products (e.g., carrots and leeks) [
26]. Our cluster-based comparisons follow current methodological trends in price linkages and transaction costs within sustainable food value chains [
27], and are aligned with EU statistics and monitoring frameworks on fruit and vegetable markets [
14,
15].
1.2. Main Technical Tools
Let us introduce the main technical parameters to be used in this paper. The first index to be considered is the farm-to-retail price spread (also called the marketing margin), which measures how much the retail (destination) price exceeds the farm (origin) price. Other directly related indices are also relevant. Let us show how they are defined.
The marketing margin at time
t is given by
where
is the destination price and
is the origin (farm) price.
In relative terms (percentage increase), we define
Analysts also make use of the log spread,
The spread reflects numerous operations along the value chain, such as transport, marketing, handling, storage, losses, processing, taxes, insurance, and commercial margins. Its complementary index is the producer share, defined below, which is in fact the index we analyze in this article.
Indeed, in the design of mathematical tools for analyzing agrifood markets, the producer share (agricultural share) is a key indicator for understanding the distribution of economic value throughout the supply chain. It is defined as the ratio between the price received by the producer on the farm,
(which, depending on the context, corresponds to the farm price
used above), and the price paid at the end of the marketing chain,
(the retail price
mentioned above). This ratio provides a direct measure of the proportion of consumer expenditure that remains with the primary producer [
28].
Thus, the producer share at a time
is given by
The interpretation of the producer share is straightforward. A high value is interpreted as an indicator of a balanced and equitable value chain, where farmers receive adequate compensation for their production costs and risks. However, if the share is low, it means that a significant portion of the final value is being absorbed by intermediaries, such as processors, distributors, and retailers. Although this may be due to justifiable causes, this fact is often interpreted as an imbalance of power that causes market inefficiencies [
1].
1.3. Experimental Data on Price Transmission in the Supply Chain
As mentioned above, we illustrate our farm management model through a contextual case study, working with the time series of the major Spanish wholesale market, using the data published by COAG (see the website listed in the References section [
29], see also [
30]). Empirical studies in various specific contexts of goods distribution (both in Spain [
21] and internationally [
26]) have shown that retail prices adjust more quickly to increases in costs at source than to decreases, given the greater flexibility of the factors affecting this part of the chain. This causes producer participation to be volatile and tend to deteriorate over time, and is the main factor in the well-known vulnerability of producers within conventional market structures [
1,
13,
16]. Thus, the empirical analysis of producer share is related to the analysis of price transmission along the supply chain. Standard models use mathematical methodologies in which time series play a central role in investigating how consumer price fluctuations are transmitted to the producer. Conclusions based on experimental data often indicate a strong asymmetry in profits, which, although adjusted at the retail level, are not usually adjusted for the producer. As noted in numerous studies [
23], this situation, which is normal in most well-established markets, has a direct impact on the profitability of farms.
Therefore, monitoring producer share over time using time series analysis is a fundamental tool for all actors who can influence the process, such as farmers’ associations, policymakers, and scientific researchers, to identify structural problems in agrifood systems. This information can then be used to assess the effectiveness of policies aimed at creating fairer conditions for primary producers and to highlight significant asymmetries in order to promote such policies.
To finish this section, let us remark that the core objective of this study is to provide a structured framework that translates complex price dynamics into actionable market intelligence. By clustering products according to their producer share trajectories, the analysis moves beyond isolated observations and identifies stable behavioral patterns along the agrifood value chain. The first advantage for farm managers is that they can easily use this information to substitute products that are similar (i.e., belong to the same cluster) if needed, under the assumption that the producer share is a critical parameter for representing the market properties of a given product. This value summarizes, in a single number, key information about the characteristics of a product’s commercialization, which, together with the price at a given moment, constitutes the minimum information required for strategic decision-making.
On the other hand, the results can also be used for targeted monitoring. Once a product is assigned to a specific dynamic group, deviations from the group’s expected trajectory can be interpreted as early warning signals. These signals provide a technical basis for collective bargaining processes and support the design of evidence-based agricultural policies.
The integrated methodological approach proposed in this paper is summarized in the flow diagram below.
2. Methodology
Although many factors may influence the problem under study, we focus on two complementary ways of analyzing the time evolution of the producer share. The first approach examines the behavior of products in a fixed month (September) in the yearly series, comparing how similarly selected products behave in the same month across different years. The second approach considers each product as a whole, using all available time series data across all recorded months and years.
The rationale for selecting September for the initial stage of the analysis is twofold. First, it allows for a synchronized comparison across products by minimizing seasonal variance, as September represents a critical transition point in Spanish crop cycles where a wide range of products coexist. Second, this choice serves a pragmatic purpose: demonstrating that a single-month snapshot can yield clusters consistent with those derived from the full historical series. This validates a simplified monitoring tool for stakeholders and farmers, showing that conclusions drawn from a representative month can provide a reliable approximation of long-term market share dynamics without requiring the computational complexity of complete time-series processing.
Similarly, there are two levels of analysis regarding the type of products considered. In the first case, all products in the dataset for which sufficient information is available are included, such as olive oil, fruits, and vegetables. In the second case, the analysis is restricted to products typically produced in orchards, which is the primary context of this study. We primarily follow this orchard-focused approach, although data for all products are provided in the
Appendix A.
From a mathematical perspective, our methodology is divided into two parts. First, each product’s time series is represented as a vector in a Euclidean space, and correlations and norm distances among products are analyzed. This provides a one-to-many comparison tool: for a given product, we identify those that behave similarly in terms of correlations and distances between their coordinate values. Second, clustering techniques are applied to group products by similarity, offering information on sets of products that may be interchangeable within the same cluster while preserving their market characteristics.
Finally, two types of similarity information are considered. Correlations describe how coordinated the variations in producer share are between two products, independently of their absolute values. Distances, on the other hand, indicate how far the mean producer share of one product is from that of another. We explain this in clear terms in the next subsection.
2.1. Theoretical Background
Research on time-series clustering consistently shows that results depend strongly on three design choices: how the series are represented, which similarity notion is adopted, and how groups are constructed from that similarity information [
31,
32]. This is particularly relevant for economic time series, where analysts often need to preserve two distinct aspects of the signal: (i) whether two products exhibit coordinated temporal variations (co-movement), and (ii) whether their producer-share levels are comparable in economically meaningful terms.
Our approach adopts a deliberately simple, two-view perspective. One view captures synchronized dynamics across time (a pattern-oriented notion of similarity), and the other view captures differences in magnitude across the observation horizon (a level-sensitive notion of dissimilarity). These complementary views are then combined through clustering to obtain a small number of product families with homogeneous producer-share behavior. This aligns with the general recommendation, emphasized in the time-series clustering literature, that complementary criteria may be preferable to a single universal distance when different invariances are relevant [
32].
It is worth noting that alternative similarity paradigms exist in time-series analysis. Dynamic Time Warping (DTW) is designed to handle temporal misalignment by allowing non-linear re-timing [
33]. Likewise, correlation-based constructions have been widely used to build interpretable taxonomies in other domains [
34], and correlation-normalized shape-based clustering has also been proposed for scalable time-series grouping [
35]. Recent methodological advances have enriched the mathematical toolkit for time series comparison. Representation learning approaches now employ self-supervised contrastive frameworks to capture temporal dependencies without explicit supervision [
36]. Shapelet-based methods identify discriminative subsequences that characterize different cluster structures [
37]. Furthermore, kernel methods have been extended to time series through the Global Alignment Kernel, which combines dynamic programming with kernel theory to enable flexible similarity assessments [
38]. Probabilistic model-based clustering using hidden Markov models and Gaussian mixture models provides statistically principled frameworks for temporal grouping [
39]. We refer to these lines of work only to highlight a methodological point: the definition of similarity must match the application. In our context, producer-share levels are economically informative, so level-preserving comparisons remain central, while co-movement information is treated as a complementary diagnostic rather than a replacement.
While the literature documents increasingly sophisticated techniques for time series clustering [
31,
40], there is no universally optimal approach. Instead, clustering performance depends critically on three already mentioned interrelated design choices: the representation of temporal data, the definition of similarity between series, and the algorithm used to form groups [
31,
41]. Modern developments demonstrate the breadth of these choices. Dictionary learning methods decompose time series into interpretable building blocks that facilitate both compression and clustering [
42]. Tensor-based approaches leverage multi-way data structures to capture complex temporal patterns across multiple dimensions simultaneously [
43]. Graph neural networks have been adapted to model temporal dependencies through learnable adjacency matrices that encode relationships between time points [
44]. Additionally, ensemble methods that combine multiple distance measures or clustering algorithms have proven effective in handling the inherent diversity of temporal patterns [
45]. In this sense, simple approaches that separate the comparison of temporal patterns and the comparison of levels can be as informative as more complex methods, especially when each criterion captures an economically relevant aspect of the phenomenon under analysis. This idea coincides with evidence that combining complementary criteria is often more appropriate than resorting to a single sophisticated measure when searching for different types of invariance [
31].
The next subsections introduce the mathematical objects and notation used in the paper, formalize the two complementary similarity notions, and describe the clustering procedure used to derive product families.
2.2. Objects of Analysis and Notation
Let
be a finite set of horticultural products (e.g., lettuce, carrot), and let
be a set of consecutive calendar years. We fix a month
(e.g., September) and extract for each product
and year
two observed price levels,
As explained above, with this notation we define the producer share (PS) as the ratio of origin to destination prices in the selected month,
For each product
i, we form the time–indexed vector
which summarizes the interannual dynamics of the producer share at the fixed month
. In the second part of our analysis, all producer shares for all the months of the entire time series are considered to represent the products, and so the representation is provided by vectors in
When necessary, we have applied a stabilizing transformation for our internal calculations and analyses, specifically the logarithmic transformation
and standardize across
t (z–scores) to isolate pure temporal patterns from level effects,
Log transformations and z-scoring were omitted in the final analysis because all vectors consist of ratios bounded between 0 and 1, and preserving their original scale allows the Euclidean distance to directly reflect meaningful differences in producer shares, which is essential for interpretability and practical decision support.
Thus, although alternative approaches have also been considered, for correlation-based analyses we have used
For distance-based similarity, we have computed the Euclidean distance directly between the representing vectors. Euclidean distance provides a meaningful measure of dissimilarity for ratio-based vectors, allowing the analysis to account for both the overall level and relative distribution of producer shares. Let us explain this below.
2.3. Two Complementary Notions of Similarity
The correlation and distance measures presented in this section serve a fundamentally different purpose than in classical statistical inference. We are not testing hypotheses about relationships between variables, nor do we require that correlations be “statistically significant” in the conventional sense. Rather, we use these measures as metric construction tools to define a geometric space in which products can be compared and grouped. In this framework, modest or near-zero correlations between certain products are not a weakness—they indicate genuine differences in temporal behavior that enable meaningful clustering. Our goal is to provide a complete metric structure that practitioners can use to identify substitutable products, not to establish statistically significant predictive relationships.
Thus, based on the mathematical elements described above, we have considered the following two types of similarity relationships.
- (i)
Trend similarity (Pearson correlation). For products
, the Pearson correlation of their standardized vectors is
As a dissimilarity measure derived from correlation, we have used the correlation distance to confirm certain arguments and conclusions.
which is a proper metric whenever
is a cosine similarity in
(here, after z–scoring).
- (ii)
Level–and–shape proximity (Euclidean distance). To capture differences in magnitude and shape over time, we compute the Euclidean distance
If the focus is on shape only, we replace with in the formula above. However, for all our final analyses, we have opted to compute the Euclidean distance directly.
The choice of Pearson correlation and Euclidean distance as interpretation tools over more sophisticated methods such as Dynamic Time Warping (DTW) or deep learning approaches is deliberate and grounded in the specific characteristics of our agricultural price data. While DTW has proven effective for handling temporal misalignments in time series [
31], and recent deep learning methods have shown remarkable success in capturing complex patterns in financial time series [
40,
41], these approaches are most beneficial when dealing with irregular sampling, phase shifts, or highly non-linear dynamics. In contrast, our producer-share data exhibit relatively stable seasonal patterns with synchronized monthly observations across all products. The agricultural pricing mechanisms in wholesale markets operate under common external influences (weather, transport costs, regulatory frameworks), which tend to synchronize rather than desynchronize temporal responses across products. Under these conditions, correlation-based measures efficiently capture the essential co-movement structure without the computational overhead and potential overfitting risks associated with more complex methods.
Moreover, the interpretability of our chosen methods aligns with the practical decision-making context of farm management. Pearson correlation provides an intuitive measure of trend alignment that farmers and agricultural advisors can readily understand and act upon, while Euclidean distance offers a transparent measure of absolute differences in producer-share levels. Deep learning methods, while powerful for prediction tasks with large datasets [
40], require substantial training data and computational resources, and often sacrifice interpretability—a critical requirement when the goal is to provide actionable strategic guidance to small-scale producers. Our approach thus prioritizes methodological transparency and operational simplicity over algorithmic sophistication, ensuring that the clustering results can be directly integrated into farm-level planning decisions without requiring specialized expertise or infrastructure.
2.4. Clustering and Equivalence Classes of Products
We induce equivalence classes of products via clustering on a chosen distance matrix . In our case, we used the Euclidean distance matrix defined by the Euclidean distances between the vectors . Using standard clustering algorithms in R (kmeans, prcomp), we compute the corresponding clusters. These clusters are visualized in two-dimensional plots derived from the PCA results, showing the first and second principal components of the vectors.
The result is a partition
of products into equivalence classes
with similar dynamics. Comparisons between the Euclidean-based partitions obtained for a fixed month (September) and those based on the entire time series reveal stable families of products that exhibit similar behavior in both representations.
2.5. Interpreting Clusters in Terms of Producer Favorability
Recall that the producer share summarizes the relative incidence of origin versus destination prices. For any cluster
and year
t, we define within–cluster statistics
and cluster–level summaries across time
Thus, clusters with high and low indicate product families that tend to be more favorable to producers in a stable manner, independently of absolute production costs. This complements trend alignment captured by correlation and the direct measurement of Euclidean distance. Since these mathematical elements have already been used to verify the validity of the clustering that was finally adopted (five groups), the associated detailed calculations are not shown in the Discussion section, where arguments are instead based on simpler numerical values.
2.6. Clustering of Producer-Share Vectors
K-means clustering is applied directly to the producer-share vectors () to divide the selected orchard products into equivalence classes. Our goal is to identify sets of products that exhibit similar producer-share behavior along the September time series. Principal Component Analysis (PCA) is used separately to report the dimensionality of the problem and to provide a two-dimensional visualization of the vectors. The optimal number of clusters for the k-means analysis is determined using the elbow method applied directly to the k-means results.
2.7. Robustness, Missing Data, and Sensitivity
When some
are missing, pairwise statistics use listwise availability. For distances, we can compute for example
where
is the set of years available for both
i and
j; the multiplicative factor re–scales to the full horizon. In some cases, we have opted to preserve missing values, leaving the corresponding correlation or distance entries empty in the final matrix (see the heatmaps in Appendices
Appendix A and
Appendix B), as explained in the next subsection.
To separate pattern from level, correlation uses standardized
. Euclidean distances can be applied to
(level–sensitive) or
(shape–sensitive). Some methods can be used to ensure clustering stability, for example by bootstrapping years and recomputing the partition to obtain an adjusted Rand index (ARI) between runs, which allows quantifying robustness. However, due to the strong agreement between the fixed-month-based clustering and the whole-time-series clustering, we have decided to accept the resulting partition as explained in
Section 4.
2.8. Data Preparation
The original dataset provided on the website [
29] was formatted for processing in R as a CSV file containing all the vectors for each product in the dataset, labelled by year and month. The product names were normalised, since they appeared in the dataset under different labels.
Products with only a few recorded values were removed. For products with only some missing data (at most three missing entries in the complete file), the label NA was preserved and the calculations were carried out under this restriction, so some results may still appear with this label.
Data preprocessing involved a selective imputation strategy to ensure the continuity of the time series. Products with extensive gaps were excluded to maintain the robustness of the dataset. For products with only some missing data (at most three missing entries in the complete file), the label NA was preserved and the calculations were carried out under this restriction, so some results may still appear with this label. For the remaining series, missing values were addressed by applying the annual mean of the product for broader gaps or nearest-neighbor imputation (right position if available) for isolated missing points where a numerical value was required. Although these techniques are standard for maintaining the structural integrity of agrifood series, we acknowledge that they may slightly smooth out extreme volatility. This approach represents a trade-off between data completeness and the preservation of original market signals, and its potential impact on clustering should be considered a limitation of the study.
When the number of missing values was small and the corresponding value was required to proceed with the calculations, the mean of the remaining elements in the row was used. The original data included producer prices, destination prices, and other marketing information. We used the first two values to compute the producer share for each month in the time series, and these monthly producer share values constitute the coordinates of our vectors.
The final outcome was a homogeneous dataset in terms of format, ready to be used for the analyses described above.
3. Results
We present the results in two separate sections. The first corresponds to the fixed-month analysis, while the second considers the whole-year description of the products.
3.1. Fixed Month Time Series
We fix the month of September to illustrate the implementation of the method and the resulting analysis. For methodological reasons, and given the objective of this work, using the whole set of products appearing in the dataset to define equivalence classes does not make sense, since products that belong to the same class must be, in some sense, interchangeable for our tool to be useful. For example, a Carrot field cannot be interchanged with a field of Lemon trees. On the other hand, for some products the number of missing values is so high that the resulting information is not solid enough to draw reliable inferences. As explained in
Section 2, this is why we have decided to work only with orchard-cultivable products, specifically the selection shown in the tables below (
Table 1 and
Table 2). However, the numerical data for a similar analysis of the whole set (with a large number of missing values) can also be found at the end of the paper, in the Appendices
Appendix A and
Appendix B.
After preparing the dataset, we first focus on the calculation of the producer share. Missing values were imputed by forward filling with the next available value in the row, or by using the mean value when only a few data points were missing.
Table 1 and
Table 2 show the values of the selected products. The reader can already observe some similarities among the rows of the matrices, which represent the products listed in the first column. For example, Watermelon and Melon exhibit similar behavior, while the time series of Tomato and Carrot are clearly different.
3.1.1. Pearson Correlation
In the next step, we compute first the (Pearson) correlation matrix and, in the next subsection, the distance matrix for the vectors representing each orchard product in the initial selection. The results are displayed later as heatmaps for the reader’s convenience, where examining the column corresponding to a given product reveals its similarity to the others. The full information is provided in the correlation and distance matrices. The former shows the trends of each product regarding increases or decreases in the producer share, indicating the extent to which they coincide with those of the others. The Euclidean distance matrix represents the proximity between products by comparing the absolute values of their producer shares.
Table 3 and
Table 4 provide the values of the Pearson correlation between the orchard products. If the farmer wants to substitute any of these products, a look at the corresponding row in the matrix (or at the heatmap in
Figure 1) gives an idea of the alternative options that can be initially considered or, if the goal is to obtain a better producer share, of the products that are not going to increase the value of this index.
From the Pearson correlation matrix, we observe that there are no exceptionally high correlations (positive or negative) between the products, but certain trends are noticeable. For instance, the correlation between Potato and Onion is 0.5250, which indicates a moderate positive relationship. This suggests that these two products share similar trends in their producer shares. This alignment may imply that, under certain conditions, these two products could be considered complementary in agricultural decisions.
Another interesting relationship is the moderate positive correlation between Watermelon and Cabbage (0.4911), as well as between Watermelon and Tomato (0.4306). These correlations suggest that these products may exhibit similar patterns of production or marketing during the September selection period. Consequently, if a farmer is considering substituting one product for another, these crops might represent viable alternatives, especially if their producer shares align well in the market.
In contrast, some products show negative correlations, indicating antagonistic relationships. For example, the correlation between Melon and Potato is −0.0944, and the correlation between Melon and Onion is −0.1177. These negative values suggest that these products tend to move in opposite directions in terms of market share, and substituting one for the other may not be effective in terms of improving overall producer share. However, this can provide a valuable tool for crop rotation.
3.1.2. Euclidean Distance
A different type of similarity is explored in the next step. The Euclidean distance matrix shown in
Table 5 and
Table 6, together with the visualization provided in the heatmap of
Figure 2, give a clear picture of the distances between the orchard products. Although this information does not necessarily coincide with that of the Pearson correlation, both reinforce each other when a similarity relation is detected.
A smaller distance value indicates greater similarity. For instance, the distance between Watermelon and Cabbage is 0.3300, which is relatively small, indicating that these products are close in terms of producer share. Similarly, the distance between Tomato and Watermelon is 0.3667, reinforcing the idea of a moderate positive relation between these two products. On the other hand, products like Eggplant and Potato exhibit a large distance of 0.9188, suggesting that their producer share trends are significantly different. Such products are likely antagonistic in nature, making substitution a less suitable option.
The distance matrix also shows the relevant difference between Cucumber and Carrot, with a distance value of 0.6893, indicating a notable dissimilarity in their producer share patterns. This suggests that, despite some possible commonalities, these two products may not serve as effective substitutes in the same market context.
Thus, the correlation and distance matrices reveal clear relationships among the selected horticultural products. Watermelon, Cabbage, and Tomato show similar producer share patterns and may be considered potential substitutes, while Potato and Onion present a moderate positive association. In contrast, pairs such as Melon–Potato and Eggplant–Potato display antagonistic behavior, making them less suitable for substitution. In the next
Section 4, a full interpretation will be given.
3.1.3. Clustering and PCA
We now address the second part of the analysis, which applies Principal Component Analysis (PCA) followed by k-means clustering to divide the selected orchard products into equivalence classes, using the producer share as the grouping criterion. Our goal is to identify sets of products that display similar producer share behavior along the September time series. The PCA transformation is first used to determine the appropriate number of clusters, and the elbow method is then applied to the transformed data to obtain this optimal value.
Figure 3 presents the results of this procedure.
Table 7 shows description of the parameters of the PCA process, including the cumulative proportion of explained variance (PCAs 1–8, 96% of cumulative proportion of explained variance). The optimal clustering with five groups is shown in
Figure 4. The elements of each of the groups are given in
Table 8.
We now address the second part of the analysis, which applies k-means clustering directly to the producer-share vectors to divide the selected orchard products into equivalence classes. Our goal is to identify sets of products that display similar producer-share behavior along the September time series. Principal Component Analysis (PCA) is used separately only to inform about the dimensionality of the problem and to provide a two-dimensional representation of the data.
Table 7 summarizes the parameters of the PCA process, including the cumulative proportion of explained variance (PCs 1–8 account for 96% of the total variance). The optimal clustering with five groups, determined from the k-means procedure, is shown in
Figure 4, and the elements of each group are listed in
Table 8.
The clustering results are consistent with the patterns provided by the correlation and Euclidean distance matrices. Indeed, products grouped within the same cluster tend to show higher correlations and shorter distances, indicating similar producer share dynamics. For instance, Watermelon, Cabbage, Tomato, and Lettuce, grouped in Cluster 2 and Cluster 4, were previously shown to have moderate positive correlations and relatively small distances, suggesting comparable market behavior. Similarly, Potato and Onion, grouped together in Cluster 3, reflect the strong positive association observed in the correlation analysis. In contrast, products such as Carrot, isolated in Cluster 5, or Eggplant in Cluster 4, exhibit larger distances and weaker or negative correlations with other products, reinforcing their distinct and more antagonistic behavior. Overall, the clustering structure provides a clear synthesis of the similarity and dissimilarity relationships previously highlighted by the correlation and distance analyses.
3.2. Complete Time Series Data: All Years and Months
Now, we turn to the task of identifying groups of products based on the similarity of their producer share patterns across the whole-yearly time series. Recall that each product is represented as a vector, where each coordinate corresponds to the producer share value for a given month over all the years in the time series. To avoid repetition, and given that the correlation and distance analysis for the fixed-month series aligns well with the clustering results presented in the previous subsection, we proceed directly to the clustering procedure. The elbow method suggests that again the optimal number of clusters to consider is five (see
Figure 5). As in the previous case, this approach is applied to the dataset focusing specifically on orchard-related products.
A visual inspection of the representation suggests that a clustering into five groups strikes a balance between maximizing the mathematical gain (variance reduction) and minimizing the complexity of the results, facilitating their interpretation in the next steps. As can be seen in
Figure 6, the results are similar to the ones obtained for the September time series; we use a different layout to highlight the difference with the results obtained for this case. For example, there is a group formed by Onion and Potato, and Carrot appears as an isolated product too. We discuss the results in
Section 4.
4. Discussion
In this section we describe the main characteristics of each cluster obtained in our analysis and discuss their implications in terms of practical decision-support for farmers, in line with the objective stated in the Introduction. In practical terms, the information provided by clusters allows farmers to identify alternative products whose producer participation dynamics have historically been similar to those of their current crop. Given a reference product, the farmer can locate its cluster and examine the products that comprise it as viable candidates for substitution, under the assumption that these products will share comparable patterns of price transmission, margin stability, and exposure to the market power of intermediaries. In this way, the substitution decision is not based solely on agronomic yields or spot prices, but on a structural characterization of the producer’s position within the value chain, which reduces the risk associated with crop changes motivated by adverse conditions or strategic changes.
In addition, cluster-level summaries—in particular, the average and temporal variability of the producer share—provide an operational criterion for assessing the relative favorability of each group. Clusters with high average values and low variance identify product families that systematically offer a more stable and favorable share to the producer, regardless of short-term fluctuations. In this sense, the proposed procedure acts as a decision support tool that allows prioritizing substitutions towards products with historically more resilient and balanced profiles, complementing traditional price and cost information with a dynamic dimension of value distribution that is directly relevant to production planning.
The models for a fixed month (September) and for the complete series of records of the year show similar results, which supports the stability of the strategic information that can be obtained and highlights the coordinated behavior of some product groups. The ratios of origin price to destination price for 2024 are reported in
Table 9, while the mean ratios per product and month (2009–2024) are shown in
Table 10. These tables provide the basis for understanding the variations and patterns across products and time, and are key to the interpretation of the results presented in the previous section. The observation of the values in the tables allows us to understand the characteristics of each product with respect to its behavior in the two time series studied in
Section 3. Furthermore, a careful examination of these numerical data reveals the underlying economic dynamics that justify the clustering patterns obtained, offering practical insights for agricultural planning and market strategy.
Regarding the correlation and absolute distance among the producer shares for orchard products, clustering provides clear evidence of coordinated behavior in certain groups. Using both the September dataset and the whole-year time series, optimal clustering separates the products into multiple groups. Although the membership of each group does not fully coincide across both analyses, we focus on the groups that largely overlap for clarity (see
Table 10). Relative coincidence of the results of the two clustering processes enhances the conclusions, and shows how our methodology can help the decision makers of the farms in strategic design.
Table 9 and
Table 10 are intended to be directly used for this purpose by the farm managers, together with the elements and descriptions of the computed clusters.
The general interpretation of the clustering results is as follows.
- (i)
Potato and onion consistently appear together (Group 3 in September and Group 3 in the whole-year analysis).
- (ii)
Carrot remains separate in both the September and whole-year datasets.
- (iii)
The central group is divided into two subgroups, depending on the time series used. In the case of the September series, we get (1) Cabbage, Lettuce, Watermelon, Chard, Melon, Green pepper; (2) Broccoli, Cucumber, Red pepper. For the whole-year series, (1) Cabbage, Lettuce, Broccoli, Watermelon, Chard, Melon; (2) Zucchini, Cucumber, Eggplant. The second subgroup is more distant and does not coincide with the September grouping, which must be taken into account if the strategic decision involves these products, meaning that the grouping would not be so clear throughout the whole-year time series.
- (iv)
September Group 4 is less well-defined (Tomato, Cauliflower, Zucchini, Eggplant), and contains elements from whole-year Group 3.
- (v)
In the whole-year analysis, Red pepper and Green pepper form Group 4 with Tomato and Cauliflower.
More concretely, and summarizing all the information, the clusters derived from both the September and whole-year time series are described below in terms of the particular characteristics of the products involved.
- 1.
September Groups 1–4, Yearly Group 2: Zucchini, Eggplant, and Cucumber. These products are mainly produced in summer and early October, with low ratios before production (around 0.18 in April–June), slightly higher during production (0.25 in July–October), and relatively high outside these months (0.4 in November–February), indicating that distribution costs capture most of the monetary gain.
- 2.
September Group 2, Yearly Group 5: Cabbage, Melon, Watermelon, Chard, Broccoli. This group includes both winter (Cabbage, Chard, Broccoli) and summer (Melon, Watermelon) products, yet the producer share remains stable (around 0.22) throughout the year with minor variations.
- 3.
September Group 3, Yearly Group 3: Onion and Potato, exhibiting low and stable producer shares (around 0.2), similar to the previous group.
- 4.
September Group 5, Yearly Group 1: Carrot, showing a singular behavior with a high and stable average producer share (around 0.3) throughout the year.
- 5.
(September Groups 1–4, Yearly Group 4: Tomato, Cauliflower, Red pepper, and Green pepper. This group is scattered, showing high producer shares throughout the year without clear trends, effectively grouping products that cannot be classified elsewhere.
The clustering patterns observed reflect underlying structural characteristics of the Spanish agrifood distribution system rather than mere statistical artifacts. Products grouped together tend to share similar commercialization channels, storage requirements, perishability profiles, and market concentration levels, all of which directly influence the marketing margins captured by intermediaries. For instance, the stable low producer shares observed in the cluster describe in item 3 (Onion and Potato) likely reflect the presence of well-established wholesale networks with significant economies of scale in storage and distribution. These products benefit from extended shelf life and can be stored in bulk for months, allowing intermediaries to accumulate market power through strategic inventory management and temporal arbitrage. While this reduces unit logistics costs, it also consolidates intermediary control over price formation, compressing farm-gate prices even as retail prices remain relatively stable. The persistence of low producer shares in this cluster suggests structural barriers to direct marketing, possibly reinforced by standardization requirements and quality grading systems that favor large-scale operators over individual farmers. The same argument applies to other groups with similarly low producer shares.
Conversely, Carrot’s isolation in the cluster described in item 4 with consistently higher producer shares (averaging 0.30) may indicate either more direct marketing channels—such as cooperative structures, regional supply contracts with supermarket chains, or participation in quality certification schemes—or intrinsic product characteristics that reduce intermediation costs. Carrots require less sophisticated cold-chain infrastructure than highly perishable products, can be marketed in various presentations (fresh, bagged, pre-cut), and face relatively inelastic consumer demand throughout the year. This combination of factors may enable producers to capture a larger share of the final price, either through reduced distribution margins or through stronger bargaining positions in supply negotiations. The seasonal products in the cluster given in item 1 (Zucchini, Eggplant, Cucumber) exhibit pronounced volatility patterns consistent with supply-demand imbalances across the calendar year. These products show notably high producer shares during winter months when local production is minimal (Zucchini: 0.44 in January, 0.35 in November-December; Cucumber: 0.38–0.41 in January–February; Eggplant: 0.41 in January and December), but this drops substantially to 0.16–0.19 during the April–June period when production intensifies and market supply increases. During peak harvest months (July–October), producer shares recover partially to around 0.23–0.29, indicating complex dynamics where distribution margins widen during off-season scarcity (requiring imports and cold-chain infrastructure), compress during peak local supply, and then expand again as the season ends. This pattern suggests that retail prices remain relatively stable throughout the year while origin prices fluctuate more dramatically in response to local supply availability, with intermediaries capturing higher margins precisely when producers face greatest competitive pressure during harvest peaks.
The intermediate group (cluster in item 2: Cabbage, Melon, Watermelon, Chard, Broccoli) combines winter and summer products with moderate, stable producer shares around 0.22, suggesting a degree of market maturity where distribution costs are relatively predictable and intermediaries operate under moderate competition. This stability may also reflect the existence of medium-term supply contracts between producer organizations and retail chains, which smooth price volatility and provide some protection to farmers against spot-market fluctuations. Finally, the dispersed cluster of item 5 (Tomato, Cauliflower, Red pepper, Green pepper) groups products that resist clear classification, possibly due to the coexistence of multiple commercialization circuits—ranging from local markets and direct sales to export-oriented supply chains—that generate heterogeneous producer-share dynamics. Understanding these structural drivers is essential for designing targeted policy interventions: clusters with persistently low producer shares may benefit from measures promoting shorter supply chains, collective bargaining mechanisms, or cooperative marketing structures, while products with stable high shares suggest that existing market structures are relatively efficient and require less regulatory attention. Moreover, the identification of these clusters provides farmers with actionable intelligence for crop substitution decisions, allowing them to navigate toward products or market segments where producer welfare is better protected.
Overall, the analysis confirms that certain products exhibit coordinated patterns while others display unique behavior. The stability of the clusters across different temporal resolutions (September vs. whole year) suggests that these similarity relationships primarily reflect persistent patterns in producer share trajectories, rather than being driven solely by seasonal effects. While seasonal fluctuations are clearly visible in the time series and are important for characterizing individual products, they are intrinsically part of the series and do not dominate the clustering results. Thus, the results provide a clear, interpretable framework that can aid farmers in strategic decision-making, particularly in identifying substitutable products and managing production to maximize producer share.
5. Conclusions
We have presented a methodology for analyzing price dynamics in a national agrifood value chain, focusing on the producer share as the main indicator. It highlights the well-known trend that retail distribution generates a persistent gap between the price paid to the producer and the final consumer price. The average producer share for most vegetables analyzed is consistently low (often below 0.3), underscoring the limited portion of the final price reaching the farmer. As a smaller producer share reduces the possibility of fair compensation and farm viability, our model provides a technical tool to integrate this information into strategic decision-making.
5.1. Methodological Contributions
As an applied work, the methodology aims to be deliberately simple to facilitate direct use by managers with standard mathematical training. It is based on basic time series analysis (Pearson correlation and Euclidean distance) and clustering, groups products with similar market dynamics. This allows farmers to identify substitutable products within clusters that have higher producer share, suggesting potential production adjustments to maximize income, as well as coordinated market behavior through the identification of product families moving together in terms of producer share. Empirical analysis revealed clear patterns: potato and onion are consistently grouped with low producer shares (around 0.2), reflecting high distribution margins, while carrot exhibits a higher and stable share (around 0.3), suggesting a more efficient distribution chain or different commercialization process. The consistency of results between a fixed month (September) and the full time series confirms the stability of these strategic insights, indicating that product similarity and market patterns are not merely seasonal but reflect enduring structures.
This work confirms the existence of wide distribution margins in the agrifood chain and provides producers with a reproducible, scientific methodology to understand and respond strategically to price dynamics, supporting greater competitiveness and a more equitable value distribution.
5.2. Methodological Limitations and Future Work
While our approach provides actionable insights for strategic decision making, several limitations should be acknowledged. First, the analysis relies on historical producer share time series and does not explicitly account for extraordinary events or structural breaks (e.g., the COVID-19 pandemic, energy and transport shocks), which could temporarily affect market behavior. Second, the methodology does not explicitly separate seasonal effects from long term trends; although clustering captures persistent patterns in producer share trajectories, seasonal fluctuations are inherently present in the time series and may influence cluster boundaries. Third, missing data were imputed using simple methods (forward filling or averaging), which, although applied in only a few cases, could introduce minor distortions. Finally, the framework focuses on a single national value chain and a selected set of orchard products, and uses producer share as the main market descriptor. While this choice is informative, conclusions derived from this single metric are inherently limited and could benefit from complementary indicators.
Future work could address these limitations by incorporating models that explicitly account for structural breaks and shocks, applying robust seasonal adjustment techniques, exploring more sophisticated imputation methods, extending the analysis to other crops, regions, or international markets, and including additional market indices alongside producer share. Such extensions would enhance both the interpretability and applicability of the methodology, while preserving its practical usability for farm managers.