Figure 1.
Complete methodological pipeline of the study, with CRISP-DM phases mapped to the corresponding stages.
Figure 1.
Complete methodological pipeline of the study, with CRISP-DM phases mapped to the corresponding stages.
Figure 2.
Data preparation stage scheme, adapted from Koukaras and Tjortjis [
36].
Figure 2.
Data preparation stage scheme, adapted from Koukaras and Tjortjis [
36].
Figure 3.
Correlation matrix among the nine candidate variables for the productive scale clustering.
Figure 3.
Correlation matrix among the nine candidate variables for the productive scale clustering.
Figure 4.
Annual distribution of records in the aggregated departmental dataset, 2006–2024.
Figure 4.
Annual distribution of records in the aggregated departmental dataset, 2006–2024.
Figure 5.
Univariate distributions of sown_area and harvested_area. (a) sown_area in linear scale. (b) sown_area in logarithmic scale. (c) harvested_area in linear scale. (d) harvested_area in logarithmic scale.
Figure 5.
Univariate distributions of sown_area and harvested_area. (a) sown_area in linear scale. (b) sown_area in logarithmic scale. (c) harvested_area in linear scale. (d) harvested_area in logarithmic scale.
Figure 6.
Univariate distributions of production and yield. (a) Production in linear scale. (b) Production in logarithmic scale. (c) Yield in linear scale. (d) Yield in logarithmic scale.
Figure 6.
Univariate distributions of production and yield. (a) Production in linear scale. (b) Production in logarithmic scale. (c) Yield in linear scale. (d) Yield in logarithmic scale.
Figure 7.
Top 10 department–crop pairs by productive indicator, 2007–2024. (a) Cumulative production. (b) Average planted area. (c) Average harvested area. (d) Average yield.
Figure 7.
Top 10 department–crop pairs by productive indicator, 2007–2024. (a) Cumulative production. (b) Average planted area. (c) Average harvested area. (d) Average yield.
Figure 8.
Temporal continuity distribution of department–crop pairs. (a) Observed conditions (36 periods, 2006B–2024B, excluding 2018B). (b) Theoretical scenario with imputation of 2018B (35 periods, 2007A–2024B).
Figure 8.
Temporal continuity distribution of department–crop pairs. (a) Observed conditions (36 periods, 2006B–2024B, excluding 2018B). (b) Theoretical scenario with imputation of 2018B (35 periods, 2007A–2024B).
Figure 9.
Frequency of transitory crops by number of department–crop pairs. (a) Ten most frequent crops. (b) Ten least frequent crops.
Figure 9.
Frequency of transitory crops by number of department–crop pairs. (a) Ten most frequent crops. (b) Ten least frequent crops.
Figure 10.
Twenty transitory crops with the highest total cumulative production, 2007–2024.
Figure 10.
Twenty transitory crops with the highest total cumulative production, 2007–2024.
Figure 11.
Spatial coverage of transitory crops, 2007–2024. (a) Top 20 crops with the greatest spatial coverage (number of producing departments). (b) Distribution of all crops by number of producing departments. The red dotted vertical line indicates the maximum number of producing departments (32).
Figure 11.
Spatial coverage of transitory crops, 2007–2024. (a) Top 20 crops with the greatest spatial coverage (number of producing departments). (b) Distribution of all crops by number of producing departments. The red dotted vertical line indicates the maximum number of producing departments (32).
Figure 12.
Spatial concentration of transitory crops measured by the Herfindahl–Hirschman Index (HHI), 2007–2024. (a) Top 10 most spatially concentrated crops (highest HHI). (b) Top 10 least spatially concentrated crops (lowest HHI).
Figure 12.
Spatial concentration of transitory crops measured by the Herfindahl–Hirschman Index (HHI), 2007–2024. (a) Top 10 most spatially concentrated crops (highest HHI). (b) Top 10 least spatially concentrated crops (lowest HHI).
Figure 13.
Twenty departments with the highest total cumulative transitory crop production, 2007–2024.
Figure 13.
Twenty departments with the highest total cumulative transitory crop production, 2007–2024.
Figure 14.
Crop diversity by department. (a) Ten departments with the highest number of distinct transitory crops. (b) Ten departments with the lowest crop diversity.
Figure 14.
Crop diversity by department. (a) Ten departments with the highest number of distinct transitory crops. (b) Ten departments with the lowest crop diversity.
Figure 15.
Department–crop presence/absence coverage matrix (32 departments × 96 crops; overall density = 31.5%). (a) Crops with wider spatial coverage (greater number of producing departments). (b) Crops with narrower spatial coverage (fewer producing departments).
Figure 15.
Department–crop presence/absence coverage matrix (32 departments × 96 crops; overall density = 31.5%). (a) Crops with wider spatial coverage (greater number of producing departments). (b) Crops with narrower spatial coverage (fewer producing departments).
Figure 16.
Colombian map showing transitory crop diversity by department, 2007–2024.
Figure 16.
Colombian map showing transitory crop diversity by department, 2007–2024.
Figure 17.
National total transitory crop production, 2007–2024. (a) Annual production (millions of tonnes) with linear trend. (b) Year-on-year percentage variation in production. Green bars indicate positive year-on-year variation and red bars indicate negative variation.
Figure 17.
National total transitory crop production, 2007–2024. (a) Annual production (millions of tonnes) with linear trend. (b) Year-on-year percentage variation in production. Green bars indicate positive year-on-year variation and red bars indicate negative variation.
Figure 18.
National total planted area of transitory crops, 2007–2024. (a) Annual planted area (millions of hectares) with linear trend. (b) Year-on-year percentage variation in planted area. Green bars indicate positive year-on-year variation and red bars indicate negative variation.
Figure 18.
National total planted area of transitory crops, 2007–2024. (a) Annual planted area (millions of hectares) with linear trend. (b) Year-on-year percentage variation in planted area. Green bars indicate positive year-on-year variation and red bars indicate negative variation.
Figure 19.
Number of active department–crop pairs, 2007–2024. (a) Annual count of active pairs with linear trend. The shaded blue area represents the range of active pairs. (b) Year-on-year percentage variation in active pairs. Green bars indicate positive variation and red bars indicate negative variation.
Figure 19.
Number of active department–crop pairs, 2007–2024. (a) Annual count of active pairs with linear trend. The shaded blue area represents the range of active pairs. (b) Year-on-year percentage variation in active pairs. Green bars indicate positive variation and red bars indicate negative variation.
Figure 20.
Distribution of productive variables by semi-annual period, 2007–2024. (a) Sown area (ha). (b) Harvested area (ha). (c) Production (t). (d) Yield (t/ha). Box plots show median, interquartile range, and mean (red circle).
Figure 20.
Distribution of productive variables by semi-annual period, 2007–2024. (a) Sown area (ha). (b) Harvested area (ha). (c) Production (t). (d) Yield (t/ha). Box plots show median, interquartile range, and mean (red circle).
Figure 21.
Cumulative totals of productive variables by semi-annual period, 2007–2024. (a) Total sown area (ha). (b) Total harvested area (ha). (c) Total production (t). (d) Mean yield (t/ha).
Figure 21.
Cumulative totals of productive variables by semi-annual period, 2007–2024. (a) Total sown area (ha). (b) Total harvested area (ha). (c) Total production (t). (d) Mean yield (t/ha).
Figure 22.
Pearson correlation matrix between productive variables (sown_area, harvested_area, production, and yield).
Figure 22.
Pearson correlation matrix between productive variables (sown_area, harvested_area, production, and yield).
Figure 23.
Residual analysis and linear regression between harvested_area and production. (a) Residuals versus fitted values. (b) Residual distribution. (c) Scatter plot with fitted regression line (y = 1249.71 + 6.03x; ). The red dotted line in (a) indicates the zero-residual reference; in (b), it marks the zero value.
Figure 23.
Residual analysis and linear regression between harvested_area and production. (a) Residuals versus fitted values. (b) Residual distribution. (c) Scatter plot with fitted regression line (y = 1249.71 + 6.03x; ). The red dotted line in (a) indicates the zero-residual reference; in (b), it marks the zero value.
Figure 24.
Residual analysis and linear regression between sown_area and harvested_area. (a) Residuals versus fitted values. (b) Residual distribution. (c) Scatter plot with fitted regression line (y = 105.61 + 0.868x; ). The red dotted line in (a) indicates the zero-residual reference; in (b), it marks the zero value.
Figure 24.
Residual analysis and linear regression between sown_area and harvested_area. (a) Residuals versus fitted values. (b) Residual distribution. (c) Scatter plot with fitted regression line (y = 105.61 + 0.868x; ). The red dotted line in (a) indicates the zero-residual reference; in (b), it marks the zero value.
Figure 25.
Determination of the optimal number of clusters. (a) Elbow method (within-cluster sum of squares, WCSS). (b) Mean silhouette coefficient for .
Figure 25.
Determination of the optimal number of clusters. (a) Elbow method (within-cluster sum of squares, WCSS). (b) Mean silhouette coefficient for .
Figure 26.
Model validation for (mean silhouette = 0.888). (a) Silhouette plot by productive scale cluster (Small-scale ; Medium-scale ; Large-scale ). (b) Two-dimensional PCA projection with clusters identified by productive scale; PC1 concentrates 99.3% of explained variance.
Figure 26.
Model validation for (mean silhouette = 0.888). (a) Silhouette plot by productive scale cluster (Small-scale ; Medium-scale ; Large-scale ). (b) Two-dimensional PCA projection with clusters identified by productive scale; PC1 concentrates 99.3% of explained variance.
Figure 27.
Geographic distribution of the 490 department–crop pairs by productive scale, 2007–2024. (a) Small scale (459 pairs, 32 departments). (b) Medium scale (26 pairs, 14 departments). (c) Large scale (5 pairs, 5 departments).
Figure 27.
Geographic distribution of the 490 department–crop pairs by productive scale, 2007–2024. (a) Small scale (459 pairs, 32 departments). (b) Medium scale (26 pairs, 14 departments). (c) Large scale (5 pairs, 5 departments).
Figure 28.
Comparative visualization of productive scale segmentation. (a) Quartile-based classification (fixed cut-offs at the 25th, 50th, and 75th percentiles of average production; : ; : ; : ; : ). (b) K-Means clustering-based classification (, boundaries at natural discontinuities; Small-scale: ; Medium-scale: ; Large-scale: ). In (b), the green dashed vertical lines indicate the natural breakpoints at approximately 35,386 t and 275,959 t.
Figure 28.
Comparative visualization of productive scale segmentation. (a) Quartile-based classification (fixed cut-offs at the 25th, 50th, and 75th percentiles of average production; : ; : ; : ; : ). (b) K-Means clustering-based classification (, boundaries at natural discontinuities; Small-scale: ; Medium-scale: ; Large-scale: ). In (b), the green dashed vertical lines indicate the natural breakpoints at approximately 35,386 t and 275,959 t.
Table 1.
Characteristics of the two EVA source datasets prior to processing.
Table 1.
Characteristics of the two EVA source datasets prior to processing.
| Characteristic | EVA 2007–2018 | EVA 2019–2024 |
|---|
| Records | 206,068 | 141,073 |
| Variables | 17 | 18 |
| Temporal coverage | 2007–2018 | 2019–2024 |
| Source | [24] | [25] |
Table 2.
Consolidated imputation decision table for zero-value cases, organized by group.
Table 2.
Consolidated imputation decision table for zero-value cases, organized by group.
| Case | Records | Agronomic Interpretation | Decision and Justification |
|---|
| A1 | 3720 | Failed sowing or total crop loss: sown_area > 0 but no harvest or production. Agronomically coherent. | Retain as valid. Contributes sown_area to aggregates without inflating production. |
| A2 | 221 | Same as A1 but yield recorded as NaN (undefined 0/0 operation). | Set yield = 0. Homogenizes with A1 and eliminates NaN in a variable whose value is known. |
| A3 | 24 | Failed sowing with positive yield: no harvest or production yet yield is reported. Likely a data entry error. | Correct yield = 0. Adjusting the derived variable is more conservative than imputing harvest. |
| A4 | 104 | No area or production, yet yield > 0. Positive yield is incompatible with total absence of productive activity. | Correct yield = 0. Prevents spurious values from distorting summaries and models. |
| B | 112 | Positive sown_area, harvested_area, and yield, but production = 0. Production must exist; zero indicates an unreported value. | Impute . Restores the agronomic identity and prevents underestimation of production. |
| C2 | 436 | Positive production and yield, but both area variables are zero; areas were not entered. | Impute: , then . Exploits the near-proportional relationship between both area variables. |
| C3 | 96 | Positive sown_area, production, and yield, but harvested_area = 0. Inconsistent: harvested area must be positive if production is reported. | Impute , retaining sown_area. Restores the agronomic identity. |
| C5 | 2278 | Positive harvested_area and production, but sown_area = 0. Planted area not recorded for the period. | Impute . Planted and harvested areas are strongly aligned in transitory crops; total losses are captured in A1–A2. |
Table 3.
Sensitivity analysis of the temporal continuity threshold: pairs retained and production coverage by candidate threshold value.
Table 3.
Sensitivity analysis of the temporal continuity threshold: pairs retained and production coverage by candidate threshold value.
| Threshold (Periods) | Approx. Years | Pairs Retained | Pairs (%) | Production Coverage (%) |
|---|
| 10 | 5.0 | 688 | 71.0 | 99.91 |
| 12 | 6.0 | 648 | 66.9 | 99.68 |
|
15 | 7.5 | 513 | 52.9 | 90.65 |
| 18 | 9.0 | 477 | 49.2 | 90.46 |
| 20 | 10.0 | 458 | 47.3 | 90.42 |
| 25 | 12.5 | 397 | 41.0 | 89.60 |
| 30 | 15.0 | 350 | 36.1 | 88.94 |
Table 4.
Global descriptive statistics of the aggregated dataset 2007–2024 (18,716 observations; 969 department–crop pairs).
Table 4.
Global descriptive statistics of the aggregated dataset 2007–2024 (18,716 observations; 969 department–crop pairs).
| Variable | Min | Q1 | Median | Mean | Q3 | Max | SD |
|---|
| sown_area (ha) | 0.01 | 23.00 | 86.00 | 1738.56 | 450.00 | 207,751.00 | 6852.42 |
| harvested_area (ha) | 0.00 | 20.00 | 78.00 | 1055.67 | 400.00 | 186,971.00 | 4234.81 |
| production (t) | 0.00 | 115.00 | 577.00 | 11,018.02 | 3370.00 | 1,434,720.00 | 51,635.00 |
| yield (t/ha) | 0.00 | 3.00 | 7.00 | 10.50 | 13.00 | 102.15 | 12.30 |
Table 5.
Distribution of department–crop pairs by number of active periods (dataset of 969 pairs, 36 observed periods).
Table 5.
Distribution of department–crop pairs by number of active periods (dataset of 969 pairs, 36 observed periods).
| Active Periods Range | N Pairs | Percentage (%) |
|---|
| 1–5 | 203 | 20.95 |
| 6–10 | 97 | 10.01 |
| 11–15 | 164 | 16.92 |
| 16–20 | 54 | 5.57 |
| 21–25 | 65 | 6.71 |
| 26–30 | 43 | 4.44 |
| 31–35 | 91 | 9.39 |
| 36 (complete) | 252 | 26.01 |
| Total | 969 | 100.00 |
Table 6.
Robustness analysis of the productive scale clustering solution: comparison of methods and scaling schemes.
Table 6.
Robustness analysis of the productive scale clustering solution: comparison of methods and scaling schemes.
| Method/Scheme | Silhouette | ARI vs. Original | Cluster Sizes (L/M/S) | Large-Scale Pairs |
|---|
| Alternative clustering methods |
| K-Means (original) | 0.888 | — | 5/26/459 | 5 (reference) |
| Hierarchical (Ward, ) | 0.865 | 0.728 | 5/44/441 | 5 (identical) |
| GMM () | −0.073 | −0.001 | Degenerate solution |
| Scaling sensitivity (K-Means, ) |
| Original (unscaled) | 0.888 | — | 5/26/459 | 5 |
| Log1p transformation | 0.472 | 0.079 | 102/167/221 | — |
| StandardScaler | 0.508 | 0.053 | 9/108/373 | — |
Table 7.
Silhouette statistics by cluster for the K-Means model with .
Table 7.
Silhouette statistics by cluster for the K-Means model with .
| Productive Scale | N Pairs | Min. Silhouette | Mean Silhouette | Max. Silhouette |
|---|
| Large | 5 | −0.474 | 0.553 | 0.640 |
| Medium | 26 | −0.050 | 0.346 | 0.547 |
| Small | 459 | −0.178 | 0.922 | 0.959 |
| Global | 490 | | 0.888 | |
Table 8.
Comparative characterization of the three productive scale clusters obtained from the K-Means model (), including descriptive statistics of average production and average sown_area per cluster.
Table 8.
Comparative characterization of the three productive scale clusters obtained from the K-Means model (), including descriptive statistics of average production and average sown_area per cluster.
| Scale | N Pairs (%) | Production Mean (t) | Production Min (t) | Production Max (t) | CV (%) | Planted Area Mean (ha) | Avg. Periods |
|---|
| Large | 5 (1.0%) | 445,073.90 | 341,163.44 | 709,922.83 | 30.6 | 40,936.51 | 35.00 |
| Medium | 26 (5.3%) | 69,873.57 | 35,925.69 | 210,753.66 | 54.0 | 13,543.21 | 34.81 |
| Small | 459 (93.7%) | 2926.08 | 1.43 | 34,846.25 | 197.2 | 727.43 | 30.49 |
| Natural breakpoints: ≈35,386 t
(small/medium boundary) and ≈275,959 t (medium/large boundary). |
Table 9.
Five large-scale department–crop pairs with clustering variables.
Table 9.
Five large-scale department–crop pairs with clustering variables.
| Department | Crop | Avg. Production (t) | Avg. Planted Area (ha) | Avg. Yield (t/ha) | N Periods | Years |
|---|
| Cundinamarca | papa | 709,922.83 | 33,907.14 | 22.00 | 35 | 2007–2024 |
| Boyacá | papa | 437,039.39 | 24,816.73 | 18.67 | 35 | 2007–2024 |
| Casanare | arroz | 375,134.74 | 75,772.73 | 5.45 | 35 | 2007–2024 |
| Tolima | arroz | 362,109.09 | 51,910.69 | 7.05 | 35 | 2007–2024 |
| Nariño | papa | 341,163.44 | 18,275.26 | 19.45 | 35 | 2007–2024 |
Table 10.
Geographic distribution of department–crop pairs by productive scale.
Table 10.
Geographic distribution of department–crop pairs by productive scale.
| Scale | N Pairs | N Departments | Leading Departments |
|---|
| Large | 5 | 5 | Cundinamarca, Boyacá, Nariño, Tolima, Casanare (1 pair each) |
| Medium | 26 | 14 | Meta (4), Boyacá (3), Norte de Santander (3), Bolívar (2),
Huila (2), Antioquia (2), Córdoba (2), Cesar (2) |
| Small | 459 | 32 | Cundinamarca (37), Cauca (33), Boyacá (30), Nariño (29),
Norte de Santander (28), Santander (27), Valle del Cauca (27) |
Table 11.
Distribution of department–crop pairs by productive scale and crop.
Table 11.
Distribution of department–crop pairs by productive scale and crop.
| Scale | N Crops | Leading Crops (N Pairs) |
|---|
| Large | 2 | Papa (3: Cundinamarca, Boyacá, Nariño);
arroz (2: Tolima, Casanare) |
| Medium | 9 | Arroz (7), maíz (7), papa (4), tomate (3);
cebolla de bulbo, cebolla de rama, soya,
patilla, zanahoria (1 each) |
| Small | 56 | Frijol (26), maíz (25), ahuyama (23),
tomate (22), patilla (22), ají (20),
others (301) |
Table 12.
Cross-tabulation between productive scale (clustering) and quartile-based classification.
Table 12.
Cross-tabulation between productive scale (clustering) and quartile-based classification.
| Productive Scale | Q1 (Low) | Q2 (Med-Low) | Q3 (Med-High) | Q4 (Large) | Total |
|---|
| Large | 0 | 0 | 0 | 5 | 5 |
| Medium | 0 | 0 | 0 | 26 | 26 |
| Small | 123 | 122 | 122 | 92 | 459 |
| Total | 123 | 122 | 122 | 123 | 490 |