1. Introduction
Garlic (
Allium sativum L.) yield averaged 11 Mg ha
−1 in Brazil, compared with 17 Mg ha
−1 worldwide [
1].Garlic yield and quality depend mainly on temperature, precipitation, and photoperiod, hence local factors [
2]. Associated with this, climate change that is causing the irregular distribution of rainfall can be a limiting factor for garlic production [
3,
4]. However, the Brazilian yield gap could be filled through genetics [
5,
6,
7], pest management [
8], irrigation [
9], and fertilization [
1,
10]. Several garlic cultivars of high commercial value [
6,
11] are grown under widely different local conditions in the Brazilian states of Minas Gerais, Goiás, Santa Catarina, and Rio Grande do Sul.
Fertilization programs can be guided by soil and plant testing methods [
12,
13]. Compared with soil tests, tissue tests are generally more closely related to crop performance because the plant can integrate many site-specific abiotic factors [
14,
15] and nutrient interactions [
16]. Cunha et al., 2016, elaborated regional nutrient standards for the noble garlic cultivar “Ito” using a dataset of 142 observational data from Minas Gerais, Brazil. However, a larger and more diversified dataset is required to diagnose nutrient problems across various growing conditions in the southern Brazilian states.
To address nutrient interactions, tissue test interpretation was thought to be universal using pairwise ratios [
17], but the assumption proved to be wrong at the local scale [
18,
19,
20]. Optimum tissue nutrient concentration ranges may also vary with early biomass production due to differential growth rates and environmental conditions [
21]. The “dilution effect” and its inverse, the “concentration effect”, occur when the concentration of an element decreases or increases in plant tissue through time due to nutrient additions and changing seasonal environmental conditions that impact plant growth rate and the production of biomass [
22]. Hence, key growing factors should be considered to predict yield from tissue analysis. This is especially true for fast-growing annual crops such as garlic.
Factor-specific nutrient diagnosis of agroecosystems could be conducted following Alexander von Humboldt’s principles of biogeography where facts and local knowledge are assembled and processed by machine learning methods such as decision trees to reach a comprehensive understanding of living systems [
23]. On the other hand, the properties of living systems are often reported in terms of concentrations or percentages. Analyzing raw compositions numerically is a very difficult and time-consuming task [
24]. Compositional data analysis (CoDa) has been developed to address the limitation related to dilution of concentration among D components that are constrained to the unit sum and provide D-1 degrees of freedom when conducting statistical analyses [
25].
Combining machine learning and compositional data analysis methods sequentially proved to be accurate for yield predictions and the detection of nutrient problems at the local scale [
18,
19,
26,
27,
28,
29]. Machine learning methods can predict crop yield in regression or classification modes. The classification mode allows growers to select realistic site-specific yield targets and provides risk analysis as the probability to exceed a target yield. For nutritionally imbalanced plants, nutrients can be ordered thereafter according to their limitation to yield using tools of compositional data analysis. The following hypotheses were tested: (1) optimum nutrient combinations of garlic tissues are factor specific, and (2) machine learning and compositional data analysis methods predict garlic yields and nutrient limitations differently using local conditions vs. regional averages. Our objective was to provide an accurate diagnosis of the nutrient status of garlic crops in southern Brazil.
3. Results
3.1. Model Accuracy
Machine learning methods can predict absolute yield from a regression equation or can provide risk analysis as the probability to reach yields higher than the target yield. The training dataset (the 914 observations collected in 2015–2017) was evaluated in regression mode using Adaboost and in classification mode using random forest. These machine learning methods showed the highest performance among other machine learning methods (data not shown). Results are presented in
Figure 2 and
Table 1. Model accuracy increased as learners were more informed about features (
Table 2).
Accuracy of the random forest classification model was 0.912 with 135 true negative specimens at yield cutoff set at Brazilian yield average of 11 ton ha
−1 (
Table 2). At yield cutoff of 8 ton ha
−1, classification accuracy reached the maximum value at 0.955 with 345 true negative specimens. There were significant differences between regional nutrient standards for yield cutoffs of 8 ton ha
−1 or 11 ton ha
−1 (
Table 3). Because yield cutoff is often arbitrarily selected but impacts on the number of true negative specimens, regional nutrient diagnosis should be interpreted in relation to yield target and local conditions.
3.2. Random Forest Yield Prediction for Observational Data in 2018–2019
Climatic indices in 2018 and 2019 differed markedly from those recorded in 2015–2017 (
Table 4). From planting to the tissue sampling date, cumulated rainfall was low and rainfall distribution was uneven in 2019 while conditions in 2018 were closer to those observed during the 2015–2017 period. Tissue compositions in 2018 and 2019 often differed from those recorded as true negative specimens during 2015–2017 period. Despite contrasting features between experimental (2015–2017) and observational (2018–2019) datasets, yield classification was predicted tentatively by the random forest classification model setting yield cutoff at the Brazilian average of 11 ton ha
−1. This emphasizes the need for larger and more diversified training datasets in addition to greater control of variables associated with management such as irrigation.
Where risk analysis showed more than 50% probability to reach a yield above the target, the predicted yield was declared “high”. While 52 of the 61 specimens yielded more than 11 ton ha−1 during the 2018–2019 period, seven specimens were classified as true negative (high-yielding and nutritionally balanced) and 45 specimens as false positive (high-yielding and nutritionally imbalanced). The remaining nine specimens were classified as true positive specimens (low-yielding and nutritionally imbalanced).
The 28 specimens in 2018 showed 100% probability of being classified as false positive, indicating luxury consumption, suboptimum concentration, or nutrient contamination, therefore, unwise use of fertilizers. The nine specimens at site A in 2019 were low-yielding and imbalanced, showing less than 16% probability to be classified as true negative, therefore, the fertilization regime appeared to be inappropriate. At site B in 2019 where yield always exceeded 11 ton ha−1, there were 4 true negative and 20 false positive specimens. The classification of median values across observational specimens indicated false positive specimens in 2018, true positive specimens at site A in 2019, and false positive specimens at site B in 2019.
3.3. The False Positive Specimens at Site in 2018
The yield of the diagnosed specimen was 15.5 ton ha
−1 compared with 11.0–11.2 ton Mg ha
−1 for the closest reference specimens (
Table 5). There was a large difference between regional and local diagnoses (
Figure 3). Regional diagnosis indicated an apparent shortage of N, P, and K despite the high yield. Local diagnosis indicated an excess across nutrients. Compared with its closest compositional neighbor, the grower had fair control of N fertilization but failed to adjust other nutrients to crop needs. There were excessive accumulations of P and K in the soil. There was a large potential for economic and environmental gains by reducing fertilization.
3.4. The True Positive Specimens at Site A in 2019
The nearest compositional neighbor showed a marketable yield of 13.9 ton ha
−1 compared with 9.7 ton ha
−1 for the diagnosed specimen. Soil tests of P, K, Ca, and Mg at site A in 2019 for the reference specimen as well as other features are reported in
Table 6. Organic matter content was 4.5% for the diagnosed site and 3.8% in the reference specimen, indicating a possible need for site-specific management of nitrogen [
43]. The grower already took action to reduce N fertilization from 400 to 130 kg total N ha
−1. Both local and regional diagnoses indicated tissue P, K, Ca, and Mg excess compared with the three closest tissue compositional neighbors (
Table 6). Factor-specific diagnosis at the local scale indicated relative excess of macronutrients, as well as Zn and Cu (
Figure 4). However, the Zn and Cu levels must vary with the time and rate of fungicide applications. Compared with its closest composition neighbor, the producer reduced N application.
4. Discussion
4.1. Factor-Specific Optimum Nutrient Levels
At optimum yield, N requirements generally depend on crop yield and tissue N content. K bioavailability depends on the clay type, and P bioavailability depends on the plant rooting pattern and competition, root hairs, mycorrhizal and other microbial associations, and also soil P sorption capacity and soil chemicals, such as pH, or physical barriers [
44,
45]. Considering those limitations, nutrient budgets were elaborated using a sequence of equations and plant- and soil-specific coefficients [
34]. Although partially applied to garlic nutrition [
1], such an approach could be inaccurate if not validated by fertilizer trials as was the case for tomato (
Solanum lycopersicum L.) in São Paulo state, Brazil [
46]. Alternatively, tissue testing can integrate several growth-impacting factors [
15]. However, there are currently no reliable guidelines for tissue testing that address the nutrient management of garlic at the local scale.
Seasonal variations in tissue nutrient concentrations depend on biomass accumulation at a given developmental stage under specific growing conditions [
21]. Fast-growing crops take up more N than slow-growing crops during their vegetative phase when most decisions on fertilization are made [
21]. The N, P, and K concentration and dilution phenomena show allometric relationships with plant biomass [
21,
47]. Indeed, N, P, and K concentrations in plants tend to decrease simultaneously through time [
17]. Because nutrient allocation patterns in response to the abiotic or biotic environments are plastic [
48,
49], tissue analytical results should not be interpreted in isolation. We showed that informing the machine learning model with an increasing number of yield-impacting abiotic and biotic features can increase model accuracy. Tissue sampling position, plant developmental stage, row spacing, plant population, fertilizer source, placement, rate and timing, and pest control were assumed to be similar between sites as current practices in the region.
4.2. Prediction of Garlic Yields and Nutrient Limitations
Machine learning methods can address the specificity of factor combinations impacting crop nutrition and yield. A minimum dataset easy to acquire by garlic growers may include cultivar, the preceding crop, planting date, tissue test, soil test, fertilization, and climatic factors impacting plant growth during the vegetative phase up to tissue sampling. Because late-season tissue nutrient diagnosis is an ex ante approach to predict final yield with uncertainty, growers may prefer the classification mode to provide the probability of reaching the selected target yield.
Growers often report differential growth patterns for plants growing on the same field without any nutrient deficiency symptoms. Those plants are typically false negative specimens. In this case, the abnormally low growth pattern is attributable to stress rather than mineral nutrition, caused by factors such as pest damage or adverse soil physical conditions. Plants may also be true positive specimens, high-yielding but nutritionally mismanaged [
20]. The advantages of local over regional diagnoses are that (1) several climatic, edaphic, and managerial growth factors can be trustfully assumed to be equal for the same ecological neighborhood, and (2) yield target is realistic for that neighborhood. Another advantage of local diagnosis is to avoid over-representation of some ecologically uniform groups of true negative specimens that may not be representative of local conditions but are still used to compute nutrient standards at the regional scale. The focus is to consider local conditions comparable to those of the diagnosed specimen to run the diagnosis. The authors of [
26,
27] thus developed the concept of “enchanting islands” as successful environments where controllable factors can be optimized. Leitzke Betemps et al., (2020) also called such benchmark locations “Humboldtian loci” or “Ilhas Encantadas” in Portuguese. That is, mathematical simulations for nutritional diagnoses in geographically close locations, taking into account areas of adequate productivity (Humboldtian loci) as a reference, which are surrounded by areas with low productivity. The objective was to address controllable factors in such a way to re-establish nutrient balance or heal physically unhealthy soils in an economically and environmentally viable fashion.
On the other hand, nutrient imbalance to be tackled should not be diagnosed by examining nutrients separately. Any nutrient level is impacted by the level of other nutrients under Liebscher’s law of the optimum [
50]. This phenomenon has been reported as “nutrient interactions” [
16], “dual or pairwise interactions” [
51], “multinutrient interactions” [
52], “nutrient crosstalks” [
53] or, in CoDa terms, “resonance within the simplex” [
24]. Vahl de Paula et al., (2020) viewed soil and tissue compositions as unique combinations of nutrients that differ from “Frankenstein-built” regional standards, that is, constructed from data that do not interact with each other and do not provide useful information for nutritional diagnosis, averaged across contrasting growth-impacting factors.
The geometry of log ratio transformations facilitates the search for nutritionally close compositional neighbors growing under comparable conditions. Where nutrient imbalance is detected by the machine learning model, nutrients can be classified in a relative order of limitation to yield to assist making wise decisions on how to adjust fertilization. The perturbation vector [
24] used to classify nutrients resembles the deviation from optimum percentage [
42], but the benchmark compositions can be tied to attainable yields among true negative specimens located in nearby “Ilhas Encantadas”.
4.3. Need for Large and Diversified Datasets
Beaufils (1971) documented several features impacting yield parameters as follows: (1) vegetative conditions such as appearance (normal or abnormal plants or stands as luxurious, luxurious to medium, medium, medium to poor, or poor), tissue sampling position, plant age measured by the difference between plantation and tissue sampling dates, hour of tissue sampling, leaf color, plant height, visual symptoms, disease, and insect infestation; (2) weather conditions such as rainfall, temperature, wind, and light intensity; (3) cultivation practices such as cultivar, date of planting, row spacing, plant population, fertilizer source, placement, rate and timing, pesticides, and site history; (4) soil chemical, physical, and mechanical quality; and (5) leaf analysis. Biological soil quality based on tools of metagenomics is also gaining acceptance [
54]. While the test dataset in 2018–2019 showed features that differed from the training dataset, yield class predictions from extrapolated features must be interpreted with caution. Nutrient diagnosis using the random forest classification model also depended on the selected cutoff yield and the number of factors documented in the dataset.
Yield of garlic could be predicted accurately using a dataset easy to acquire by the grower, including cultivar, fertilization, tissue analysis, soil analysis, and the preceding crop and climatic conditions between plantation and tissue sampling dates. Compositional log ratio methods provided an order of nutrient limitations to yield useful benefits to customize fertilizer recommendations at the local scale. Nevertheless, large and diversified datasets should thus be documented by experimental and observational data in collaboration with stakeholders.
5. Conclusions
Optimum nutrient combinations to reach high-yield level under specific local conditions were shown to be factor specific. Tissue analysis alone returned an accuracy of 0.750 in regression mode, and of 0.891 in classification mode using a yield cutoff of 11 ton ha−1. However, accuracy reached 0.855 in regression mode and 0.912 in classification mode where models included all factors documented in the dataset. Because biomass production and other factors must impact tissue compositions, it is recommended to include all factors that are easy to acquire by the grower to run nutrient diagnosis of garlic crops.
Garlic can grow successfully under various combinations of nutrients subjected to a number of edaphic and managerial local factors. Local nutrient diagnosis may differ from regional diagnosis because several yield-impacting factors are taken into account and benchmark compositions are representative of local conditions. As datasets become larger in size and diversity, local diagnosis will provide effective and economical use of nutrient resources to reach a high yield of garlic. Increased economic and environmental concerns should cause many growers to re-assess nutrient management strategies so that inputs and costs are minimized while yield expectations are met. Collaboration among stakeholders is required to tackle the numerous factor combinations of yield-impacting factors and to address knowledge gaps to be filled by additional fertilizer trials.