Multi-Environment and Multi-Year Bayesian Analysis Approach in Coffee canephora

This work aimed to use the Bayesian approach to discriminate 43 genotypes of Coffea canephora cv. Conilon, which were cultivated in two producing regions to identify the most stable and productive genotypes. The experiment was a randomized block design with three replications and seven plants per plot, carried out in the south of Bahia and the north of Espírito Santo, environments with different climatic conditions, and evaluated during four harvests. The proposed Bayesian methodology was implemented in R language, using the MCMCglmm package. This approach made it possible to find great genetic divergence between the materials, and detect significant effects for both genotype, environment, and year, but the hyper-parametrized models (block effect) presented problems of singularity and convergence. It was also possible to detect a few differences between crops within the same environment. With a model with lower residual, it was possible to recommend the most productive genotypes for both environments: LB1, AD1, Peneirão, Z21, and P2.


Introduction
Worldwide, more than 80 tropical countries employ 25 million farmers that produce around 175 million bags of coffee every year. Commonly, Coffea arabica represents 60% and C. canephora 40% of the annual revenue of 172 million USD [1]. Brazil is an important player, due it is the main producer and exporter in the world, with an estimated increase in global demand in the 2021/22 cycle due to the advance of control of COVID-19 in important consumer centers, such as Europe and the United States [2]. In this scenario of strong C. arabica valuation due to the significant loss of production in Brazil (−25%), the industry tends to increase demand for C. canephora coffee to reduce the cost of producing blends. The Espírito Santo and Bahia states in Brazil are the largest producer with almost 17.5 thousand bags of already processed coffee, with an increase of 12% of C. canephora in the last harvest [2]. In this context, this is an important crop for the surrounding regions mainly due to the local climate that exerts a strong influence on production.
The climate in the region is predominantly within the ideal limits for the culture, mainly C.canephora which is temperature tolerant up to 37 • C by photoprotective and antioxidant mechanisms [3]. Outside the limits from 17 to 31 • C its development and production are reduced or impaired [4,5]. Although the limits are wide, it has been observed that coffee is highly responsive to small variations in temperature [6]. Breeding programs have developed new coffee genotypes, providing not only desirable agronomic characteristics, but also greater adaptability to the environment [7].
Even though breeding programs develop better genotypes available to environments, they cannot be tested in many locations. This impossibility is mainly due to the inherent characteristics of perennial plants, which have a long reproductive cycle, a high annual variation, differences in precocity between genotypes, and productive longevity [8]. So, biometric studies that fill these gaps are important to identify productive cultivars adapted to the environment, which is the main component of the competitiveness of coffee plantations. Difficulties inherent to perennial crops such as coffee, in addition to the lack of information aimed at important regions, also pose problems for statistical models [9].
The challenge for statistical modeling is focused on the quantity and quality of data obtained by this type of crop, which certainly reduces the predictive power of the models. Both from the point of view of genetic breeding and commercial orchards, this consequence comes from the use of the same genotype for several years, a low survival rate during the life of the orchards, and the need for repeated evaluations in each individual over time [9]. Therefore, using a more accurate modeling approach can undoubtedly save resources and improve the chance of success of orchards with perennial plants.
An approach that has been used more frequently in these types of problems is the Bayesian approach. This approach is widely used with genomic data in culture [10][11][12], but for phenotypic data, it has already been successfully applied to other perennial plants [13][14][15]. The Bayesian approach enables advantages compared to the frequentist (REML/BLUP). The main one is the possibility of using previous information (called informative priors) incorporated in the parameter's distribution of the model [16]. In the frequentist's approach, previous data are generally used in a joint analysis, which often generates hyper-parameterized models, with unplanned/incompatible experimental design or missing observations. Usually, it's become a source of variation in the model and does not add much information, beyond the possibility of identifying if the previous data had different averages/variances from the current experiment. Another advantage is that the credibility intervals tend to be smaller than the confidence intervals with a proper prior [17]. Given the above, this work aims to use the Bayesian approach to discriminate 43 genotypes of C. canephora genotypes cv. Conilon is the most widely cultivated cultivar in Brazil for the Robusta type of coffee in the main coffee-producing regions.

Results
Both genotypes, locations, and year effects incorporated into the model were significant (p < 0.05) by chi-square (Table 1). The models that include the block effect, in addition to increasing the number of parameters in the model, presented problems of singularity and convergence. It is also noted that the deviance of model 3 + 5 (which corresponds to the 3rd model plus the block effect) is close to model 3, thus, the block effect was disregarded in the models. The model chosen was the 3 + 8 model, in which the year models both the intercept and slope of the location, at the cost of one more parameter in the model (six to seven), but with less deviance. After choosing the model, it was adjusted following a Bayesian approach. The chain generated by the approach passed on the quality statistics of the Markov Chain convergence and sample size tests (Table A1). In addition to the tests, all chains were visually inspected (Figures 1 and A1, Figures A2-A5), looking for initial instability in the chain that could be due to a small burn-in or problems along the chain that could come from either the model or inadequate parameters in the prior. The distribution of residues was also checked to identify any possible bias in the model or non-incorporated effect ( Figure A6). No issues were found in either case. Observing the chains from parameters year inside the environment (Figure 1) of the genotypes in each year and location can be a good indication of how the genotypes are responding in these environments.  A biannually coffee yield was observed in Bahia, but it was not observed in Espírito Santo (Figure 1). Despite this biannually, the average yield in Bahia (100.35 bags of green coffee.ha −1 ) was not significantly different from Espírito Santo (84.05 bags of green coffee.ha−1), as well as both farmers who produce coffee in the two locations have the potential to produce similar amounts. In the years 2017 and 2018 in Bahia, the highest average  A biannually coffee yield was observed in Bahia, but it was not observed in Espírito Santo (Figure 1). Despite this biannually, the average yield in Bahia (100.35 bags of green coffee·ha −1 ) was not significantly different from Espírito Santo (84.05 bags of green coffee·ha −1 ), as well as both farmers who produce coffee in the two locations have the potential to produce similar amounts. In the years 2017 and 2018 in Bahia, the highest average production of coffee genotypes was observed, with 133.58 (lower CI = 126.60 and upper CI = 140.40) and 126.02 (lower CI = 118.96 and upper CI = 132.74) bags of green coffee·ha −1 . These two years were significantly higher than the others evaluated in Bahia. The averages and credibility intervals observed in Espírito Santo did not show significant differences between the years in that location. Subsequently, some variance components were estimated from the genotype chains, mainly obtaining heritability and genotype ranking by BLUP ( Table 2). Finally, a ranking of genotype production was obtained, allowing farms in these locations to choose the most productive genotypes among those evaluated for use in their orchards ( Figure 2). The heritability of the yield trait was about 0.28 with credibility intervals between 0.18 and 0.37. Yield average estimates for four harvests for 43 coffee (Coffea canephora) genotypes grown in Bahia and Espírito Santo. Means were grouped into values above one standard deviation from the mean, above the mean, below the mean, and below one standard deviation from the mean.

Discussion
Accurate estimates and production pattern for a crop is important for the regional development of the crop, mainly because it allows accurate planning and profit forecasting on the cultivation activity. This importance is easily seen when a farmer must choose between two slightly different locations to plant their orchard. If initial studies do not consider that there may be a different variance in yield between the two environments and obtain a single average estimate for both, yield may be reduced by some non-incorporated effect (such as unfavorable weather). This loss can be further attenuated if considered an orchard destined for export, which generally yields 40% more than conventional ones.
The coffee crop is planted in a relatively wide spacing, where to obtain an orchard with a few hundred plants, the producer easily needs a large area. Thinking about the size needed to experiment, it was planned to be installed in a block design, as we believe that the soil might not be very homogeneous. After collecting all the data and thinking about the dependent random variable and all the explanatory variables, we tried to keep the model as inflated as possible. Later, use some form of model selection to choose a model that balances predictive accuracy and overfitting/Type I error [18,19], removing only the terms necessary to allow for a non-singular adjustment [20]. So, looking at our model simplification results, mainly looking at how much each factor contributes to reducing the Yield average estimates for four harvests for 43 coffee (Coffea canephora) genotypes grown in Bahia and Espírito Santo. Means were grouped into values above one standard deviation from the mean, above the mean, below the mean, and below one standard deviation from the mean.

Discussion
Accurate estimates and production pattern for a crop is important for the regional development of the crop, mainly because it allows accurate planning and profit forecasting on the cultivation activity. This importance is easily seen when a farmer must choose between two slightly different locations to plant their orchard. If initial studies do not consider that there may be a different variance in yield between the two environments and obtain a single average estimate for both, yield may be reduced by some non-incorporated effect (such as unfavorable weather). This loss can be further attenuated if considered an orchard destined for export, which generally yields 40% more than conventional ones.
The coffee crop is planted in a relatively wide spacing, where to obtain an orchard with a few hundred plants, the producer easily needs a large area. Thinking about the size needed to experiment, it was planned to be installed in a block design, as we believe that the soil might not be very homogeneous. After collecting all the data and thinking about the dependent random variable and all the explanatory variables, we tried to keep the model as inflated as possible. Later, use some form of model selection to choose a model that balances predictive accuracy and overfitting/Type I error [18,19], removing only the terms necessary to allow for a non-singular adjustment [20]. So, looking at our model simplification results, mainly looking at how much each factor contributes to reducing the model's uncertainty, we believe that the block effect can be ignored. This factor contributed little to a better fit of the model, in addition to causing convergence problems.
After simplifying the model, we chose to use a Bayesian approach. The Bayesian approach has some advantages over frequentist analysis. The main one considered in our choice is the possibility of using informative priors on model parameters [16]. After several years of following farms and collecting data in this important coffee-producing region, in some cases with the same genotypes, we could include this prior information as a joint experiment for example. This would generate unbalanced data, sources of variation without enough repetitions, hyper parametrization, complexity, and perhaps increase model uncertainty. Another advantage is that credibility intervals are often reported as smaller than confidence intervals if an appropriate prior has been used, as observed in guava, another perennial fruit tree [14]. Due to the likelihood function, if a non-informative prior is used, a Bayesian performance at least similar to the frequentist is expected [21][22][23].
The observation of the aspect of the chain is also a quality control criterion frequently used in the adjustment of the Bayesian model to the data [24]. If any problems such as chain instability, lack of convergence, or instability were observed, we could revise our prior, chain size, or even the sources of variation in the model. If there is interest from other breeders or biometrics, our chains, and their quality tests, as well as algorithms, can be obtained from Appendix A.
Disregarding the blocking effect, all others presented at least one significant contrast (p < 0.05), moreover, looking at the residuals of the model, we did not observe any bias. This implies that the effects we considered in the model, in addition to being significant, were reasonably sufficient to control the errors. This information is particularly important to discriminate which genotypes are more promising, and the farmer in the region of the greatest importance for the crop in Brazil can better plan the harvest volume, even in regions relatively close to those evaluated in the experiment. All the genotypes studied here had already been studied for other traits related to grain quality [25], and a significant difference was observed between the genotypes, like as differences between the growth of the root system, which is directly correlated with yield, nutrient concentrations in leaves, our main source of variation of interest for recommendation to farmers.
This fluctuation in production observed in Bahia has already been observed in another trait, this is directly related to the best rainfall conditions concerning the state of Espírito Santo, the leaf concentrations of macro-and micronutrients are at better levels [26]. The rainfall and temperature as well as the reproductive periods influenced this trait, and as here the crop has closed the cycle for all genotypes, we believe that climatic data are strong candidates to enter as covariates in future models in cultivar recommendation systems. Here, our year/location as random provides a hint that something could vary between the interaction of these variables, but we don't know what it is, although a good guess would be the weather.
Although we did not detect a significant difference between the locations, in the meteorological data we observe that Bahia, for different periods, had an average temperature higher than Espírito Santo. This may have contributed to a greater sum of degree days in the culture at that location. On the other hand, a lack of consistency in Espírito Santo may have generated greater variability in production there, which led to greater credibility intervals than those observed in Bahia.
So, was proceeded with the rank of the individuals by BLUP obtained using the Bayesian approach. The Bayesian MCMC methods consider uncertainties in the parameters throughout the inference process. On the other hand, the frequentist models are predicted by punctual estimates of variance components and considered as true values, ignoring uncertainty in the variance parameters [27].
It was possible to observe great genotypic variability in the group of studied genotypes. The most relevant information here is the group of the first five genotypes, which stood out from the others and can be used by farmers who want high yields. The credibility intervals (HPD interval) found can be considered relatively high, but this great variability may come from the interaction between the environmental genotype, mainly in the Espírito Santo locations. This does not rule out the other genotypes as potential candidates to compose an orchard with characteristics that the most productive may not have, such as good stability between environments, and great responsibility to favorable environments. However, another study with the material is needed to discriminate these genotypes for these characteristics. The heritability found was also a value within the expected, which is not very high but is consistent with the polygenic nature of the studied trait. The five genotypes shown in Figure 2 are good recommendations for people interested in growing crops in the region, or any other environment with similar conditions, especially regarding altitude. The selection of the best families was performed to be recombined and to generate new populations.

Plant Material and Field Experiments
The genetic material consisted of 43 coffee (C. canephora) genotypes, both adapted to tropical environments, and widely cultivated by coffee farmers in the State of Espírito Santo-Brazil. The genotypes were evaluated for high-yield potential and agronomic traits of interest. The experiment was planted at a spacing of 3.5 m × 1.0 m, in a randomized block design with three replications, and seven plants as an experimental unit. All cultural treatments were made according to the crop requirement, and pruning kept around 12,000 orthotropic branches.
The trials were carried out in two environments in Brazil, in the south of Bahia (16 • 36 52 S, 39 • 30 33 W, and 140 m asl) and the north of Espírito Santo (18 • 39 43 S, 40 • 25 52 W, and 200 m asl). According to Köppen's classification, the regional climate is Aw tropical, with hot humid summers and dry winters, mean minimum temperatures of over 15 • C (July to August) and mean maxima of over 35 • C in January to February (Figure 3). credibility intervals (HPD interval) found can be considered relatively high, but this great variability may come from the interaction between the environmental genotype, mainly in the Espírito Santo locations. This does not rule out the other genotypes as potential candidates to compose an orchard with characteristics that the most productive may not have, such as good stability between environments, and great responsibility to favorable environments. However, another study with the material is needed to discriminate these genotypes for these characteristics. The heritability found was also a value within the expected, which is not very high but is consistent with the polygenic nature of the studied trait. The five genotypes shown in Figure 2 are good recommendations for people interested in growing crops in the region, or any other environment with similar conditions, especially regarding altitude. The selection of the best families was performed to be recombined and to generate new populations.

Plant Material and Field Experiments
The genetic material consisted of 43 coffee (C. canephora) genotypes, both adapted to tropical environments, and widely cultivated by coffee farmers in the State of Espírito Santo-Brazil. The genotypes were evaluated for high-yield potential and agronomic traits of interest. The experiment was planted at a spacing of 3.5 m × 1.0 m, in a randomized block design with three replications, and seven plants as an experimental unit. All cultural treatments were made according to the crop requirement, and pruning kept around 12,000 orthotropic branches.
The trials were carried out in two environments in Brazil, in the south of Bahia (16°36′52″ S, 39°30′33″ W, and 140 m asl) and the north of Espírito Santo (18°39′43" S, 40°25′52″ W, and 200 m asl). According to Köppen's classification, the regional climate is Aw tropical, with hot humid summers and dry winters, mean minimum temperatures of over 15 °C (July to August) and mean maxima of over 35 °C in January to February ( Figure  3). The edaphoclimatic conditions in these two locations characterize slightly different conditions, and therefore two distinct environments for the plants. At each location, four harvests were carried out, between the harvested period 2016 to 2019 at the experimental plot level, according to the maturation cycle of each genotype. The individual yield (bags of green coffee per hectare) was used as a random variable in the statistical analyses.

Model Choice
A statistical model was implemented that contemplates the initially planned experimental design, with all the identified effects. From this model called the complete model, a simplification of the model was performed, removing non-significative effects or their combinations. After that, a Bayesian approach was used. This simplification helps to reduce the number of model parameters and complexity, in addition to helping the processing time in Bayesian chains, reducing their numbers if there are fewer effects in the model. Complete model: where: y (ijk) is the vector of the phenotypic values of the yield in the i th genotype of j th location of k th year; β 0 is an associated model parameter (intercept); g [i=1,...,43] is the parametric vector of random effects of i th genotype associated with the vector y by the incidence matrix known β 1 , assuming that g ∼ N 0, I ⊗ Σ g p [j=1,...,2] is the parametric vector of fixed (systematic) effects of j th location associated with the vector y by the incidence matrix known β 2 assuming that p ∼ N µ p , I ⊗ Σ p ;t [k=1,...,4] is the parametric vector of random effects of k th year inside j th location influencing both intercept and slope, associated with the vector y by the incidence matrix known β 3 , assuming that t ∼ N(0, 3] is the parametric vector of random effects of l th block inside j th location influencing both intercept and slope, associated with the vector y by the incidence matrix known β 4 , assuming that t ∼ N(0, I ⊗ Σ l ); and is the vector of random residual effects, assuming that ∼ N(0, I ⊗ Σ ). From this complete model(I), a simplification of the model was performed using package::functionlme4::lmer [18] in R language [28]. Were observed its goodness of fit through deviance and the number of parameters that each effect includes in the model. After simplification, a model for fit (II) using the Bayesian approach was chosen, using the same definitions for the effects as the full model described above:

Bayesian Approach
The Bayesian approach was used with the same model (II), applying the Monte Carlo method based on Markov Chains (MCMC), employing the package MCMCglmm::MCMCglmmas [29], in R language. A total of 500,000 iterations were estimated, discarding the first one at 50,000 (burn-in) and performing a 1:5 (thin) sampling. Quality statistics of the Markov Chain convergence were tested by the Geweke criterion [30] using the coda::geweke.diag package [31] in R language.
The formula represented in model (II) can be written more compactly, representing the system of equations with random effects models, expressed as: where: X is the design matrix for fixed effects and contains the predictor information and β = [β 0 , β 2 ] is the vector of parameters; Z is the design matrix for random effects and µ = [µ 1 , µ 3 ] is the vector of parameters; and is the error associated with the model, assuming y ∼ N(Xβ, I ⊗ Σ ).
In a Bayesian approach, both fixed and random effects are random, also called systematic effects, treated in the same way [29], and combine the design matrices (W = [X, Z]) and combine the vectors of parameters (θ = [β , µ ] ): In this formula, the design matrix is exactly equivalent to the original design matrix (frequentist) except we have one additional variable. In the frequentist model, this variable is absorbed into the global intercept because it could not be uniquely estimated from the data. Now, in the Bayesian approach, this additional parameter can be estimated by inclusion from a prior. As is usual in a Bayesian analysis, if there is no information in the data it has to come from the prior.
Priors can be defined for the residuals, the fixed effects, and the random effects. MCMCglmm::MCMCglmm uses an inverse-Wishart distribution prior for the (co)variances matrices and a normal prior for the fixed effects [29]. For model (II), the prior information was based on the posterior distributions of the parameters from a meta-analysis of coffee experiments [32][33][34][35]. The prior informative probability distribution for the systematic parameters of interest was provided by: β i ∼ N(µ 0 , V b ) in which µ 0 is a vector of averages and V b is a diagonal matrix of the a priori variance of β. Also was used an inverse-Wishart distribution for both random G 0 and residuals R 0 as a priori for the covariance matrices: G 0 ∼ W −1 1 Σ g , n and R 0 ∼ W −1 1 (Σ , n) in which Σ g and Σ are scale matrices. From chains, the a posteriori distribution was used to take averages, credibility intervals, breeding values, variance components, and broad-sense heritability, using the expression: h 2 = σ 2 g σ 2 g +σ 2 p/t +σ 2 , where: σ 2 g , σ 2 p/t and σ 2 are the genotypic, year inside the environment, and residual variances, respectively. Funding: The authors are indebted to the pioneer breeders, the coffee farmers, who made the first steps in the process of selection of most of the superior genotypes available nowadays. Therefore, the traditional names of the clones were maintained as used among coffee growers. We further acknowledge the support of the Federal University of Espírito Santo (UFES), the National Council of Scientific and Technological Development (CNPq), and the Foundation for Research and Innovation Support of Espírito Santo (FAPES). This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior-Brasil (CAPES)-Finance Code 001.