Previous Article in Journal
Response of Senegalese Sorghum Seedlings to Pathotype 5 of Sporisorium reilianum
Article

Good Statistical Practices in Agronomy Using Categorical Data Analysis, with Alfalfa Examples Having Poisson and Binomial Underlying Distributions

1
Vis Viva Energy Economics Consulting, 2114 State Ave., Ames, IA 50014, USA
2
Department of Agronomy and Plant Genetics, University of Minnesota, 1991 Upper Buford Circle, St. Paul, MN 55108, USA
3
USDA-ARS, Plant Science Research Unit, 1991 Upper Buford Circle, St. Paul, MN 55108, USA
4
School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
5
Department of Plant Pathology, University of Minnesota, 1991 Upper Buford Circle, St. Paul, MN 55108, USA
*
Authors to whom correspondence should be addressed.
Crops 2022, 2(2), 154-171; https://doi.org/10.3390/crops2020012
Received: 21 March 2022 / Revised: 25 April 2022 / Accepted: 3 May 2022 / Published: 13 May 2022
Categorical data derived from qualitative classifications or countable quantitative data are common in biological scientific work and crop breeding. Categorical data analyses are important for drawing correct inferences from experiments. However, categorical data can introduce unique issues in data analysis. This paper discusses common problems arising from categorical variable analysis and modeling, demonstrates the issues or risks of misapplying analysis, and suggests approaches to address data analysis challenges using two data sets from alfalfa breeding programs. For each data set, we present several analysis methods, e.g., simple t-test, analysis of variance (ANOVA), split plot analysis, generalized linear model (glm), generalized linear mixed model (glmm) using R with R markdown, and with the standard statistical analysis software SAS/JMP. The goal is to demonstrate good analysis practices for categorical data by comparing the potential ‘bad’ analyses with better ones, avoiding too much reliance on reaching a significant p-value of 0.05, and navigating the morass of ever-increasing numbers of potential R functions. The three main aspects of this research focus on choosing the right data distribution to use, using the correct error terms for hypothesis test p-values including the right type of sum of the squares (Type I, II, and III), and proper statistical models for categorical data analysis. Our results show the importance of good statistical analysis practice to help agronomists, breeders, and other researchers apply appropriate statistical approaches to draw more accurate conclusions from their data. View Full-Text
Keywords: categorical data analysis (CDA); Poisson; binomial; generalized linear model (glm); generalized linear mixed model (glmm) categorical data analysis (CDA); Poisson; binomial; generalized linear model (glm); generalized linear mixed model (glmm)
Show Figures

Figure 1

MDPI and ACS Style

Mowers, R.P.; Bucciarelli, B.; Cao, Y.; Samac, D.A.; Xu, Z. Good Statistical Practices in Agronomy Using Categorical Data Analysis, with Alfalfa Examples Having Poisson and Binomial Underlying Distributions. Crops 2022, 2, 154-171. https://doi.org/10.3390/crops2020012

AMA Style

Mowers RP, Bucciarelli B, Cao Y, Samac DA, Xu Z. Good Statistical Practices in Agronomy Using Categorical Data Analysis, with Alfalfa Examples Having Poisson and Binomial Underlying Distributions. Crops. 2022; 2(2):154-171. https://doi.org/10.3390/crops2020012

Chicago/Turabian Style

Mowers, Ronald P., Bruna Bucciarelli, Yuanyuan Cao, Deborah A. Samac, and Zhanyou Xu. 2022. "Good Statistical Practices in Agronomy Using Categorical Data Analysis, with Alfalfa Examples Having Poisson and Binomial Underlying Distributions" Crops 2, no. 2: 154-171. https://doi.org/10.3390/crops2020012

Find Other Styles

Article Access Map by Country/Region

1
Back to TopTop