Novel Approach for Statistical Interpretation: A Case Study from Long-Term Crop Production Experiments (Hungary)

: In recent decades, the agricultural sector has witnessed rapid technological interventions from field to the production stage. Thus, the importance of these technological interventions must be strictly evaluated. The traditional statistical method often deems low statistical differences as a significant one, which cannot be considered effective from different perspectives. In this sense, the aim of this research was to develop a new statistical method for evaluating agricultural experiments based on different criteria; hence, the significant importance of the technological interventions can be clearly determined. Data were collected from of a long-term (13-year) crop production experiment (Central Europe, Hungary), which involved five different fertilization levels, along with non-fertilized treatment (control), two irrigation treatments (irrigated and non-irrigated), and 15–20 genotypes of maize. The output of this research showed that the classic statistical approach for testing the significant differences among treatments should be accompanied with our new suggested approach (i.e., professional test), which reflect whether treatments were professionally effective or not. Also, results showed that good statistical background is not enough for interoperating the analysis of agricultural experiments. This research suggested that erroneous conclusions can be avoided by merging classical and professional statistical tests, and correct recommendations could be provided to decision makers and farmers based on their financial resources.


Introduction
Globally, the agricultural sector is under immense pressure due to rapid population expansion, urbanization, and climate change [1].Despite the malfunction of the agricultural sector in many parts of the world due to land degradation [2,3], extreme climate events [4,5], and plant diseases [6]; this sector has to feed more than 9 billion by 2050 and due to several drawback in NHST.Rinella and James [23] reported that p-values were difficult to be interpreted and regularly misinterpreted, whereas there are high possibilities for uncertainties in the interoperations of the results.However, this method is still widely used in agricultural research, where the correctly interpreted NHST is considered by researchers to be a decision-making tool [25][26][27][28].
In long-term agricultural experiments, especially for crop production, researchers applied different treatments, such as different level of fertilization, irrigation, and soil treatments.Ordinarily, the collected data from different treatments were evaluated by analysis of variance.If the F-test is significant, a post-hoc test is applied to analyze mean value differences.This may be the Least Significant Difference (LSD) [29], Duncan [30], Dunnett [31], Scheffe [32], or many other tests.However, these tests examine a very "soft" hypothesis: can the difference between treatment mean values be considered zero?Which is the classic null hypothesis.Such hypothesis is often denied by the tests and show a statistically significant effect.In the case of samples with a large number of elements, small, insignificant differences are also considered significant.However, the statistically significant effect is not the same as the professionally significant effect.In addition, researchers often erroneously interpret the strength of significance as the magnitude of the effect [21].For instance, in agricultural fertilization experiments, if researchers rely only on statistical significance (p < 0.05); over-fertilization, environmental pollution and economic loss might be caused as a result.Thus, a new approach should be adopted, tacking into consideration environmental impact, crop yield, and economic positives.
Given the above introduction, the main aim of this research was to develop a new method for evaluating long-term experiments in crop production based on Meehl [33] framework.We considered the "null hypothesis" (H0) when the expected yield surplus in the possession of professional knowledge.However, the level of expenditure and the expected increase in yield can be determined for each level of treatment.

Site Description and Experiment Design for Data Collection
Data were collected from Látókép Experimental Site of the University of Debrecen (47°33′ N, 21°26′ E, 11 m) in eastern Hungary (Figure 1) [34].The experimental site was established in 1983 to be the core of agricultural research in the eastern part of Hungary.The average temperature is 10.4 °C and the average rainfall is 537 mm (Figure 2).The experimental was designed as Randomized Complete Block Design (RCBD), which included a four-repetition three-factor process (fertilization, irrigation, hybrids).The treatments include six levels of fertilization and tow level of irrigation, as can be seen in Table 1.
To cope with the study goal, data from long-term experiment (i.e., 13 years (1996-2008)) were used, which can be considered as a homogeneous time series.The collected data include maize yield of 15-20 genotypes.

Steps for Building Professional Statistical Approach (Suggested Approach)
1. Selection of the significant level: based on decades of experimental experience, the best compromise is 10%.In crop production experiments, where biological objects are dealt with, internal deviation is high and the generally applied 5% significance level is too strict.
2. Cost and yield: cost of fertilizer dose and application is determined and the amount of grain it corresponds to is calculated.This will be the basis of the expected yield growth.A fertilizer dose is only considered to be professionally significant if the yield increase is statistically higher than that.This is the professional alternative hypothesis.Thus, the professional alternative hypothesis has to be defined.At the same time, the expected yield surplus that will cover the cost of the treatment has to be calculated, which is the material and application cost of the additional fertilizer in the presented example.In other experiments, e.g., in an irrigation experiment, the cost of irrigation should be covered by the revenue from the surplus yield caused by the treatment.If no such yield increase is obtained, the treatment is professionally non-significant.However, a statistically significant difference might still be found, but it is so small that it is professionally ineffective, not covering the cost of the inputs, i.e., the treatments.
3. Analysis of variance: If the F-test is significant, the analysis can proceed.If the Ftest is not significant, then there is no significant statistical or professional effect.However, a significant F-test does not automatically mean that the effect is also professionally significant.
4. Suggested approach: the "professional null hypothesis" is set opposite to the professional alternative hypothesis.This hypothesis is tested with statistical methods.The "professional null hypothesis": treatment is professionally ineffective; yield growth is less than or equal to the expected yield growth.Thus, professionally significant effects will be justified indirectly.User-defined contrasts are created that simultaneously examine onesided hypotheses.This will be a real multiple mean value comparison method, where pvalues are corrected based on the single-step method.Contrasts should be adjusted to test the difference between two neighboring treatment means.This step can be set up in the R statistics software as follows: glut(model, linfct = mcp(npk = c("N60-controll ≤ 0.84", "N120-N60 ≤ 0.84", "N180-N120 ≤ 0.84", "N240-N180 ≤ 0.84", "N300-N240 ≤ 0.84"))) 5. Analyzing findings: If our hypothesis is accepted, the fertilizer dose will have no significant professional effect.If we reject it, the fertilizer has significantly increased the yield of the plant in a professional sense as well.
6.The analysis is carried out for each year and professionally significant effect of each dose is determined.Based on that probability of the professional effectiveness of the given fertilizer dose can be determined.Deducting this probability, the risk of using the given dose is obtained.

Statistical Analysis Performance:
For testing professional significance, the 3.6.0version of the R statistical environment was applied [35].For general linear hypotheses and multiple comparisons for parametric models, the glut() function was applied, which can be found in the {multcomp} package [36].In the R environment, multiple mean value comparison tests to be used in agricultural research (LSD, Duncan, Sheffé, Dunett, etc.) can be found in the {agricolae} package [37].

Compression between Traditional and Professional Method
The previous treatment mean is subtracted from the next value and it is tested if it can be considered less or equal to 0.84 t ha −1 , i.e., ineffective (Equation ( 1)).The results of the multiple mean comparison test are shown in Table 2, which is the output list of the R software.The "professional null hypothesis" is rejected only in the case of the N60 kg ha −1 dose alone, as only this dose has a professionally significant effect.Applying additional fertilizer is not justified, as crop growth does not exceed the professionally expected level.
The explanation of Table 2 requires the MSE value from the analysis of variance.In addition to the six fertilizer doses, six hybrids were also included in the experiment, so a two-factor analysis of variance were applied.In this sense, the MSE value was 0.973, and the degree of freedom 130.
The Estimate column of Table 2 contains the difference in mean yield values of the two treatments.The Std. error column is the standard error of differences.This should be determined as follows: This formula is also used to determine the LSD.The number of repetitions (r) is 24, there are 4 real repetitions and 6 hybrids, the product of which is 24.Thus, the standard error of the difference between the mean values of the two group is 0.2848 t ha −1 .The standard error can be used to represent the distribution of the "professional null hypothesis" (Figure 3).Acceptance and rejection ranges can be marked.The blue light area of the figure is the acceptance range, while red indicates the rejection range.In this example, the critical value is 1.2 t ha −1 for a 10% one-sided alternative hypothesis.Of course, this value varies depending on the experiment and the given year.The greater the internal deviation of the data, the greater the difference between the professionally expectable yield increase and the value belonging to the 90% probability.
Result of the "traditional LSD test" (alpha = 0.1) is presented through an example, completed by an illustration of a homogeneous group with the actual data of the random block arranged maize experiment of 1985.In addition to the non-fertilized control treatment involved in the experiment, there were five different doses of NPK, which was double the amount of applied fertilizer during the period (1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008).
Three homogeneous groups were obtained.Group "a" shows the yields of N120-N300 kg ha −1 .They do not differ significantly in statistical terms.According to the above, the highest yield can be achieved with N120 kg ha −1 fertilizer.The N60 kg ha −1 dose (group b) resulted in a significantly lower yield, while the lowest was recorded in the case of the non-fertilized group "c".Another way to reduce type I errors is if fewer paired comparisons are performed.If the effects of treatments are compared to the control, the Dunnett's test can be applied suitably.There is only k − 1 (k is the number of treatment levels) in this case.If this is the objective, then this test has the highest power as a statistical test.However, in a fertilizer dose experiment, it is incorrect to compare to the control (non-fertilized treatment) because it might lead to a misleading conclusion.An example where the test results are misleading the researcher: let us assume that even the first fertilizer dose increases the yield to a very large extent; however, the other additional fertilizer doses no longer increase the yield any further.In this case, the third, fourth, etc. doses also show a significant effect as compared to the control.This result is shown in Table 2.It can be clearly seen that the Dunnett test indicates a significant effect on all fertilizer doses, although only the first doses increased the yield significantly.
According to the Professional Hypotheses, only the first fertilizer dose of N60 kg increased yield significantly.This is shown by the value of the Pr(>|t|) column.The probability value is less than 0.1.The other additional fertilizer applications did not increase maize yields to the extent that would be professionally expected.

Application of Professional Hypotheses on Collected Experimental Data
A total of 13 years of data from the long-term experiment were utilized.When presenting the method, yields of non-irrigated and irrigated plots are evaluated separately, as water supply in the often-droughty climate of Hungary significantly influences the utilization of nutrients.
Expected increase of grain yield of fertilizer dosages were determined on the basis of the technology of KITE cPlc.(Table 3).Prices were taken into account on the Hungarian level of 2017.The grain yield increase of the first dose is higher than the others, because the application cost over the fertilizer price is borne by this dose.The additionally applied quantities will only increase production by the cost of the fertilizer.For the other doses, the value of yield increase is not exactly the same, because the calculation is based on actual KITE cPlc.data (KITE is the collaborated agricultural company with Debrecen University), where fertilizer reloads and actual rotations were taken into account.The difference is not significant.

Results of the Non-Irrigated Treatments
The method was first tested on the data of plots with natural precipitation.So, two different years are presented from the whole studied period (i.e., 13 years).Table 4 shows a year where only low fertilizer doses were professionally significant.Except for the N150 kg ha −1 nitrogen dose, all doses increased the yield of maize professionally for yield data in 1999.
The overall result of 13 years is shown in Figure 4.Each year, the N30 and N60 kg ha −1 doses caused professionally significant yield increases.The N90 kg ha −1 dose was only 46% effective, and the N120 kg ha −1 was effective at 8%.The N150 kg ha−1 dose was not effective in any of the years of the studied period.

Results of the Irrigated Treatments
The two extreme cases from among the results of irrigated treatments are presented as well.Table 5 illustrates that only the first two fertilizer dosages increased the yield in a professionally significant manner (1997).The other extreme, when at 10% professional significance, all the doses used in the experiment increased maize yield, except for N150 (Table 5).
Figure 4 shows the summarized results of the studied period.The first two doses increased yield every year at the professionally expectable level.N90 kg ha −1 was 54% effective, N120 kg ha −1 was 46%, and N150 kg ha −1 was only 15% effective.

Discussion
Recently, economic growth and rapid increase of world population stress the importance of the agricultural sector and food security all over the world [38,39].In this regard, as the world's population grows to nearly 9 billion people by 2050, global food demand is expected to increase by 70-85 percent [40,41].To meet the increased demand for food, rapid technological investment in the agricultural sector was implemented.However, these changes in the agricultural pattern raise questions about the profitability of this investment and its environmental impact.In this sense, the main goal of this research was to provide a new statistical approach for evaluating the impact of new agricultural practices (fertilization, genotype) on agricultural practices against traditional approaches.On the other hand, if researchers depend solely on statistical significance (p < 0.05) in agricultural fertilization trials, over-fertilization, environmental contamination, and economic loss may ensue.As a result, a new method should be taken, taking into account environmental impact, agricultural output, and economic benefits [23,24].Scientifically, the conventional statistical approaches have their drawbacks [14].For instance, when evaluating experiments through the application of multiple mean value tests, mostly all of the possible paired comparisons are tested, where k(k − 1)/2 (k is the number of treatment levels) comparison can be performed, where k is the number of treatment levels.The disadvantage of the above is that if the number of treatment levels is high, accumulation of Type I Error will be so high because of the large number of paired comparisons that there are often misinterpreted effects, i.e., Type I Error occurs.As the number of comparisons increases, the probability of making Type I Errors increases.If there are six fertilizer levels in an experiment and a 5% LSD test is applied, the global alpha value will be 53.67%.This is the probability that there will be at least one Type I Error during the comparisons.This error can be reduced by correcting the p-value, for which many recommendations are known.
From amongst the results of the long-term fertilization experiments, the most important information for practice would be to know how much additional grain yield increase is induced by additional fertilizer compared to the previous dose, and whether it is worthwhile to apply more nutrients.The yield surplus must cover the expenditure, so it should be at least that amount of yield increase or more.This has to be tested with a one-sided asymmetric test, in a series from small to large.This reduces the accumulation of the type I error, since only k−1 comparison is required.This could also be tested with independent two-sample t-tests, but the probability of committing a type I error is significantly increased in this case.In addition, the common deviation of error is only estimated from two samples, which is less accurate than an estimate based on all groups.Thus, in our research, we develop a new professional null hypothesis, which overcomes the disadvantages of old statistical approaches and it tests whether the supplemented nutrient is ineffective, namely whether it does increase the yield to the expected level.
In this research, fewer paired comparisons are performed to reduce type I errors, and to avoid the misleading of Dunnett test.In other words, conducting a comparison between fertilized and nonfertilized experiments showed that fertilization plots showed a significant impact on yield production (Table 2), although only the first two doses increased the yield significantly.To overcome this issue, the professional hypothesis approach should be applied to know how much additional grain yield increase is induced by additional fertilizer compared to the previous dose, and whether it is worthwhile to apply more nutrients.
By adopting the professional hypothesis approach, we can precisely evaluate the impact of different agricultural practices on crop yield.However, in this research we discriminated the irrigated experiments from rainfed ones, where irrigation and fertilization enhanced crop production [42].For rainfed experiments, the implementation of professional hypothesis approach reveals that the N30 and N60 kg ha −1 doses caused a professional significant yield increase, while the other treatments could be excluded, especially the N150 kg ha −1 dose (Figure 4).Similarly, the output of irrigated plots showed that the N30 and N60 kg ha −1 doses increased the yield professionally and significantly, while the highest fertilization dosage (N150 kg ha −1 ) covers only 15% of yield increase (Figure 4).It is clear from the results that irrigation improves the efficiency of fertilization in the oftendroughty climate of Hungary and that higher doses are more likely to be utilized than in non-irrigated treatments.By knowing the results, farmers can decide what risk they are taking with additional nutrient supply and whether it is worthwhile to use or not to use nutrients that are utilized with a probability lower than 50%.
It is good to emphasize that the professional hypothesis approach does not aim to replace the traditional approach, and therefore it is a complementary approach that tests the hypothesis from different dimensions.The outcome of this research will provide farmers and decision makers with extra information about the efficiency of their input and their impacts from different aspects (finical, environmental).

Conclusions
The developed method should not be utilized instead of the classic analysis of variance, but afterwards as its addition.After the statistically significant F-test, there is no need to test whether two treatment differences can be considered zero.Indirectly, this will only show a significant difference higher than zero.Information that can be useful for the producers is provided by the professional significance test.
This method is suitable for evaluating data from long-term experiments, and the vast amount of information they contain can be used for practical purposes, helping to balance productivity and environmental considerations.
Possibilities for further development of the method: development of scenarios by changing professionally expected yields, depending on the hectic changes in maize prices and production costs, or in view of their expected future developments.How will the impact of climate change on yield modify nutrient replenishment?More or less artificial fertilizers will be applied?

Figure 1 .
Figure 1.Location of the experiment site, Hungary, city of Debrecen.

Table 2 .
Results of the simultaneous tests for general linear hypotheses.
*** Significant at the 0.001 probability level (Adjusted p values reported-single-step method).

Table 3 .
Data of treatments and expectable grain yield increase.

Table 4 .
Result of the simultaneous tests for general linear hypotheses in two different crop years.
*** Significant at the 0.001 probability level; ** Significant at the 0.01 probability level (Adjusted p values reported-single-step method).

Table 5 .
Result of the simultaneous tests for general linear hypotheses.