Proposed Consecutive Uncertainty Analysis Procedure of the Greenhouse Gas Emission Model Output for Products

: The study objective was to develop a method for an uncertainty analysis of the greenhouse gas (GHG) emission model output based on consecutive use of an analytical and a stochastic approach. The contribution to variance (CTV) analysis followed by the data quality analysis are the main feature of the procedure. When a set of data points of a certain input variable has a high CTV, but its data quality indicator (DQI) is good, then there is no need to iterate data collection of this input variable. This is because the DQI of this data set indicates that there is no room for the reduction of its variance, and the high variance must be its inherent attribute. Through the CTV analysis and data quality analysis, the identiﬁed input variables were selected as the input variables for the data from the iteration of data collection. The statistical parameters of the GHG emissions of the model were calculated using the Monte Carlo simulation (MCS). In the case study of a cattle dairy farm, the relative reduction in the CV value was 47.6%. In this study, a procedure was developed for the selection of the input variables for iteration of data collection to reduce their variance and subsequently reduce the uncertainty in the model output. The dairy cow case study showed that the uncertainty in the model output was decreased by the iteration of data collection, indicating that CTV analysis can be used to identify the input variables, contributing considerably to the uncertainty in the model output.


Introduction
It is an international consensus that human production and consumption activities cause climate change [1]. In recent years, life cycle assessment (LCA) studies have been extended to food production and cooking appliances [2,3]. This is a phenomenon that confirms that interest in the environmental impacts that occur throughout human life in the LCA field has increased.
According to previous studies, the contribution of greenhouse gas (GHG) emissions in the dairy sector is estimated to be 3-5% of the global GHG missions. In Korea, various efforts are being made to reduce GHG emissions in the dairy sector and there is a growing demand for accuracy [4][5][6][7].
In Europe, an effort is being made to manage and control GHG emissions not only from the industrial products sectors but also from the dairy industry sector through the product environmental footprint (PEF). This includes the development of the quantification method of GHG emissions from the dairy sector [8,9]. This effort can be envisaged as a prelude to the certification of carbon emissions in Europe. Any carbon certification or trading requires that the credibility of the GHG emission results, such as the quantification in the uncertainty of the GHG emission resulting from the industry sectors or product, is necessary and must be ensured [10].
Uncertainty analysis is the analysis of the mathematical model output by quantifying the amount of deviation of the calculated model output from its mean. The uncertainty analysis result is often expressed as a confidence interval at a given confidence level. Quite often, the model inputs suffer from observation and measurement errors. This causes a limit on the confidence in the model output [11].
In order to gain confidence in the model output, the mathematical model should include the following two evaluation steps: a quantification of the uncertainty in the model output (uncertainty analysis) and an evaluation of how much each input variable contributes to the uncertainty of the model output (sensitivity analysis) [12].
Normally, there are many input variables in the mathematical model. Therefore, an efficient scheme needs to be developed for identifying input variables that considerably contribute to the uncertainty of the model output. Global sensitivity analysis is an effective tool for identifying input variables contributing to the model output uncertainty [12][13][14][15][16].
Uncertainty can be reduced through the process of iteration of data collection [17]. Therefore, those identified input variables will become targets for further scrutiny, including iteration of data collection.
The global sensitivity method used in this study was a modification of the variance-based method [18][19][20][21]. The variance-based method uses probabilistic approaches, which quantify the input and output uncertainty using their probability distributions. It also decomposes the output variance into parts attributable to input variables [12].
The objective of this study was two-fold: (i) to perform a global sensitivity analysis for identifying the input variables that contribute considerably to the uncertainty of the model output and (ii) to quantify the uncertainty reduction of the model output when the data from iteration of data collection of the identified input variables are used instead of the original data. The actual process and activity data collected from a dairy cow farm in Korea [22] were used to evaluate the applicability of the proposed method.

Materials and Methods
The uncertainty of each variable can affect the uncertainty of the result, and the variance of each variable can be used as an indicator to represent the uncertainty of the variable [23]. The process of reducing variance helps to correctly estimate the mean of the overall results [23]. This means that it is important to select significant variables in order to effectively reduce the uncertainty of the results.
In this study, we used the error propagation method in lieu of the probabilistic approach for identifying key input parameters which affect the uncertainty of the carbon footprint result. This was to avoid an excessive computing time in selecting the key input parameters. The contributions of input parameters to the uncertainty of the result were evaluated by the contributions of the input parameters to the variance of the results. This was the concept of the contribution to variance (CTV).
The global sensitivity analysis approach, termed analytical approach in this study had two elements which were the variance calculation of the model output using the error propagation equation and the identification of the significant input variables using the CTV analysis. The sensitivity analysis results led us to focus on the identified input variables where their errors were reduced through iteration of data collection and we simplified the mathematical model by removing the iteration of data collection process for insignificant input variables as in studies dealing with a single environmental issue, such as carbon footprint.
The reason for coining the term "analytical approach" here was that no stochastic simulations were included in the variance calculation step for the sensitivity analysis. If time and resources are not constraints, then one can use the stochastic approach from the beginning without going through the error propagation equation step and obtain the uncertainty of the model output based on the variance-based approach. However, the iteration of data collection process expenses were a major hurdle in this case.
After the analytical approach, the model underwent stochastic simulation to calculate the uncertainty of the model output. This required estimating the probability density function (PDF) of the input variables, its data ranges, and generating the model output using the Monte Carlo simulation method. From the model output values, the interval estimate of the model output was calculated at a 95% confidence level.
In addition to the sensitivity analysis, we also included a data quality indicator (DQI) concept to reduce the number of input variables for iteration of data collection. Temporal, geographical, and technological characteristics of the data influenced the mean and variance of each variable, and these characteristics were used as the DQI evaluation factors [23]. High data quality meant that the accuracy and precision of the collected data for the variable was less likely to be reduced. We assumed the CTV analysis identified input variables that contributed considerably to the model output. If the data of the input variable had inherently high variance (a representative example of an input variable with inherently high variance was ingredient feeds and roughage feeds that were applied alternatively according to price fluctuation, such as soy bean, alfalfa hay, grass hay, etc.), then, there was no room for reducing the sample variance of that input variable, even if we collected more data points with higher precision. Therefore, the purpose of introducing the DQI concept was to add one more filter before commencing iteration of data collection.
For assessing the environmental impact of the GHG emissions from a dairy cow farm system, a mathematical model was formulated. The model was defined by equations, input variables, and relevant coefficients.
This study addressed only uncertainty in GHG emission from on-farm data. Figure 1 shows the concept of on-farm and off-farm processes. In this study, the upstream processes in off-farm, such as feedstuff cultivation, energy, and utility production processes were excluded from active data collection. This was because the research conditions were incomplete, so the activity data up to the data on the cultivation of the feedstuff could not be used. The GHG emissions for upstream processes were calculated using the pre-established LCI database, and the uncertainty of the LCI database was not considered in this study. Functional unit was set to 1kg of Fat-Protein Corrected Milk (FPCM). After the analytical approach, the model underwent stochastic simulation to calculate the uncertainty of the model output. This required estimating the probability density function (PDF) of the input variables, its data ranges, and generating the model output using the Monte Carlo simulation method. From the model output values, the interval estimate of the model output was calculated at a 95% confidence level.
In addition to the sensitivity analysis, we also included a data quality indicator (DQI) concept to reduce the number of input variables for iteration of data collection. Temporal, geographical, and technological characteristics of the data influenced the mean and variance of each variable, and these characteristics were used as the DQI evaluation factors [23]. High data quality meant that the accuracy and precision of the collected data for the variable was less likely to be reduced. We assumed the CTV analysis identified input variables that contributed considerably to the model output. If the data of the input variable had inherently high variance (a representative example of an input variable with inherently high variance was ingredient feeds and roughage feeds that were applied alternatively according to price fluctuation, such as soy bean, alfalfa hay, grass hay, etc.), then, there was no room for reducing the sample variance of that input variable, even if we collected more data points with higher precision. Therefore, the purpose of introducing the DQI concept was to add one more filter before commencing iteration of data collection.
For assessing the environmental impact of the GHG emissions from a dairy cow farm system, a mathematical model was formulated. The model was defined by equations, input variables, and relevant coefficients.
This study addressed only uncertainty in GHG emission from on-farm data. Figure 1 shows the concept of on-farm and off-farm processes. In this study, the upstream processes in off-farm, such as feedstuff cultivation, energy, and utility production processes were excluded from active data collection. This was because the research conditions were incomplete, so the activity data up to the data on the cultivation of the feedstuff could not be used. The GHG emissions for upstream processes were calculated using the pre-established LCI database, and the uncertainty of the LCI database was not considered in this study. Functional unit was set to 1kg of Fat-Protein Corrected Milk (FPCM). The greenhouse gas (GHG) emission model, in general, was expressed as the linear function as shown in Equation (1): where: The greenhouse gas (GHG) emission model, in general, was expressed as the linear function as shown in Equation (1): where: z = GHG model output, g CO 2 -eq/fu, a i = GHG emission factor, g CO 2 -eq/g of the ith substance, X i = mass (energy) of the ith substance, g(J), fu = functional unit (1 kg of FPCM).
The data of the input variables, X i , were collected and then plugged into Equation (1) to calculate the model output, z. The GHG emission factor, ai, comes from the LCI database [24]. Variance, mean, and coefficient of variation (CV) of z were calculated to assess the uncertainty of the model output. It was often desirable to use the contribution to variance (CTV) to judge the degree of dispersion of the data of a random variable and the output of a model [11,25].
The data of the input variables, such as data of the processes and activities (i.e., the amount of feed intake, energy use, number of heads, etc.) were subject to a variety of errors, including completeness, representativeness, and boundaries, such as temporal, geographical, and technological. In other words, the data quality of the input variables is questionable [26]. Without considering the data quality of the input data, the GHG model output would suffer from the errors of the input variables.
However, there were instances where an input variable had an inherently high variance in nature, such as soy bean, alfalfa hay, grass hay, etc. In this case, iteration of data collection did not reduce the variance of the model output. Therefore, such input variables were not subjected to iteration of data collection.
Once particular input variables exhibited high CTV with poor data quality, those particular input variables underwent iteration of data collection. The new mean, variance, and CV of the model output were then calculated using the data from the iteration of data collection. The entire procedure was repeated until all the conditions specified in Figure 2 were met. This paper adopted the global sensitivity analysis method for identifying input variables that influence the model output. The expression for the variance of the model output, z, could be obtained using the Taylor Series 1st order approximation and definition of the error propagation equation [14,[27][28][29]. The resulting expression was termed as the error propagation equation and is shown in Equation (2): where: Equation (2) was a generic equation for the quantification of the variance of the model output as a function of the variances of the input variables and their sensitivity coefficients. Furthermore, GHG emission was chosen as the model output to apply the identification methodology proposed in this study. In this sense, the identification methodology could be applied to any other impact categories.
In most LCI databases and LCIA studies, covariance can be assumed to be negligible [30]. Therefore, we used Equation (2) by setting the covariance term to zero. Equation (2) shows that the variances of the input variables weighted by the square of their partial derivatives determine σ 2 z . The variance of the input variables caused the uncertainty of the model output. The error propagation equation indicates that the uncertainty of each input variable propagated uncertainty through the model and resulted in model output uncertainty.
Equation (2) also shows that the value of σ 2 X i ·(∂z/∂X i ) 2 represents the degree of contribution of the input variable X i to the variance of the model output, i.e., σ 2 z , and the CTV of X i to σ 2 z as expressed in Equation (3) should be used as the criterion for identifying the significant input variable [14]: A variable X i identified with high CTV did not automatically indicate that it became the target for iteration of data collection. One needed to investigate the data quality of the identified input variable. The description of the steps in Figure 2 together with rationale for each step are shown below.
Step 1 collect initial data for calculating the GHG emission.
Before any data collection activity began, a target farm had to be chosen based on the random sampling technique. Here, we used the stratum sampling method [20]. The stratum used in this study was a dairy cow farm that fed its cow using the standard feed mixture. Since one of the objectives of the study was to validate the identification methodology for reducing the uncertainty of the model output, only one typical Korean dairy farm was chosen.
There were two principles in the data collection of the input variables adopted in this study. The first principle was that onsite data would be collected as much as possible. The second principle was When a set of data points of a certain input variable had a high σ 2 X i ·(∂z/∂X i ) 2 , but its data quality indicator (DQI) was good, then there was no need to iterate data collection of this input variable. This was because the DQI of this dataset indicated that there was no room for the reduction of its variance, and the high variance must be its inherent attribute.
It should be pointed out that no direct relationship existed between the DQI and variance of an input variable. However, in the case of an input variable with a poor DQI, there was a possibility of having high variability of data because of the following data quality areas: time-related coverage, geographical related coverage, technology coverage, precision, completeness, representativeness, consistency, and sources of data. If this was the case, the variance of the input variable was reduced through iteration of data collection. As a result, its DQI could be improved. As such, choosing the input variable with a poor DQI as well as high CTV for iteration of data collection would be an effective means of reducing the variance of the model output. Figure 2 shows the step-by-step procedure for identifying the input variables that contributed considerably to the uncertainty of the model output together with the uncertainty quantification procedure of the model output.
The description of the steps in Figure 2 together with rationale for each step are shown below.
Step 1 collect initial data for calculating the GHG emission.
Before any data collection activity began, a target farm had to be chosen based on the random sampling technique. Here, we used the stratum sampling method [20]. The stratum used in this study was a dairy cow farm that fed its cow using the standard feed mixture. Since one of the objectives of the study was to validate the identification methodology for reducing the uncertainty of the model output, only one typical Korean dairy farm was chosen.
There were two principles in the data collection of the input variables adopted in this study. The first principle was that onsite data would be collected as much as possible. The second principle was that the data collection period would span at least one year to reflect the seasonal variations of the dairy cow farm. Data sources in this study included the invoice of the feedstuff, materials, and energy, and the growth record of the number of heads on the farm. Table 1 lists the input and output variables with activity data from the target farm. Table 2 lists GHG emission factors from the LCI databases developed in a research project [24]. Emission factors related to crops included the effect of volatilization and leaching from fertilizers applied to the farm land.  According to the error propagation equation in Equation (2), σ 2 X i and (∂z/∂X i ) 2 represent the variance of the input variable X i and the square of GHG emission factors of the input variables, respectively, which are shown in Table 1; Table 2.
Step 2 calculate the mean, variance, and CV of the model output.
The mean of the model output is z = a i ·X i [27,29], where X i is the average of X i . The variance of the model output is σ 2 z = σ 2 X i ·(∂z/∂X i ) 2 , and the CV of the model output is σ z /z.
Step 3 calculate the CTV of each input variables using Equation (3).
Step 4 select the input variables (X i ), of which CTV is more than 1%.
Step 5 calculate the data quality rating (DQR) for the chosen X i .
For the chosen input variable with a high CTV from Step 4, the data points of the input variable were assessed for their data quality using the pedigree-matrix data quality indicator (DQI) [32]. Herein, the DQR value could be a useful criterion in judging the data quality of the input variables.
Equation (4) shows the DQR calculation used in this study as a function of six DQIs, which included technological, geographical, time-related representativeness, completeness, precision/uncertainty, and methodological appropriateness and consistency [9]: where: DQR-data quality rating of the data points; TeR-technological representativeness; GR-geographical representativeness; TiR-time-related representativeness; C-completeness; P-precision (data measurement method); M-methodological appropriateness and consistency.
DQR was calculated using Equation (4) and site-specific data. The site-specific data of the input variables were processes and activities of a product.
This study used the previous report on the overall data quality rating in terms of DQR and its associated data quality level (i.e., ≤1.6, 1.6 to 2.0, 2.0 to 3.0, 3 to 4.0, and >4 represent "excellent quality", "very good quality", "good quality", "fair quality", and "poor quality", respectively [32]). Therefore, the DQR value <3 was envisaged as good quality data and the input variable with the DQR value >3 was selected for iteration of data collection. Table 3 shows the criteria of the data quality assessment items used in this study [32,33]. The DQR value of each input variable was obtained using these criteria.
Step 6 iteration of data collection for X i identified from Step 5.
Iteration of data collection for the input variable X i . The site-specific data from the process and activity of a product were collected, bearing in mind that more accurate data needed to be collected. The completeness of the data could be improved with more data with a longer time span, as such, the number of data points collected should be increased if possible.
Once iteration of data collection for the chosen input variables (those with high σ 2 with poor DQI) was completed, Step 1 should be used for calculation of the σ 2 z values. This procedure was repeated until all the conditions specified in Figure 2 were met. These conditions included that the CTV of the input variables less than 1% should be excluded from iteration of data collection. This was because its contribution to the model output could be negligible. Those input variables of which CTV was greater than 1% should be further tested for the data quality. A DQR value less than 3 indicated that there may be no room for reducing the variance of the input variable, because their data quality was judged to be reasonably high.
The next step was to quantify the uncertainty of the model output by first estimating the probability density function (PDF) of each input variable as described in Step 7.
During the iteration of data collection, a total of 72 data were collected for each input variable, spanning monthly data over the six-year period in this study. Step 7 estimating the PDF of X i .
Several methods could have been used for estimating the PDF. They include the Chi-square [27,35,36], the Kolmogorov-Smirnov test (K-S test) [27,37,38], and the Anderson Darling test [36,39], among others. In general, the K-S test is widely used in testing the PDF of a set of data points of a random variable, as such, we used the K-S test in this research. The K-S test was based on the empirical cumulative distribution function (ECDF). The method compared two cumulative distributions: one was the ECDF and the other was the assumed CDF for the dataset of the random variable. The maximum difference between the two CDF, Dn, was tested for the critical value of the Dn distribution. Dn was a statistic and is defined in Equation (5): where: Fx(x) = theoretical CDF based on the assumed PDF, Sn(x) = ECDF based on the experimental dataset. Let x1, . . . , xn be an ordered sample with x 1 ≤ . . . ≤ x n and define Sn(x) as in Equation (6); The distribution of Dn can be found in the Kolmogorov-Smirnov table [37]. If Dn, α was the critical value from the table at error of α, then P(Dn ≤ Dn, α) = 1 − α. Dn was used to test the hypothesis that the experimental dataset of a random variable of X came from a population with a specific cumulative distribution function Fx(x). If D n ≤ D n,α , then the experimental dataset was a good fit with Fx(x) [38].
Step 8 run the Monte Carlo simulation (MCS) and find z, σ 2 z , CV, and 95% confidence interval (CI) of the z.
There were two different methods for assessing the uncertainty, which were an analytical approach, such as the error propagation method, and a stochastic approach, such as the MCS method. The error propagation method for the model constructed in Equation (1), Section 2 only required the emission factor and variance of the input variables for the calculation of the variance of the model output. One shortcoming of this method originated from its deterministic nature, as such, the variance estimated for an input variable based on a limited number of data may not have represented the true variance of the input variable. This shortcoming could be overcome by incorporating the PDF of the input variables and generating many data of the input variable. This led to the use of a stochastic approach, such as the MCS. A unique feature of this paper is that it combines both approaches to estimate the uncertainty of the model output.
MCS, a stochastic method for the estimation of the model output uncertainty, provided a method for generating data points for each input variable and calculating the output result using the model and the generated input data points [27]. Repeating the procedure many times (e.g., n = 10,000) resulted in the PDF of the model output. The mean, variance, and confidence interval of the model output were then computed.
The error propagation was used to derive the iteration of data collection target variables, and it was possible to calculate the result that reflected the statistical correlation between variables through MCS.
MCS was performed using the statistical parameter values, and the estimated PDF from Step 7. The results of Step 7 of this study, shown in Table 7, together with the GHG emission factors are listed in Table 3. The number of iterations in each of the MCS runs was 10,000.

Results and discussion
A case study was performed to assess the applicability of the proposed uncertainty analysis method. A dairy cow farm located in Korea was selected for the case study [24]. The functional unit of the model output was one kg of FPCM. The mean, standard deviation, and cumulative CTV of the GHG emission of the input variables based on the initial data are listed in Table 4. The mean, standard deviation, and coefficient of variation of the model output were 1.18 kg CO 2 -eq/kg FPCM, 1.27 × 10 −1 CO 2 -eq/kg FPCM, and 10.77%, respectively, also listed in Table 4. Table 4. Mean, standard deviation, and cumulative contribution to variance (CTV) of the GHG emission of the input variables based on the initial data.  Table 4 shows that the CTV values of 10 input variables ranging from the mixed feed for lactating cows to maize silage were greater than 1%. Thus, a total of 10 input variables were chosen for the calculation of the DQR value. The DQR values are listed in Table 5 with the values of the six data quality indicators. Assessing the data quality indicators of the input variables followed the procedure outlined below. In the case of the mixed feed for the lactating cows, data for straw, oat, soybean, and maize silage came from the year 2005. The data were 10 years at the time of this study implemented in 2014. Old data such as these suffer from temporal representativeness. In addition, advancements in feeding technology have made these data less representative from a technological representativeness aspect. Most data came from the invoices of the feedstuff, such that the accuracy of the data was also questionable. In addition, 12 monthly data points in a one-year period were judged inadequate for data completeness.

Rank of the
Electricity consumption data came from the invoice of the power company. This indicates that electricity consumption data were inadequate from a temporal, technological representativeness, as well as a completeness aspect. The same problem exists in the case of the diesel consumption.
The enteric fermentation data of the growth stage of a cow and the number of heads were collected in a different manner for different growth stages of a cow. The lactating cow data had the same shortcomings as those of the mixed feed. Meanwhile, data of the growing heifer and dry cow were recorded regularly and had a significantly large number of data points. As such, they were considered to have better data quality compared to those of the other input variables.
Analysis of the collected data in accordance with the approach given above allows us to assign a DQI value to each category of the data for a given input variable. The DQI value assignment criteria listed in the literature were used [32,34].
Data for a six-year period of the eight chosen input variables came from iterated data collection from invoices and regular records, and iteration for the calculation of CTV was performed (termed 1st iteration). The mean, standard deviation, and coefficient of variation of the model output from the recollected data (1st iteration) were 1.09 kg CO 2 -eq/kg FPCM, 6.06 × 10 −2 kg CO 2 -eq/kg FPCM, and 5.56%, respectively, as listed in Table 6. Table 6. Mean, standard deviation, and cumulative CTV of the GHG emission model output based on iteration of data collection (1st iteration). The mean, standard deviation, and CTV of the recollected data of the input variables are shown in Table 6. Table 6 shows that there is a total of 15 input variables with CTV values greater than 1%. The DQR values of the input variables were less than 3 for all corresponding input variables, and thus, there was no need for further data collection. Then, estimating PDF of the input variables follows for the run of the MCS. The PDF of all the input variables tested in the case study using the K-S test is listed in Table 7. The probability distribution of the collected data, although it may not follow the normal distribution, can be estimated from the K-S test. The K-S test allows estimation of even a skewed distribution, such as lognormal, gamma, and beta, among others [37,40]. However, the result of the K-S test, which was explained in Step 7 of the Method section for raw data, shows the PDFs of each input variable as a normal and uniform distribution. Figure 3 shows the PDF of the model output based on the initial data and the recollected data. The reduction in the interval length (upper bound-lower bound) from that of the initial dataset and the recollected dataset showed that the relative reduction of the interval length was 51.8%, whereas the relative reduction in the CV value was 47.6%. These results clearly indicate that the uncertainty of the model output was reduced significantly by the iteration of data collection of the problematic input variables. Table 8 shows the MCS results for the total GHG emission for 1 kg of dairy cow FPCM. Several uncertainty analysis studies for the GHG emissions from the dairy products used the MCS method [40][41][42]. However, there are differences in the methodologies of the uncertainty analysis between this study and others. The previous existing studies focused on estimating the uncertainty of the GHG emission itself from the dairy cow milk and estimating the PDF of the activity data from the literature or assumptions made by the experts. This study, however, identified the sources of the uncertainty, namely, the input variables contributing considerably to the uncertainty of the model  Several uncertainty analysis studies for the GHG emissions from the dairy products used the MCS method [40][41][42]. However, there are differences in the methodologies of the uncertainty analysis between this study and others. The previous existing studies focused on estimating the uncertainty of the GHG emission itself from the dairy cow milk and estimating the PDF of the activity data from the literature or assumptions made by the experts. This study, however, identified the sources of the uncertainty, namely, the input variables contributing considerably to the uncertainty of the model output. On top of this, corrective measures were applied to reduce the error of the identified input variables by recollection of the data of the significant input variables. This result is the decreased uncertainty in the model output.

Rank of the
The identification of the input variables contributing considerably to the model output uncertainty was based on the calculation of the CTV of the input variables to the model output. To ensure proper selection of the input variables for iteration of data collection, the input variables exceeding a certain DQR value were selected for iteration of data collection. The assignment of a value to the element of the pedigree matric parameters based on the qualitative criterion is quite subjective, and one of the limitations of this study. We should collect more detailed data or go through more iterations.
However, the logic for including this qualitative approach was to reduce the effort required for the collection of the detailed data. As delineated in Step 5 in Figure 2, the DQR value calculation was used as a screening step for identifying input variables for iteration of data collection (detailed data). Clearly, the DQR approach used in this study should be improved or banned completely when there are other means available to use for choosing the input variables for iteration of data collection.
In addition, the K-S test was applied to estimate the PDF of the dataset of the selected input variables. A comparison of the uncertainty analysis method used in this study and the methods used by others is shown in Table 9.
According to Table 9, the key input variables for uncertainty of the GHG emission are emission factors for manure deposited in pasture, feed intake, EF3, EF CH4 , energy use, and enteric GH 4 emission. It is difficult to compare directly the study in France to other studies, because the study in France was conducted on the error of the emission factor for calculating GHG emission directly from the farm. However, it has been shown that the calculation of emissions from manure treatment contributes to the uncertainty of the GHG emissions results. The CTV of the GHG emission from manure is 67% and 84% for conventional and organic farms, respectively. The reason for the uncertainty contribution to the manure management emission seems to be due to the difference in climate characteristics in the France area. The difference of the emission factors to be applied to the calculation of GHG emission by the manure management is considered to be due to the climate characteristics of the regions where dairy farms can operate.
Energy use has been identified as a key input variable in the case of Korea and Sweden. Both Korea and Sweden have four distinct seasons and seem to reflect the effect of these seasonal variations. Table 9 shows that the CV value from the French study was lower than that from this study. In the French study, the number of dairy farms investigated including conventional and organic farms was 47, and the number of data points was 1692 over the three year period collected monthly (n = 47 × 12 × 3 = 1692).
The number of each data points in this study was 72, which came from one farm collected monthly for six years (n = 1 × 12 × 6 = 72). Differences in the number of data points may affect the variance of the input variables, such that the CV value was higher in this study compared to that in the French study. Meanwhile, the number of data points in the Swedish study, which exceeded 10,000, showed that the CV value was higher than that of the French study. This may indicate that the CV values may not be a reliable parameter for judging the reliability of the uncertainty analysis results. Several uncertainty analysis studies in the LCA field employed the stochastic approach based on the assumed PDF or expert judgment [40][41][42]. Other studies in the case of the analytical approach ignored the PDF of the input variables completely [43,44]. Lack of PDF estimation or assumed PDF of the input variables may lead to poorer estimation of the variance of the input variables. This would adversely affect the reliability of the model's uncertainty results.
Taking into consideration the above information, the proposed methodology in this study may be able to generate uncertainty results based on the limited number of data points. This is partly because the estimation of the PDF of the input variables was done in a systematic manner. This may allow us to estimate the variance of the input variables more reasonably, leading to a more reliable uncertainty analysis of the model output.
The contribution analysis for GHG emission is not a suitable method for finding key issues about the uncertainty of the results [7]. Therefore, it is a reasonable choice to identify significant input variables that require iteration of data collection through the CTV analysis and DQR.
The CTV analysis alone cannot lead to the selection of input variables for iteration of data collection, as the variance of a certain input variable cannot be reduced because of its innate nature. Sometimes, data quality can be inherently good (e.g., DQR <3) even if its CTV is high and in this case, iteration of data collection cannot improve its data quality. Therefore, the data quality analysis of the input variables identified from the CTV analysis is an essential element in reducing the uncertainty of the model.
However, there are shortcomings and disadvantages to the proposed uncertainty analysis method. These include the use of the pedigree-matrix for assessing data quality, and not considering the errors in the emission factors on the uncertainty results of the model output. Both the variance as well as the emission factors of the input variables influence the uncertainty of the model output in LCA [35][36][37]. The matrix-based approach considering both the variance of the input variable and its emission factor would be the viable alternative to the emission factor problem encountered in this study [13,29].
The Korean emission factors and the LCI database, however, have many shortcomings from a statistical standpoint, as such, emission factors were unreliable and thus not considered in this study. Accordingly, the matrix-based approach for LCA was not used in this study. The assignment of values to the elements of the pedigree-matrix based on the qualitative criterion is quite subjective and another shortcoming of this study.

Conclusions
An analytical and stochastic approach were used consecutively in the uncertainty analysis of the GHG emission model output. The error propagation equation and MCS method were used for the analytical and stochastic approaches.
An analytical approach can be an effective means for selecting input variables for the uncertainty analysis. A stochastic approach can prevent the risk of incorrect estimation of the uncertainty of the model output via PDF estimation and Monte Carlo simulation. This work showed that eliminating unnecessary iteration of data collection via the CTV analysis combined with the DQR calculation can increase the efficiency of the uncertainty analysis.
Application of the proposed procedure to a dairy cow milk farm showed that the uncertainty of the model output was reduced by the iteration of data collection of the input variables with a high CTV. This indicated that CTV analysis can be used to identify the input variables contributing considerably to the uncertainty of the model output. Investigating the data quality further reduced the number of input variables for iteration of data collection.
The use of the K-S test improved the estimation of the PDF of the datasets. A stochastic approach enabled more accurate quantification of the GHG emissions together with its uncertainty.
Finally, the study suggested an effective way to reduce the uncertainty of individual carbon footprint results by performing a series of steps to reduce uncertainty on activity data, input variables. The DQI for significant input variables derived through contribution to variance (CTV) was used as an index to evaluate the variance reduction potential of input variables. The uncertainty of the individual carbon footprint results was reduced through the iteration of data collection for input variables that could actually be improved.
The results of this study are expected to be useful as a way to manage the uncertainty of the results needed to ensure comparability of future environmental footprint results.
However, there are shortcomings and disadvantages to the proposed uncertainty analysis method. They include not considering the errors of the emission factors and upstream data on the uncertainty results and the use of the qualitative pedigree-matrix for assessing data quality. Future studies should address the contribution of the errors in the emission factors to the uncertainty of the model output. In addition, the use of the qualitative data quality analysis should be eliminated in future uncertainty analyses.