Abstract
The difference-from-control (DFC) test is one of the sensory discrimination methods which is applicable to sensory evaluation in some areas, including process optimization and quality assessment for foods. Thurstonian models are needed for any one of the sensory discrimination methods, including the DFC. This is important because the models provide a useful sensory measure, the Thurstonian discriminal distance, or , which is theoretically independent of the methods or scales used for its estimation. This paper originally derives the Thurstonian model and the estimations of the model parameters for the DFC test based on the folded normal distribution. The statistical testing, difference testing power, and sample sizes needed for the DFC test are also discussed. R codes for the estimations and tests of parameters of the model for the DFC are developed, used, and provided in the paper.
1. Introduction
The difference-from-control (DFC) test is a sensory discrimination method. It can be used to determine the degree of difference (if any) between one or more test samples and a control sample. The method is applicable to sensory evaluation in some areas, including process optimization and quality assessment for foods. The ISO standard (ISO [1]) describes the application of sensory analysis in quality control (QC) using the DFC test. For more about the DFC method, see, e.g., Muñoz et al. [2]; Costell [3]; Meilgaard et al. [4]; Kemp et al. [5]; Lawless and Heymann [6]; and Whelan [7].
In the DFC test, assessors are provided with an identified control sample, followed by one or more test samples. Blind controls are also included within this test. The assessors evaluate the identified control sample and the test sample(s) including blind controls, and scale how different they perceive the test sample(s) including blind controls to be from the identified control sample. There are different types of scales used for difference-from-control ratings. The rating can be done on a line or category scale. The scale may be a 10-point numerical scale, or a 6-point verbal scale, or a line scale with anchors, e.g., 0 and 100. The scales will range from 0 = “no difference” to 9 (or 5, or 100) = “extreme difference”. An important characteristic of the DFC test data is that all the data are positive numbers or zero, which represent the sensory intensity or distance between the test sample(s), including the blind controls, and the identified control sample, regardless of the direction of the difference.
Although the DFC test has been used in various laboratory studies, to the best knowledge of the authors, there are few, if any, discussions about a Thurstonian model for the DFC test in the sensory literature. Thurstonian models are needed for any one of the sensory discrimination methods, including the DFC. This is important because the models provide a useful sensory measure, i.e., a Thurstonian discriminal distance or , which is theoretically independent of the methods or scales used for its estimation (ASTM [8]).
Bradley [9] discussed Thurstonian models for some discrimination methods including the triangle, the duo–trio, and the DFC in a memorandum prepared for the General Foods Corporation. The results were published in the statistical literature (Bradley [10]). Two indices, including the Thurstonian and the scaled DFC measure , were proposed for the DFC method. However, Bradley [10] did not indicate clearly that the Thurstonian model is based on the folded normal distribution and how to estimate and test the indices of the model from the DFC test data. Bradley [10] mentioned that the methods of analysis for the DFC test data have not been entirely satisfactory. Although there have been more discussions and applications for the DFC method, the Thurstonian model for the DFC has not been discussed adequately and used widely in the sensory literature during the past 60 plus years, since this paper was published.
It should be indicated that the DFC test is also a method to determine the degree of difference of two samples. Hence, the DFC test can be regarded as another variant of the degree of difference (DOD) method. The conventional DOD includes three variants: the ratings of the A-Not A, the ratings of the A-Not A with reminder (A-Not AR), and the ratings of the Same-Different methods, which are commonly used and called in, e.g., Aust et al. [11]; Bi [12]; Bi et al. [13]; Ennis and Rousseau [14]; Ennis and Christenson [15]; Ennis [16] (Section 8.4.1); and Christensen, et al. [17]. There are three types of Thurstonian models for the three variants of the conventional DOD method. They are the Thurstonian models for the ratings of the A-Not A, the ratings of the A-Not AR, and the ratings of the Same-Different methods, which were discussed in Bi et al. [13] and Bi [18] (Sections 3.3–3.5). The Thurstonian model for the DOD in Ennis and Rousseau [14], Ennis and Christenson [15], and Christensen, et al. [17] is in fact only for a variant of the DOD, i.e., the ratings of the Same-Different method.
The main objective of this paper is to derive a Thurstonian model for the DFC method based on the folded normal distribution. Estimations including the maximum likelihood estimate and nonparametric estimate, statistical tests including difference testing and similarity/equivalence testing for the parameters of the model, and difference testing the power and sample size needed for the DFC test are also discussed and conducted. Corresponding R codes are developed, used in the paper, and provided in the Supplementary Materials.
2. Materials and Methods
2.1. Folded Normal Distribution for Perception of Difference Between Two Samples in the DFC Test
Letting represent the perception for a test sample, follows a normal distribution with mean and variance , i.e., . Letting represent the perception for a control sample, follows a normal distribution with mean and variance , i.e., . Then, follows a normal distribution with parameters and , i.e., , where , and .
According to basic statistical theory (see, e.g., Read [19]), if a random variable has a normal distribution, then the absolute value of that random variable has a folded normal distribution with the same parameters. Letting , then follows a folded normal distribution with the same parameters and , i.e., . In the DFC test, the assessor’s perception of the difference between the test sample and the identified control sample is just as the value of the random variable X, which follows a folded normal distribution. If the test sample is a blind control sample, then .
In the statistical literature, Leone et al. [20] first studied the properties of the folded normal distribution and provided the probability density function (pdf) with mean and variance of a folded normal distribution of , as in Equations (1)–(3). For the folded normal distribution, see also, e.g., Elandt [21], Johnson [22], Read [19], Tsagris et al. [23], and Chatterjee and Chakraborty [24].
where is the cumulative distribution function of the univariate standard normal distribution. The subscript f is used here to distinguish the mean and variance of a folded normal distribution from those of a normal distribution. Equation (1) can also be expressed as Equation (4). This can be seen in, e.g., Elandt [21].
It is noted from Equation (4) that ; hence, we can regard as a distance between and , i.e., .
It is convenient to re-parameterize and to and σ, where and . The probability density function (pdf) of the folded normal distribution in Equation (1) can be expressed as Equation (5).
Equation (5) becomes Equation (6) when .
The cumulative distribution function (cdf) of X can be obtained as Equation (7) (see, e.g., Chatterjee & Chakraborty [24]).
where Φ(F N) (.) denotes the cdf of the folded normal distribution. The cdf of the folded normal distribution with = 0 is in Equation (8).
The Hit (H) and False-alarm (FA) probabilities for the DFC can be expressed as Equations (9) and (10), based on Equations (7) and (8).
where I = 1, …, k − 1, and k is the number of the k-point numerical scale in the DFC test.
It is noted, interestingly, that the Hit and False-alarm probabilities for the DFC test in Equations (9) and (10) are the same as the Hit and False-alarm probabilities for the Same-Different method for the k-point scale, as discussed in Kaplan et al. [25] and Bi [26], when k = 2, and as discussed in, e.g., Bi et al. [13], when k > 2. Hence, it is demonstrated that the DFC and the ratings of the Same-Different methods share the same Thurstonian model.
2.2. Three Parameters (Indices), , , and , Related to the DFC Model
There are three parameters (indices), , , and , which are related to the model of the DFC method. The index (, or , is a Thurstonian discriminal distance, where , i.e., the absolute value of the difference between the expectation of and the expectation of . The index () is a scaled DFC measure for DFC test data, where , i.e., the expectation of the absolute value of the difference between and . is the area under ROC curve (AUC) of the DFC determined by Equations (9) and (10).
Equation (2) can be expressed as Equation (11), which is in fact the same as the equation provided in Bradley [10].
where and in Equation (2).
It should be noted that, although Bradley [10] did not mention the folded normal distribution for the DFC test data, the DFC index in Equation (11) can be derived from Equation (2) based on the folded normal distribution for the DFC test data.
The relationship between and is as in Equation (12). Irwin et al. [27] and Bi [18] (p. 48) present the Same-Different area theory.
Note that the or , i.e., the Thurstonian discriminal distance, is theoretically independent of the methods or scales used. Hence, it provides a useful sensory measure, regardless of which sensory discriminal method is used. However, it is difficult to give an absolute value for a meaningful size across all applications. Swets [28] indicates that if a meaningful difference in terms of an area measure (R-index) should be larger than 0.7, the corresponding distance measure () should be larger than 0.74.
The R code ‘DFCindex(d)’ can be used for the calculation of and value(s) for given value(s) based on Equations (11) and (12). For example, for from 0 to 3 in a step of 0.1, (DFCindex(d = seq(0,3,0.1))), the corresponding and values are as in Table 1.
Table 1.
Theoretical and values corresponding to values.
Note that = 1.13 when = 0. This suggests that the distribution of the DFC test data is skewed and usual tests of significance are inappropriate, as warned by Bradley [10]: “Sometimes difference-from-control tests have been misinterpreted”.
2.3. Maximum Likelihood Estimations (MLEs) of , , and from Ratings of the DFC
Equation (13) is the log-likelihood function for the estimation of parameter based on the Hit (H) and False-alarm (FA) probabilities in Equations (9) and (10) for the DFC. There are a total of k parameters in the log-likelihood function in Equation (13). They are and k − 1 criteria = , where i = 1, …, k − 1, and k is the number of the k-point numerical scale in the DFC test. For the maximum likelihood estimation of parameters, a local maximum L using the R program ‘nlminb’ (R Core Team [29]) on -L is required. The R program ‘hessian’ in the R package ‘numDeriv’ Version 2016.8-1.1 (Gilbert and Varadhan [30]) can be used to estimate the co-variance matrix of the estimators of parameters δ and k − 1 criteria for the DFC test.
where and are the frequencies of ratings lower than the i-criterion for the blind control sample and test sample, respectively; and are the sample sizes of the blind control sample and test sample, respectively; and and are the Hit (H) and False-alarm (FA) probabilities for the DFC in Equations (9) and (10), respectively.
As soon as we obtain the maximum likelihood estimation of parameter , the maximum likelihood estimation of can be obtained from Equation (11). The variance of , i.e., the variance of the estimator , can be estimated using the delta method (see, e.g., Bi [18] (p. 51)), as in Equation (14).
where in Equation (15) denotes the first derivative of Equation (11).
The maximum likelihood estimation of is from Equation (12). The variance of can be obtained from Equation (16).
where .
2.4. Nonparametric Estimations of from Ratings of the DFC
Bi [18] (Section 3.3.2) discusses nonparametric estimation of or in the ratings of the Same-Different method. The method can also be used for the DFC, as in Equation (17) from Equation (12).
where denotes the AUC of the ratings of the Same-Different method or the DFC. The nonparametric estimation of the AUC can be obtained by , where , if ; if ; and if ; and are the ratings of Hit and False-alarm responses.
The variance of can be estimated from Equation (18) based on Equation (16).
A simple estimation of variance of is , where N = min(n,m), and n and m are the sample sizes of Hit and False-alarm. A more accurate estimation of is from Equation (19). This can be seen in, e.g., Bi [18] (p. 52).
where and .
2.5. Statistical Tests for (or or )
Bi and Kuesten [31] discussed statistical testing for the Thurstonian discriminal distance or d′ based on the estimator d′ and its variance. The statistical tests include difference testing and equivalence/similarity testing for individual for a test sample and a control sample in the DFC test, and difference testing, equivalence/similarity testing, and multiple comparisons for multiple for multiple test samples in the DFC test. Statistical testing can also be conducted based on , i.e., the area under the ROC curve of the DFC, according to Equation (15).
2.6. Difference Testing Power and Sample Size for DFC
Bi [18] (p. 95) discusses the difference testing power and sample size for difference tests using ratings of the Same-Different method based on AUC. The method can be used for the DFC. The difference testing power and sample size are in Equations (20) and (21), respectively.
where , and . denotes the area corresponding to a specified or d′ value.
2.7. Ratings Data of the DFC Test
The ratings data collected from the DFC test should be summarized into a data matrix with k rows and p + 1 columns, where p is the number of test samples. The first column of the data matrix contains the frequencies for the blind control sample versus the identified control sample, while the other p columns contain the frequencies for each of the test samples vs. the identified control sample.
The DFC ratings data used in the paper are listed in Table 2. There are frequencies of 100 assessors’ responses in a DFC test with a blind control sample and 3 test samples. The frequencies of a 6-point scale are presented for the blind control vs. identified control and each of the 3 test samples vs. the identified control. The categories of the 6-point scale are 0= “no difference”, 1= “very slight difference”, 2= “slight difference”, 3= “moderate difference”, 4= “large difference”, and 5= “extreme difference”.
Table 2.
Frequencies of ratings with a 6-point scale in the DFC (data ‘dfc6’).
The data file ‘dfc6’ in R can be produced by using the R code ‘DFC6ps()’ as below: ‘dfc6<-DFC6ps()’.
3. Results
3.1. Estimations of Parameters in the Model for the DFC
3.1.1. Maximum Likelihood Estimations (MLEs) of , , and
The R code ‘DFCmle(x)’ can be used for MLEs of the parameters (indices) , , and and their variance for any given DFC test data. The data file (x) for each test sample is a data matrix with k rows and two columns. The first column is the frequencies for the blind control sample versus the identified control sample, while the second column is the frequencies for the test sample versus the identified control sample.
Using the DFC data in Table 2 (‘dfc6’), the maximum likelihood estimations of the parameters (indices) , and are listed in Table 3. Obviously, among the three test samples, test sample 2 has the smallest difference from the control in terms of , , and , while test sample 1 has the largest difference from the control.
Table 3.
Maximum likelihood estimations (MLEs) of , and for the six-point scale DFC data in Table 2.
3.1.2. Nonparametric Estimation of , , and
The R code ‘DFCnoe(x)’ can be used for nonparametric estimations of , , and , as well as their variances.
Using the DFC data in Table 2 (‘dfc6’), the nonparametric estimations of the parameters (indices) , and are listed in Table 4.
Table 4.
Nonparametric estimations of , , and for the six-point scale DFC data in Table 2.
We can see that although the estimations are not the same exactly as these of the MLEs, the same conclusions can be obtained as those for MLE. Among the three test samples, test sample 2 has the smallest difference from the identified control in terms of , , and , while test sample 1 has the largest difference from the control.
3.1.3. Comparison of MLE and Nonparametric Estimations of
Figure 1 shows the d-prime values, i.e., the estimates of for three test samples vs. the identified control sample estimated by MLEs and nonparametric estimations. MLE and nonparametric estimation generally produce consistent and similar results. MLE is more popular and powerful with smaller estimation errors, while an advantage of nonparametric estimation is that there is no assumption for distributions. However, it is noted that the nonparametric estimation is, in fact, for the AUC. The d-prime is indirectly obtained from the AUC based on an area theorem.
Figure 1.
d-prime values estimated by MLEs and nonparametric estimation.
3.2. Statistical Tests for , , or
The R codes for the statistical tests are provided in Bi and Kuesten [27]. The statistical tests are conducted in this section using the results of the estimators and their variances obtained in Section 3.1.1 (Table 3). The data file ‘alldp’ in R can be obtained by ’alldp<-DFCdps()’.
The same test procedures can be applied for testing or using the estimators of or and their variances. The differences are only that for the difference test for , the null hypothesis is 1.13; however, for the difference test for , the null hypothesis is = 0.5.
3.2.1. Difference Test Based on Individual Parameters, e.g., d′ and Its Variance
The R code ‘dpdtest(d,v)’ can be used for the difference test with the null hypothesis and the alternative hypothesis based on an individual d′ value and its variance.
For example, the result of the difference test is as below for the data in the second row in the data file ‘alldp’: d′ = 0.88, and the variance of d′, i.e., var(d’) = 0.0390. A significant difference was found between test sample 2 and the control sample in the DFC test, with a p-value < 0.0001.
3.2.2. One-Sided Equivalence/Similarity Test Based on Individual Estimator, e.g., d′, Its Variance, and a Specified Similarity Limit
The R code ‘dpstest(d,v, slim)’ can be used for the one-sided equivalence/similarity test with the null hypothesis and the alternative hypothesis based on an individual d′ value, its variance, and a specified similarity limit . For example, the result of the equivalence/similarity test is as below for the data in the second row in the data file ‘alldp’: d′ = 0.88, var(d’) = 0.0390, and a specified similarity limit = 1.5. Because the p-value < 0.01, significant equivalence/similarity between test sample 2 and the control sample can be claimed in terms of the equivalence/similarity limit = 1.5 at a significance level = 0.01. This means that the perceptual difference between test sample 2 and the control sample is smaller than the specified perceptual difference in terms of the Thurstonian discriminal distance d′ = 1.5.
3.2.3. Difference Test Based on Multiple Parameter Values, e.g., Multiple d′ Values and Their Variances
The R code ‘dstest (d, v)’ can be used to conduct a difference test with the null hypothesis and the alternative hypothesis , i.e., if significant, at least two parameters are different for multiple d′ values and their variances. For example, for the three d′ values and their variances in ‘alldp’ for the three test samples vs. the control sample, the test results can be obtained by ‘dstest(d=alldp[,1],v=alldp[,2])’. A significant difference was found among the three test samples in the DFC test, with a p-value < 0.01.
3.2.4. Multiple Comparisons for Multiple Parameter Values, e.g., Multiple d′ Values and Their Variances
The S-Plus program ‘multicomp(x,vmat,alpha)’ in S-PLUS 6.0 (Insightful [32]) can be used for the multiple comparisons, based on a vector of multiple d-prime (‘dp1’) and a co-variance matrix (‘dv1’) with a selected alpha level, e.g., alpha = 0.2. The input of the program includes x = dp1 (dp1<-alldp[,1]) and vmat=dv1 (dv1<-matrix(0,3,3), diag(dv1)<-alldp[,2]), with alpha = 0.2.
The R programs ‘confint’, ‘glht’, and ‘parm’ in the R package ‘multcomp’ Version 1.4-28 (Hothorn, et al. [33]) can also be used for the multiple comparisons, based on ‘dp1’ and ‘dv1’, with a selected confidence level (1-alpha), e.g., 0.8 (alpha = 0.2).
Significant differences were found between test sample 1 (T1) and test sample 2 (T2), and between test sample 2 (T2) and test sample 3 (T3).
3.2.5. Equivalence/Similarity Test Based on Two Parameter Values, e.g., Two d′ Values, Their Variances, and a Specified Similarity Limit
The R code ‘s2dptest(d,v,d0)’ can be used for the two one-sided tests (TOSTs) with the two sets of one-sided hypotheses versus and versus for two test samples based on two estimators, e.g., and for test samples T1 and T3 and their variances. The input of the code is the two estimators and their variances, as well as an equivalence/similarity limit . The outputs of the code are the test statistics Z1 and Z2 and the p-values.
For example, for the data d = c(2.42,2.33), v = c(0.0203,0.0199), and an equivalence/similarity limit d0 = 0.5, the output is pv1 = 0.0016 and pv2 = 0.0204. Significant equivalence/similarity of T1 and T3 can be concluded with an equivalence/similarity limit of d0 = 0.5 at a significance level of 0.05.
3.3. Difference Testing Power and Sample Size for DFC Data in Terms of
The R code ‘DFCpower(d,samp,alpha)’ can be used to calculate the difference testing power for the DFC test in terms of . The input of the code includes a specified perceptual difference in terms of (d), sample size (samp), and a significance level (alpha). The output includes a corresponding and the testing power.
For example, for a specified difference (d = 1.2), sample size (samp = 100), and significance level (alpha = 0.1), the corresponding = 0.6 and the testing power is 0.78.
The R code ‘DFCsamp(d,pow,alpha)’ can be used to estimate the sample size needed for difference testing for the DFC test. The input of the code includes a specified perceptual difference in terms of (d), testing power (pow), and a significance level (alpha). The output includes a corresponding and sample size.
For example, for a specified difference (d = 1.2), testing power (pow = 0.8), and significance level (alpha = 0.1), the corresponding = 0.61 and the sample size is 93.
3.4. Observed Proportions and Predicted Probabilities for Categories of DFC Ratings
The R code ‘DFCmle0(x)’ can be used to produce predicted probabilities of the DFC categories of Hit (H) and False-alarm (FA) based on the observed proportions or frequencies of the DFC ratings and MLE results. For example, for the rating frequencies of the blind control vs. the identified control in the first column in the data file ‘dfc6’, and for the rating frequencies of test sample 1 vs. the identified control in the second column in the data file ‘dfc6’, the observed proportions of the DFC categories are as below.
| >dfc6[,c(1,2)]/100 | ||||
| Blind Control | vs. | Identified Control Test 1 | vs. | Identified Control |
| 5 | 0.02 | 0.10 | ||
| 4 | 0.05 | 0.18 | ||
| 3 | 0.15 | 0.34 | ||
| 2 | 0.17 | 0.30 | ||
| 1 | 0.20 | 0.02 | ||
| 0 | 0.41 | 0.06 | ||
The predicted probabilities of the DFC categories are as below.
| >DFCmle0(dfc6[,c(1,2)]) | ||||
| C1 | vs. | C T | vs. | C |
| 5 | 0.0043 | 0.1263 | ||
| 4 | 0.0290 | 0.2113 | ||
| 3 | 0.1582 | 0.3201 | ||
| 2 | 0.2647 | 0.1817 | ||
| 1 | 0.1660 | 0.0624 | ||
| 0 | 0.3777 | 0.0981 | ||
Figure 2 shows the observed proportions and predicted probabilities of the categories of the DFC ratings for three test samples vs. the identified control.
Figure 2.
Observed proportions and predicted probabilities of the categories of the DFC ratings.
Figure 3 shows the corresponding ROC curves based on the proportions and probabilities shown in Figure 2.
Figure 3.
ROC curves based on the observed proportions and predicted probabilities for three test samples vs. the identified control.
Figure 2 and Figure 3 are visualizations of the comparisons of the observed proportions and predicted probabilities of the categories of the DFC ratings based on the folded normal distribution. Although the observed proportions and the predicted probabilities are not exactly the same, they are consistent and similar in general. This suggests that the model of the folded normal distribution is suitable for the DFC ratings data in general.
4. Discussion
4.1. Thurstonian Model for the DFC and the Ratings of the Same-Different Method
Although as one of the advanced and powerful sensory analysis tools, the Thurstonian model for sensory discrimination methods is not new in the sensory literature (see, e.g., ASTM [34]), the model for the specified sensory discrimination method, the DFC, has not been discussed adequately and used widely in the sensory field.
This paper originally derives the Thurstonian model for the DFC test and novelly demonstrates that the model is based on the folded normal distribution and is the same as that for the ratings of the Same-Different method. Although the two methods share the same Thurstonian model, the DFC is different from the ratings of the Same-Different method in the designs of the methodologies.
In the DFC test, assessors are provided with an identified control sample, followed by one or more test samples, including blind controls. The assessor’s task is to scale how different they perceive the test sample(s), including blind controls, to be from the identified control sample. In the DFC test, the only possible same sample pair is C/C1 and the only different sample pair is T/C, where T denotes the test sample, C denotes the control sample, and C1 denotes the blind control sample.
In the Same-Different test, two products of interest (A and B) are selected. It is not necessary for the two products to be a test sample and a control sample. Assessors are presented with one of the four possible sample pairs: A/A, B/B, A/B, or B/A. The assessor’s task is to categorize the given pair of samples as same or different (ASTM [34]). For the ratings of the Same-Different method, the assessor’s task is to give ratings for sureness of difference for any given sample pair.
Note that the Thurstonian model in the R program ‘dod’ in the R package ‘sensR’ Version 1.5-3 (Christensen, et al. [17]) is just the variant of the DOD for the ratings of the Same-Different method. Using the R program ‘dod’ in the R package ‘sensR’ and the data ‘dfc6’, the estimated d-prime is 2.391 with a standard error of 0.22211 (i.e., variance 0.22112 = 0.0489), which are consistent with the results (d-prime = 2.4168 with variance 0.0203) obtained by using the R code ‘DFCmle(x)’ in Section 3.1 of this paper.
4.2. Scales Used in the DFC Test
There are different types of scales used for DFC ratings. For the larger number of k in the k-point rating scales, the frequencies for some categories may be smaller or zero. It is suggested to coalesce the frequencies for larger numbers of the k-point scale into the frequencies in a smaller number of categories. For example, this would involve transferring the 100-point scale data or 9+1-point scale data into a six-point scale, or three-point or two-point scale data. In theory, the parameter is independent of the criteria. Hence, in theory, for a specified control sample and a specified test sample, the parameter should be unchanged by using different types of scales.
Note that the R codes ‘DFCmle (x)’ can be used for the MLE of parameters (indices) for k-point scale DFC data where k is larger than or equal to two. The DFC is the same as the Same-Different method when k is equal to two.
It is noted that the data ‘dfc3’ with a three-point rating scale in Table 5, and the data ‘dfc2’ with a two-point rating scale in Table 6 were summarized from the data ‘dfc6’ with a 6-point rating scale in Table 2. The ‘dfc3’ and ‘dfc2’ data can be produced by ‘dfc3<-DFC3ps()’ and ‘dfc2<-DFC2ps()’, where the R codes ‘DFC3ps()’ and ‘DFC2ps()’ can be found in the Supplementary Materials.
Table 5.
Frequencies of ratings with a three-point scale in the DFC (data ‘dfc3’).
Table 6.
Frequencies of ratings with a two-point scale in the DFC (data ‘dfc2’).
The MLE results of the parameters (indices) , , and for the data files ‘dfc3’ and ‘dfc2’ are listed in Table 7 and Table 8. We can find that the estimated values for the three data files are similar and consistent.
Table 7.
Maximum likelihood estimations (MLEs) of , and for the three-point scale DFC data in Table 5.
Table 8.
Maximum likelihood estimations (MLEs) of , and for the two-point scale DFC data in Table 6.
4.3. Qualifier and Limitation of the DFC Test
As one of the sensory discrimination methods, the DFC test is applicable to sensory evaluation in process optimization and quality assessment for foods. The DFC test is applied for specific situations where a reference control is available and the goal is to determine if a noticeable difference exists. The DFC test may be used for reformulation testing, process changes, ingredient substitutions, quality control, and batch consistency. The Thurstonian model provides some useful indices, including the Thurstonian discriminal distance or , to measure the perceptual difference or similarity between a test sample and a control sample. An advantage of the index or is that it is independent of the methods and scales used. The use of the indices would influence decision-making in industrial applications or QA/QC procedures because these indices are more useful and reasonable than the conventional rating means. The indices are continuous scales, while the ratings scale in the DFC is not. This is why the Thurstonian model is necessary for every sensory discrimination method, including the DFC.
The DFC test is not appropriate when no control sample is available, understanding specific differences in depth is required, determining consumer acceptance or preference, optimizing sensory attributes, comparing multiple product variations at once (i.e., ranking test), or conducting exploratory testing with untrained panelists. The DFC test can be more variable, since panelists rate the degree of difference rather than simply identifying if a difference exists as is done for other discrimination tests (i.e., triangle, duo–trio, and tetrad tests). The DFC test may require a larger sample size to detect small differences reliably. It does not provide detailed profiling of sensory attributes.
4.4. Relevance of DFC for Sensory Evaluation Practices
The DFC is particularly useful to sensory researchers for product development, quality control, and shelf-life/storage stability studies. As a sensory evaluation technique that measures how much a sample deviates from a reference or “control” sample, it can be applied for ingredient substitution and flavor optimization, reformulation or process changes, routine screening, and pharmaceutical/palatability testing. DFC is used to assess overall or ‘holistic’ sensory perceptual difference from a control. Advantages of DFC include that it is rapid, simple, and efficient for data collection, based on a category or line scale, it is sensitive to small perceptual differences, the data are statistically analyzable, and the results are suitable for tracking gradual change. Limitations include that DFC results lack attribute-specific details (does not specify what changed), relies on a consistent control (any drift in the control undermines the method), and is less useful with untrained panels (due to the abstract nature of the “difference magnitude”). It is of interest to note that Compusense, Inc. offers a video showing execution of the DFC method on YouTube [35]. References for applications where DFC may be used are summarized below.
4.4.1. DFC Resources
- Product development (ingredient substitution and flavor optimization, reformulation, or process changes)
DFC is among the recommended methods to detect differences from standards during reformulation efforts [36]. Difference-from-control methods are featured in quality control and reformulation contexts [37] (pp. 121–123). Applied use of DFC is shared for plant-based proteins [38] and sodium reduction [39]. Meilgaard et al. [4] present coverage of DFC testing in product matching and ingredient change validations, and Whelan [7] presents DFC case studies.
- Quality control and routine screening
Details and guidelines for use of difference tests in QC and in–out attribute screening are available [36]. For an example of using a QC panel and a 10-point DFC scale where readings above a threshold indicate rejection, see King et al. [40]. Kilcast [41] also shows supportive use of DFC with examples for sensory quality control. ASTM [42] presents the DFC test method as an example of a “product-focused” scale for use in quality control (QC). The advantages include the following: DFC is a rapid approach to measuring the overall difference, is amenable to a threshold selection process (i.e., pass/fail based on a difference), and less training is required vs. a descriptive panel. Disadvantages include the following: DFC is no replacement for a detailed sensory profile, does not provide directional assessment (unipolar scale), and it may be difficult to calibrate a panel to rate differences. Another ASTM reference [43] covers DFC with an example and recommendations for DFC as a viable method for QC. This manual distinguishes differences in the DFC test from the DOD test based on how variance is handled. The DOD test differs from the DFC test in that the variance as a result of judging (as measured by reference versus blind-coded reference) is averaged with the variance as a result of the batch or lot of product. Further, this manual distinguishes an approach named as the difference from reference (DFR) test, which uses a reference sample, a set of test samples, and a blind-coded reference sample, which is used as the estimate of panel variance. Notably, care should be taken in deciding the approach for difference testing and how the data should be analyzed.
- Shelf-life studies
ASTM #2454-20 [44] outlines best practices for using DFC (often via a degree of difference scale) to detect sensory drift over time. Munoz [2] and [45] also provide further shelf-life study information and guidance for detecting sensory changes, endpoint criterion, and go-no-go screening for stability studies. Hough [46] (pp. 44–46) describes an example for applying the difference-from-control test (when the size of the difference is more important than simply knowing of the existence of a difference) using a cutoff methodology based on the size of differences in acceptability perceived by consumers, coupled with the corresponding size of the difference perceived by a trained panel. Additionally, Sharma et al. [47] provide further research examples for shelf-life evaluation methods and quality monitoring.
- Pharmaceutical/palatability studies
Clapham et al. [48] discuss sensory testing against appropriate controls, citing several useful ISO and ASTM documents, but is remiss in explicitly mentioning the DFC approach for tracking palatability and acceptability in drug products over time.
4.4.2. DFC Applications
Table 9 presents application areas, example scales tailored to each area based on the literature, and common, best practices with the DFC method. Typical scale types, category labels, and usages are listed. In addition, typical experimental designs used with DFC tests are shared by application area. These designs aim to ensure statistical power, control for variability, and maximize relevance to the sensory goals.
Table 9.
Summary of difference-from-control (DFC) usages by application.
4.4.3. DFC Database Schemas
Researchers may be interested in collecting DFC ratings on an ongoing basis. Databasing this information is useful and may require different metadata fields for each application area. For effective use of DFC data and building a powerful multi-study database, the following variables should be standardized: DFC scales, control samples, sensory descriptors (attributes), panelist demographics (where applicable), test protocol(s), and environmental/test conditions.
Table 10 below summarizes database schemas (data structures) to collect metadata and results for DFC tests across each application area, either by individual study or aggregated in multi-study databases for longitudinal or meta-analysis. Hypothetical research questions tailored to each application area are provided. The metadata can be used to achieve the following: (1) model shelf-life trends over time and different conditions, (2) link sensory changes to formulations/processes, (3) auto-trigger actions if DFC scores exceed selected threshold(s), (4) link variables to sensory acceptability trends, and (5) identify ingredient-performance thresholds using substitution(s) levels.
Table 10.
Difference-from-control (DFC) data structures and research questions.
4.4.4. DFC Training
Training is required for DFC testing. DFC asks panelists to quantify how different a sample is from a control—not in what ways it differs or if it is better. Training is needed to ensure consistency, scale calibration, and understanding of how to rate “magnitude of difference” on a unipolar scale. It can be difficult to calibrate a panel to rate differences. For these reasons, the DFC is not recommended with consumers due to ambiguity on how to apply the scale without training. Though DFC ratings can be collected from consumers, if so, the results should be interpreted with caution. A best practice for training a panel to use DFC involves several steps:
- (1)
- The DFC task is clarified by making sure panelists will compare each test sample against a fixed control, that they are not evaluating liking, direction, or attribute-specific intensity, and that their goal is to assess the overall perceived magnitude of difference. It is emphasized that 0 = no difference and the maximum point = most different imaginable.
- (2)
- The panel is trained with a product set designed to span the range of expected differences using anchored examples: 0 = no difference, 2–3 = slight difference, 5–6 = moderate difference, and 8–10 = strong difference.
- (3)
- The panel undergoes practice with feedback using a known control vs. modified samples, participates in debriefing and discussion after each test, and is shown the average group scores to highlight consistency (or variability). They are provided feedback on individual and group results, discuss differences in perception or scoring habits, and train for consistency, not conformity.
- (4)
- Scale interpretation is reinforced with visual or verbal anchors; printed guides may be considered, especially for newer panelists.
- (5)
- Ongoing monitoring and calibration is continued; using repeat blind duplicates to measure panelist consistency may be considered; and individual and group scores may be tracked to identify drift or outliers.
- (6)
- To improve sensitivity to small changes, even if the DFC scale remains ‘holistic’, panelists are pre-trained with individual attributes to boost perceptual awareness.
4.4.5. When Is DFC Appropriate vs. Alternatives?
The DFC test is popular for measuring sensory differences between test and control samples; however, it may not always be the best choice depending on the research objective(s), panel type, and regulatory need. Alternative discrimination methods (triangle test, tetrad test, duo–trio, and degree of difference (DOD)), descriptive analysis (DA), consumer testing (check-all-that-apply (CATA), rate-all-that-apply (RATA), and Just-About-Right (JAR)), temporal methods, or others may be more suitable. DFC is never appropriate when a control reference product is not available. When detailed profiling is needed, consider DA, the tetrad test, or RATA. For specific temporal changes, use descriptive analysis, just-noticeable-difference (JND), temporal dominance of sensations (TDS), or temporal RATA. For masking performance, try TDS, DA, or preference tests. When consumer difference testing results are needed for liking or emotions, consider using CATA, RATA, JAR, or hedonic scales, not DFC. Keep in mind specific scales may be required to satisfy legal claims.
5. Concluding Remarks
A Thurstonian model for the DFC test provides a useful index, or , to measure the perceptual difference between test sample(s) and the identified control sample. This paper originally derives the Thurstonian model based on the folded normal distribution. It is demonstrated that the DFC, as a variant of the degree of difference (DOD) method, shares a common Thurstonian model with another variant of the DOD, i.e., the ratings of the Same-Different method, though the DFC and the ratings of the Same-Different are quite different sensory discrimination methods. Maximum likelihood estimates and nonparametric estimates of the parameters of the model are provided. Statistical tests, including different tests and equivalence/similarity tests for individual or multiple d’ values obtained from the DFC test, are also conducted in this paper.
Supplementary Materials
The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/pr13072105/s1: R codes and data files used in this paper.
Author Contributions
Conceptualization, J.B. and C.K.; software, J.B.; formal analysis, J.B.; writing—original draft preparation, J.B. and C.K.; writing—review and editing, J.B. and C.K.; visualization, C.K. and J.B. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
The original contributions presented in the study are included in the article/Supplementary Materials; further inquiries can be directed to the corresponding author.
Conflicts of Interest
Author Carla Kuesten was employed by the Kuesten Sensory Perception Research, LLC. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
References
- ISO 20613; Sensory Analysis—General Guidance for the Application of Sensory Analysis in Quality Control. ISO: Geneva, Switzerland, 2019. Available online: https://www.iso.org/standard/68549.html (accessed on 4 June 2025).
- Muñoz, A.M.; Civille, G.V.; Carr, B.T. Sensory Evaluation in Quality Control; Van Nostrand Reinhold: New York, NY, USA, 1992. [Google Scholar]
- Costell, E.A. Comparison of sensory methods in quality control. J. Food Qual. Prefer. 2002, 13, 341–353. [Google Scholar] [CrossRef]
- Meilgaard, M.; Civille, G.V.; Carr, B.T. Sensory Evaluation Techniques, 4th ed.; CRC Press: Boca Raton, FL, USA, 2007. [Google Scholar]
- Kemp, S.E.; Hollowood, T.; Hort, J. Sensory Evaluation: A Practical Handbook; Wiley-Blackwell: Oxford, UK, 2009. [Google Scholar]
- Lawless, H.T.; Heymann, H. Sensory Evaluation of Food: Principles and Practices, 2nd ed.; Springer: New York, NY, USA, 2010. [Google Scholar]
- Whelan, V.J. Difference from control (DFC) test. In Discrimination Testing in Sensory Science: A Practical Handbook; Rogers, L., Ed.; Elsevier: Amsterdam, The Netherlands; Woodhead Publishing: Duxford, UK, 2017. [Google Scholar]
- ASTM-E2262-03; Standard Practice for Estimating Thurstonian Discriminal Distances. ASTM International: West Conshohocken, PA, USA, 2021.
- Bradley, R.A. Comparison of Different-from-Control, Triangle, and Duo-Trio Tests in Tasting: Comparable Expected Performance; Memorandum Prepared for the General Foods Corporation; General Foods Corporation: Rye Brook, NY, USA, 12 November 1957. [Google Scholar]
- Bradley, R.A. Some relationship among sensory difference tests. Biometrics 1963, 19, 385–397. [Google Scholar] [CrossRef]
- Aust, L.B.; Gacula, M.C., Jr.; Beard, S.A.; Washam, R.W., II. Degree of difference test method in sensory evaluation of heterogeneous product types. J. Food Sci. 1985, 50, 511–513. [Google Scholar] [CrossRef]
- Bi, J. Statistical models for the Degree of Difference method. Food Qual. Prefer. 2002, 13, 31–37. [Google Scholar] [CrossRef]
- Bi, J.; Lee, H.S.; O’Mahony, M. Statistical analysis of ROC curves for the ratings of the A-Not A and the Same-Different methods. J. Sens. Stud. 2013, 28, 34–46. [Google Scholar] [CrossRef]
- Ennis, D.M.; Rousseau, B.A. Thurstonian model for the degree of difference protocol. J. Food Qual. Prefer. 2015, 41, 159–162. [Google Scholar] [CrossRef]
- Ennis, J.M.; Christenson, R.A. Thurstonian comparison of the Tetrad and Degree of Difference tests. J. Food Qual. Prefer. 2015, 40, 263–269. [Google Scholar] [CrossRef]
- Ennis, D.M. Thurstonian Models: Categorical Decision Making in the Presence of Noise; The Institute for Perception: Richmond, VA, USA, 2016; ISBN 9780990644606/099064460X. [Google Scholar]
- Christensen, R.H.B.; Brockhoff, B.P.; Kuznetsova, A.; Birot, S.; Stachlewska, K.A.; Rafacz, D. Package ‘sensR’. 2023. Available online: https://cran.r-project.org/web/packages/sensR/index.html (accessed on 1 July 2025).
- Bi, J. Sensory Discrimination Tests and Measurements: Sensometrics in Sensory Evaluation, 2nd ed.; Wiley/Blackwell Publishing: Oxford, UK, 2015. [Google Scholar]
- Read, C.B. Folded distributions. In Encyclopedia of Statistical Sciences; Kotz, S., Johnson, M.L., Eds.; John Wiley & Sons: West Sussex, UK, 1983; Volume 3. [Google Scholar]
- Leone, F.C.; Nelson, L.S.; Nottingham, R.B. The folded normal distribution. Technometrics 1961, 3, 543–550. [Google Scholar] [CrossRef]
- Elandt, R.C. The Folded Normal Distribution: Two Methods of Estimating Parameters from Moments. Technometrics 1961, 3, 551–562. [Google Scholar] [CrossRef]
- Johnson, N.L. The folded normal distribution: Accuracy of estimation by maximum likelihood. Technometrics 1962, 4, 249–256. [Google Scholar] [CrossRef]
- Tsagris, M.; Beneki, C.; Hassani, H. On the Folded Normal Distribution. Mathematics 2014, 2, 12–28. [Google Scholar] [CrossRef]
- Chatterjee, M.; Chakraborty, A.K. A simple algorithm for calculating values for folded normal distribution. J. Stat. Comput. Simul. 2016, 86, 293–305. [Google Scholar] [CrossRef]
- Kaplan, H.L.; Macmillan, N.A.; Creelman, C.D. Tables of d′ for variable-standard discrimination paradigms. Behav. Res. Methods Instrum. 1978, 10, 796–813. [Google Scholar] [CrossRef]
- Bi, J. Variance of d′ from the same–different method. Behav. Res. Methods Instrum. Comput. 2002, 34, 37–45. [Google Scholar] [CrossRef] [PubMed]
- Irwin, R.J.; Hautus, M.J.; Butcher, J.C. 1999. An area theorem for the Same–Different experiment. Percept. Psychophys. 1999, 61, 766–769. [Google Scholar] [CrossRef] [PubMed]
- Swets, J.A. Measuring the accuracy of diagnostic systems. Science 1988, 240, 1285–1293. [Google Scholar] [CrossRef] [PubMed]
- R Core Team. R: A Language and Environment for Statistical Computing, version R 4.5.0; R Foundation for Statistical Computing: Vienna, Austria, 2023; Available online: https://www.R-project.org/ (accessed on 19 June 2025).
- Gilbert, P.; Varadhan, R. Accurate Numerical Derivatives R Package “numDeriv”. 2019. Available online: https://cran.r-project.org/web/packages/numDeriv/index.html (accessed on 29 June 2025).
- Bi, J.; Kuesten, C. Thurstonian Scaling for Sensory Discrimination Methods. Appl. Sci. 2025, 15, 991. [Google Scholar] [CrossRef]
- Insightful. S-PLUS 6. In Guide to Statistics Vol.1. for Windows; Insightful Corporation: Seattle, WA, USA, 2001. [Google Scholar]
- Hothorn, T.; Bretz, F.; Westfall, P.; Heiberger, R.M.; Schuetzenmeister, A.; Scheibe, S. R Package “multcomp”: Simultaneous Inference in General Parametric Models. 2023. Available online: https://cran.r-project.org/web/packages/multcomp/index.html (accessed on 29 June 2025).
- ASTM-E2139-05; Standard Test Method for Same-Different Test. ASTM International: West Conshohocken, PA, USA, 2018.
- Making Sense of: Difference from Control (DFC). Video Posted on YouTube. Available online: https://www.bing.com/videos/riverview/relatedvideo?q=E253-09+Standard+Practice+for+Sensory+Evaluation+of+Products+by+a+Difference-from-Control-Method&qpvt=E253-09+Standard+Practice+for+Sensory+Evaluation+of+Products+by+a+Difference-from-Control-Method&view=riverview&mmscn=mtsc&mid=22DB11E992B22B7A17CC22DB11E992B22B7A17CC&&aps=201&FORM=VMSOVR (accessed on 12 June 2025).
- Carpenter, R.P.; Lyon, D.H.; Hasdell, T.A. Guidelines for Sensory Analysis in Food Product Development and Quality Control, 2nd ed.; Springer: New York, NY, USA, 2000. [Google Scholar]
- Meilgaard, M.C.; Civille, G.V.; Carr, B.T.; Osdoba, E.T. Sensory Evaluation Techniques, 6th ed.; CRC Press: Boca Raton, FL, USA, 2025. [Google Scholar]
- Bowen, A.; Blake, A. Lessons for the Sensory Characterization of Plant-Based Proteins. SSP Presentation. 2024. Available online: https://www.sensorysociety.org/meetings/archives/2024Conference/Documents/04%20Bowen_SSP.pdf#search=DFC (accessed on 12 June 2025).
- Leong, J.; Kasamatsu, C.; Ong, E.; Hoi, J.T.; Loong, M.N. A study on sensory properties of sodium reduction and replacement in Asian food using difference-from-control test. Food Sci. Nutr. 2015, 4, 469–478. [Google Scholar] [CrossRef]
- King, S.; Gillette, M.; Titman, D.; Adams, J.; Ridgely, M. The Sensory Quality System: A global quality control solution. Food Qual. Prefer. 2002, 13, 385–395. [Google Scholar] [CrossRef]
- Kilcast, D. (Ed.) Sensory Analysis for Food and Beverage Quality Control: A Practical Guide; Woodhead Publishing Limited: Cambridge, UK, 2010. [Google Scholar]
- E3041-17(2025); Standard Guide for Selecting and Using Scales for Sensory Evaluation. ASTM International: West Conshohocken, PA, USA, 2025.
- MNL14-2ND-EB; The Role of Sensory Analysis in Quality Control. 2nd ed. Ojeh, S., Ed.; ASTM International: West Conshohocken, PA, USA, 2021.
- E2454-20; Standard Guide for Sensory Evaluation Methods to Determine Sensory Shelf Life of Consumer Products. ASTM International: West Conshohocken, PA, USA, 2020.
- Sensory Shelf-Life Test Edited by M. Johnson. SSP. 2021. Available online: https://www.sensorysociety.org/knowledge/sspwiki/Pages/Sensory%20Shelf-Life%20Test.aspx (accessed on 12 June 2025).
- Hough, G. Sensory Shelf Life Estimation of Food Products; CRC Press: Boca Raton, FL, USA, 2010. [Google Scholar]
- Sharma, C.; Torrico, D.D.; Singh, S. Chapter: Sensory Methods for Shelf Life Assessment of Foods. In Shelf Life and Food Safety, 1st ed.; CRC Press: Boca Raton, FL, USA, 2022; pp. 33–60. [Google Scholar]
- Clapham, D.; Belissa, E.; Inghelbrecht, S.; Pensé-Lhéritier, A.M.; Ruiz, F.; Sheehan, L.; Shine, M.; Vallet, T.; Walsh, J.; Tuleu, C. A Guide to Best Practice in Sensory Analysis of Pharmaceutical Formulations. Pharmaceutics 2023, 15, 2319. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).


