Why Fuzzy Partition in F-Transform ?

Why Fuzzy Partition in F-Transform? Vladik Kreinovich 1,†,‡ , Olga Kosheleva 1,‡ and Songsak Sriboonchitta 2,‡* 1 University of Texas at El Paso; vladik@utep.edu, olgak@utep.edu 2 Chiang Mai University, Thailand; songsakecon@gmail.com * Correspondence: vladik@utep.edu; Tel.: +1-915-747-6951 (V.K.) † Current address: Department of Computer Science, University of Texas at El Paso, 500 W. University, El Paso, TX 79968, USA ‡ These authors contributed equally to this work.

1 Formulation of the Problem F-transform: a brief reminder.In many practical applications, it turns out to be beneficial to replace the original continuous signal x(t) defined on some time interval with a finite number of "averaged" values where A i (t) ≥ 0 are appropriate functions; see, e.g., [34,35,38,39,40,42].In many applications, a very specific form of these functions is used: namely, A i (t) = A(t−t i ) for some function A(t) and for t i = t 0 +i•h, where t 0 and h > 0 are numbers for which A(t) is equal to 0 outside the interval [−h, h].However, more general families of functions A i (t) are also sometimes efficiently used.
A similar 2-D transformation is very useful in many image processing problems.
The general idea behind F-transform is very reasonable.From the general measurement viewpoint, F-transform makes perfect sense -it corresponds to the results of measuring the signal.Indeed, in practice, a measuring instrument cannot measure the exact value x(t) of the signal at a given moment t.No matter how fast the processes within the measuring instrument, it always has some inertia.As a result, the value m i measured at each measurement depends not only on the value x(t) of the signal at the given moment of time, it also depends on the values at nearby moments of time; see, e.g., [43].
The signal is usually weak, so the values x(t) are small.Thus, we can expand the dependence of m i on x(t) in Taylor series and safely ignore terms which are quadratic or of higher order with respect to x(t).Then, we get a model in which the value m i is a linear function of different values x(t); this is a usual technique in applications; see, e.g., [6].The general form of a linear dependence is for some coefficients A i (t).A measuring instrument is usually calibrated in such a way that in the absence of the signal, when x(t) = 0, the measurement result is 0. After such a calibration, we get m (0) i = 0 and thus, the expression (2) gets a simplified form This is exactly the form used in F-transform.Thus, the F-transform is indeed a very natural procedure: it replaces the original signal x(t) with the simulated results of measuring this signal -and the results of measuring the signal is exactly what we have in real life.
But why fuzzy partition?So far, everything has been good and natural, but there is one aspect of successful applications of F-transform that cannot be explained so easily: namely, in most such applications, the corresponding functions A i (t) form a fuzzy partition, in the sense that for all moments t from the corresponding time interval.
Mathematical comment.Sometimes, the corresponding requirement takes a slightly different form A i (t) = c for some constant c.This case can be naturally reduced to the case (4) if we consider re-scaled functions and the corresponding re-scaled values In view of this equivalent re-scaling, the question is why it is natural to require that Application-related comment.It is worth mentioning that fuzzy partitions are successfully used in other applications of fuzzy techniques.For example, fuzzy sets that form a fuzzy partition are used: • in fuzzy control; see, e.g., an application to control of telerobots in space medicine [8]; • in information accessing systems such as information retrieval systems, filtering systems, recommender systems, and web quality evaluation tools; see, e.g., [9] and references therein, etc.
In fuzzy clustering, the important frequently used requirement is also that the fuzzy sets corresponding to different clusters form a fuzzy partition.The resulting clustering techniques have been very successful in many applications; see, e.g., a recent application to the analysis of earthquake data [4].
On the other hand, in some other applications, it turns to be more efficient to use fuzzy sets which do not form a fuzzy partition; an example related to face and pose detection is given in [25].
It is desirable to explain the efficiency of fuzzy partition requirement.We strongly believe that every time there is an unexplained empirical fact about data processing algorithms, it is desirable to come up with a theoretical explanation.Such an explanation makes the resulting algorithms more reliable, thus decreasing the possibility that these algorithms will fail and, correspondingly.increasing the chances that these efficient algorithms will be used by practitioners, even in potentially high-risk situations.Sometimes, the corresponding analysis finds conditions under which these methods work efficiently, and even helps develop even more efficient techniques.
What we do in this paper.In this paper, we show that the fuzzy partition requirement (4) can be naturally explained in the measurement interpretation of F-transform.
To be more precise, we show that what naturally appears is a 1-parametric family of similar requirements of which the fuzzy partition requirement is a particular case, and then we explain that in the fuzzy cases, it is indeed reasonable to use the fuzzy partition requirement.
The resulting explanation of the fuzzy partition requirement is the main contribution of this paper.
The structure of this paper.The structure of our paper is as follows.The main idea behind our explanation is presented in Section 2, for a very general (not necessary fuzzy) uncertainty.In Section 3, we analyze the case of probabilistic uncertainty.In Section 4, this analysis is generalized to general uncertainty.In Section 5, the analysis performed in the previous sections is used to explain which functions A i (t) we should choose in a general uncertainty situation and, in particular, in the case of fuzzy uncertainty.All these sections contain original research results.The final Section 6 contains conclusions and possible future research directions.
Comment.In the above text, we assumed that the actual signal x(t) is defined for all possible moments of time.In some practical situations, however, it makes sense to only consider discrete-time values x(t 1 ) x(t 2 ), . . .In such discrete-time situations, it makes sense to apply the following discrete F-transform: for some constant c, where the values A i (t k ) satisfy the same fuzzy partition requirement: that for each k, we have In the following text, we use the continuous case to explain the fuzzy partition requirement; however, as one can easily check, the same explanation holds for the discrete F-transform as well.

Main Idea
What if we can exactly measure instantaneous values?In the idealized case, when inertia of measuring instruments is so small that it can be safely ignored, we can measure the exact values x(t 1 ), x(t 2 ), . . ., of the signal x(t) at different moments of time.
In this case, we get perfect information about the values of the signal at these moments of time t 1 , t 2 , . . ., but practically no information about the values of the signal x(t) at any other moment of time.In other words: • we reconstruct the values x(t 1 ), x(t 2 ), . . ., with perfect accuracy (0 measurement error), while • the values x(t) corresponding to all other moments of time t are reconstructed with no accuracy at all (the only bound on measurement error is infinity).
Even if we take into account that measurements are never 100% accurate, and we only measure the values x(t i ) with some accuracy, we will still get the difference between our knowledge of values x(t) corresponding to different moments of time: • we know the values x(t i ) with finite accuracy, but • for all other moments of time t, we know nothing (i.e., the only bound of measurement error is infinity).
This difference does not fit well with the fact that we want to get a good representation of the whole signal x(t), i.e., a good representation of its values at all moments of time.Thus, we arrive at the following idea.
Main idea.To adequately represent the original signal x(t), it is desirable to select the measurement procedures in such a way that, based on these measurements, we reconstruct each value x(t) with the same accuracy.
Comment.At this moment, we have presented this idea informally.In the following sections, we will show how to formalize this idea -and we also show that this idea leads to the fuzzy partition requirement.
To be more precise, this idea leads to a general formula that includes the fuzzy partition requirement as a particular case.We also explain why namely the fuzzy partition requirement should be selected in the fuzzy case.

Case of Probabilistic Uncertainty
Description of the case.Let us start with the most well-studied uncertainty: the probabilistic uncertainty.In this case, we have probabilistic information about the measurement error ∆m i def = m i − m i of each measurement, where m i denotes the result of measuring the quantity m i .
We will consider the usual way measurement uncertainties are treated in this approach (see, e.g., [43]): namely, we will assume: • that each measurement error ∆m i is normally distributed with 0 mean and known standard deviation σ, and • that measurement errors ∆m i and ∆m j corresponding to different measurements i = j are independent.
How accurately can we estimate x(t) based on each measurement.
Based on each measurement, we know each value m i = A i (t) • x(t) dt with accuracy σ.The integral is, in effect, a large sum, so we have Thus, for each moment t, we have and therefore, The measurement result m i is an estimate for the quantity m i , with mean 0 and standard deviation σ.Thus, if we know all the values x(s) corresponding to s = t, then, based on the result m i of the i-th measurement, we can estimate the remaining value x(t) as By comparing the formulas ( 6) and ( 7), we can conclude that the approximation Since the measurement error ∆m i is normally distributed, with 0 mean and standard deviation σ, the approximation error ∆x i (t) is also normally distributed, with 0 mean and standard deviation How accurately can we estimate x(t) based on all the measurements.For each moment t, based on each measurement i, we get an estimate x i (t) ≈ x(t) with the accuracy σ i described by the formula (9): x(t) ≈ x n (t).
For each estimate, since the distribution of the measurement error is normal, the corresponding probability density function has the form Since the measurement errors ∆m i of different measurements are independent, the resulting estimation errors ∆x i (t) = x i (t)−x(t) are also independent.Thus, the joint probability density corresponding to all the measurements is equal to the product of all the values (11) corresponding to individual measurements: As a combined estimate x(t) for x(t), it is reasonable to select the value for which the corresponding probability (12) is the largest possible.This is known as the Maximum Likelihood Method; see, e.g., [44].
To find such a maximum, it is convenient to take the negative logarithm of the expression (12) and use the fact that − ln(z) is a decreasing function -so the original expression is the largest if and only if its negative logarithm is the smallest.Thus, we arrive at the need to minimize the sum n i=0 this minimization is known as the Least Squares approach.Differentiating the expression ( 13) with respect to the unknown x(t) and equating the derivative to 0, we conclude that and thus, that The accuracy σ(t) of this estimate can be determined if we describe the expression (12) in the form By comparing the coefficients at (x(t)) 2 under the exponent in the formulas ( 12) and ( 16), we conclude that i.e., equivalently, that In particular, if all the estimation errors were equal, i.e., if we had σ i (t) = σ(t) for all i, then, from (18), we would conclude that where N def = n + 1 is the overall number of combined measurements.Substituting expression (9) for σ i (t) into the formula (18), we conclude that Thus, the requirement that we get the same accuracy for all moments of time t, i.e., that σ(t) = const means that we need to have n i=0 Discussion.The formula ( 21) is somewhat similar to the fuzzy partition requirement but it is different: • in the fuzzy partition requirement, we demand that the sum of the functions A i (t) be constant, but • here, we have the sum of the squares.
The formula ( 21) is based on the probabilistic uncertainty, for which the measurement error decreases with repeated measurements as 1/ √ N .However, e.g., for interval uncertainty (see, e.g., [10,23,26,43]), when we only know the upper bound on the measurement errors, the measurement error resulting from N repeated measurements decreases as 1/N ; see, e.g., [45].
So maybe by considering different types of uncertainty, we can get the fuzzy partition formula?To answer this question, let us consider a general case of how uncertainties can be combined in different approaches.

How Uncertainties Can Be Combined in Different Approaches
Towards a general formulation of the problem.In the general case, be it probabilistic or interval or any other approach, we can always describe the corresponding uncertainty in the same unit as the measured quantity.
In the interval approach, a natural measure of uncertainty is the largest possible value ∆ of the absolute value |∆x| of the approximation error ∆x = x − x, where x is the actual value of the corresponding quantity and x is the measurement result.This value ∆ is clearly measured in the same units as the quantity x itself.
In the probabilistic approach, we can use the variance of ∆x -which is described in different units than x -but we can also take the square root of this variance and consider standard deviation σ, which is already described in the same units.
In the general case, let us denote the corresponding measure of accuracy by ∆.The situation when we have no information about the desired quantity corresponds to ∆ = ∞.The idealized situation when we know the exact value of this quantity corresponds to ∆ = 0.
If ∆ and ∆ are corresponding measures of accuracy for two different measurements, then what is the accuracy of the resulting combined estimate?Let us denote this combined accuracy by ∆ * ∆ .
In these terms, to describe the combination, we need to describe the corresponding function a * b of two variables.What are the natural properties of this function?
Commutativity.The result of combining two estimates should not depend on which of the two estimates is listed first, so we should have a * b = b * a.In other words, the corresponding combination operation must be commutative.
Associativity.If we have three estimates, then: • we can first combine the first and the second ones, and then combine the result with the third one, • or we can first combine the second and the third ones, and then combine the result with the first one.
The result should not depend on the order, so we should have (a * b) * c = a * (b * c).
In other words, the corresponding operation should be associative.
Monotonicity.An additional information can only improve the accuracy.Thus, the accuracy of the combined estimate cannot be worse than the accuracy of each of the estimates used in this combination.So, we get a * b ≤ a.
Similarly, if we increase the accuracy of each measurement, the accuracy of the resulting measurement will increase too: if a ≤ a and b ≤ b , then we should have a * b ≤ a * b .
Non-degenerate case.If we start with measurements of finite accuracy, we should never get the exact value, i.e., if a > 0 and b > 0, we should get a * b > 0.
Scale-invariance.In real life, we deal with the actual quantities, but in computations, we need to describe these quantities by their numerical values.To get a numerical value, we need to select a measuring unit: e.g., to describe distance in numerical terms, we need to select a unit of distance.
This selection is usually arbitrary.For example, for distance, we could consider meters, we could consider centimeters, and we could consider inches or feet.It is reasonable to require that the combination operation remains the same if we keep the same quantities but change the measuring unit.Let us describe this requirement in precise terms.
If we replace the original measuring unit with a new one which is λ times smaller, then all the numerical values are multiplied by λ.For example, if we replace meters by centimeters, then all the numerical values are multiplied by 100.The corresponding transformation x → λ • x is known as scaling.
Suppose that in the original units, we had accuracies a and b and the combined accuracy was a * b.Then, in the new units -since accuracies are described in the same units as the quantity itself -the original accuracies become λ • a and λ•b, and the combined accuracy is thus (λ•a) * (λ•b).This is the combined accuracy in the new units.It should be the same as when we transform the old-units accuracy c = a * b into the new units, getting λ • (a * b): This invariance under scaling is known as scale-invariance.
Discussion.Now, we are ready to formulate the main result.To formulate it, we list all the above reasonable properties of a combination operation in the form of the following definition: Comment.This definition is similar to similar definitions presented in [1] for quantum systems and in [7] for the neural networks.However, because of the different application domains, the above definition is somewhat different: e.g., in our case, we have non-degeneracy requirement which is natural for combining uncertainty but not in the above two domains.Proof of this result is, in effect, described in [1] (see also [7]).
Comment.The proof shows that if we do not impose the non-degeneracy condition, the only other alternative is a * b = 0. Thus, the non-degeneracy condition can be weakened: instead of requiring that a * b > 0 for all pairs of positive numbers a and b, it is sufficient to require that a * b > 0 for at least one such pair.
Discussion.The form min(a, b) is the limit case of the second form when β → ∞.
In the situation when we have N measurement results with the same accuracy ∆ 1 = . . .= ∆ N = ∆, the combined accuracy ∆ can be determined from the formula ∆ Which value β should we use in the case of fuzzy uncertainty.In the fuzzy case (see, e.g., [3,11,24,31,33,36,46]), the usual way of propagating uncertainty -Zadeh extension principle -is equivalent to applying interval computations for each α-cut.Thus, for analyzing fuzzy data, it makes sense to use the value of β corresponding to interval uncertainty -which, as we have mentioned at the end of the previous section, is β = 1.For β = 1, the formula (29) becomes the fuzzy partition property.Thus, when analyzing fuzzy data, the use of fuzzy partition property is indeed justified.

Conclusions and Future Work
Conclusions.In many applications of fuzzy techniques, including applications of F-transforms, we use fuzzy sets A 1 (t), . . ., A n (t) that form a fuzzy partition -in the sense that for each t, the corresponding degrees A i (t) add up to 1 (or to a constant): i A(t) = 1.Empirically, in many applications, the fuzzy partition requirement indeed helps, but why it helps -this, until now, remained a mystery.
In this paper, we provide a theoretical justification for this requirement.Specifically, we show that the fuzzy partition requirement naturally follows from the desire to have the signal values at different moments of time to be estimated with the same accuracy.
Possible directions of future research.While our main objective was to explain the ubiquity of the fuzzy partition requirement in fuzzy logic, our analysis started on a more general note, by considering general uncertainty -of which fuzzy is a particular case.In addition to the case of fuzzy uncertainty, we also explicitly analyzed another important particular type of uncertaintyprobabilistic uncertainty.
It is desirable to extend this analysis to other types of uncertainty, e.g.: • to different imprecise probability situations, and • to situations when different functions A i (t) correspond to different types of uncertainty.
It is also desirable to analyze the situations (like the situation mentioned in Section 1) when empirically, fuzzy sets that do not form a fuzzy partition work better.Maybe in this case, a more general scheme with β = 1 will help?

Definition 1 .
By a combination operation, we mean a function a * b that transforms two non-negative numbers a and b into a new non-negative number and for which the following properties hold: • for all a and b, we have a * b = b * a (commutativity); • for all a, b, and c, we have (a * b) * c = a * (b * c) (associativity); • for all a and b, we have a * b ≤ a (first monotonicity requirement); • for all a, b, a , and b , if a ≤ a and b ≤ b , then a * b ≤ a * b (second monotonicity requirement); • if a > 0 and b > 0, then a * b > 0 (non-degeneracy); and • for all a, b, and λ > 0, we have (λ•a) * (λ•b) = λ•(a * b) (scale-invariance).

Funding
This work was supported by the Center of Excellence in Econometrics, Faculty of Economics, Chiang Mai University, Thailand, and by the US National Science Foundation via grants 1623190 (A Model of Change for Preparing a New Generation for Professional Practice in Computer Science) and HRD-1242122 (Cyber-ShARE Center of Excellence).