Why Triangular Membership Functions Are So Efficient in F-Transform Applications : A Global Explanation to Supplement the Existing Local One

The main ideas of F-transform came from representing expert rules. It would be therefore re reasonable to expect that the more accurately the membership functions describe human reasoning, the more efficient will be the corresponding F-transform formulas. We know that an adequate description of our reasoning corresponds to complicated membership functions – however, somewhat surprisingly, most efficient applications of F-transform use the simplest possible triangular membership functions. There exist some explanations for this phenomenon which are based on local behavior of the signal. In this paper, we supplement this local explanation by a global one: namely, we prove that triangular membership functions are the only one that provide the accurate description of appropriate global characteristics of the signal. 1 Formulation of the Problem F-transforms: a brief reminder. In many application areas, it turned to be very efficient to transform the original signal x(t) into the values proportional to xi = ∫ A ( t− ti h ) · x(t) dt, where ti = t0 + i · h for appropriate t0 and h > 0, and A(t) is a non-negative function: • which is equal to 0 outside the interval [−1, 1],

F-transforms: a brief reminder.In many application areas, it turned to be very efficient to transform the original signal x(t) into the values proportional to where t i = t 0 + i • h for appropriate t 0 and h > 0, and A(t) is a non-negative function: • which is equal to 0 outside the interval [−1, 1], • which, starting at t = −1, increases to 1 until it reaches t = 0, • which then decreases to 0, and for all t; this last property is known as the fuzzy partition property.
This transform comes from the general fuzzy approach (see, e.g., [2,3,5,7,10,15]), namely, from the idea of describing imprecise (fuzzy) expert knowledge, of the type "if t is close to t i , then x(t) is close to x(t i )".From this viewpoint, the function A(t) is a membership function that corresponds to the word "close".
rather complex membership functions are needed to represent our reasoning; see, e.g., [10].However, surprisingly, in many of these applications, very efficient results are obtained when we use a very simple triangular membership function A(t) = 1 − |t|.Why?
One possible "local" explanation -based on uncertainty -was proposed in [4]; however, not everyone was convinced, so this empirical fact still remains somewhat a mystery.
What we do in this paper.In this paper, we propose an alternative "global" explanation for this efficiency, an explanation based on the need to correctly reconstruct global characteristics of the signal.In some cases, we are interested in the local behavior of the signal.In this case, we try to measure values which are as close to x(t) as possible.F-transform values are an example of such a local analysis.
In other cases, we are interested in the global trend.In such cases, instead of concentrating on a short-term time interval, we deliberately measure the signal over a long period of time.
Resulting idea.Let us describe this idea in precise terms.

Which Global Characteristics Should We Represent: Discussion
Need for linearization.Signals are usually weak.Thus, for any quantity q that depends on this signal x(t) -be it local or global -we should be able to ignore terms which are quadratic or higher order in terms of x(t) and thus retains only the linear terms in the corresponding dependence.As a result, we should only consider linear quantities, i.e., quantities of the type q = q(t) • x(t) dt.
Which linear quantities should we select?Of course, when we perform F-transform, we lose some information about the signal.Indeed, on each time interval, we replace infinitely many values x(t) corresponding to infinitely many moments of time t from this interval, with finite many values of the corresponding F-transform.Thus, we cannot perfectly reconstruct all possible global characteristics qsince from the values of all these characteristics, e.g., of the integrals t −∞ x(s) ds -we would be able to uniquely reconstruct all the values x(t).
Thus, we need to select the most appropriate global characteristics.

How to define what is most appropriate?
In different situations, different global characteristics may be more appropriate.In this paper, instead of trying to list specific notions of appropriateness, we will consider all possible criteria of this type.
Interestingly, it turns out that all reasonable criteria of this type lead, in effect, to the same family of optimal global characteristics -and the only way to reconstruct these characteristics exactly is to use triangular membership functions.
Let us describe all this in precise terms.

Towards Precise Formulation of the Problem
Towards describing what is more appropriate and what is less appropriate.As we have mentioned, all global characteristics have the form q = q(t) • x(t) dt.Thus, selecting a characteristic is equivalent to selecting the corresponding function q(t).This function q(t) may be discontinuous, as in the above example of a characteristic t −∞ x(s) ds.
However, at least it should be measurable (non-measurable functions cannot be defined without using the Axiom of Choice, which means that they are not definable).
Of course, if we can reconstruct the value q(t) • x(t) dt, then, for every real value c, we can also reconstruct the related value (c • q(t)) • x(t) dt, since this related value is simply equal to Thus, strictly speaking, a characteristics is represented not by a single function, but by the entire family {c • q(t)} c =0 of the related functions.So, we arrive at the following definition.Definition 1.By a characteristic or, alternatively, a family, we mean a family of the type {c • q(t)} c =0 , where q(t) is a given measurable function, and c runs over all possible non-zero real numbers.
Discussion.What do we mean when we say that some characteristic (family) are more appropriate and some are less appropriate?We mean that we have some criterion according to which, for every two families F and G, we can say one of the three things: • we can say that F is more appropriate than G; we will denote this by G ≺ F; • we can say that G is more appropriate than F; we will denote this by F ≺ G; • or we can say that the two characteristics are equally appropriateness; we will denote this by No matter what is the criterion, we have these relations.Thus, we can simply make these relations the definition of a criterion.
Of course, we need to make sure that these relations are consistent: e.g., if F is better than G and G is better than H, then F should be better than H. Thus, we arrive at the following definition.Definition 2. By a criterion for selecting a characteristic, we means a pair of relations ≺, ∼ that satisfies the following properties: • for every two characteristics F and G, we have one of only one of three options: • F ∼ F, and Discussion.The whole purpose of selecting a criterion is to use this criterion for selecting the best (most adequate) characteristic, i.e., a characteristic which is better -according to this criterion -than any other characteristic.So, if there is no such optimal characteristics, the corresponding criterion is useless.But what if there are several characteristics which are all the most appropriate according to the given criterion?
In this cases, we can use this non-uniqueness to optimize something else.For example, if several characteristics are equally good in terms of accuracy with which we can predict the future behavior of the signal, then we can select among them the characteristic which is the easiest to compute.As a result, we get, in effect, a new criterion, according to which F is better than G if: • either F better than G according to the original criterion, • or F equivalent to G in terms of the original criterion but better according to the additional criterion.
If for the new criterion, we still have several different optimal characteristics, we can then optimize something else, etc., until we reach a final criterion for which there is exactly one optimal characteristic.Definition 3.
• We say that a characteristic F is optimal with respect to the criterion ≺, ∼ if for every characteristic G, we have G ≺ F or G ∼ F.
• We say that the criterion is final if there exists exactly one characteristic which is optimal with respect to this criterion.
Need for scale-invariance.A signal x(t) describes how the value of a physical quantity x depends on time.We may have a starting point for the corresponding process, which provides a natural starting point for measuring time, but in general, the numerical value of time depends on what unit we use for measuring time.We can use seconds or minutes or hours -the time interval will be the same but the numerical values will change.
When we replace the original unit for measuring time with a new unit which is λ times smaller, then all numerical values of time are re-scaled, i.e., multiplied by λ.For example, if we go from seconds to milliseconds, all numerical values are multiplies by 1000.The function q(t) in the new unit becomes q(λ • t).
It is reasonable to require that the relative quality of different characteristics should not change if we simply change the unit used for measuring time, without changing anything of substance.In other words, it is reasonable to require that the criterion be "scale-invariant".Here is a precise definition.
Definition 4. We say that a criterion ≺, ∼ is scale-invariant if for every two functions q(t) and r(t) and for every λ > 0, the following two conditions hold: Discussion.We want to find all membership functions that allow us to reconstruct the most adequate global characteristics.To find these functions, we will first describe which characteristics are the most adequate.Then, we will analyze which membership functions allow us to reconstruct the values of these characteristics from the results of the F-transform.

Which Characteristics Are the Most Adequate: Preliminary Result
Discussion.In the previous section, we argued that the most adequate global characteristic must be optimal with respect to some final scale-invariant criterion.Let us describe all such characteristics.
Proposition 1.For every final scale-invariant criterion, each optimal characteristic has the form {c • x β } c , for some real value β.
Proof.Let us denote the scaling transformation that transforms a family F = {c • q(t)} c into a re-scaled family {c • q(λ • t)} c by T λ .In terms of this notation, scale-invariance means that: Let ≺, ∼ be the final scale-invariant criterion.Since this criterion is final, there exists exactly one optimal characteristic F opt .Let us prove that this characteristic is scale-invariant, i.e., that T λ (F opt ) = F opt for all λ > 0. (This proof is similar to the one given in [6].)Indeed, since F opt is optimal, it is better than or equivalent to any other characteristic.In particular, for every G, the characteristic F opt is better than or equivalent to T 1/λ (G): By applying scale-invariance, we conclude that However, one can easily check that T λ (T 1/λ (G)) = G.
Thus, for every characteristic G, we have either G ≺ T λ (F opt ) or G ∼ T λ (F opt ).By definition of an optimal characteristic, this means that the characteristic T λ (F opt ) is optimal.However, for the final criterion, there is only one optimal characteristic, so we conclude that T λ (F opt ) = F opt .Thus, the optimal characteristic is indeed scale-invariant.
By definition, each characteristic has the form {c • q(t)} c .Let us denote the function q(t) corresponding to the optimal characteristic by q opt (t).The fact that the optimal family is scale-invariant means, in particular, that for every λ > 0, the function q opt (λ • t) -which belongs to the re-scaled family T λ (F opt ) -also belongs to the original family, i.e., has the form c(λ) • q opt (t) for some value c(λ): It is known that the only measurable functions satisfying this functional equation are functions of the type C • t β ; see, e.g., [1].The proposition is proven.
Discussion.Let us now find out which membership functions can allow us to reconstruct these most adequate characteristics.

Which Membership Functions Enable Us to Reconstruct the Most Adequate Global
Characteristics Definition 5. We say that for a membership function A(t), it is possible to always reconstruct a global characteristic q(t) • x(t) dt if for every t 0 and h, the value of this characteristic can be uniquely determined once we know all the values Case of β = 0.A particular case of the most adequate global characteristic is the case β = 0, when q(t) = const and the corresponding global characteristic is simply the integral x(t) dt = 1.This characteristic can always be reconstructed from the F-transform, since we require that ∑ General case.Thus, we should worry only about the case when β = 0.In this case, we have the following result.Comment.This result provides the desired global explanation of why triangular membership functions are so efficient in F-transform applications.
Proof.Let us assume that for some β = 0, the membership function A(t) enables us to always uniquely reconstruct the corresponding characteristic Let us first consider the case when t 0 = 0, h = 1, and the signal x(t) is equal to 0 everywhere except for the interval [0, 1].Then, only two F-transform values are different from 0: • the value x 0 =

Proposition 2 .
The only membership function A(t) for which it is possible to always reconstruct a most adequate global characteristic with β = 0 is the triangular membership function -it can reconstruct the characteristic t • x(t) dt corresponding to β = 1.

2. Local Vs. Global Characteristics: Main Idea What we mean by local and global characteristics.
No measuring instrument can provide an instantaneous value of a physical quantity.No matter at what time t we perform our measurement, the measurement result depends not only on the value of the signal x(t) at this moment of time, but also on the values x(s) at nearby moments of time.
To most adequately reconstruct the signal, we should be able to adequately reproduce