On the Normalization of Interval Data

: The impreciseness of numeric input data can be expressed by intervals. On the other hand, the normalization of numeric data is a usual process in many applications. How do we match the normalization with impreciseness on numeric data? A straightforward answer is that it is enough to apply a correct interval arithmetic, since the normalized exact value will be enclosed in the resulting “normalized” interval. This paper shows that this approach is not enough since the resulting “normalized” interval can be even wider than the input intervals. So, we propose a pair of axioms that must be satisﬁed by an interval arithmetic in order to be applied in the normalization of intervals. We show how some known interval arithmetics behave with respect to these axioms. The paper ends with a discussion about the current paradigm of interval computations.


Introduction
Normalization of input data is an important method in many applications involving numerical data. When such input data contain impreciseness they are normally represented by intervals, which can be operated by various available interval arithmetics-for example, [1][2][3]. There is no reference relating normalization and interval arithmetics. Under the current paradigm, an interval arithmetic is fixed beforehand and the whole calculation (even the normalization) is done with such arithmetic. As we will see, just the step of normalization can be responsible for increasing the uncertainty of input data when they are normalized. Therefore, the choice of an arithmetic to perform the step of normalization is a very important step, since, depending on the computed operations, the output can be even more imprecise.
To solve this problem, this paper proposes two axioms that an interval arithmetic must satisfy in order to be applied at the step of normalization. We think that the normalization must be done separately from the rest of the computation without the introduction of uncertainty, since it is just translation of input data. So, any arithmetic applied in such process must satisfy the axioms proposed here-see Figure 1. The normalization involves division, and division is always faced as an operation having a connection with multiplication (as the inverse operation of that). This viewpoint comes from our experience with numbers. However, this approach has not been sustainable whenever we deal with entities representing uncertainty. For intervals, the usual division does not satisfy the whole properties that connect real division to multiplication. In fact, some interval arithmetics were developed to recover this relation and the benefits that come with the algebraic method-see, for example, [4][5][6][7]. The same situation occurs when we take into account the usual division for fuzzy numbers. Therefore, the notion of interval division requires a careful reflection. In the sequel (Section 2), we show the close relation between the notions of division, partition and normalization. This will be the basis to propose our axioms.
We organized this paper in the following way: Section 2 introduces the notion of Partition Principle (PP), which is an intuitive principle that is realized by the normalization of numbers and by other contexts. We also introduce the notion of Interval Division Structures (IDS), which will be interval structures that satisfy the (PP). Section 3 investigates some known arithmetics and shows if they fit in our axiomatic system. Section 4 provides some consideration on Constrained Interval Arithmetic (CIA) with respect to its approach strongly related with methods of optimization. Finally, Section 5 shows that until now we do not have a universal representation for intervals that is fast to compute and promotes a good division able to capture the notion of data normalization. That section ends by proposing a change in the usual paradigm of interval computation.

Partition Principle and Interval Normalization
The literature offers a plethora of interval arithmetics. Some of them intend to recover the algebraic structure of real numbers. Like real numbers, which are seen as representing quantities, measures, and vectors, intervals also have different viewpoints. They are seen as sets, numbers, information about real numbers and real numbers with imprecision [8]. This last viewpoint lead us to see that the "introduction" of imprecision on real numbers induces the loss of field properties of some arithmetics.
Until now interval division has been faced as part of a whole interval arithmetic and has not been faced separately. In other words, the authors have followed the tradition to define the four basic operations and verify the properties of division with respect to the whole arithmetic; mainly that division is connected to multiplication.
However, observe the following notion of division: Partition Principle (PP) To Divide a "whole" is to provide a "partition" of it in such a way that when the parts are "collected" together we recover the "whole".

Real Numbers
(PP) is perfectly captured by the division on real numbers since for a finite set of reals: A " tr 1 , . . . , r n u, . .`r n n ř j"1 r j " 1. (1) The "'partition", here, is represented by the set / -, the "whole" is numerically represented by "1" and "to collect" means "to sum".

Sets
A finite partition of a set X is a family of non-empty pairwise disjoint subsets A " tA 1 , . . . , A n u, the union of which recovers X. The "whole" is the set X and "to collect" means to provide the union of sets in the partition: Example 2. Let X " ta, b, c, d, eu, the family of sets A " ttau, tb, cu, td, euu is a partition of X.
Both examples satisfy (PP). In the case of numbers, the "whole" is always represented by the number one. In the case of sets, the whole is represented by the considered universe set X. What about intervals?
As we have stated, interval division has not been defined in this way, but it has been faced as part of an interval arithmetic with some properties. Instead, in what follows we provide an axiomatic system for interval division, which is based on (PP) and investigate how some operations in the literature behaves with respect to such axioms. Before we proceed, we recall the following definitions. Definition 1 ([9]). Given a set A, x, e P A and an associative operation ' : AˆA Ñ A. If x ' e " e ' x " x, then the structure xA, ', ey is called a monoid.

Definition 2 ([10]
). The set ra, bs " tx : R | a ď x ď bu is called the closed interval between a and b. The set of all such intervals is denoted by IpRq. Given an interval ra, bs P IpRq, the width of ra, bs is wpra, bsq " b´a.

Interval Division Structures
In this section we introduce the notion of interval structures that interpret the partition principle (PP). They are called Interval Division Strutures (IDS) and are our proposal to perform the normalization of interval data. (D-1) Given a finite family of intervals A " tA 1 , . . . , A n u such that 0 R pA 1 ' . . . ' A n q, (2) (D-2) wpNpAqq ď n ř j"1 wpA j q, for all 1 ď j ď n.
The first axiom states that the division must be able to provide an interval that contains the normalization of the numbers contained in each A i .
The second axiom generalizes the standard notion of normalization of real numbers because, since real numbers can be seen as degenerate intervals, Equation (1) is an instance of (D-2). Moreover, (D-2) establishes a relation between the impreciseness of input data and that produced by division; namely, the second should not exceed the first. In other words, the normalization of interval data should not introduce additional impreciseness.
In what follows we show how some interval divisions behave with respect to the axioms of IDS. That is, we will check whether they can be used to normalize data according to our interpretation.

Interval Division Structures and Some Interval Artithmetic
In this section we show how some interval sums and divisions behave with respect to the proposed axiomatic system. An interval, X, will be denoted by X " rx, xs.
The standard interval operations produce wider error bound whenever the same interval appears more than once in an interval expression-for example, X´X or pX¨Yq`pX¨Zq-in both cases the interval X appears more than once and the resulting evaluated interval is wider than the direct image, respectively: r0, 0s and X¨pY`Zq. This is called the variable dependence problem (VDP) [12], which means that each occurrence of a variable (independent of the expression) influences on the output precision. For instance, consider X " r1, 2s, Y " r2, 3s and Z " r´1, 4s. Then X´X " r´1, 1s, XpY`Zq " r1, 14s and XY`XZ " r0, 14s, hence X´X ‰ 0 and XpY`Zq Ď XY`XZ (subdistributivity). One can go further, consider f pxq " px`1qpx´1q, note that´1 ď f pxq ď 3 if x P r´2, 1s. But f pr´2, 2sq " pr´2, 1s`1qpr´2, 1s´1q " r´1, 2sr´3,´2s " r´6, 3s.
In order to overcome (VDP) some further interval arithmetics were proposed [12]. In what follows we briefly present the sum and division of some arithmetics and show if they satisfy or not our axioms.
Therefore this arithmetic will not satisfy the axiom (D-2).

Hansen's or Generalized Approach
In 1975 E.R. Hansen [5] proposed a representation for intervals together with an arithmetic. The idea was to maintain "registered" the imprecision of each input interval until the end of the computation and to use it to provide the computation's output in the standard interval form. In what follows, we use the notation proposed by Hansen in [5].
Given n input intervals: X 1 , . . . , X n with j P t1, . . . , nu, midpoints m j and width w j , let be s j " w j 2 and U j " r´s j , s j s. Assuming the standard interval arithmetic, each interval X j is represented in Generalized Interval Arithmetic (GIA) form in the following way: where A j 0 " m j and U j " r´s j , s j s. Note that A j k (for k ą 0) are intervals and pA j n¨Un q is the standard interval multiplication. If X j is an input interval, then A j k " r1, 1s (for j " k) and A j k " r0, 0s otherwise. Example 3. Given the intervals: X 1 " r1, 2s, X 2 " r5, 10s and X 3 " r´4,´1s, since we are considering three intervals, their GIA representation is: The generalized interval arithmetic will produce new intervals with new A i j 's with the same U i 's.

Addition and Difference
The addition of n generalized intervals according to Equation (7) is given by: The difference is given by: Example 4. Assuming the previous intervals, let's calculate pX1`X3q`pX2´X1q.

Division
The division of two generalized intervals is given by: Example 5. Now, for simplicity, consider just X 2 and X 3 , then: K¯" r´3,´0.75s. So, Proposition 1. Let A " tX 1 , . . . , X n u be a family of intervals s.t.
Then we can write: Proof. According to Equation (10) one can write: Remember that, Thus, Hence,  ı and X 2 " Using only X 1 and X 2 in Hansen's representation, according to Equation (7) and by Corollary 1, the Hansen division satisfies (D-1) but does not satisfy (D-2): Then wpNpAqq " 1.0316 ą 0.583333 " wpX 1 q`wpX 2 q. Therefore, Hansen's sum and division do not provide an IDS.

Affine Arithmetic
Developed by Jorge Stolfi and Luiz Figueiredo [4,14], Affine arithmetic (AA) is a method proposed to overcome the overestimation. The ideal quantities are represented by affine forms. Each AA operation provides a framework in which the approximation errors have, normally, a quadratic dependency with respect to width of the input intervals even when the operands are correlated, for example, in X¨p1´Xq. Thus, for tight input intervals, the operations provide tight estimations of the exact range.
In AA, a quantity x is written by an affine form: The coefficients x i are finite floating-point numbers and the i are symbolic real variables, the values of which are unknown but assumed to lie in the interval U " r´1,`1s. The value x 0 is called central value of p x, whereas each coefficient x i is called a partial deviation. Finally, each i represents noise. Each affine expression p x " x 0`x1 1`¨¨¨`xn n implies an interval bound for the corresponding ideal quantity x, namely x P X " rx 0´s , x 0`s s, where s " n ÿ i"1 |x i | is the total deviation of p x. Conversely, every interval X " ra, bs representing a real number x can be written as an affine form p x " x 0`xk k s.t.: x 0 is the midpoint a`b 2 , x k " b´a 2 , and k is a new symbol for noise-i.e., it does not occur in any other existing affine expression. This is very important, since some operations-even one primitive arithmetical operation-will require this resource. Those operations are classified as non-affine operations and are described in [4] (p. 53).

Generalized Hukuhara's Division
The usual interval arithmetic produces some particular issues, for instance, we can find intervals, X, Y and Z, s.t. pX`Yq´Y ‰ X. To overcome this problem, Hukuhara [15] proposed a new difference for intervals, called H-difference: An important property of this difference is that X H X " r0, 0s and pX`Yq´Y " X. Although this difference provides a unique value it is a partial function. For example the difference, r3, 4s H r2, 5s, is not defined since the resulting value should be r1,´1s, which is not an interval. In order to extend this operation to a total one, Stefanini [6] proposed the generalized difference: In the same way, Stefanini proposed a generalized division: For X " rx, xs and Y " ry, ys, s.t. 0 R py, yq, X˜g Y " Z can be evaluated according to the following rules: -Case 1: If 0 ă x and y ă 0, then: If x¨y ě x¨y, then z " x y and z " x y .
If x¨y ď x¨y, then z " x y and z " x y .
-Case 2: If 0 ă x and 0 ă y, then: If x¨y ď x¨y, then z " x y and z " x y .
If x¨y ě x¨y, then z " x y and z " x y .
-Case 3: If x ă 0 and y ă 0, then: z " x y and z " x y whenever x¨y ď x¨y.
z " x y and z " x y whenever x¨y ě x¨y.
-Case 4: If x ă 0 and 0 ă y, then: If x¨y ď x¨y, then z " x y and z " x y .
If x¨y ě x¨y, then z " x y and z " x y .
-Case 5: If x ď 0, x ě 0 and y ă 0, then the solution does not depend on y, thus: z " x y and z " x y .
-Case 6: If x ď 0, x ě 0 and 0 ă y, then the solution does not depend on y, thus: z " x y and z " x y .
If 0 P ry, ys the generalized division is undefined; for intervals Y " r0, y] or Y " ry, 0s the division is possible but obtaining unbounded results Z of the form Z " p´8, zs or Z " rz,`8q.

Constrained Arithmetic
The standard approach and its extensions are based on what Lodwick [12] calls the axiomatic approach: The operations are defined in terms of equations that determine how to obtain intervals by the calculation with endpoints. For example, X`Y " rx`y, x`ys.
"The power of the axiomatic approach to interval arithmetic is that it is simple to apply. Its complexity is at most four times that of real-valued arithmetic. However, the axiomatic approach to interval arithmetics leads to overestimations in general because it takes every instantiation of the same variable independently." [12] The most important property for an interval operation is correctness [16]: For a real number x, an interval X, a real operation f and an interval operation F, This can be achieved for interval functions that satisfy the extension principle. It comes from set theory [17] and is simply the application of direct images.

Definition 4.
Given any function, f : A Ñ B, and the powersets PpAq and PpBq, there is a function called direct image of f , f : PpAq Ñ PpBq, such that f pSq " t f pxq : The direct image is also noted as Pp f q (see [18] (p. 195 Ex.3)). For the real line, the direct image of continuous functions, f : R n Ñ R, always maps closed intervals into closed intervals. In other words, f pra, bsq " t f pxq : x P ra, bsu " rc, ds. Since the operations of real arithmetic are continuous functions, then the definition of interval arithmetic can be expressed in the form: ra, bs˚rc, ds " tx˚y : x P ra, bs and y P rc, dsu.
where˚P t`,´, {,ˆu. However, this definition imposes all the combinations of elements of both intervals which leads to situations in which X´X ‰ 0, even when X is representing the same number. This problem of overestimation is solved, for example, by using the Hansen's approach. Lodwick [12] proposed another approach called: Constrained Interval Arithmetic (CIA). The idea follows the same approach proposed by some other interval arithmetics (like Range [13] or Hansen/Generalized arithmetics [5]) in which the authors provide another representation form to represent intervals, an arithmetic for such representation and a way to recover the set interpretation of intervals. In the case of the CIA approach, an interval is represented as a function: Definition 5. Given an interval X " rx, xs with width, w x , the constrained function, X I : r0, 1s Ñ R, related to X is, These functions are also called constrained intervals.
Observe that x and x are known whereas λ x is varying.

Definition 6.
Assuming the real operations:˚P t`,´, {,ˆu, an arithmetic associated with constrained intervals, called constrained interval arithmetic (CIA), is given by: where the resulting interval, Z " X˚Y, is the set Z " tθpx, yq : 0 ď λ x , λ y ď 1 ) , for θpx, yq "´p1´λ x q¨xp λ x q¨x¯˚´p1´λ y q¨y`pλ y q¨y¯. So, The case "X I {Y I " is undefined whenever 0 P Y.
Observe that CIA is an arithmetic in which the evaluation of expressions has a powerful influence on the result. For example, for X " r1, 2s and Y " r´1, 1s, take the interval expression: X`Y´X. Then, X I pλ x q " 1¨p1´λ x q`2λ x " pλ x`1 q and Y I pλ y q " p´1q¨p1´λ y q`1¨λ y " p2λ y´1 q. The evaluation of the whole expression will lead to Y, since θpx, yq is equal to: "pλ x`1 q`p2λ y´1 qṕ λ x`1 q " p2λ y´1 q" and X`Y´X " r min 0ďλ y ď1 p2λ y´1 q , max 0ďλ y ď1 p2λ y´1 qs " r´1, 1s. However, if we first evaluate X`Y, we obtain the interval, Z " X`Y " r0, 3s, and the respective constrained function: Z I " 3λ z . Therefore, It is assumed that the whole expression is evaluated before the calculation of min and max. Proof. The proof is straightforward. For a given finite family of intervals tX 1 , . . . , X n u, NpAq " r min 0ďλ x 1 ,¨¨¨λ xn ď1 θ , max 0ďλ x 1¨¨¨λ xn ď1 θs, where θ : R n ÝÑ R defined by: θpλ x 1 , . . . , λ x n q " X I 1 pλ x 1 q X I 1 pλ x 1 q`¨¨¨`X I n pλ x n q`¨¨¨`X I n pλ x n q X I 1 pλ x 1 q`¨¨¨`X I n pλ x n q " X I 1 pλ x 1 q`¨¨¨`X I n pλ x n q X I 1 pλ x 1 q`¨¨¨`X I n pλ x n q " 1.

Some Considerations about CIA
The standard arithmetic (SIA) has a simple application and an irrelevant complexity for modern computers. Nevertheless, SIA produces to overestimation. Refinement of interval extensions is a method for computing arbitrarily sharp upper and lower bounds for the values of a real function (see [19]). As a consequence, the overestimation may be reduced. However, this method becomes too slow if we consider a large interval or complex functions, for instance exponential or trigonometric. On the other hand, constrained interval arithmetic (CIA) [12] overcomes the overestimations and becomes closer to real numbers. As we demonstrated, constrained arithmetic, in particular on division is an IDS, i.e., the overestimation on division is less than the sum of the terms, since D-2 is satisfied. However, in our opinion constrained arithmetic has some issues, for instance every operation needs to optimize a specific function, consider the following example: Example 10. Consider X " r´1, 2s and Y " r1, 2s. Let us evaluate Z " Y´X¨X Y . Using standard arithmetic (SIA) by few steps we reach Z " r´3, 4s. By CIA: Z " r min 0ďλ x ,λ y ď1 θpλ x , λ y q , max 0ďλ x ,λ y ď1 θpλ x , λ y qs, and we need to optimize: θpλ x , λ y q " pλ y`1 q´p 3λ x´1 q 2 pλ y`1 q on r0, 1sˆr0, 1s (see Figure 2).
Note that CIA reduces overestimation comparing with SIA. Also, in CIA X´X " 0, X{X " 1 and XpY`Zq " XY`XZ. However, these advantages have a cost, because many evaluations are required the by optimization process to find the final result.
At a glance CIA seems solve the problem of interval overestimation and has good algebraic properties, but CIA can become complex and slow if we increase the number of variables or add multiplications/divisions. For instance, consider X, Y intervals and Z " 2X´4Y, in this case to find Z it is necessary to optimize a function θ with two variables on r0, 1sˆr0, 1s, where θ is linear, i.e., a plane on three dimensional space. The optimization in this case is very simple, since θ is a continuous function on compact box r0, 1sˆr0, 1s and its minimum and maximum occurs on the border of the box, also as θ is a plane on R 3 the minimum and maximum occur on p0, 0q, p0, 1q, p1, 0q and p1, 1q. In fact, if we consider Z an expression formed by the sum or difference of any finite set of variables, to evaluate Z with CIA we need to evaluate θ on p0, 0, . . . , 0q, p1, 0, . . . , 0q, . . . p1, 1, . . . , 1q and take the minimum and maximum.
However, if Z " X¨Y or Z " X Y , the process can become complex, since the global minimum and maximum of θ can occur in the interior of the box (see Figure 2 from Example 10) and one needs to use the optimization process to find them. If we increase the number of variables, for instance Z " XYK, the optimization process becomes more complex. In this case, to find Z it is necessary to optimize θ on the hyper-rectangle r0, 1sˆr0, 1sˆr0, 1s. it is necessary to optimize the function θpλ x , λ y q " pλ y`1 q´p 3λ x´1 q 2 pλ y`1 q on r0, 1sˆr0, 1s, which has a high cost.

Final Remarks
This paper proposed a pair of axioms to characterize what would be a suitable interval division. The axiomatic system is inspired on what we call the partition principle (PP), which appears in different contexts. The most important application of this principle is on the process of data normalization. It is the starting point to answer the question: "What would be a normalization of interval data?" We think we have provided an answer for that by stating the axioms (D-1) and (D-2). Axiom (D-2) states that the requirement to reflect the notion of normalization is that a suitable division should not introduce impreciseness. However, this axiom can be weakened in order to permit a controlled introduction of impreciseness, namely: (D-2i) Dc ą 0, wpNpAqq ď c¨n ř j"1 wpA j q, for all 1 ď j ď n.
This axiom generalizes (D-2), since (D-2) is the case for c " 1. We have not found an interval division that satisfies it, however we proposed it here for future proposals of division.
Among the arithmetics investigated here, CIA is the only one that provides a division that captures the notion of normalization. As we have observed, it considers the whole expression to optimize it and recover the resulting interval. Since the space of functions is a distributive ring, it provides good algebraic properties for constrained intervals, however with the drawback of optimization, since a sequence of multiplications or division can become very hard to compute. Therefore, CIA cannot be considered a representation of any interval computation.
Hitherto, we have many proposals to represent intervals and operate with them. The idea is that the interval data could be translated to these representations, which will perform a computation in which the result will be transformed back to the usual interval representation. This is clear in approaches like: Hansen's, AA and CIA.
Since just CIA captures the notion of normalization, it is the only thing (at least here) that must be applied to execute such a procedure during the computation of interval data. However, the optimization issues can forbid us to execute all the computation using CIA. Another point is that operations like sum and difference of different works are like the same operations in SIA, which is fast and simple. Hukuhara operations are other options in the scenario.
What we want to mean is that we can think of interval computation from another perspective, instead to operate with intervals by using a single representation we propose the computation of interval expressions by using different representations. In other words, sub-expressions of an interval expression are translated to a suitable representation (e.g., Affine, Hansen), its computation is performed and the resulting (converted back) interval replaces the sub-expression. Normalization is always made by using CIA.
One consequence is that this approach will lead us to review the role of intervals on fuzzy arithmetic and fuzzy algebra, since most of the proposals deal with α-cuts [20].