In general, information is only comparable if its domain is the same. Otherwise, the comparison and interpretation of information becomes imprecise or even impossible. Therefore, in this Section, important exemplary domains will be discussed. Then (in

Section 4.4), preconditions for comparability will be defined exactly.

#### 4.1. Domain of Information: “Language Vocabulary”

In the case of language-based information, the domain is “language vocabulary” (i.e., a set of commonly known words and phrases, including the special terms, of a certain language). There should be a common language, but, even in this case, the domain “language vocabulary” is not exactly the same for all speakers. This can cause misunderstandings. For example, as a comment on the weather, Alice may say “It is cold” when, at the same temperature, Bob might say “It is not cold”, because the word “cold”, as an element of the domain “language vocabulary” for Bob (who may wear warmer clothes) has another definition than for Alice. A further deep problem is caused by combinatorial complexity and redundancy. Multiple phrases are possible in the same situation. For example, in this situation, Alice may also say “I’m freezing”.

#### 4.2. Translation of Original Information into Digital Representation using the Domain “Language Vocabulary”

Let us denote, by original information (“ORGINFO”), certain relevant original (language-independent) information that should be transported digitally as digital information (“DIGINFO”). In the case of typical language-based communication, ORGINFO is coded and transported by combinations of the domain “language vocabulary”. In the case of non-trivial ORGINFO, these combinations of words are long. As such, the coding (or representation) of ORGINFO by a free language is done in a non-reproducible way, and there is large variability in the resulting language-based digital representation, DIGINFO.

For an illustration of the principle, we first start with the abovementioned simple weather commentary example, assuming that the original situation ORGINFO means “The temperature is 16 °C”, which caused Alice to say “It is cold”. Using “language vocabulary” as the domain, ORGINFO can be represented as DIGINFO in several ways, as Alice could also say “I’m freezing” or Bob could even say “It is not cold”. In every case, Alice and Bob think that they translated ORGINFO correctly into language, but the resultant DIGINFO is so imprecise that it can even look contradictory (

Figure 1).

Conversely, when searching for ORGINFO using the domain “language vocabulary”, several terms can be entered. The precise term “16 degrees Celsius” is too seldom used in conventional texts to be representative of ORGINFO. Moreover, similar situations are also interesting—for example, the precise term “15 degrees Celsius”, as a measurement result of temperature, in all languages. As a text search of all possibly interesting precise terms is not practicable, the term “temperature” can be used to represent the imprecise term “It is cold”, as shown in

Figure 2. The search results represent very different original temperatures. More useful results are possible by searching for a longer, more specific text which represents additional features—for example, by searching for the combination “cold indoor temperature”. Some search results may already contain helpful information. Therefore, a text search is far better than nothing. Nevertheless, basic problems (e.g., incompleteness, overlapping, redundancy, imprecision) related to forward (

Figure 1) and backward (

Figure 2) translations of original information (ORGINFO) in the domain “language vocabulary” remain.

Completely different combinations of words or phrases (elements of the domain “language vocabulary”), as shown on the right side of

Figure 1, can have the same intended meaning, as shown on the left side of

Figure 1. In the case of a text search, the results may be imprecise because the meaning of the same text, as shown on the right side of

Figure 2, is imprecise and corresponds to many variants of ORGINFO, as shown on the left side of

Figure 2. This imprecision results from the use of the domain “language vocabulary”, which should be manageable and easily understandable.

To describe everything feasible using this domain, there is freedom in combining its elements (words and phrases). However, this leads to overlapping of meaning. The same thing can be described in several ways (i.e., by several different combinations of words). Therefore, a text search of a certain sequence yields only a part of all locations with this meaning. As the number of possible sequences increases exponentially with the count of words in the sequence, the probability of finding a certain meaning with a single word sequence decreases exponentially with the number of words in it. Thus, if more than a few words are necessary to obtain a certain meaning, the probability of finding the most interesting locations with this meaning using a text search becomes very small. Therefore, text searches are practicable only for short sequences of words.

However, in the case of professional communication (e.g., in medicine), communicated information is usually nested and non-trivial. This means that a few words are not sufficient to describe a certain situation. An additional introduction is necessary, which is too long to be searchable using a text search. As searchability and comparability of non-trivial and nested information is important, a solution is necessary.

#### 4.3. Domain of Information: Adapted to the Topic

For a precise comparison and search of ORGINFO, a solution that is less variable and more reproducible than using “language vocabulary” as the domain (see

Section 4.2) of DIGINFO is desirable. This is possible through the use of a topic-specific “adapted domain”, which is defined online, such that there is full reproducibility in both directions—that is, it forms a bijection (a one-to-one correspondence) between every variant in ORGINFO and their digital representations in DIGINFO.

As it is impossible to bijectively represent “all information” (i.e., “all features”) of reality digitally, the restriction to relevant features (i.e., sub-areas of information) is necessary. This is possible because ORGINFO is communicated within a certain topic—that is, it should only represent features which are relevant within the chosen topic. Thus, for the adaptation of the domain of ORGINFO to this topic, the following questions are (repeatedly) asked:

- (a)
Which (additional) independent feature (parameter) is relevant within the chosen topic? If an appropriate quantification of this feature is available online, reuse it; otherwise, ask:

- (b)
Which variants of the feature are possible? Quantify the feature, order its variants, and define a bijection to the numeric values of a parameter with the corresponding order.

For (a), relevant independent features are repeatedly searched. Every feature has variants which are selected (represented) by ORGINFO. If these are naturally ordered (e.g., have a quantitative magnitude), this order is taken; otherwise, a useful order is introduced. If the resulting order is multidimensional, every dimension can be regarded as an independent feature with a one-dimensional order.

After this, every resulting feature has a one-dimensional set of variants, such that every variant of every feature is bijectively represented (i.e., digitally selected) by a single number. Thus, the feature is quantified. If “N” denotes the count of all features, then the selection of the variants of all features is done digitally using N numbers (i.e., by an N-dimensional vector). The conversion of ORGINFO to this digital representation DIGINFO is a bijection into an N-dimensional vector space (i.e., the digital domain of DIGINFO) from the to the topic-adapted domain of ORGINFO. Due to this bijection, the domains of ORGINFO and DIGINFO can be treated as equivalent. This substantially simplifies our considerations in the case of adapted domains.

Within the adapted domain, the relevant features of the original information are represented by numbers. Therefore, the definition of an adapted domain can be regarded as the definition of the number sequence, DIGINFO, which represents certain relevant features within the chosen topic. Adapted domains can be defined online (as described in

Section 3). It is important that online definitions are globally available. To avoid redundancy, appropriate online definitions for this topic should be first searched and used before a new definition is defined. If relevant features are still undefined, their new online definition is appropriate.

Figure 3 shows a flowchart of the online definition of an adapted domain.

Consider this process applied to the weather commentary example of

Section 4.1, where we assume that no appropriate online definition of the topic “weather” is available. In this case, the generation of a new definition is appropriate. According to

Figure 3 and

Section 4.3 (a), independent relevant features within the topic “weather” are searched. There are many such features, such as atmospheric temperature, barometric pressure, relative humidity, and so on. In this example, only the feature “atmospheric temperature” is necessary. If an appropriate online definition is available, it is used; otherwise, such a definition is created. For this, the feature is quantified. In this example, the original information (ORGINFO) “atmospheric temperature” already has the internationally given ordered property T °C. Therefore, simply the letter T (which represents multiples of °C) is taken as the digital information (DIGINFO). According to

Section 4.3 (a), all interesting variants of this feature are ordered to obtain a one-to-one correspondence (bijection) with the number T.

This process is illustrated in

Figure 4. The original information “The temperature is 16 °C” is represented by the single number “16”. Despite this shortness, there is a clear one-to-one correspondence between every possible variant of ORGINFO to its digital representation, DIGINFO. In contrast,

Figure 1 and

Figure 2 show how ambiguity and imprecision occurs, in the case of free language, due to the use of the domain “language vocabulary”.

As shown above for the feature “atmospheric temperature”, definitions of further features such as “barometric pressure”, “relative humidity”, and so on can be appended to the online definition of “weather”. This increases its dimensionality and the maximal length of the number sequence DIGINFO. If the value of a certain number is not available, it can be represented, for example, by a short placeholder in DIGINFO.

#### 4.4. Comparability of Information

Let DV1, DV2, and DV3 represent variants of digital information which are elements of the same domain D (e.g., domain vectors, as defined in

Section 3 with the same UL). This is the first precondition for comparability. A further precondition is a non-negative distance function (i.e., metric)

F(DV1, DV2) ≥ 0,

F(DV1, DV2) = 0 if and only if DV1 = DV2,

F(DV1, DV2) + F(DV2, DV3) ≥ F(DV1, DV3), and

F(DV1, DV2) = F(DV2, DV1).

A domain D with such a metric F is called a “metric space“ in the literature [

10]. A metric space with domain vectors (2) as elements is called a “Domain Space” [

2,

3,

4].

The definability of the metric F provides clear preconditions (3). for the comparability of information. The digital representation of information (DIGINFO) is always represented by a finite count of numbers (N), which can be seen as a vector in an N-dimensional vector space. There are many possibilities to define the metric F on such a vector space; the Manhattan and Euclidean metrics are well-known examples [

10]. Therefore, the digital representation, DIGINFO, is always comparable. The decisive question is: is the original information (ORGINFO) comparable?

For example, there are severe difficulties in the case of the domain “language vocabulary”. According to

Figure 1, the phrases “It is cold” and “I’m freezing” (as DIGINFO) can both represent the same original information (ORGINFO); however, these phrases can obviously also represent different original information. In the first case, F (“It is cold”, “I’m freezing”) is zero, but in the second case, F (“It is cold”, “I’m freezing”) is non-zero. Thus, if the domain “language vocabulary” is used, it is impossible to appropriately define F for the reliable comparison of original information (ORGINFO).

However, if an adapted domain is used, there is a bijection between the original information (ORGINFO) and its digital representation, DIGINFO (according to

Section 4.3). This completely changes the situation. The definition of F on DIGINFO is directly applicable to ORGINFO (i.e., in the case of an adapted domain, the original information (ORGINFO) is comparable). For its automatic comparison, F can be used on the digital representation, DIGINFO. This is also important for similarity searches.

It is also plausible to consider the comparability of medical information before the application of Artificial Intelligence (AI) algorithms [

11]; otherwise, the AI algorithm may “learn” from the wrong (i.e., non-bijective representation and, therefore, non-natural) domain of information, with unpredictable side effects.