An Introduction to Generalized Representational Information Theory (GRIT)
In this extensive technical appendix we give a simplified introduction to RIT (based largely on material from [
9,
21]) and its generalization, GRIT. RIT and GRIT are based on categorical invariance theory. For the formal details of categorical invariance theory and its applications to human concept learning research see [
14,
15,
16]. The first part of this technical appendix introduces RIT; the second part introduces an extension of RIT to continuous domains using matrices. We begin by defining some terms. By a well-defined category (aka, categorical stimulus) we shall mean a set of dimensionally definable objects in the environment, or a set of memory traces (
i.e., exemplars) of such objects, that, by virtue of being defined by the same dimensions, are related in some way. Concepts, on the other hand, we shall define roughly as mental representations of these categorical stimuli. Accordingly, categorical stimuli are the raw material from which concepts are formed. Dimensionally definable stimulus objects are objects that can be characterized in terms of a fixed number of shared attributes or properties (
i.e., dimensions), each ranging over a continuum or over discrete values. For example, the properties of brightness, shape, and size, as well as the more subjective attributes of satisfaction and personal worth, are all possible dimensions of the objects of some stimulus set. In addition, we shall assume that all of the dimensions associated with a specific stimulus set range over a specific and fixed number of values that combined specify a gradient (standardized in the
interval) for the particular dimensions. For example, the brightness dimension may have five fixed values representing five levels of brightness in a continuum standardized from 0 to 1 (from least bright to most bright).
Six examples of categorical stimuli consisting of objects defined over the discrete binary dimensions of color, shape, and size were shown in
Figure 1. Note that each of the six categorical stimuli has a certain structure, which is to say that each displays a specific relationship between its dimensional values. These structures, due to their specific binary dimensional nature, are represented by Boolean algebraic or, simply stated, logical rules (
i.e., expressions consisting of disjunctions, conjunctions, and negations of variables that stand for binary dimensions). These algebraic representations of a stimulus set are referred to as
concept functions. Concept functions are useful in spelling out the logical structure of a stimulus set. For example, suppose that
stands for blue,
stands for red,
stands for round, and
stands for square, then the two-variable concept function
(where “
” denotes “or”, “
” denotes “and”, and “
” denotes “not
- ”) defines the category which contains two objects: a red and round object and a blue and square object. Clearly, the choice of labels in the expression is arbitrary. Hence, there are many Boolean expressions that define the same category structure [
33,
34]. In this paper, concept functions will be represented by capital letters of the English alphabet (e.g.,
F,
G,
H), while the sets that such functions define in extension will be denoted by a set bracket symbol of their corresponding function symbols. For example, if
F is a Boolean function in disjunctive normal form (DNF),
is the category that it defines it. A DNF is a Boolean formula consisting of sums of products that are a verbatim object per object description of the category of objects (just as the example given above).
Before defining the representational information measure, we shall first define the notion of a representation (or “representative”) of a well-defined category. A representation of a well-defined category S is any subset of S. The power set
is the set of all such representations. Since there are
elements in
, then there are
possible representations of S (
stands for cardinality or size of the set S). Some representations capture the structural (
i.e., relational) “essence” or nature of S better than others. In other words, some representations carry more representational information (
i.e., more conceptual significance) about S than others. For example, consider a well-defined category with three objects defined over three dimensions (color, shape, and size) consisting of a small black circle, a small black square, and a large white circle. The small black circle better captures the character of the category as a whole than does the large white circle. In addition, it would seem that, (1) for any well-defined category S, all the information in S is conveyed by S itself; and that (2) the empty set
carries no information about S. The aim of our measure is to measure the amount and quality of conceptual information carried by representations or representatives of the category S about S while obeying these two basic requirements and capturing the conceptual significance of S.
Categorical Invariance Theory
Categorical invariance theory is a theory of human concept learning that has been successful at accurately predicting the degree of concept learning difficulty of categories of objects (see [
20,
21]). The theory is based on the general idea that the way that humans learn concepts is by detecting invariance patterns in sets of objects in the environment. More specifically, the theory posits that humans detect inherent relational symmetries or invariants in sets of objects that facilitate concept formation. This type of pattern detection involves the systematic perturbing or transforming each object in a set with respect to each of its defining dimensions. To illustrate this idea, consider the category containing a square that is black and small, and a circle that is black and small, and a circle that is white and large which is described by the concept function
. Let's encode the features of the objects in this category using the digits “1” and “0” so that each object may be representable by a binary string. For example, “111” stands for the first object when
x = 1 =
square,
y = 1 =
small, and
z = 1 =
black. Thus, the entire set can be encoded by {111, 011, 000}. If we transform this set in terms of the shape dimension by assigning the opposite shape value to each of the objects in the set, we get the perturbed set {011, 111, 100}. Now, if we compare the original set to the perturbed set, they have two objects in common with respect to the dimension of shape. Thus, two out of three objects remain the same. This proportion, referred to as the “dimensional kernel”, is a measure of the partial invariance of the category with respect to the dimension of shape. The first pane of
Figure 2 illustrates this transformation. Doing this for each of the dimensions, we can form an ordered set, or vector, consisting of all the dimensional kernels (one per dimension) of the concept function or category type (see
Figure 2 and the example after Equation (7)).
Figure 2.
Logical manifold transformations along the dimensions of shape, color, and size for a set of objects defined over three dimensions. The fourth pane underneath the three top panes contains the pairwise symmetries revealed by the shape transformation.
Formally, these partial invariants can be represented in terms of a vector of discrete partial derivatives of the concept function that defines the Boolean category. This is shown in Equation (5) below where
stands for the logical manifold of the concept function
F and where a “hat” symbol over the partial differentiation symbol indicates discrete differentiation (for a detailed and rigorous explanation, see [
20,
21]). Discrete partial derivatives are somewhat analogous to continuous partial derivatives in Calculus. Loosely speaking, in Calculus, the partial derivative of an
n variable function
is defined as how much the function value changes relative to how much the input value(s) change as seen below:
On the other hand, the discrete partial derivative, defined by the Equation below (where
with
) is analogous to the continuous partial derivative except that there is no limit taken because the values of
can be only 0 or 1.
The value of the derivative is ±1 if the function assignment changes when
changes, and the value of the derivative is 0 if the function assignment does not change when
changes. Note that the value of the derivative depends on the entire vector
(abbreviated as
in this note) and not just on
. As an example, consider the concept function AND, denoted as
(Equivalently, we could also write this function as
. Because this is more readable than the vector notation, we shall continue using it in other examples.). Also, consider the particular point
. At that point, the derivative of the concept function AND with respect to
is 0 because the value of the concept function does not change when the stimulus changes from
to
. If instead we consider the point
, the derivative of AND with respect to
is 1 because the value of the concept function does change when the stimulus changes from
to
.
Accordingly, the discrete partial derivatives in Equation (5) below give the number of items that have been changed in the category in respect to a change in each of its dimensions. The double lines around the discrete partial derivatives give the proportion of objects that have not changed in the category and are defined in Equation (6) below (where
is the number of objects in the category defined by the concept function
F).
In the above definition (Equation (6)),
stands for an object defined by
D dimensional values
). The general summation symbol represents the sum of the partial derivatives evaluated at each object
from the Boolean category
(this is the category defined by the concept function
F). The partial derivative transforms each object
in respect to its
i-th dimension and evaluates to 0 if, after the transformation, the object is still in
(it evaluates to 1 otherwise). Thus, to compute the proportion of objects that remain in
after changing the value of their
-th dimension, we need to divide the sum of the partial derivatives evaluated at each object
by
(the number of objects in
and subtract the result from 1. The absolute value symbol is placed around the partial derivative to avoid a value of negative 1 (for a detailed explanation, see [
9,
20,
21]).
Relative degrees of total invariance across category types from different families can then be measured by taking the Euclidean distance of each structural or logical manifold (Equation (7)) from the zero logical manifold whose components are all zeros (
i.e., (0,…,0)). The zero-manifold is the ideal reference point for measuring degrees of global invariance because it represents a true zero point in the subjective homogeneity scale hypothesized by the cognitive theory developed by Vigo in [
20,
21]. In other words, in the theory, structures with zero degree of invariance are perceived by humans as having no coherent pattern (
i.e., no structural homogeneity). Thus, the overall degree of invariance
of the concept function
F (and of any category it defines) is given by the Equation below:
Using our example from pane one in
Figure 2, we showed that the original category and the perturbed category have two elements in common (out of the three transformed elements) in respect to the shape dimension; thus, its degree of partial invariance is expressed by the ratio
. Conducting a similar analysis in respect to the dimensions of color and size, its logical manifold computes to
and its degree of categorical invariance is:
Note that the concept function
used in our example at the beginning of
Section 3 has been rewritten in an entirely equivalently form as
in order to be consistent with the vector notation introduced above. Henceforth, we shall use both ways of specifying concept functions and it will be left to reader to make the appropriate translation. We do this since the non-vector notation is more intuitive and less confusing to comprehend structurally.
Invariance properties facilitate concept learning and identification. More specifically, the proposed mathematical framework reveals the pairwise symmetries that are inherent to a category structure when transformed by a change to one of its defining dimensions. One such pairwise symmetry is illustrated in the bottom pane of
Figure 2. The more of these symmetries, the less the dimension is useful in determining category membership. In others words, the dimensions associated with high invariance do not help us discriminate the perturbed objects from the original objects in terms of category membership. Consequently, these particular dimensions do not carry “diagnostic” information about their associated category; however, they signal the presence of redundant information.
Representational Information
With the preliminary apparatus introduced, we are now in a position to introduce a measure of representational information that meets the goals set forth in the introduction to this paper. In general, a set of objects is informative about a category whenever the removal of its elements from the category increases or decreases the structural complexity of the category as a whole. That is, the amount of representational information (RI) conveyed by a representation R of a well-defined category
is the rate of change of the structural complexity of
. Simply stated, the representational information carried by an object or objects from a well-defined category
is the percentage increase or decrease (
i.e., rate of change or growth rate) in structural complexity that the category experiences whenever the object or objects are removed [
36].
More specifically, let
be a well-defined category defined by the concept function
and let the well-defined category R be a representation of
(
i.e.,
or
). Then, if
, the amount of representational information
of R in respect to
is determined by Equation (12) below where
and
stand for the number of elements in
and in
respectively.
Note that definitions 12 and 13 above yield negative and positive percentages. Negative percentages represent a drop in complexity. Thus, RI has two components: a magnitude and a direction (just as the value of the slope of a line indicates both magnitude and direction). For humans, the direction of RI is critical: for example, a relatively large negative value obtained from 12 and 13 above indicates that high RI is conveyed by the subset of
but it characterizes the objects in the subset as highly unique or unrepresentative of those in
; while a relatively large positive value indicates that high RI is conveyed the subset of
but it characterizes the objects in the subset as highly representative of those in
. In the following examples, it will be shown that, intuitively, the RI values make perfect sense for representations of the same size (
i.e., with the same number of objects).
Using Equation (13) above, we can compute the amount of subjective representational information associated with each representation of any category instance defined by any concept function. Take the category defined by the concept function
where
= square,
= black, and
= small used above as an example (
Table 2 displays the category). To be consistent with the vector notation introduced, this concept function can also be written as:
, and as before, we leave it up to the reader to make the necessary translation. As before, the objects of this category may be encoded in terms of zeros and ones, and the category may be encoded by the set {111, 011, 000} to facilitate reference to the actual objects. The amount of subjective representational information conveyed by the singleton (single element) set containing the object encoded by 111 (and defined by the rule
) in respect to the category encoded by
(and defined by the concept function
) is computed as shown in 13 and 14 below:
Next, we compute the values of
and
and get:
Similarly, if we compute the results for the remaining two singleton (single element) representations of the set {111, 011, 000}, we get the values shown in the table of
Table 2 below. These illustrate that the representation {000} is relatively less informative with respect to its category of origin {111, 011, 000} because the absence of 000 results in a 52% reduction in the structural complexity of
(
i.e., −0.52). Likewise, the other two singleton representations ({111} and {011}, respectively) are more informative because the absence of 111 and 000 respectively from
results in a 30% increase in the structural complexity of
. The reader is directed to
Figure 3 below, showing a visuo-perceptual instance of the category structure, in order to confirm these results intuitively.
Figure 3.
Category instance of
concept function.
Table 3 shows the information conveyed by the single element representations of the six categories consisting with four objects defined over three dimensions (see
Figure 1). Information vectors containing the amount of information conveyed by each single object representation are given in the information column. Note that each of the single element representations of category structures 3[4]-1, 3[4]-2, and 3[4]-6 respectively convey the same amount of information.
Table 3.
Amount of information conveyed by all the possible single element representations of six different category types or concept functions.
Category | Objects | Information |
---|
3[4]-1 | {000, 001, 100, 101} | [0.20, 0.20, 0.20, 0.20] |
3[4]-2 | {000, 010, 101, 111} | [0.05, 0.05, 0.05, 0.05] |
3[4]-3 | {101, 010, 011, 001} | [−0.31, −0.31, −0.08, −0.08] |
3[4]-4 | {000, 110, 011, 010} | [−0.31, −0.31, −0.31, 0.78] |
3[4]-5 | {011, 000, 101, 100} | [−0.41, −0.22, −0.22, 0.52] |
3[4]-6 | {001, 010, 100, 111} | [−0.25, −0.25, −0.25, −0.25] |
From Binary to Continuous Domains: The Similarity-Invariance Principle
In the above discussion, RIT has been portrayed as a theory that applies only to sets of objects or categories that are defined over binary dimensions. In order to transition to continuous dimensions with values standardized in the
real number interval, all that is needed is a generalization of the structural or logical manifold (capital lambda) of a binary category so that it also applies to any continuous category. Every other aspect of the theory described above remains the same. To generalize the logical manifold operator we introduce the following equivalence between the pairwise symmetries in sets of objects (which identify pairs of invariants) and the partial similarity between the same two objects with respect to a particular dimension.
Figure 4 illustrates this intuitive equivalence with respect to the shape dimension. It simply means that the relational symmetries on which categorical invariance theory is based are equivalent to pairs of objects being identical when disregarding one of their dimensions. We shall call the disregarded dimension the
anchored dimension.
Figure 4.
Equivalence of Invariance to partial similarity across two dimensions.
In the discussion below we shall employ the following notation:
(1) Let
be a stimulus set and
stand for the cardinality (
i.e., the number of elements) of X.
(2) Let the object-stimuli in
be represented by the vectors
(where
).
(3) Let the vector
be the
j-th
D-dimensional object-stimulus in
(where
D is the number of dimensions of the stimulus set).
(4) Let
be the value of the
i-th dimension of the
j-th object-stimulus in X. We shall assume throughout our discussion that all dimensional values are real numbers greater than or equal to zero.
(5) Let
stand for the similarity of object-stimulus
to object-stimulus
as determined by the assumption made in multidimensional scaling theory that stimulus similarity is some monotonically decreasing function of the psychological distance between the stimuli.
We begin by describing formally the processes of dimensional binding and partial similarity assessment. To do so, we will introduce a new kind of distance operator. But first, let’s define the generalized Euclidean distance operator
(aka
Minkowski distance) between two object-stimuli
with attention weights
as:
As in the Generalized Context Model (GCM) [
37], the inclusion of a parameter
represents the selective attention allocated to dimension
such that
. We use this parameter family to represent individual differences in the process of assessing similarities between object-stimuli at this level of analysis. For the sake of simplifying our explanation and examples below, we shall disregard this parameter. Next we introduce a new kind of distance operator termed the
partial psychological distance operator to model dimensional anchoring and partial similarity assessment.
Equation (16) computes the psychological distance between two stimuli ignoring their
d-th dimension (
. In other words, it computes the partial psychological distance between the exemplars corresponding to the object-stimuli
, by excluding dimension
d in the computation of the Minkowski generalized metric. For example, if the stimulus set X consists of four object-stimuli, we represent the partial pairwise distances between the four corresponding exemplars with respect to dimension
d with the following partial distances matrix:
And more generally, for any stimulus set containing p stimulus objects as:
Similarly, we can define the partial similarity between the two exemplars corresponding to the two object-stimuli—as is done in the GCM [
37] and in multidimensional scaling [
38,
39]—as a monotonically decreasing function
F of the partial distance between the two exemplars corresponding to the two object-stimuli.
In Equation (19) above, we have standardized the value
in the
interval using the following linear transformation
where the
and
of a matrix are respectively its largest and smallest element, and the
for any
d and
r.
This standardization will prove useful when we introduce the discrimination threshold parameter later in this section. As in [
40], we define subjective similarity as the negative exponent of the partial distance measure
and set
r = 1 (
i.e., we use the city block metric in our example) as shown in Equation (21) below.
In spite of using the above metric, we acknowledge the possibility that a different kind of function may be playing a similar role in the computation of partial similarities. Next we can construct the matrix of the pairwise partial psychological similarities between all four exemplars corresponding to the four object-stimuli in X as seen in Equation (22) below:
Again, as a process assumption, we have excluded reflexive or self-similarities in the diagonal of the partial distances matrix shown in Equation (22) above. However, we include symmetric comparisons since we assume that they are processed by humans when assessing the overall homogeneity of a stimulus; besides, they add to the homogeneity of the stimulus as characterized by the categorical invariance principle and the categorical invariance measure, and we wish to be consistent with both of these constructs.
Adding the values of the similarity matrix that correspond to differences within a chosen discrimination threshold
for each dimension
we can get the following expression which is functionally analogous to the local homogeneity operator given in Equation (6) (for any pair of objects
where
,
, and
):
The Equation above defines the perceived degree of local homogeneity
of a
D-dimensional stimulus set X with respect to dimension
d.
is the ratio between the sum of the similarities corresponding to distances that are zero or close to zero (depending on the value of the discrimination resolution threshold) in the matrix
(for a particular anchored dimension
d) and the number of items in the stimulus set X. In other words,
is the ratio between: (1) The sum of the similarities in the matrix
(for a particular anchored dimension
d) that correspond to distances in the
discrimination resolution interval; and (2) the number of items in the dataset X. When the partial distances are close to zero, the points are for all intent and purpose treated as perfectly similar or identical.
For example, take a stimulus set consisting of four binary dimensions and four objects as seen in
Table 4 below and represented by
. Equation (24) below shows the matrix used to calculate the degree of partial homogeneity (with respect to dimension 1) of A when we let
and
r = 1.
Table 4.
Matrix representing a stimulus set structure with four object-stimuli O1–O4 of four dimensions D1–D4.
| D1 | D2 | D3 | D4 |
---|
O1 | 1 | 1 | 1 | 0 |
O2 | 1 | 1 | 0 | 1 |
O3 | 1 | 1 | 0 | 0 |
O4 | 1 | 1 | 1 | 1 |
Note that the computed matrix in Equation (24) contains 4 ones that represent four identical pairs of exemplars corresponding to four pairs of object-stimuli. Applying Equation (23) above, we get Equation (25).
Lastly, we define the generalized structural manifold by Equation (26). This construct is analogous to the global homogeneity construct defined under the binary theory, except that it applies to both binary and continuous dimensions and is equipped with a distance discrimination threshold. It measures the perceived degree of global homogeneity of any stimulus set.
We can also specify the particular degree of partial homogeneity of the structural manifold as seen in the Equation below.
We hypothesize that for every dimension
the discrimination resolution threshold
will be a relatively small number dependent on the discriminatory capacities of the observer. Also, the above Equation assumes that, for any
d and any
r,
are the only partial deltas that partake in determining the partial similarity matrices. Finally, since we standardized the partial distance metric in Equation (21), then we can also say that
. To simplify our discussion, in the remaining computations in this paper we shall let
for all subjects and any dimension
; however, this value may also be treated as a free parameter that accounts for individual differences in classification performance. The assumption is that humans vary in their capacity to discriminate between stimuli and in their criterion for discriminating (in this paper we shall not investigate this latter option: That is, we shall not try to derive estimates for
). In either case, we assume that the primary goal of the human conceptual system is to optimize classification performance via the detection of invariants.
Table 5 below illustrates the distance and similarity matrices that represent the computation of the structural manifold of a stimulus set. The perceived degree of partial or local homogeneity is shown in the final column.
Table 5.
The distance and similarity matrices associated with the computation of the local homogeneities of the stimulus set A: There are 4 structural kernels and these are listed in the last column under the perceived local homogeneity measure. Combined they form the manifold of the stimulus set A.
Dimension | Standardized | | Standardized Similarity | Perceived |
---|
Distance Matrix | | Matrix | Local Homogeneity |
---|
1 | | 1110 | 1101 | 1100 | 1111 | | | 1110 | 1101 | 1100 | 1111 | 0/4=0 |
1110 | 0 | 1 | 0.5 | 0.5 | | 1110 | 1 | 0.37 | 0.61 | 0.61 |
1101 | 1 | 0 | 0.5 | 0.5 | | 1101 | 0.37 | 1 | 0.61 | 0.61 |
1100 | 0.5 | 0.5 | 0 | 1 | | 1100 | 0.61 | 0.61 | 1 | 0.37 |
1111 | 0.5 | 0.5 | 1 | 0 | | 1111 | 0.61 | 0.61 | 0.37 | 1 |
2 | | 1110 | 1101 | 1100 | 1111 | | | 1110 | 1101 | 1100 | 1111 | 0/4=0 |
1110 | 0 | 1 | 0.5 | 0.5 | | 1110 | 1 | 0.37 | 0.61 | 0.61 |
1101 | 1 | 0 | 0.5 | 0.5 | | 1101 | 0.37 | 1 | 0.61 | 0.61 |
1100 | 0.5 | 0.5 | 0 | 1 | | 1100 | 0.61 | 0.61 | 1 | 0.37 |
1111 | 0.5 | 0.5 | 1 | 0 | | 1111 | 0.61 | 0.61 | 0.37 | 1 |
3 | | 1110 | 1101 | 1100 | 1111 | | | 1110 | 1101 | 1100 | 1111 | 4/4=1 |
1110 | 0 | 1 | 0 | 1 | | 1110 | 1 | 0.37 | 1 | 0.37 |
1101 | 1 | 0 | 1 | 0 | | 1101 | 0.37 | 1 | 0.37 | 1 |
1100 | 0 | 1 | 0 | 1 | | 1100 | 1 | 0.37 | 1 | 0.37 |
1111 | 1 | 0 | 1 | 0 | | 1111 | 0.37 | 1 | 0.37 | 1 |
4 | | 1110 | 1101 | 1100 | 1111 | | | 1110 | 1101 | 1100 | 1111 | 4/4=1 |
1110 | 0 | 1 | 1 | 0 | | 1110 | 1 | 0.37 | 1 | 0.37 |
1101 | 1 | 0 | 0 | 1 | | 1101 | 0.37 | 1 | 0.37 | 1 |
1100 | 1 | 0 | 0 | 1 | | 1100 | 1 | 0.37 | 1 | 0.37 |
1111 | 0 | 1 | 1 | 0 | | 1111 | 0.37 | 1 | 0.37 | 1 |
Combined as a vector, these four values represent all the structural information of a concept, or in other words, the ideotype of the stimulus set. The overall degree of perceived global homogeneity or invariance of a stimulus set X defined over
dimensions and for any pair of objects
(such that
,
, and
) is given by the Euclidean metric as follows:
Note the arc above the capital phi variable: It stands for the invariance measure when is able to handle objects defined over dichotomous and continuous dimensions. Equation (28) is all that is needed to generalize RIT to continuous domains (and, hence, to go convert RIT into GRIT). Thus, the final general measure is then given by the Equation below when we let
be a well-defined category and let the well-defined category R be a representation of
(
i.e.,
or
). Then, if
, the amount of representational information
of R in respect to
is determined by Equation (29) below where
and
stand for the number of elements in
and in
respectively and
is the generalized perceived degree of structural complexity of a well-defined category (with
normally set to 0).