# Categories with Complements

## Abstract

**:**

## 1. Featural Specifications as Categorial Dimensions

^{st}-Merge type: this is inasmuch as the process is presumed to take an item from the lexicon (the head) and asymmetrically associate it to a projection, its complement. Evidently, that presupposes understanding what is meant by “head” and “projection”, or the presumed “endocentricity” of phrases. Bear in mind that a Chomsky grammar is a 4-tuple [Σ, T, N, P], specifying rules over arbitrary vocabularies of “non-terminals” N and “terminals” T, thus allowing a set of productions P starting on initial axiom Σ. Rules of the sort X → ψ, where X ∈ N and ψ is some arbitrary string of terminals in T*,2 do not guarantee the endocentricity of phrases, since (by that formalism) one could rewrite a VP, say, as a noun followed by a prepositional phrase only (or any other string of terminals, but no verb). If these constructions do not exist in natural language, the formalism is inadequate [1].

- As far as the categorial component is concerned, it seems to me plausible to suggest that it is a kind of projection from basic lexical features through a certain system of schemata as roughly indicated in (1) and (2):

- (1)
- [±N, ±V]: [+N, −V] = N[oun]; [+N, +V] = A[djective]; [−N, +V] = V[erb],[−N, −V] = everything else;
- (2)
- X
^{n}→ … X^{n−1}…, where x^{i}= [α = $\pm $N, β = $\pm $V]^{i}and X^{1}= X

- Let us assume that there are two basic lexical features N and V ($\pm $N, $\pm $V). Where the language has rules that refer to the categories nouns and adjectives…they will be framed in terms of the feature +N. and where there are rules that apply to the category nouns and adjectives, they will be framed in terms of the feature +V. [2] (Lecture 3, p. 2)

## 2. Lindenmayer Systems in Matrix Representation

- (3)
- a. The merge of α and β results in {α, β}b. {α, β} → α, β

- (4)

- (5)

- (6)

^{x}, where constant K is the term’s coefficient and a root of the polynomial turns it to an equation; the root of polynomial P(z) is the number z

_{j}such that P(z)

_{j}= 0. P(z) is of degree n if it has n roots, its degrees of freedom. Thus, the characteristic polynomial of a matrix can be thought of as its numerical description. The polynomial roots (solutions to the polynomial when construed as equating zero) constitute fundamental elements in the matrix diagonal, its eigenvalues. From those, one can compute the matrix eigenvectors, or characteristic vectors of the linear transformation the matrix represents, which changes by a scalar factor when the linear transformation is applied—the factor in point being the eigenvalue. Aside from having the eigenvalues as roots, among the polynomial coefficients are the matrix trace (sum of the elements in the diagonal) and determinant (here, a product of the elements in the diagonal minus the product of those in the off-diagonal, with further caveats for higher-dimensional matrices). It is interesting to note for our purposes that, with Medeiros’s method in place, we can compute how an L-system grows as it expands. The degree n of the characteristic polynomial tells us the dimension n x n of its corresponding square matrix and, therefore, the number of rewrite or Merge type applications that correspond to it in an L-system. In turn, one matrix eigenvalue, often called its spectral radius, can be defined as in (7a), a quantity related to the topological derivational entropy h

_{T}of a matrix A, (7b):

- (7)
- Derivational entropya. For spectral radius ρ(A), square matrix A’s largest absolute value λ
_{max}of A’s eigenvalues,b. A’s topological derivational entropy h_{T}= log_{2}λ_{max}

- (8)
- $\left[\begin{array}{ccc}1& 1& 0\\ 1& 0& 1\\ 0& 0& 0\end{array}\right]$
^{1}= $\left[\begin{array}{ccc}1& 1& 0\\ 1& 0& 1\\ 0& 0& 0\end{array}\right]$; $\left[\begin{array}{ccc}1& 1& 0\\ 1& 0& 1\\ 0& 0& 0\end{array}\right]$^{2}= $\left[\begin{array}{ccc}2& 1& 1\\ 1& 1& 0\\ 0& 0& 0\end{array}\right]$; $\left[\begin{array}{ccc}1& 1& 0\\ 1& 0& 1\\ 0& 0& 0\end{array}\right]$^{3}= $\left[\begin{array}{ccc}3& 2& 1\\ 2& 1& 1\\ 0& 0& 0\end{array}\right]$; …

^{n}and A

^{n+1}, the ratio between the spectral radiuses ρ(A

^{n+1})/ρ(A

^{n}) is the initial spectral radius ρ(A

^{1}) for the seed matrix.5

## 3. The Fundamental Assumption and its Fundamental Corollary

- (9)
- Fundamental AssumptionThe V dimension is a (mathematical) transformation over an orthogonal N dimension.

- (10)
- Fundamental CorollaryThe N dimension has unit value 1; the V dimension, unit value i; [±N, ±V] = [±1, ±i].

Now: in what sense are lexical projections, like noun phrases, and grammatical projections, like sentences, “about the same”? And what are “higher order endocentric categories”? One can always introduce more dimensions, for instance calling A/P dimensions ±A and ±P, playing the substantive game already alluded to for color and cones, VOT, and so on. But a direct alternative is to argue that further formal conditions emerge by algebraically operating with the fundamental ones in (10)/(11) only—that is, by taking our formalism seriously.There will also be subsidiary features, that are necessary to distinguish auxiliaries from main verbs and adverbials from adjectives for example. So there will be a hierarchy of categories, super-lexical categories, super-super-lexical categories…In other words it ought to be in essence the case that the structure of NP’s, S’s, and AP’s should be about the same… and this will be true of higher order endocentric categories.

- (11)
- a. $\left[\begin{array}{cc}\pm 1& 0\\ 0& \pm i\end{array}\right]$ b. $\left[\begin{array}{cc}\pm i& 0\\ 0& \pm 1\end{array}\right]$

- (12)
- (13)

- (14)
- $\left[\begin{array}{cc}\pm 1,\text{}\pm i& 0\\ 0& \pm 1,\text{}\pm i\end{array}\right]$

- (15)
- a. $\left|\begin{array}{c}\pm 1\\ 0\end{array}\right|$, b. $\left|\begin{array}{c}0\\ \pm 1\end{array}\right|$

- (16)
- $\left|\begin{array}{c}\pm i\\ 0\end{array}\right|$, b. $\left|\begin{array}{c}0\\ \pm i\end{array}\right|$

## 4. The Jarret Graph

_{Z}, first systematically studied by Wolfgang Pauli while analyzing electron spin, around 1924 (see https://en.wikipedia.org/wiki/Pauli_matrices (accessed on 28 August 2022)). This matrix is remarkably elegant, as shown through relevant formal characteristics:

- (17)

_{Z}and -σ

_{Z}. This is unlike what we see for the identity (17a) and its negative counterpart (17b), each with its characteristic polynomial. Consequently, the eigenvalues (roots of the polynomial) are the same for σ

_{Z}and -σ

_{Z}(again, unlike for the identity and its negative). There is no other pair of matrices in the Chomsky group for which this is true; so suppose we build on this elegant formal fact.

_{Z}after squaring them. This is arguably significant too, under certain assumptions about Merge.

^{st}) Merge, EM, is inherently asymmetrical for a head and the projection of its complement. That works directly for the operation after it has started. But it is computationally impossible to merge a head to a projection if we do not have, yet, any projection formed in a given derivation—in the pristine moment in which just two heads are selected for assembly. A natural way to address this is the idea in Guimarães [26] of allowing the self-merger of heads, at the base of a derivation. Then we can characterize EM as an operation resulting in an anti-symmetrical relation (i.e.: asymmetrical in all instances but the relation of an element to itself). But once again note that the results in Table 3 obtain for all Chomsky matrices:

$\left[\begin{array}{cc}1& 0\\ 0& -i\end{array}\right]$^{2} = $\left[\begin{array}{cc}1& 0\\ 0& -1\end{array}\right]$ | $\left[\begin{array}{cc}-1& 0\\ 0& i\end{array}\right]$^{2} = $\left[\begin{array}{cc}1& 0\\ 0& -1\end{array}\right]$ | $\left[\begin{array}{cc}1& 0\\ 0& i\end{array}\right]$^{2} = $\left[\begin{array}{cc}1& 0\\ 0& -1\end{array}\right]$ | $\left[\begin{array}{cc}-1& 0\\ 0& -i\end{array}\right]$^{2} = $\left[\begin{array}{cc}1& 0\\ 0& -1\end{array}\right]$ |

_{Z}, obviously collapsing the formal results into identical representations. While algebraically this is entirely fine, for a symbolic system it is the equivalent of Paul Revere’s famous code having been the senseless “One if by land, one if by sea!”8 The whole point of a simple code is to have different signals representing different events, which is precisely what the self-multiplication in Table 3 denies, since matrix multiplication is not a structure-preserving operation. Is there a solution to this information impasse?

- (18)
- Anchoring AxiomOnly nouns self-merge.

**boldfaced**in (19)) plays a dual role: as an operator on itself, with an argument of identical formal characteristics.

- (19)

- (20)

- (21)
- “Selection” conditions for operators in the Jarret Graph:
- a.
- Nouns may either self-merge or take PPs to NPs
- b.
- Verbs take NPs to VPs
- c.
- Prepositions take NPs to PPs
- d.
- Adjectives take PPs to APs

## 5. Non-Algebraic Assumptions in the Jarret Graph

- (22)

- (23)

- (24)

- (25)
- Semiotic assumptions for algebraic syntactic matrices:
- (i)
- Univocal.
- (ii)
- Complete.
- (iii)
- Recursive.
- (iv)
- Fully connected.

- (26)
- Exemplars covered by the Jarret Graph:
- a.
- [
_{NP}pictures [_{PP}of [_{NP}war]]] - b.
- [
_{VP}hate [_{NP}war]]] - c.
- [
_{PP}of [_{NP}war]]] - d.
- [
_{NP}proud [_{PP}of [_{NP}science]]]

- (27)
- a. [
_{VP}hear [_{NP}stories [_{PP}about [_{NP}pictures [_{PP}of [_{NP}students [_{PP}of …]]]]]]]b. [_{AP}proud [_{PP}of [_{NP}stories [_{PP}about [_{NP}pictures [_{PP}of [_{NP}students [_{PP}of …]]]]]]]]]

- (28)
- a. red (in/of color) b. tall (in/of stature) c. large (in/of size) d. pitiful (in/?of nature)

_{Z}and ±I. Matrices of that kind—with real values in the diagonal and also real or conjugate symmetrical values elsewhere, here zero—are called Hermitian and present particularly elegant properties, vis-à-vis non-Hermitian counterparts. In comparable situations in particle physics, these same operators, equipped with positive inner products (the central metric in a vector space), have real eigenvalues. This is key to representing physical quantities, since measurements correspond to real quantities. Interestingly, this separates what “exists” (is mathematically necessary for a system to work) from what is “observable” (definitely identified for measurement). If we suppose the way for the abstract syntax in that mathematical existence to interface with concrete external systems, or definite internal thought, is through equal certainty, then only Hermitian points in the Jarret Graph yield “interpretations”:

- (29)
- Interpretive Axiom

## 6. Beyond the Jarret Graph

_{Z}is invoked, it is natural to ask whether the other Pauli matrices (σ

_{X}= $\left[\begin{array}{cc}0& 1\\ 1& 0\end{array}\right]$ and σ

_{Y}= $\left[\begin{array}{cc}0& -i\\ i& 0\end{array}\right]$) should also play a role in the system. Multiplication by either one of those yields an interesting group of 32 matrices, 16 positive and their negative counterparts, that Orús et al. [25] dubbed the Chomsky-Pauli group G

_{CP}. Readers will be easily able to spot here the Abelian group arrayed into the Jarret Graph, as well as the entire Chomsky group (all of them using diagonal matrices only; Z and I, the identity, are two of Pauli’s matrices and the mnemonic C is a shorthand for the Chomsky objects). In addition to that, the G

_{CP}contains also non-diagonal matrices; inasmuch as these are symmetrical with the Chomsky objects, they are notated in (30) with an S, and they align with Pauli’s X and Y.

- (30)
- Chomsky-Pauli Group G
_{CP}

- (31)
- a. $\left[\begin{array}{cc}\pm 1,\text{}\pm i& 0\\ 0& \pm 1,\text{}\pm i\end{array}\right]$ b. $\left[\begin{array}{cc}0& \pm 1,\text{}\pm i\\ \pm 1,\text{}\pm i& 0\end{array}\right]$

_{CP}as operators to cover some “grammatical category” space (for T, Det, C, and so on). Bear in mind that the “grammatical structure” never starts a derivation, it is presumed to be added on “lexical structure”—which the Jarret graph grounds. Ideally, the way to connect the Jarret graph to a Super Jarret Graph is by applying the extended-Chomsky categories (“grammatical” elements) to the Jarret Graph, in the process yielding a related object with “more structure”, targeting each of the projections in (16). We should not need more than that, the super-graph preserving the basic symmetries of the Jarret Graph, including the determinats/labels we have established for basic “projection”—which, in this sense, could be seen as an “extended projection” into a layer of functional structure. One should also not need further phrasal axioms beyond the already stipulated Anchoring and Interpretive axioms, so that whatever semantics ensues follows from, rather than determine, the algebraic fate of the graph. Figure 2 shows a cartoon version of the SJG from work in progress.

_{CP}, presupposing further nuances if the entire group has grammatical significance. The super-graph should be seen as an empirical claim based on the formalism deployed, presuming the Jarret graph with all the implications above. The goal, as a result of the various dynamics, should be to obtain a “periodic table” of categorial elements of this elementary sort, ideally without having to invoke new dimensions or higher orders. Figure 3 below is based on the Jarret Graph as already discussed, as well as Table 3 and extensions from the presuppositions behind it: semiotic and phrasal axioms, together with “churning calculations” in the way any “categorial grammar” of this sort may function, applying “type changing” rules—here, matrix multiplication. The question marks in the table are meant to suggest that we haven’t discussed here any specificities for those categories or associated projections. In general, with any “periodic table” of this sort, one either needs to predict why a given gap exists in the paradigm or otherwise argue for a given underlying category “fitting the bill”.

_{CP}and normalizing the results (to entries ±1, ±i) has a consequence for what one may think of as “phrasalization” of the categories under discussion:

- (32)
- $\left[\begin{array}{cc}1& 0\\ 0& 1\end{array}\right]+\left[\begin{array}{cc}0& 1\\ 1& 0\end{array}\right]=\left[\begin{array}{cc}1& 1\\ 1& 1\end{array}\right]$

- (33)
- a. $\left[\begin{array}{cc}1& 0\\ 0& 1\end{array}\right]$
^{2}= $\left[\begin{array}{cc}1& 0\\ 0& 1\end{array}\right]$^{3}= $\left[\begin{array}{cc}1& 0\\ 0& 1\end{array}\right]$^{4}= … = $\left[\begin{array}{cc}1& 0\\ 0& 1\end{array}\right]$^{n}= $\left[\begin{array}{cc}1& 0\\ 0& 1\end{array}\right]$b. $\left[\begin{array}{cc}0& 1\\ 1& 0\end{array}\right]$^{2}= $\left[\begin{array}{cc}1& 0\\ 0& 1\end{array}\right]$; $\left[\begin{array}{cc}0& 1\\ 1& 0\end{array}\right]$^{3}= $\left[\begin{array}{cc}0& 1\\ 1& 0\end{array}\right]$; $\left[\begin{array}{cc}0& 1\\ 1& 0\end{array}\right]$^{4}= $\left[\begin{array}{cc}1& 0\\ 0& 1\end{array}\right]$; $\left[\begin{array}{cc}0& 1\\ 1& 0\end{array}\right]$^{5}= $\left[\begin{array}{cc}0& 1\\ 1& 0\end{array}\right]$; …c. $\left[\begin{array}{cc}1& 1\\ 1& 1\end{array}\right]$^{2}= $\left[\begin{array}{cc}2& 2\\ 2& 2\end{array}\right]$; $\left[\begin{array}{cc}0& 1\\ 1& 0\end{array}\right]$^{3}= $\left[\begin{array}{cc}4& 4\\ 4& 4\end{array}\right]$; $\left[\begin{array}{cc}0& 1\\ 1& 0\end{array}\right]$^{4}= $\left[\begin{array}{cc}8& 8\\ 8& 8\end{array}\right]$; $\left[\begin{array}{cc}0& 1\\ 1& 0\end{array}\right]$^{5}= $\left[\begin{array}{cc}16& 16\\ 16& 16\end{array}\right]$; …

- (34.)
- Derivational entropya. For spectral radius ρ(A), square matrix A’s largest absolute value λ
_{max}of A’s eigenvalues,b. A’s topological derivational entropy h_{T}= log_{2}λ_{max}.

_{CP}—for A matrix in the sequence of powers, ρ(A) is always 1. In contrast, some of the superposed matrices (arising from summing matrices within G

_{CP}) have ρ(A) larger than 1.13 Consequently, the topological derivational entropy h

_{T}of the matrices in (33a) or (33b) is log

_{2}λ

_{max}= log

_{2}1 = 0; in contrast, for h

_{T}of the matrices in (33c) log

_{2}λ

_{max}= log

_{2}2 = 1. This topological entropy matters for interpreting systems’ growth as they (try to) expand in L-system fashion, via rewrite or EM merge processes.

## 7. Categories & Interactions Redux

_{CP}have the formal condition of their power sequences being “cyclic” in this sense, while some of their (normalized) superpositions can be shown to allow for the fractal growth the standard L-systems imply—starting with the simple (33c), which corresponds to a trivial Lindenmayer expansion as in (35):

- (35)

_{CP}need to be necessarily categorial in the Markovian cyclicity of corresponding powers, the mere superposition of its elements is enough, under the right conditions, to allow for the phrasalization of corresponding categories—via the same algebra.

- (36)
- a. $\left[\begin{array}{cc}1& 0\\ 0& 1\end{array}\right]+\left[\begin{array}{cc}1& 0\\ 0& -1\end{array}\right]+\left[\begin{array}{cc}0& 1\\ 1& 0\end{array}\right]=\left[\begin{array}{cc}2& 1\\ 1& 0\end{array}\right]$ b. $\left[\begin{array}{cc}1& 1\\ 1& 0\end{array}\right]$

- (37)

_{+}in (37) that happens to be sister to and daughter to a “-” into a constant k that does not rewrite any further; while subsequently (38b) prunes the “-” (whose “purpose” is only to contextualize the atomization) out of existence:

- (38)
- a. […]
_{+}→ k/[… __− …]_{−}b. − → Ø

- (39)

^{32}ways of superposing (summing) the categories in the G

_{CP}, and if the empirically observed X’-schema in (39) is any indication, we need a further dimension for the system’s constants. That presupposes, together with the phrasalization (via superposition), some mechanism like (38a) for the atomization of phrasal chunks into lexical idioms, perhaps akin to whatever goes on in idiomatization more generally, as discussed in Lasnik & Uriagereka [27].

_{CP}operate on: its own formal scaffolding “rotating” in four dimensions.16 Those should express the categorial distinctions Chomsky [2] was after: nouns, verbs, and so on, all the way “up” to determiners, inflectional categories, or wherever else may be syntactically needed. The (ultimately mental) states this allows ought to be the support for the lexical and grammatical categories we have been using in a text like this (and see Note 16). To be sure, that alone will not distinguish, let’s say, write from read, page from noun, or any of the thousands of instantiations one carries in one’s head—worse: anyone could carry in one’s head, in principle, and for any language in the past or the future. But just as assigning categorial features [+N, -V] does not, in itself, distinguish all possible nouns there could be, so too the implied algebra is meant to combine with whatever system may be in place for the further nuances one may want to add to the relevant vector space.17 Once we establish the vectorial units of said space as priors, whatever substantive dimensions are added to the foundations are as immaterial (to the structure itself) as the notes of a melody one can deploy within a given harmony. Importantly—while there are grammatical processes that depend on the implementation of the features this system presupposes (by changing values and attributes as relevant, to distinguish nouns from verbs, etc.)—there is no syntactic process that depends on the exact implementation of the relevant vector spaces. Thus, there are syntactic operations that target nouns or verbs, but none that target the noun page as opposed to any other or the verb write, but not the verb read, etc.18 What we are attempting to establish now is just the algebraic foundation of the vector space where syntax lives.19

## 8. Conclusions

_{CP}, which includes all of the Pauli group and extends it (by mixing real and complex entities, or vertically flipping the elements in the Chomsky group into their symmetrical counterparts). Although this short paper cannot fulfill the promise of distributing the formal entities in that group into a full-fledged “periodic table” of grammatical categories, mapping them within the formal system amounts to a dissertation exercise, with familiar categorial-grammar methods (we know the input, suspect the output, have an excellent sense of the phonetics and an informed idea of the semantics, so it is all a matter of “aligning the Rubik cube”). A harder—more interesting—question is how to go beyond head-complement relations, in particular into head-specifier relations and the long-range correlations they afford.

^{n}growing exponentially with the size of n do not for the subcase k = 1. We are dealing with matrices, but they too have powers, and their effect on their spectral radius (per (7)) does depend on whether it is 1 or more, thereby affecting the system’s topological derivational entropy. In the former instance the power sequence “cycles back” to the origin, after hitting the identity matrix, while in the latter the power series yields fractal growth instead.20 We have capitalized on that distinction to separate “cyclic/Markovian” categories from derivational topologies that may carry phrasal combinations. To go on with the Tour metaphor in Section 2, the “earworm” stages would have to be designed by M.C. Escher, to cycle back after a topological trick involving a Möbius Strip! In contrast, all other stages would shoot out to infinitude, creating a space for exploration of the relevant terrain that can take arbitrarily many twists and turns, within the restricted topology. Algebraically, the difference is small: mere symmetry within the underlying square matrices vs. breaking that symmetry into some result that could be chaotic.

_{CP}in ways sketched in Orús et al. [25], is entirely compatible with that model, and thus can preserve its virtues, including a way to directly map the relevant representations to actual phrase-markers (not just L-trees). More importantly, both systems present the ability to (formally) state entanglements upon relevant superpositions—what syntacticians customarily call “chains of occurrences”. While this idea is only offered as a proof of concept in the connectionist instance, and without obvious connections to actual observables across languages, I intend to show in a sequel to this piece how the present formulation has the same mathematical result as a direct consequence, moreover tracking grammatical facts in a more direct fashion (e.g., allowing us to state differences between voice and information questions). It is interesting that one should converge in a similar math from very different assumptions (in the present instance, asking how to generalize from the most modest formal objects to matrices combining them, step by step). While data-driven analyses involve the same math that can be deployed over the group we have begun to formally analyze, we “got there” from first principles that we still teach in our introductory linguistics classes, dating back to the dawn of linguistics. While this could certainly be mere coincidence, it may also be an indication that some progress may be at hand as these matters are further explored.

## Funding

## Acknowledgments

## Conflicts of Interest

## Notes

1 | Here I concentrate on head-complement relations only. One of the reviewers reasonably wonders “how this approach works with complex syntactic operations such as movement”; while this is very much part of the program, that matter cannot be seriously addressed within a chapter like this. As the reviewer also notes, “the functional structure above lexical items is related to extended projections…(tentatively presented in Section 6)”. S/he wonders whether “restrictions on the kind and quantity of functional items/structure or some parallelisms among the extended projection of lexical items”; the answer is: Yes, as such formal symmetries are crucial to this approach. That said, I need to ask readers to be patient with the exposition, which cannot be rushed—or it will not make algebraic sense. It is my hope that the separate pieces that articulate this ensemble will come out together in the form of chapters in a single monograph. |

2 | The “*” represents the Kleene star operator or closure, named in honor of Stephen Cole Kleene. For a given set like T, T* is defined as the smallest superset of T that contains the empty string and is closed under concatenation. See https://en.wikipedia.org/wiki/Kleene_star (accessed on 28 August 2022). |

3 | One of the reviewers takes me to assume “the verb/noun distinction as… radical”, noting how others see the distinction less dramatically. Careful readers will see that this entire exercise is meaningless if, in point of fact, the distinction is not dramatic to the extreme of (relevant) orthogonalities. This can, of course, mean that the project is wrong; if so, so be it. But formally it would make no sense to “weaken” the claim, so I will stick to the presumptions. Perhaps it should be clear that the notion of “noun” and “verb” that I am after has nothing to do with lexical instantiations (which can be identical); what matters in the present context is the (I think radical) fact that, for example, event semantics is articulated in a way different—as a “sentence spine”—than arguments thereof; the latter being foundational nouns. Of course, one can verbalize nouns or nominalize verbs, but what is being sought in the present context are the underlying dimensions. |

4 | If we consider the power series, the ensuing aggregation relates to the total number of nodes in the L-tree. |

5 | And note also that A ^{0} is the identity matrix, whose ρ(A^{0}) = 1, so ρ(A^{1})/ρ(A^{0}) is of course also ρ(A^{1}). |

6 | Lest this be confusing, “linear operator” has nothing to do with “linearization” in the sense used in syntax (to order terminals in the speech signal). I say this because one of the reviewers takes me to be alluding to “the issue of linearization (and, implicitly, labelling)”. My use of the notion is in the linear algebra sense (so as a linear map within a vector space), to denote a very tight operation that keeps basic relations unchaged. I know that, as the reviewer points out, for some authors “linearization” (in the syntactic sense) relates to “labeling”, which I have nothing to contribute to (indeed, as mentioned at the end of Section 2, the M-matrices code no “linear ordering” in that syntactic sense among terminals, which is left as a separate problem at right angles with phrasal topology). This is not to say that the present system does not care about “labels”: here such abstract (non-terminal) representations are matrix determinants (nothing to do with “determiners” in the syntactic sense); these are algebraically related to matrix traces as in Section 2 (no relation to syntactic “traces” either). The terminological nightmare is what it is, but the truth is the terms in point have a much older tradition in algebraic systems, which syntacticians are not always careful in distinguishing; alas, even the term linear transformation is used in math, and it turns out to be an interesting (and difficult) question whether this relates to what syntacticians call a “transformation”, with a structural description and a structural change—I believe it does, but this must be argued, which I will not go into here. Unfortunately, there is nothing much I can do about any of this, beyond noting the issue and proceeding with the relevant caution. |

7 | Interested readers can check these expressions in the Matlab platform or the popular Wolframalpha.com (accessed on 28 August 2022). |

8 | For those unfamiliar with US independence, Revere prepared a code for the colonists of Charlestown about troop movements, in terms of their route being signaled by the number of lanterns in a church steeple: “one if by land, two if by sea”. See https://en.wikipedia.org/wiki/Paul_Revere%27s_Midnight_Ride (accessed on 28 August 2022). |

9 | A reviewer asks the question, about generalization (21), of whether all N/V/A/P elements follow it. I discuss this further at the end of Section 5, but I must also raise a deeper concern. That generalization is what the Jarret Graph gives us, the graph itself following from algebraic assumptions coupled with reasonable grounding conditions (of the semiotic sort because that is what grounds the system). We may choose to ignore this and continue to search for other answers or may, instead, be moved by whatever portion of the “empirical cake” we get to describe this way, with absolutely no semantic assumptions (beyond the Ur Anchoring Axiom). This is an aesthetic decision that I have little to contribute to. In a similar vein, the reviewer asks whether “a more detailed mention of several categories…would be helpful”. We certainly all agree that there are more “parts of speech”, and even that, as Chomsky [2] suggested already in 1974, that we could add further dimensions to the system to capture those—one can always add more dimensions. But the issue is how much mileage one can get out of the smallest amount of formal machinery. At the end of the chapter, I suggest some natural extensions that do not entail expanding the dimensions of the system or its underlying algebra. But it should be clear that no formal system ever describes a “totality of the natural phenomena” it is attempting to describe. Even physics, the obvious model to follow, cannot describe the universe with the same math, which happens to also be (an extension of) linear algebra. It would seem unreasonable (and not very illuminating) to demand more of linguistics than we do of any of the other natural sciences. |

10 | I thank MG Hirsch for the question and discussion of its significance. |

11 | The linguist may note ditransitive verbs are bi-clausal in some languages, differences in verbal (periphrastic or synthetic) instantiations, or many idiosyncrasies. Worrying about differences now, however, is the equivalent of preventing Galileo from tabling the behavior of balloons while studying the falling of objects: the foundations of a scientific theory are built on patterns, not exceptions—to better understand the exceptions and beyond. |

12 | Primitive types differ depending on the semantic theory assumed; a larger issue (particularly if the Interpretive Axiom is assumed) is whether only certain portions of the syntax are, in fact, interpretable. |

13 | Some of the superpositions also yield the zero matrix; e.g., any matrix plus its negative. |

14 | A normalization can mean very different things in different formal contexts. Here we could take it as a way to achieve an interpretation with matrix entries that are not larger in magnitude than 1, which can have a probabilistic interpretation. Note, for concreteness that: (i) $\left[\begin{array}{cc}2& 1\\ 1& 0\end{array}\right]$$\u2013\left[\begin{array}{cc}1& 0\\ 0& 0\end{array}\right]$ = $\left[\begin{array}{cc}1& 1\\ 1& 0\end{array}\right]$. So the second element in the sum in (i) can be seen as a “normalizer”. This of course is stipulative, so one must explore under what algebraic circumstances one can naturally go from one of the matrices in the C _{CP} group to an M-matrix associated to a phrasal topology, like the one in the right-hand side equation (i). I will not study this here, but I can anticipate that this is behind the idea of “chain” formation in the present system, involving the supra-phrasal linkage of specifiers, through Internal Merge. |

15 | None of that is to say that anything presumed here is computationally trivial or even obvious. This starts with (37b), which is easy to state in “projection” terms, but much less so in Merge (or any BPS) terms. I am setting aside discussion of this important nuances simply to make the larger points. |

16 | The Pauli group is algebraically equivalent to the 4D quaternions postulated by William Hamilton in 1843, which are behind the virtual reality graphics in vogue these days. Going from the Pauli group to the G _{CP} can be seen as a rotation on the Pauli group, of the sort Hamilton discovered vis-à-vis the complex plane—so it is possible that the syntactic system (if it does live in the G_{CP}) is really an 8D representation. |

17 | This is meant very literally. As Naomi Feldman observes, the implied vectorial algebra can combine through a tensor product with any vectorial system (e.g., of a visual sort or any other), without this changing the basic underlying “syntactic” structure presumed here. I will not develop the point now, but it is mathematically rather direct, given that tensor products are structure-preserving. |

18 | This insight was first presented in Bresnan [28], p. 200, when noting how no language has a rule extraposing phrases involving the concept redness. This is generally true for any such process and any such feature. I thank Howard Lasnik for the reference and discussion relating to this topic. |

19 | This touches upon an issue Peter Kosta raises, regarding language acquisition. The short answer is that the present system is too abstract to bear on that. But a long answer is more interesting: the system does predict that nouns are fundamental in anchoring syntax, which can be tested empirically. Alison Brooks mentioned to me a fascinating counter-example: San children apparently start linguistically acquiring verbs before nouns, which she associates to the (itself surprising) facts that nouns, unlike verbs, exhibit (difficult) click phonemes. Brooks also insightfully notes, however, that the toddlers use pointing in place of nouns! If this is ultimately correct, it is grist for a mill Jackendoff [29] emphasized: that the linguistic system crucially connects to vision. My (modest) contribution to this would be that linear algebra can help with that connection, since one needs it for the visual system–and the present project argues that it is needed in language too. |

20 | The issue relates to a comparison between Fourier and Taylor series, both function decompositions represented as linear combinations of countable sets of functions, thereby specified by a coefficient sequence. The intuition is “categories” correspond to periodic functions while interactions “live” in the topological space of fractal L-systems that relate to the infinite sum of powers in a Taylor series. While abstractly similar, these differ in that the computation of a Taylor series is “local”, unlike the computation of a Fourier series. |

21 | The balance across synaptic conditions in a normal brain presumes homeostatic mechanisms and, in particular, the goal of maintaining excitation/inhibition balance and total activity at the network level, affecting synaptic weights. Although this is speculation, the thought is that these global forms of balance presume orthogonal conditions that one would hope to model in terms of complex vectors as invoked here. |

## References

- Lyons, J. Introduction to Theoretical Linguistics; Cambridge University Press: Cambridge, MA, USA, 1968. [Google Scholar]
- Chomsky, N. The Amherst Lectures; University of Paris VII: Paris, France, 1974. [Google Scholar]
- Jackendoff, R. X’-Syntax: A Study of Phrase Structure; MIT Press: Cambridge, MA, USA, 1977. [Google Scholar]
- Speas, M. Phrase Structure in Natural Language; Kluwer: Dordrecht, The Netherlands, 1990. [Google Scholar]
- Muysken, P. Parametrizing the notion Head. J. Linguist. Res.
**1982**, 2, 57–76. [Google Scholar] - Kayne, R. The Anti-Symmetry of Syntax; MIT Press: Cambridge, MA, USA, 1994. [Google Scholar]
- Chomsky, N. Bare Phrase Structure. In Evolution and Revolution in Linguistic Theory: Essays in Honor of Carlos Otero; Campos, H., Kempchinsky, P., Eds.; Georgetown University Press: Washington, DC, USA, 1994. [Google Scholar]
- Jakobson, R.; Fant, G.; Halle, M. Preliminaries to Speech Analysis: The Distinctive Features and Their Correlates; MIT Press: Cambridge, MA, USA, 1952. [Google Scholar]
- Jakobson, R.; Halle, M. Fundamentals of Language; Mouton: The Hague, The Netherlands, 1971. [Google Scholar]
- Kent, R. De Lingua Latina; Varro, M.T., Translator; Heinemann: London, UK, 1938. [Google Scholar]
- Palmer, S. 1999 Vision Science; MIT Press: Cambridge, MA, USA, 1999. [Google Scholar]
- Poeppel, D. The analysis of speech in different temporal integration windows: Cerebral lateralization as ‘asymmetric sampling in time’. Speech Commun.
**2003**, 41, 245–255. [Google Scholar] [CrossRef] - Eiseley, L. Darwin’s Century; Doubleday Anchor Book: Garden City, NY, USA, 1961. [Google Scholar]
- Chomsky, N. The Logical Structure of Linguistic Theory. Harvard/MIT mimeograph, partly submitted as UPenn. Ph.D. Thesis, 1955. [Published in part, 1975, Plenum: New York, USA]. [Google Scholar]
- Stowell, T. Origins of Phrase Structure. Ph.D. Thesis, MIT, Cambridge, MA, USA, 1981. [Google Scholar]
- Chomsky, N. Lectures on Government and Binding; Foris: Dordrecth, The Netherlands, 1981. [Google Scholar]
- Lasnik, H.; Kupin, J. A restrictive theory of transformational grammar. Theor. Linguist.
**1977**, 4, 173–196. [Google Scholar] [CrossRef] - Lindenmayer, A. Mathematical models for cellular interactions in development II. Simple and branching filaments with two-sided inputs. J. Theor. Biol.
**1968**, 18, 300–315. [Google Scholar] [CrossRef] - Boeckx, C.; Carnie, A.; Medeiros, D. Some Consequences of Natural Law in Syntactic Structure. Master’s Thesis, University of Arizona, Tucson, AZ, USA, Harvard University, Cambridge, MA, USA, 2005. [Google Scholar]
- Medeiros, D. Economy of Command. Ph.D. Thesis, University of Arizona, Tucson, AZ, USA, 2012. [Google Scholar]
- King, C. Some Properties of the Fibonacci Numbers. Master’s Thesis, San José State College, San Jose, CA, USA, 1960. [Google Scholar]
- Uriagereka, J. Biolinguistic Investigations and the Formal Language Hierarchy; Routledge: London, UK, 2018. [Google Scholar]
- Ott, E. Chaos in Dynamical Systems; Cambridge University Press: New York, NY, USA, 1993. [Google Scholar]
- Uriagereka, J. Forthcoming. “Minimalist Goals”. In The Cambridge Handbook of Minimalism; Grohmann, K., Leivada, E., Eds.; CUP: Cambridge, MA, USA.
- Orús, R.; Martin, R.; Uriagereka, J. Mathematical Foundations of Matrix Syntax. arXiv
**2017**, arXiv:1710.00372. [Google Scholar] - Guimarães, M. In Defense of Vacuous Projections in Bare Phrase Structure. In University of Maryland Working Papers in Linguistics; The University of Maryland: College Park, MD, USA, 2000; Volume 9. [Google Scholar]
- Lasnik, H.; Uriagereka, J. Structure: Concepts, Consequences, Interactions; MIT Press: Cambridge, MA, USA, 2022. [Google Scholar]
- Bresnan, J. Theory of Complementation in English Syntax. Ph.D. Thesis, MIT, Cambridge, MA, USA, 1972. [Google Scholar]
- Jackendoff, R. Foundations of Language; Oxford University Press: Oxford, UK, 2002. [Google Scholar]
- Uriagereka, J. Spell-Out and the Minimalist Program; Oxford University Press: Oxford, UK, 2012. [Google Scholar]
- Bukalo, O.; Campanac, E.; Hoffman, D.A.; Fields, R.D. Synaptic plasticity by antidromic firing during hippocampal network oscillations. Proc. Natl. Acad. Sci. USA
**2013**, 110, 5175–5180. [Google Scholar] [CrossRef] [PubMed] - Chistiakova, M.; Bannon, N.; Chen, J.-Y.; Bazhenov, M.; Volgushev, M. Homeostatic role of heterosynaptic plasticity: Models and experiments. Front. Comput. Neurosci.
**2015**, 9, 89. [Google Scholar] [CrossRef] [PubMed] - Fisher, M.P. Quantum cognition: The possibility of processing with nuclear spins in the brain. Ann. Phys.
**2015**, 362, 593–602. [Google Scholar] [CrossRef] [Green Version] - Straub, J.; Nowotarski, M.; Lu, J.; Sheth, T.; Fisher, M.; Helgeson, M.; Jerschow, A.; Han, S. Phosphates form spectroscopically dark state assemblies in common aqueous solutions. PNAS
**2021**, 20, 1–8. [Google Scholar] - Smolensky, P.; Legendre, G. The Harmonic Mind; MIT Press: Cambridge, MA, USA, 2005. [Google Scholar]

Inverse, Adjoint | $\left[\begin{array}{cc}1& 0\\ 0& -i\end{array}\right]$←→$\left[\begin{array}{cc}1& 0\\ 0& i\end{array}\right]$ | $\left[\begin{array}{cc}-1& 0\\ 0& i\end{array}\right]$←→$\left[\begin{array}{cc}-1& 0\\ 0& -i\end{array}\right]$ | $\left[\begin{array}{cc}-i& 0\\ 0& 1\end{array}\right]$←→$\left[\begin{array}{cc}i& 0\\ 0& 1\end{array}\right]$ | $\left[\begin{array}{cc}i& 0\\ 0& -1\end{array}\right]$←→$\left[\begin{array}{cc}-i& 0\\ 0& -1\end{array}\right]$ | |||

Multiplication within (13) | $\left[\begin{array}{cc}\pm 1& 0\\ 0& +i\end{array}\right]\left[\begin{array}{cc}\pm 1& 0\\ 0& +i\end{array}\right]$= $\left[\begin{array}{cc}\pm 1& 0\\ 0& +1\end{array}\right]$ | Multiplication within (14) | $\left[\begin{array}{cc}\pm i& 0\\ 0& +1\end{array}\right]\left[\begin{array}{cc}\pm i& 0\\ 0& +1\end{array}\right]$= $\left[\begin{array}{cc}\pm 1& 0\\ 0& +1\end{array}\right]$ | ||||

Multiplication from (13) to (14) | $\left[\begin{array}{cc}\pm 1& 0\\ 0& +i\end{array}\right]\left[\begin{array}{cc}\pm i& 0\\ 0& +1\end{array}\right]$= $\left[\begin{array}{cc}\pm i& 0\\ 0& +i\end{array}\right]$ | Multiplication from (14) to (13) | $\left[\begin{array}{cc}\pm i& 0\\ 0& +1\end{array}\right]\left[\begin{array}{cc}\pm 1& 0\\ 0& +i\end{array}\right]$= $\left[\begin{array}{cc}\pm i& 0\\ 0& +i\end{array}\right]$ |

$\left[\begin{array}{cc}\pm 1& 0\\ 0& +i\end{array}\right]$^{2} = $\left[\begin{array}{cc}1& 0\\ 0& -1\end{array}\right]$ | $\left[\begin{array}{cc}\pm 1& 0\\ 0& +1\end{array}\right]$^{2} = $\left[\begin{array}{cc}1& 0\\ 0& 1\end{array}\right]$ | $\left[\begin{array}{cc}\pm i& 0\\ 0& +1\end{array}\right]$^{2} = $\left[\begin{array}{cc}-1& 0\\ 0& 1\end{array}\right]$ | $\left[\begin{array}{cc}\pm i& 0\\ 0& +i\end{array}\right]$^{2} = $\left[\begin{array}{cc}-1& 0\\ 0& -1\end{array}\right]$ |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Uriagereka, J.
Categories with Complements. *Philosophies* **2022**, *7*, 102.
https://doi.org/10.3390/philosophies7050102

**AMA Style**

Uriagereka J.
Categories with Complements. *Philosophies*. 2022; 7(5):102.
https://doi.org/10.3390/philosophies7050102

**Chicago/Turabian Style**

Uriagereka, Juan.
2022. "Categories with Complements" *Philosophies* 7, no. 5: 102.
https://doi.org/10.3390/philosophies7050102