Solomon Marcus Contributions to Theoretical Computer Science and Applications

Solomon Marcus (1925–2016) was one of the founders of the Romanian theoretical computer science. His pioneering contributions to automata and formal language theories, mathematical linguistics and natural computing have been widely recognised internationally. In this paper we briefly present his publications in theoretical computer science and related areas, which consist in almost ninety papers. Finally we present a selection of ten Marcus books in these areas.


Introduction
In 2005, on the occasion of the 80th birthday anniversary of Professor Solomon Marcus, the editors of the present volume, both his disciples, together with his friend Professor G. Rozenberg, from Leiden, The Netherlands, have edited a special issue of Fundamenta Informaticae (vol. 64), with the title Contagious Creativity. This syntagma describes accurately the activity and the character of Marcus, a Renaissance-like personality, with remarkable contributions to several research areas (mathematical analysis, mathematical linguistics, theoretical computer science, semiotics, applications of all these in various areas, history and philosophy of science, education), with many disciples in Romania and abroad and with a wide recognition all around the world. Marcus  In what follows we only briefly describe his contributions to theoretical computer science and related areas, especially to automata and formal language theories, natural computing (DNA and membrane computing), applications of grammars in various domains, recursive function theory and provability in mathematics, as well as a selection of his many books in these areas. Some papers have been re-printed in S. Marcus, Words and Languages Everywhere, Polimetrica, Milano, 2007, but almost all collected in the two-volume book G. Pȃun (ed.), Solomon Marcus, Selected Papers-Computer Science, Spandugino Publ. House, Bucharest, 2018, abbreviated SPCS). Our choices have been guided by SPCS.
Marcus' pioneering book Gramaticişi automate finite (Grammars and Finite Automata), published in 1964 in Romanian is one of the first monographs in the world on this subject. This book, written in a rigorous mathematical language at a time when the domain was in infancy, covers automata and language theories, closely linking finite automata and Chomsky regular grammars. The book ends with a chapter on the relations between natural languages and regular grammars, a theme which motivated Marcus' interest and his many publications in mathematical linguistics. Unfortunately, the book, written in Romanian, was not translated into any other language; hence, it remained almost unknown internationally. This is not the case with many of his subsequent books, specifically those in mathematical linguistics, some of which will be listed in this paper. These books have been translated in several languages (French, English, German, Russian, Italian, Czech, Spanish, Greek and other languages) and then published by Academic Press, Dunod, Nauka and other well-known international publishers. Without exception, they had a very high international audience and impact.
His first paper in formal language theory was published in 1963 and it is illustrative for his permanent interest in building bridges between apparently disjoint research areas; in this case, finite automata, regular grammars, arithmetical progressions. Symmetrically, his last paper, published 50 years later, returns to bio-informatics, a domain which he somehow prognosticated (too early) in the beginning of the 70's.

A Working Classification
It is difficult to classify the theoretical computer science papers of Marcus because of their inter/multi-disciplinarity. In SPCS, the papers have been classified into four large categories: Formal language theory, applications of formal language theory, bio-informatics, and recursive function theory. We will use this classification here too.
In the first class there are papers dealing with finite state grammars and automata, contextual grammars, the history of formal language theory, combinatorics on words and on infinite sequences (periodicity and quasi-periodicity, unavoidable patterns, density of words of a given length), mathematical analysis notions adapted to formal language theory, and so on.
The last category deserves a closer study, which we only suggest here: To systematically extend notions/ideas from mathematical analysis to formal language theory in general and to combinatorics on words in particular (a symmetric study is worth carrying out for applications of formal languages to other mathematical areas, e.g., number theory by classifying various classes of numbers in Chomsky's hierarchy, characterising them with grammars, etc.). This was a direction of research programmatically explored by Marcus. The title of his 1999 paper is explicit and significant in this respect: From real analysis to discrete mathematics and back, followed by details: Symmetry, convexity, almost periodicity, and strange attractors. In the beginning of this paper he wrote: Despite its importance, the relation between continuous and discrete mathematics is a rather neglected topic. (. . . ) Working in real analysis in the fifties and in the sixties and then in discrete mathematics (the mathematical theory of languages), I became interested to look for the discrete analog of some facts belonging to continuous mathematics.
Among the most fruitful ideas of this kind we mention several variants of the Darboux property for languages, the basic one being the following: If we have three families of languages, L 1 ⇢ L 2 ⇢ L 3 , conceivably belonging to a larger hierarchy of families of languages, possibly infinite, and two languages L 1 2 L 1 , L 3 2 L 3 \ L 2 , can we find a language L 2 2 L 2 such that L 1 ⇢ L 2 ⇢ L 3 ? Various definitions of symmetry, attractors, periodicity, convexity, etc., have been extended to strings. In all cases, Marcus used to define a series of subtle variants, of the type left-, right-, almost-, pseudo-, weak-, strong-, etc. Marcus had an unbounded creativity to pose open problems, and these papers never missed them; quite a few papers solved such problems, some of them with Marcus as a coauthor.
Actually, formulating open problems and suggesting research directions is one of the specific features of "Marcus' style". Many of the questions formulated by Marcus were addressed by his disciples, collaborators, by researchers in mathematics and computer science from Romania and other countries. Some problems were, partially or totally, solved-many of them are still waiting for solutions.

A Constant Interest for Bio-Informatics
We mentioned before that in the 1970s Marcus published "too early" a paper dealing with applications of mathematical linguistics and formal language theory in biology, specifically in the genomics area. The year was 1974 and the title of the paper is Linguistic structures and generative devices in molecular genetics.
Bio-informatics can be understood in two senses, as an attempt to use computer science in biology, providing notions, tools, techniques to the biologist and, mainly in the last decades, in the opposite direction, to utilise ideas inspired from biology in developing algorithms in computer science, and in hardware too, as is the case in DNA computing-DNA molecules do computations. In his paper, Marcus considered both directions. In the first direction of research he synthesised previous approaches and results; in the second one he proposed new research vistas for using mathematical (linguistic) tools in addressing questions in the genetic area, to model the DNA and its biochemistry. Speculations about using DNA molecules as a support for computations were published only later (by M. Conrad, R. Feynman, C. H. Bennet), while the first computing model based on an operation specific to DNA recombination was introduced only in 1987 by T. Head (another friend of Marcus). However, it is worth emphasising the attention paid by Marcus, in this first paper and also in many others, to a 1965 proposal formulated by the Polish mathematician Z. Pawlak (famous for introducing in early 1990s, the rough sets), to generate proteins starting from amino acids; the method used a specific representation of amino acids and certain picture grammars. (This is the reason Marcus considered Z. Pawlak a precursor of picture grammars, a type of generative mechanisms developed later.) Over the years, Marcus was constantly interested in the (mathematical) linguistic approach to cellular biology, to applications in genomics and life sciences. For instance, after the apparition of DNA computing in 1994, and especially after the initiation of membrane computing in 1998, he had contributed to these areas with a series of papers and participated to several international meetings dedicated to these subjects, in Romania and abroad. As expected, the inter-disciplinary approach, typical to Marcus, is always present in his contributions-here are two illustrative titles of papers in membrane computing, Membranes versus DNA and Bridging P systems and genomics, presented at the first meetings devoted to membrane computing (Curtea de Argeş, Romania, 2001Romania, , 2002. Actually, in 2002, he proposed a slogan which became folklore in this research area: Life = DNA software + membrane hardware. As expected, in this area too he proposed several research directions, some of them truly "non-standard" ("too" inter-disciplinary) at the first sight. We only cite two examples of ideas not yet explored: To consider membranes with a topology different from the usual one (vesicle-like membranes), where the separation between inside and outside is crisp (for example, to study membranes similar to Klein's bottle), and, respectively, to use multisets, the sets with a multiplicity associated with their elements (the usual data structure in membrane computing) described by Pawlak rough sets.

Marcus Contextual Grammars
In a paper simply called "Contextual grammars" (published in 1969 in Revue Roumaine de Mathématiques Pures et Appliquées) Marcus has introduced the grammars which are now called Marcus contextual grammars, a branch of formal language theory. In fact, the paper was presented one year before in an international linguistics conference held in Stockholm, Sweden.
The paper has ten pages, but currently there probably exist more than 400 papers on contextual grammars, about two dozen of PhD and Master Theses, as well as two monographs, one published by the Publishing House of the Romanian Academy, Bucharest, 1982 (in Romanian), and one by Kluwer Publishing, The Netherlands, in 1997 (Marcus Contextual Grammars), both of them authored by Gh. Pȃun. In the second volume of the massive Handbook of Formal Languages, Springer-Verlag, 1997 (three volumes), edited by G. Rozenberg and A. Salomaa, there are two chapters dedicated to this topic, one by Marcus, "Contextual grammars and natural languages", which discusses motivations and developments in this area, and another more technical one, "Contextual grammars and formal languages", by A. Ehrenfeucht, Gh. Pȃun, and G. Rozenberg.
The idea has the origins in algebraic linguistics: For a natural language L (over an alphabet V), with every word w over V one associates a set of contexts hu, vi over V which accept w with respect to L (that is, uwv 2 L). Can we use this process of selecting words by contexts, in order to describe a language? One can also conversely state it. The answer was initially given in the form of simple contextual grammars, triples of the form G = (V, A, C), where V is an alphabet, A is a finite language over V (its elements are called axioms), and C is a finite set of contexts over V. Such a grammar generates a language L(G) which contains (1) all axioms in A and (2) all strings obtained from axioms by adjoining contexts to them. More formally, L(G) contains all strings of the form u n . . . u 1 xv 1 . . . v n , where x 2 A and hu i , v i i 2 C for all 1  i  n, with n 0; for n = 0 the string is an axiom from A.
This simple model does not have a powerful generative capacity. Moreover, it does not take into consideration the string-contexts selectivity mentioned above. However, at the end of the paper, Marcus also proposes the contextual grammars with choice, G = (V, A, C, j), where j : V ⇤ ! 2 C is the selection mapping (of contexts by the strings). This time, a string is in L(G) if it is of the form u n . . . u 1 xv 1 . . . v n as above with x 2 A, hu 1 , v 1 i 2 j(x), and hu i , v i i 2 j(u i 1 . . . u 1 xv 1 . . . u i 1 ) for all i = 2, . . . , n.
A great research program started from there, following the usual questions of formal language theory: Variants (extensions and restrictions), characterisations, generative power, comparisons of the obtained families among them and with the known families of languages, especially with those in the Chomsky hierarchy, closure and decidability properties, parsing complexity, equivalent automata, etc.
An important detail, which makes Marcus contextual grammars so attractive is the fact that they are not using, like the Chomsky grammars, nonterminal symbols, categorial auxiliary symbols: They are intrinsic grammars as each derived string belongs to the generated language.
Still, there was an embarrassing restriction in the initial model, the possibility to adjoin contexts only in the ends of the current string. A real breakthrough was proposed at the end of the 1970s, when the Vietnamese Nguyen Xuan My came to Romania to start a PhD with Marcus. In a joint paper Nguyen-Pȃun, the inner contextual grammars have been introduced: The contexts can be added in any place inside the current string, under the control of the selection mapping. (Formally, an inner contextual grammar is a usual contextual grammar with choice, G = (V, A, C, j), with j : V ⇤ ! 2 C , with the language L(G) defined as the smallest language L ✓ V ⇤ such that (i) A ✓ L and (ii) if x 1 x 2 x 3 2 L and hu, vi 2 j(x 2 ), then x 1 ux 2 vx 3 2 L.) In this way, the generative capacity has significantly increased, the flexibility (hence the adequacy) of the model has been accordingly augmented.
Another important advance in this area was made at the beginning of the 1990s, when G. Rozenberg, A. Salomaa, A. Ehrenfeucht became interested in contextual grammars. Details can be found in Kluwer's monograph mentioned before and in two chapters in the Handbook of Formal Languages.
Progress was rather rapid. Certain classes of contextual grammars have been proved to be relevant for modelling typical constructions in natural languages (duplication, multiple agreements, crossed agreements) and classes of contextual grammars which are mildly context sensitive in the sense requested by linguists (A. K. Joshi and others) have been introduced. They are parsable in polynomial time and contain strings whose lengths do not make large jumps-sometimes one asks only that the language be semilinear.
In this way, the impressive bibliography we mentioned above has been accumulatedand this bibliography is still growing.

Applications of Formal Language Theory
In this class we have included the papers devoted to applications of grammars and automata. This was a really central and continuous interest of Marcus, also passed onto his students and collaborators. The domains of applicability are very diverse: Natural and programming languages, the semiotics of folklore fairy tales, the modelling of economic processes, diplomatic negotiations, the medical diagnosis, the semiotics of theatre, action theory, learning theory, chemistry, genetics.
These applications should be placed in a more general context under the slogan linguistics as a pilot-science, a catchphrase coined by C. Levi-Strauss: Adopted, extended and transformed by Marcus it became a real research program for his Romanian school of mathematical linguistics and formal language theory.
The grounding assumption, also explored by M. Nowakowska in her book Languages of Action, Languages of Motivations, Mouton, The Hague, 1973, was that many processes/activities can be described as sequences of elementary actions ("semantic marks"), sequences which are governed by precise restrictions which can be described by syntactic rules. Thus, languages describing actions and grammars describing languages of actions came into stage. Combined with the Chomskian hypothesis that the linguistic competence is innate and influences all other competences of the human brain, Levi-Strauss's slogan became Marcus' formal linguistics as a pilot-science. Indeed, a large variety of processes, from fairy tales description to economic processes proved to be described, at convenient levels of abstraction, by grammars of the types initially developed in linguistics.

Recursive Function Theory and Provability
The last category of papers we mention deals with recursive functions and provability in mathematics; it contains fewer papers, but some of these papers have a special significance, as they clarify an important paternity in the history of computability. Specifically, they proved that the first example of a recursive function which is not primitive recursive was constructed by G. Sudan in 1927, simultaneously with and independently of W. Ackermann, who was credited before with this achievement (1928). The problem was examined by Marcus in collaboration with C. Calude and I.Ţevy, following a suggestion coming from G. C. Moisil.
It is important to mention that Marcus was constantly concerned with adequately valuing the history of the Romanian mathematics: Pointing out the priorities in this area was already one of the main goals of his well-known book Din gândirea matematicȃ româneascȃ (From the Romanian Mathematical Thinking), Scientific and Encyclopaedic Publishing House, Bucharest, 1975. This group also includes a few papers on provability in mathematics, at different levels of formalisation and with various tools, including proof-assistants.