Samuel Tenka

$\newcommand{\NN}{\mathbb{N}}$ $\newcommand{\ZZ}{\mathbb{Z}}$ $\newcommand{\RR}{\mathbb{R}}$ $\newcommand{\CC}{\mathbb{C}}$

First Definitions in Formal Languages

Whereas Sipser defines regular and decidable languages algorithmically, this short note defines the same concepts algebraically.

Languages as Problems

We can study computation by studying algorithms that classify strings into yes and no. Since strings form a monoid, they model time and hence a streaming input. Many familiar problems reduce to binary classification by means of binary encoding. For instance, we might transform the question

How does $n$ factor into primes, where $n$ is the number described in the input string?

to the question of

What is the $i$th bit of the $j$th prime factor of $n$, where $i, j, n$ are the three numbers described in the input string?

So let's fix a finite alphabet $\Sigma$ and let $\Sigma^\star$ denote the free monoid it generates. We then represent a string-classification problem as a subset of $\Sigma^\star$ or, equivalently, as a map $\mathcal{L}:\Sigma^\star \to 2$.

Monoids as Computers

The regular languages are those $\mathcal{L}$ that factor as $\Sigma^\star \xrightarrow{\epsilon} \mathcal{M} \xrightarrow{\rho} 2$, where $\tau$ is a monoid homomorphism and $\mathcal{M}$ is a finite monoid.

Regular languages are typically defined using finite automata; by Caley-embedding $\mathcal{M}$, we see that the algebraic and typical definitions are equivalent. Our algebraic definition makes clear that the regular languages are closed under the usual plethora of operations (such as reversal of strings, homomorphic pre-image, and union). It also allows a comparison with Turing Decidable languages:

The decidable languages are those $\mathcal{L}$ that factor as $\Sigma^\star \xrightarrow{\alpha} (\Sigma \cup \{0\})^\star \xrightarrow{\epsilon} \mathcal{P} \xleftarrow{\supseteq} 2$, where $\tau$ is a monoid homomorphism and $\mathcal{P}$ is a finitely presented monoid. Here, $\alpha$ bookends its input string with fresh symbols $0$ not in the alphabet $\Sigma$ and we consider $2$ as some subset of $\mathcal{P}$. Below, we sketch a proof that this is equivalent to the usual definition.

Question: What happens when we replace the condition that $\mathcal{P}$ be finitely presented by other conditions? For example, what class of languages do we get if we restrict to finitely presented commutative monoids?

Vague Remark: More generally, we are interested in language classes that capture some notion of simplicity. We could call a language class virtuous if it satisfied some algebraic conditions. We would want the regular and decidable classes to be virtuous. It would be cool to state and prove that each virtuous class is actually the class of languages computable by some appropriate class of monoids.

Proof Sketch

We will show that if a language is decidable by a Turing Machine, then it is decidable in our algebraic sense; the converse and all other remaining claims are routine to check.

The idea is to have each element of $\mathcal{P}$ represent the state of a Turing tape. The tape symbols (marked with whether or not the tape head is on them) generate $\mathcal{P}$, so we only need to find appropriate relations. The transition dynamics of the TM determines several relations of the form $x\dot{y}z \sim x\hat{y}\dot{z}$, where $x, y, z$ are tape symbols, $\dot{y}$ denotes that the head is on $y$'s cell, and $y'$ is the new value written onto that cell before the head moves to $z$'s cell. Two tapes are equivalent (under the equivalence relation generated by these string relations) if and only if they lead to some shared future tape under the Turing dynamics. We thus have a finitely presented monoid.

Actually, we must deal with initialization and termination, too. We use the symbol $0$ concatenated by $\alpha$ to initialize the head location and introduce bookending blank tape cells. We require the TM to clear its tape before halting, hence ensuring that every accepting trajectory has a unique terminal tape (and likewise for rejecting trajectories).

References

Sipser --- Introduction to the Theory of Computation --- 1997