Grammar and Language Hierarchy

Grammar

A grammar contains four parts,

Vocabulary: A set of symbols, noted as $\mathcal{V}$ .
Non-Terminal: A set of symbols, noted as $\mathcal{N}$ .
Rules: A set of rules, noted as $\mathcal{R}$ .
Start Symbol: A symbol. Usually just $S$ .

A grammar can generate a language.

Let's demonstrate one by one.

Vocabulary

A vocabulary is a set of symbols. They are the vocabulary of the final sentence.

Non-Terminal

The non-terminal is another set of symbols. We usually use capitalized letters for non-terminals. They are used to construct rules.

Rules

A rule has two part, the left part is called the source, and the right part is called the target.

The source should be from the language $(\mathcal{V} + \mathcal{N})^*N(\mathcal{V} + \mathcal{N})^*$ , to have at least one non-terminal, whereas the target should be from language $(\mathcal{V} + \mathcal{N})^*$ .

We usually note a rule as,

\text{source} \rightarrow \text{target}

There must be a finite number of rules,

|\mathcal{R}| < \infty

Start Symbol

Start symbol is just a non-terminal symbol. Usually just $S$ .

Deduced Language

A grammar is sufficient to deduce a language, which we call the deduced language, the $\mathcal{L}$ , $\mathcal{L} \sub \mathcal{V}^*$ .

We define a language called the intermediate language, the $\mathcal{I}$ . It is a subset of the language $(\mathcal{V} + \mathcal{N})^*$ .

Without condition, $S \in \mathcal{I}$ . Sentence with only a start symbol is always a valid sentence in the intermediate language.

For any rule in the $\mathcal{R}$ , if a sentence $u$ is in the intermediate language set $\mathcal{I}$ , and it has a substring that matches the $\text{source}$ of the rule, then by replacing the $\text{source}$ with $\text{target}$ , we will have a new sentence $v$ . We define $v \in \mathcal{I}$ .

We note this as,

v = \text{sub}(u, \text{source},\text{target})

This process from $u$ to $v$ is called derivation, noted as,

u \Rightarrow v

If the derivation repeats $n$ times, we note,

u \Rightarrow^n v

For any times, use $*$ , for more than one times, use $^+$ .

u \Rightarrow^* v

u \Rightarrow^+ v

That is to say,

\mathcal{I} = \{u | S \Rightarrow^* u\}

After we get $\mathcal{I}$ ,

\mathcal{L} = \mathcal{I} \cap \mathcal{V}^*

Example

The conept is a bit abstract. We now demonstrate how we can deduce a language.

The First Example

We choose,

\mathcal{V} = \{0, 1\} \\ \mathcal{N} = \{S, A, B\} \\ \mathcal{R} = \{ \\ S \rightarrow A B \\ \quad AB \rightarrow 0 A \\ \quad A \rightarrow 1 \\ \quad AB \rightarrow 1 B \\ \quad 1B \rightarrow 0 \\ \}

To look for the language this grammar generates, we first get the complete set of intermediate language,

Initially,

\mathcal{I_0} = \{S\}

The only applicable rule is $S \rightarrow AB$ , thus,

\mathcal{I_1} = \{S, AB\}

For $AB$ , it has two applicable rules,

AB \rightarrow 0 A \\ BB \rightarrow 1 B

Thus,

\mathcal{I_2} = \{S, AB, 0A, 1B\}

For $0A$ , it has only one applicable rule,

A \rightarrow 1

And for $1B$ , there is only one applicable rule,

1B \rightarrow 0

Thus,

\mathcal{I_3} = \{S, AB, 0A, 1B, 0, 01\}

There are no more applicable rules, thus, the language generated by this grammar is,

\mathcal{L} = \mathcal{I} \cap \mathcal{V}^* = \{0, 01\}

The Second Example

info

For simplicity, if there exists,

\text{source} \rightarrow \text{target}_1 \\ \text{source} \rightarrow \text{target}_2 \\

We can shorthand it to,

\text{source} \rightarrow \text{target}_1 | \text{target}_2

It's also valid for more than two targets.

We choose,

\mathcal{V} = \{0\} \\ \mathcal{N} = \{S\} \\ \mathcal{R} = \{ S \rightarrow 0 S | 0\}

Because $S \in \mathcal{I}$ , then,

\mathcal{I_0} = \{S\}

Because the rules applicable are,

S \rightarrow 0 S \\ S \rightarrow 0 \\

Thus,

\mathcal{I_1} = \{S, 0S, 0\}

Because $0^nS$ is always applicable to the rule,

S \rightarrow 0 S \\ S \rightarrow 0 \\

So if $0^nS \in \mathcal{I}$ , then $0^{n+1}S \in \mathcal{I}$ and $0^{n+1} \in \mathcal{I}$ .

And there exists $0S$ , so,

\mathcal{I_2} = \{S, 0^n, 0^nS | n \geq 1\}

\mathcal{L} = 0^n \{S, \lambda\}

Chomsky Hierarchy

Chomsky classify the grammars based on the form of rules. From lower types to higher types, there will be more and more restrictions on the rules, and the grammar will be simpler and simpler, so is the language.

In our book, we only talk about infinite languages. That is, $|\mathcal{L}| = +\infty$ .

Type-0 Recursive Enumerable Grammars and Languages

If a language can be generated by a recursive enumerable grammar, then it is of type-0 or a recursive enumerable language.

Type-0 does not enforce any constrain on the rules. All grammars are type-0. That is to say, all languages are type-0.

Type-1 Context Sensitive Grammars and Languages

If a language can be generated by a context sensitive grammar, then it is of type-1 or a context sensitive language.

tip

Please note that if a language can be generated by type- $n$ grammar, then it is also a type- $n$ language. This mean that, for a language that generated from type- $m$ where $m > n$ , there is possibility that we chose a stupid grammar that can be simplified to a higher type. Thus type- $m$ grammar doesn't always generates type- $m$ language.

It enforces all rules to be of the form,

x A y \rightarrow x z y

Where $|z| > 0$ , but $x$ and $y$ can be empty, $A$ is any non-terminal symbol.

info

For convenience, we sometimes allow a special rule,

S \rightarrow \lambda

If there exists no rules has $\text{target}$ that contains $S$ . This is called a nullable context sensitive language. The only different the language has from the context sensitive language is that allows $\lambda \in \mathcal{L}$ .

This is obvious because we forbid $S$ to exists on the right side, and thus if we remove $S \rightarrow \lambda$ , we get a non-nullable context sensitive language that has no $S$ in any sentence except for the single symbol sentence $S$ .

After adding the rule $S \rightarrow \lambda$ . We can only yield one new sentence $\lambda$ .

tip

Nullable languages allows null sentence and the rule $S \rightarrow \lambda$ , but doesn't enforce it.

It is also true for the nullable regular language.

Type-2 Context Free Grammars and Languages

If a language can be generated by a context free grammar, then it is of type-2 or a context free language.

A context free grammar requires,

A \rightarrow x

Where $A$ is a non-terminal symbol, and $x$ is any sentence.

info

Context free grammar allows null generation.

Type-3 Regular Grammars and Languages

If a language can be generated by a regular grammar, then it is of type-3 or a regular language.

A regular grammar requires,

A \rightarrow aB

A \rightarrow a

Where $A$ and $B$ are non-terminal symbols, and $a$ is a symbol.

info

Nullable regular grammar adds,

S \rightarrow \lambda

And no rule yields a sentence with $S$ .

This is the nullable regular language and nullable regular grammar.

Nullable regular language only differentiates itself from regular language in that it allows a null sentence.

tip

All finite languages are regular.

Grammar​

Vocabulary​

Non-Terminal​

Rules​

Start Symbol​

Deduced Language​

Example​

The First Example​

The Second Example​

Chomsky Hierarchy​

Type-0 Recursive Enumerable Grammars and Languages​

Type-1 Context Sensitive Grammars and Languages​

Type-2 Context Free Grammars and Languages​

Type-3 Regular Grammars and Languages​

Grammar

Vocabulary

Non-Terminal

Rules

Start Symbol

Deduced Language

Example

The First Example

The Second Example

Chomsky Hierarchy

Type-0 Recursive Enumerable Grammars and Languages

Type-1 Context Sensitive Grammars and Languages

Type-2 Context Free Grammars and Languages

Type-3 Regular Grammars and Languages