Skip to main content

Grammar and Language Hierarchy

Grammar

A grammar contains four parts,

  • Vocabulary: A set of symbols, noted as V\mathcal{V}.
  • Non-Terminal: A set of symbols, noted as N\mathcal{N}.
  • Rules: A set of rules, noted as R\mathcal{R}.
  • Start Symbol: A symbol. Usually just SS.

A grammar can generate a language.

Let's demonstrate one by one.

Vocabulary

A vocabulary is a set of symbols. They are the vocabulary of the final sentence.

Non-Terminal

The non-terminal is another set of symbols. We usually use capitalized letters for non-terminals. They are used to construct rules.

Rules

A rule has two part, the left part is called the source, and the right part is called the target.

The source should be from the language (V+N)N(V+N)(\mathcal{V} + \mathcal{N})^*N(\mathcal{V} + \mathcal{N})^*, to have at least one non-terminal, whereas the target should be from language (V+N)(\mathcal{V} + \mathcal{N})^*.

We usually note a rule as,

sourcetarget\text{source} \rightarrow \text{target}

There must be a finite number of rules,

R<|\mathcal{R}| < \infty

Start Symbol

Start symbol is just a non-terminal symbol. Usually just SS.

Deduced Language

A grammar is sufficient to deduce a language, which we call the deduced language, the L\mathcal{L}, LV\mathcal{L} \sub \mathcal{V}^*.

We define a language called the intermediate language, the I\mathcal{I}. It is a subset of the language (V+N)(\mathcal{V} + \mathcal{N})^*.

Without condition, SIS \in \mathcal{I}. Sentence with only a start symbol is always a valid sentence in the intermediate language.

For any rule in the R\mathcal{R}, if a sentence uu is in the intermediate language set I\mathcal{I}, and it has a substring that matches the source\text{source} of the rule, then by replacing the source\text{source} with target\text{target}, we will have a new sentence vv. We define vIv \in \mathcal{I}.

We note this as,

v=sub(u,source,target)v = \text{sub}(u, \text{source},\text{target})

This process from uu to vv is called derivation, noted as,

uvu \Rightarrow v

If the derivation repeats nn times, we note,

unvu \Rightarrow^n v

For any times, use *, for more than one times, use +^+.

uvu \Rightarrow^* v u+vu \Rightarrow^+ v

That is to say,

I={uSu}\mathcal{I} = \{u | S \Rightarrow^* u\}

After we get I\mathcal{I},

L=IV\mathcal{L} = \mathcal{I} \cap \mathcal{V}^*

Example

The conept is a bit abstract. We now demonstrate how we can deduce a language.

The First Example

We choose,

V={0,1}N={S,A,B}R={SABAB0AA1AB1B1B0}\mathcal{V} = \{0, 1\} \\ \mathcal{N} = \{S, A, B\} \\ \mathcal{R} = \{ \\ S \rightarrow A B \\ \quad AB \rightarrow 0 A \\ \quad A \rightarrow 1 \\ \quad AB \rightarrow 1 B \\ \quad 1B \rightarrow 0 \\ \}

To look for the language this grammar generates, we first get the complete set of intermediate language,

Initially,

I0={S}\mathcal{I_0} = \{S\}

The only applicable rule is SABS \rightarrow AB, thus,

I1={S,AB}\mathcal{I_1} = \{S, AB\}

For ABAB, it has two applicable rules,

AB0ABB1BAB \rightarrow 0 A \\ BB \rightarrow 1 B

Thus,

I2={S,AB,0A,1B}\mathcal{I_2} = \{S, AB, 0A, 1B\}

For 0A0A, it has only one applicable rule,

A1A \rightarrow 1

And for 1B1B, there is only one applicable rule,

1B01B \rightarrow 0

Thus,

I3={S,AB,0A,1B,0,01}\mathcal{I_3} = \{S, AB, 0A, 1B, 0, 01\}

There are no more applicable rules, thus, the language generated by this grammar is,

L=IV={0,01}\mathcal{L} = \mathcal{I} \cap \mathcal{V}^* = \{0, 01\}

The Second Example

info

For simplicity, if there exists,

sourcetarget1sourcetarget2\text{source} \rightarrow \text{target}_1 \\ \text{source} \rightarrow \text{target}_2 \\

We can shorthand it to,

sourcetarget1target2\text{source} \rightarrow \text{target}_1 | \text{target}_2

It's also valid for more than two targets.

We choose,

V={0}N={S}R={S0S0}\mathcal{V} = \{0\} \\ \mathcal{N} = \{S\} \\ \mathcal{R} = \{ S \rightarrow 0 S | 0\}

Because SIS \in \mathcal{I}, then,

I0={S}\mathcal{I_0} = \{S\}

Because the rules applicable are,

S0SS0S \rightarrow 0 S \\ S \rightarrow 0 \\

Thus,

I1={S,0S,0}\mathcal{I_1} = \{S, 0S, 0\}

Because 0nS0^nS is always applicable to the rule,

S0SS0S \rightarrow 0 S \\ S \rightarrow 0 \\

So if 0nSI0^nS \in \mathcal{I}, then 0n+1SI0^{n+1}S \in \mathcal{I} and 0n+1I0^{n+1} \in \mathcal{I}.

And there exists 0S0S, so,

I2={S,0n,0nSn1}\mathcal{I_2} = \{S, 0^n, 0^nS | n \geq 1\} L=0n{S,λ}\mathcal{L} = 0^n \{S, \lambda\}

Chomsky Hierarchy

Chomsky classify the grammars based on the form of rules. From lower types to higher types, there will be more and more restrictions on the rules, and the grammar will be simpler and simpler, so is the language.

In our book, we only talk about infinite languages. That is, L=+|\mathcal{L}| = +\infty .

Type-0 Recursive Enumerable Grammars and Languages

If a language can be generated by a recursive enumerable grammar, then it is of type-0 or a recursive enumerable language.

Type-0 does not enforce any constrain on the rules. All grammars are type-0. That is to say, all languages are type-0.

Type-1 Context Sensitive Grammars and Languages

If a language can be generated by a context sensitive grammar, then it is of type-1 or a context sensitive language.

tip

Please note that if a language can be generated by type-nn grammar, then it is also a type-nn language. This mean that, for a language that generated from type-mm where m>nm > n, there is possibility that we chose a stupid grammar that can be simplified to a higher type. Thus type-mm grammar doesn't always generates type-mm language.

It enforces all rules to be of the form,

xAyxzyx A y \rightarrow x z y

Where z>0|z| > 0, but xx and yy can be empty, AA is any non-terminal symbol.

info

For convenience, we sometimes allow a special rule,

SλS \rightarrow \lambda

If there exists no rules has target\text{target} that contains SS. This is called a nullable context sensitive language. The only different the language has from the context sensitive language is that allows λL\lambda \in \mathcal{L}.

This is obvious because we forbid SS to exists on the right side, and thus if we remove SλS \rightarrow \lambda, we get a non-nullable context sensitive language that has no SS in any sentence except for the single symbol sentence SS.

After adding the rule SλS \rightarrow \lambda. We can only yield one new sentence λ\lambda.

tip

Nullable languages allows null sentence and the rule SλS \rightarrow \lambda, but doesn't enforce it.

It is also true for the nullable regular language.

Type-2 Context Free Grammars and Languages

If a language can be generated by a context free grammar, then it is of type-2 or a context free language.

A context free grammar requires,

AxA \rightarrow x

Where AA is a non-terminal symbol, and xx is any sentence.

info

Context free grammar allows null generation.

Type-3 Regular Grammars and Languages

If a language can be generated by a regular grammar, then it is of type-3 or a regular language.

A regular grammar requires,

AaBA \rightarrow aB

Or

AaA \rightarrow a

Where AA and BB are non-terminal symbols, and aa is a symbol.

info

Nullable regular grammar adds,

SλS \rightarrow \lambda

And no rule yields a sentence with SS.

This is the nullable regular language and nullable regular grammar.

Nullable regular language only differentiates itself from regular language in that it allows a null sentence.

tip

All finite languages are regular.