Grammar and Language Hierarchy
Grammar
A grammar contains four parts,
- Vocabulary: A set of symbols, noted as .
- Non-Terminal: A set of symbols, noted as .
- Rules: A set of rules, noted as .
- Start Symbol: A symbol. Usually just .
A grammar can generate a language.
Let's demonstrate one by one.
Vocabulary
A vocabulary is a set of symbols. They are the vocabulary of the final sentence.
Non-Terminal
The non-terminal is another set of symbols. We usually use capitalized letters for non-terminals. They are used to construct rules.
Rules
A rule has two part, the left part is called the source, and the right part is called the target.
The source should be from the language , to have at least one non-terminal, whereas the target should be from language .
We usually note a rule as,
There must be a finite number of rules,
Start Symbol
Start symbol is just a non-terminal symbol. Usually just .
Deduced Language
A grammar is sufficient to deduce a language, which we call the deduced language, the , .
We define a language called the intermediate language, the . It is a subset of the language .
Without condition, . Sentence with only a start symbol is always a valid sentence in the intermediate language.
For any rule in the , if a sentence is in the intermediate language set , and it has a substring that matches the of the rule, then by replacing the with , we will have a new sentence . We define .
We note this as,
This process from to is called derivation, noted as,
If the derivation repeats times, we note,
For any times, use , for more than one times, use .
That is to say,
After we get ,
Example
The conept is a bit abstract. We now demonstrate how we can deduce a language.
The First Example
We choose,
To look for the language this grammar generates, we first get the complete set of intermediate language,
Initially,
The only applicable rule is , thus,
For , it has two applicable rules,
Thus,
For , it has only one applicable rule,
And for , there is only one applicable rule,
Thus,
There are no more applicable rules, thus, the language generated by this grammar is,
The Second Example
For simplicity, if there exists,
We can shorthand it to,
It's also valid for more than two targets.
We choose,
Because , then,
Because the rules applicable are,
Thus,
Because is always applicable to the rule,
So if , then and .
And there exists , so,
Chomsky Hierarchy
Chomsky classify the grammars based on the form of rules. From lower types to higher types, there will be more and more restrictions on the rules, and the grammar will be simpler and simpler, so is the language.
In our book, we only talk about infinite languages. That is, .
Type-0 Recursive Enumerable Grammars and Languages
If a language can be generated by a recursive enumerable grammar, then it is of type-0 or a recursive enumerable language.
Type-0 does not enforce any constrain on the rules. All grammars are type-0. That is to say, all languages are type-0.
Type-1 Context Sensitive Grammars and Languages
If a language can be generated by a context sensitive grammar, then it is of type-1 or a context sensitive language.
Please note that if a language can be generated by type- grammar, then it is also a type- language. This mean that, for a language that generated from type- where , there is possibility that we chose a stupid grammar that can be simplified to a higher type. Thus type- grammar doesn't always generates type- language.
It enforces all rules to be of the form,
Where , but and can be empty, is any non-terminal symbol.
For convenience, we sometimes allow a special rule,
If there exists no rules has that contains . This is called a nullable context sensitive language. The only different the language has from the context sensitive language is that allows .
This is obvious because we forbid to exists on the right side, and thus if we remove , we get a non-nullable context sensitive language that has no in any sentence except for the single symbol sentence .
After adding the rule . We can only yield one new sentence .
Nullable languages allows null sentence and the rule , but doesn't enforce it.
It is also true for the nullable regular language.
Type-2 Context Free Grammars and Languages
If a language can be generated by a context free grammar, then it is of type-2 or a context free language.
A context free grammar requires,
Where is a non-terminal symbol, and is any sentence.
Context free grammar allows null generation.
Type-3 Regular Grammars and Languages
If a language can be generated by a regular grammar, then it is of type-3 or a regular language.
A regular grammar requires,
Or
Where and are non-terminal symbols, and is a symbol.
Nullable regular grammar adds,
And no rule yields a sentence with .
This is the nullable regular language and nullable regular grammar.
Nullable regular language only differentiates itself from regular language in that it allows a null sentence.
All finite languages are regular.