Context Free Grammar
Context free grammar allows rules in the form of,
Where , .
Leftmost and Rightmost Derivations
Because a derivation operation allows us to replace any substring that is applicable to a rule, which results in uncertainty, we must have a deterministic way to derivate a sentence.
The leftmost derivation force us to replace the leftmost non-terminal symbol in the sentence. The rightmost derivation force us to replace the rightmost non-terminal symbol in the sentence.
Because we only change the order of the derivation, if only leftmost derivation or only rightmost derivation is allowed, the grammar generates the same language as the original grammar without restriction.
For example,
Is a context free grammar that can generate paired parentheses.
Let's consider the sequence,
There are many ways to generate this sequence, for example,
We didn't choose the non-terminal to replace on the leftmost side nor the rightmost side. Thus, it is neither leftmost nor rightmost derivation.
But the following sequence,
Is a leftmost derivation sequence.
Of course, the leftmost derivation sequence isn't always unique, consider,
However, since leftmost or rightmost derivation doesn't change the yielded language, we usually choose one so that it makes the analysis easier.
Parse Tree
Parse tree is another way to represent the derivation of a sentence. We can use parse tree to prove that leftmost and rightmost derivation has the same result as the original derivation.
The root of a parse tree is always the starting symbol.
The leaves of a parse tree is always the terminal symbols.
The internal nodes of a parse tree is always the non-terminal symbols.
If a non-terminal generates sequence , then each letter in is a child of the non-terminal node (in oreder).
For,
We have the following parse tree,
Because the we can traverse through the tree with DFS in any order (left first or right first), so the leftmost and rightmost derivation has the same result, so is for derivation without restriction.
Ambiguity
If a sentence can be generated in more than one way using leftmost or rightmost derivation, then the grammar is ambiguous.
For sequence , we listed two different leftmost derivation,
Thus, the grammar,
Is ambiguous.
Ambiguous grammar can be defined in any of the following ways,
- Existing a sequence that has more than one ways to derive with leftmost or rightmost derivation.
- Existing a sequence that can be generated by more than one parse tree.
Please note that ambiguous grammar doesn't mean ambiguous language. The same language can be generated with an unambiguous grammar,
This is an unambiguous grammar.
After the next chapter, you will see that this is actually an grammar that generates valid parentheses.
However, things aren't always so nice- some languages has it's inherent ambiguity. That all grammar yields it is ambiguous. For example,
Is an inherently ambiguous language.
Chomsky Normal Form
CNF and Nullable CNF
The context free grammar has too little restriction, that it is hard to study. Instead, we use an equal form called Chomsky Normal Form (CNF).
CNF grammar enforce rules to be in either of
For CFG , there always exist an CNF that,
We can also use nullable CNF, that is, we specifically allow,
And enforce no other rule has as the target. Then,
Reduction into Chomsky Normal Form
To convert a CFG to CNF, we follow the given steps. The following step is logically valid and can be proven to have the listed properties on the generated language, but proving them is too trivial and grinding- so this note only shows you how to do it.
The reduction will yield a CNF that generates the same language but may be without the .
Please notice that the following steps must be done in order. Or else some non-CNF rules may exist after the steps.
Removing Empty Production
We find all,
Then we remove all of them.
Removing Unit Production
A unit production is a rule of the form,
That a non-terminal only generates a new non-terminal.
If,
We abridge to,
This should be done recursively, until there is no more unit production.
Removing Dead Non-Terminal
Some non-terminal may never yield a terminal character. For example,
And there are no rule,
Then is a dead non-terminal. We should just remove all rules that yields string that contains .
Removing Unreachable Non-Terminal
After our previous steps, some non-terminals may become out of reach from the start symbol. We simply remove them and any rules that contains them.
Decomposing Non-Terminal
For every rule that generates more than two symbols, for example,
Where can either be terminal or non-terminal, .
We decompose it to,
We repeat this process until the grammar is a CNF grammar.
Example
That was a hell lot of things to memorize. Let's look at an example,
Removing Empty Production
The only empty production is,
We simply remove it, so that,
Removing Unit Production
The unit production here includes,
We can draw a tree based on the generated-by relation,
After abbreviating ,
Then ,
Then ,
Then ,
Removing Dead Non-Terminal
Here, never yields a fully terminal string, so we remove all rules that contains .
Removing Unreachable Non-Terminal
Now we draw a graph based on the generated-by relation of all non-terminals.
We can see there is no path from to , so we remove and all the rules that contains it.
Decomposing Non-Terminal
Now we look at all the rules that go against CNF, that is,
We break it by adding intermediate non-terminals,
Now this is the CNF of the original grammar.
Later, our analysis will be basedon the CNF form of the CFG.