LL(1) language has good enough properties for us to write a simple parser. However, LL(1) still has too many restrictions and sometimes not expressive enough. An alternative approach would be to use LR(1) grammar, which is left starting, rightmost derivation and look ahead of one symbol. Using LR(1), for any given sequence, we can derivate a leftmost reduction path, thus constructing a parse tree using rightmost derivation.
However, LR(1) is too complex sometimes, we may also use SLR(1), simplified version of LR(1), or LALR(1), which is a combination of LR(1) and SLR(1). We also introduce LR(0).
OPG Parser
OPG Parser is a simplified, and in a sense, reversed PDA that has only one state. All CFG can be accepted by a single-node PDA.
In the last section, when converting from CFG to PDA, we constructed a single node PDA that can accept all CFG.
When constructing PDA from CFG, we see that one-state PDA has the same expressive power as multiple-states. This is true with empty-stack-equals-acceptance. For accept state set, we need two states, one for the starting, one for the end, so that we can have the same expressive power.
Two states with accept state set basically have an extra rule that when the stack is empty, go to the accept state.
OPG parser is based on deduction. It has four possible actions,
- Move to the next input symbol, and put it into its stack.
- Pop a few elements from the stack, deduce it into a single non-terminal, then put the non-terminal into the stack.
- If there only left start symbol in the stack and the inputs are exhausted, accept.
- The inputs have exhausted but there are more signs in the stack other than the start symbol, and the machine can not do any more deduction, reject.
This basically is a PDA that walks backward from the end. But in another sense, similar to what we learnt back in the left linear grammar, this walking-backward can be converted into a normal PDA by tweaking the rules.
Nonetheless, we will not focus on how LR parser works as a PDA. It's not the focus.
Now, we have only one issue- how can we define which action the parser should take? Accept and reject are easy to judge, but when should a deduction or a move happen?
Operator Precedence Grammar
OPG is a grammar that, besides what common grammar possesses, it also has a relation, that for any two terminals, a,b∈V, one of, a≡b, a<b, or a<b exists.
Our notion DOES NOT IMPLY PARTIAL ORDER. That is,
a>bDOES NOT IMPLY a<b.
a≡a is also NOT ALWAYS TRUE.
We define the order as, (for grammar that doesn't produce λ),
- a≡b if and only if there exists rules like A→…ab… or A→…aBb….
- a<b if and only if there exists A→…aB… and B⇒+b… or B⇒+bC….
- a>b if and only if there exists A→…Bb… and B⇒+…a or B⇒+…aC.
A bit hard to understand. This order is actually the order of reduction. If we have,
S→BbB→aCC→cFor any deduction, the a is guaranteed to disappear into non-terminals after b does, so that a>b.
For example, consider deducting the string acb,
acb⇐aCb⇐Bb⇐Sa always disappear before b.
If a and b is never adjacent (ignoring the non-terminals) in any rules, we can assign arbitrary order to them.
If you can define two relations on one pair, then it means that the grammar isn't OPC, you should consider refactoring.
For example, consider the following grammar that yields mathematical expression with only multiplication and addition,
S→EE→E+EE→E∗EE→(E)E→0∣1∣2∣3∣4∣5∣6∣7∣8∣9
Because of,
E→E+EE→E∗E
So +>∗ and +<∗ is valid at the both time, so, it's not OPC.
However, we can rewrite it into,
S→EE→E+T∣TT→T∗F∣FF→(E)∣0∣1∣2∣3∣4∣5∣6∣7∣8∣9
Here, + is always reduced after ∗, and thus ∗>+ is valid. Thus, it is an OPC.
Construct Precedence Table
FIRSTOP and LASTOP
We can't iterate through every rule to get the precedence. Here is a systematic algorithm.
Define the FIRSTOP and LASTOP of a non-terminal A. The first being all possible terminals that can be the first terminals of all the sentences derivate from A, and the latter is the last. To formulate it formally,
FIRSTOP(A)={b∣A⇒+b…∨A⇒+Bb…B∈N,b∈V}LASTOP(A)={b∣A⇒+…b∨A⇒+…CbB∈N,b∈V}
Then, for,
A→…ab…
There must be a≡b.
For,
A→…aB…
∀b∈FIRSTOP(B),a<b.
For,
A→…Ba…
∀b∈LASTOP(B),a>b.
Now the problem is how to obtain FIRSTOP and LASTOP.
Because we have,
If,
A→B…
Then, FIRSTOP(B)⊂FIRSTOP(A).
If,
A→…B
Then, LASTOP(B)⊂LASTOP(A).
That tells us that we should construct FIRSTOP and LASTOP in topological order.
For the grammar,
S→EE→E+T∣TT→T∗F∣FF→(E)∣0∣1∣2∣3∣4∣5∣6∣7∣8∣9
We shorten it into,
S→EE→E+T∣TT→T∗F∣FF→(E)∣n
And treat n as a terminal for convenience.
We draw a graph based on first-generate relationship. That is,
In the topological order, we first need FIRSTOP.
FIRSTOP(F)={n,(}
And we walk up the graph, because there is the first terminals generated by T is only ∗, and T→F,
FIRSTOP(T)=FIRSTOP(F)+{∗}={n,(,∗}
And,
FIRSTOP(E)=FIRSTOP(T)+{+}={n,(,∗,+}
FIRSTOP(S)=FIRSTOP(E)={n,(,∗,+}
Now, draw another graph based on last-generate relationship, which, here, happens to be exactly identical to the FIRSTOP graph,
And,
LASTOP(F)={n,)}LASTOP(T)={∗,n,)}LASTOP(E)={+,∗,n,)}LASTOP(S)={+,∗,n,)}
And we are done.
Determining Precedence
Because our deduction concerns the end-of-text symbol S, we add an extra rule,
S′→SSS
The use S′ as the starting symbol, so that the precedence of S will come out naturally.
Now, for rule,
A→X0X1…Xn−1
Where Xi∈N∪V
- If XiXi+1∈VV, then Xi≡Xi+1.
- If XiXi+1Xi+2∈VNV, then Xi≡Xi+2.
- If XiXi+1∈VN, then ∀b∈FIRSTOP(Xi+1),Xi<b.
- If XiXi+1∈NV, then ∀b∈LASTOP(Xi),b>Xi+1.
For convenience, we note,
b>S
Where S is a set as a shorthand for,
∀a∈S,b>a
And this shorthand is also used for < and ≡.
We had,
FIRSTOP(F)={n,(}FIRSTOP(T)={∗,n,(}FIRSTOP(E)={+,∗,n,(}FIRSTOP(S)={+,∗,n,(}LASTOP(F)={n,)}LASTOP(T)={∗,n,)}LASTOP(E)={+,∗,n,)}LASTOP(S)={+,∗,n,)}
And rules,
S′→SSSS→EE→E+T∣TT→T∗F∣FF→(E)∣n
Unit productions like S→E, E→T, and T→F doesn't yield any precedence information.
S′→SSS tells us that, S<FIRSTOP(S), LASTOP(S)>S, and S≡S.
E→E+T tells us that, LASTOP(E)>+, +<FIRSTOP(T).
T→T∗F tells us that, LASTOP(T)>∗, ∗<FIRSTOP(F).
F→(E) tells us that, (≡), (<FIRSTOP(E) and LASTOP(E)>).
Make them into a table,
| + | ∗ | ( | ) | n | S |
---|
+ | > | < | < | > | < | > |
∗ | > | > | < | > | < | > |
( | < | < | < | ≡ | < | > |
) | > | > | - | > | - | - |
n | > | > | - | > | - | > |
S | < | < | - | < | < | ≡ |
− denotes there doesn't exist a precedence relation.
Parser
The parser of OPG is a monotonically increasing stack machine. If the newly pushed symbol violates the precedence relation, we perform a deduction.
We had,
S′→SSSS→EE→E+T∣TT→T∗F∣FF→(E)∣n
| + | ∗ | ( | ) | n | S |
---|
+ | > | < | < | > | < | > |
∗ | > | > | < | > | < | > |
( | < | < | < | ≡ | < | > |
) | > | > | - | > | - | - |
n | > | > | - | > | - | > |
S | < | < | - | < | < | ≡ |
An OPG parser works in the following way, conside the next symbol s,
- If the current stack looks like xi(<or≡)xi+1(<or≡)⋯(<or≡)xk(<or≡)s, then we push s into the stack. Please note that xi are the terminals in the stack. We ignore the non-terminals.
- If the current stack looks like xi<xi+1(<or≡)⋯(<or≡)xk>s, then we perform a deduction by finding the leftmost xi such that xixi+1…xk match the source of a rule, and then deduce it into a single non-terminal. This should be done recursively until there exists no such xi. Then we can push the s into the stack.
Let's see how the machine would parse n+n∗n.
Firstly, it adds special symbols, Sn+n∗nS. For convenience, we note the input sequence as s0s1…si↓si+1…sn. The part before the ↓ is the part that the parser has taken in, and the part after is the remaining part. The next symbol to handle will be si+1.
Input Sequence | Stack SnapShot | Action | Reduction | Reason |
---|
↓Sn+n∗nS | | Shift | | Empty Stack |
…↓n+n∗nS | S | Shift | | S<n |
…↓+n∗nS | Sn | Reduce | n+n∗n⇐F+n∗n | S<n>+ |
…↓+n∗nS | SF | Reduce | F+n∗n⇐T+n∗n | - |
…↓+n∗nS | ST | Reduce | T+n∗n⇐E+n∗n | - |
…↓+n∗nS | SE | Shift | | S<+ |
…↓n∗nS | SE+ | Shift | | S<+<n |
…↓∗nS | SE+n | Reduce | E+n∗n⇐E+F∗n | S<+<n>∗ |
…↓∗nS | SE+F | Reduce | E+F∗n⇐E+T∗n | - |
…↓∗nS | SE+T | Shift | | S<+<∗ |
…↓nS | SE+T∗ | Shift | | S<+<∗<n |
…↓S | SE+T∗n | Reduce | E+F∗n⇐E+T∗F | S<+<∗<n>S |
…↓S | SE+T∗F | Reduce | E+T∗F⇐E+T | S<+<∗>S |
…↓S | SE+T | Reduce | E+T⇐E | S<+>S |
…↓S | SE | Reduce | E⇐S | - |
…↓S | SS | Shift | | S≡S |
λ | SSS | | | |
So we have the reduction sequence.
However, in some rows, we gave the reason −. This is because, in those rows, there exists conflicts. That is, at such states, whether,
- You can either shift or continue reduction. This is called the shift-reduce conflict.
- There are multiple source that you can reduce into. This is called the reduce-reduce conflict.
In the table above, we resolved that by choosing the option that the input can be parsed- we can do it in simple cases, but machines can't do so.
This issue is caused by the nature of PDA- PDA is a e-NFA with a stack. In our discussion above, to reduce means to walk back in the PDA. However, because e-NFA is non-deterministic, thus allowing multiple transitions- and multiple transition means conflicts, which, in theory, is not a big deal, yet in practice, will cause the rise of time complexity.
We will later see that by replacing the e-NFA with a DFA, that is, a DFA with a stack, we can solve this issue elegantly- albeit it imposes more restrictions on the grammar.
In practice, we need to do backtrace to find the only possible reduction sequence encountered with such conflicts. However, more commonly, we design the grammar so that there exists easier ways to decide the action in conflicts.
Look Ahead LR
The basic idea of LR(k) is to look ahead k characters so that we can decide on the action. Just like what we did with LL(1), where, by looking a head of one character, we can decide how to derivate the sentence (If the grammar is LL(1)).
The parser or LR is identical to OPG, except that it has it's own rules to resolve conflict, unlike OPG, which requires back tracing.
Let's consider the following grammar,
S→EE→E+EE→n
And let's perform leftmost reduction (rightmost derivation) for n+n.
n+n⇐E+n⇐E+E⇐E⇐S
Our ideal parser should mimic this behavior,
Stack Snapshot | Input | Action |
---|
S | S↓n+nS | Shift |
Sn | Sn↓+nS | Reduce |
SE | Sn↓+nS | Shift |
SE+ | Sn+↓nS | Shift |
SE+n | Sn+n↓S | Shift |
SE+E | Sn+n↓S | Reduce |
SE | Sn+n↓S | shift |
SES | …↓ | - |
The parser should decide the action based on the stack and the next k characters.
Live Prefix
Suppose we have a CFG and a rightmost derivation sequence αi⇒rmβi.
We call a string s a handle if the reduction of a sequence is as follows,
xsy⇐xAy
For any intermediate result in a rightmost derivation sequence, the βi. It's prefix p is called a live sequence if, p does not contain any character to the right side of the handle of βi.
For example,
S→AA→0A1∣01
For the reduction sequence,
0011⇐0A1⇐A⇐S
In the step 0011⇐0A1,
- 01 is the handle because A→01 was used.
- λ, 0, 00, 001 are all live prefixes
Let's consider the language,
LP={p∣p is a live prefix of any sentence u such that S⇒∗u}
We can prove that LP is a regular language
TODO: ADD THE PROOF
And RL can be recognized by a DFA. And thus, we can design the LR parser to be,
- A DFA that recognizes the live prefix.
- A set of rules that tells which rule should be applied, given the next k character and the state of the DFA.
LR Parser
The real LR parser is designed in a easier way to implement in programming, that is,
- A State stack that is used to record the state of the DFA
- A symbol stack that contains the current string to be reduced
- An action table ACTION(S,a) that tells the parser, if the top of the state stack is S, and the input symbol is a, which action should the parser take. Specifically, we note acc as the accept action.
- A goto table GOTO(S,X) that tells the parser, if the top of the state stack is S, and the top of the symbol stack is X, which state the parser should move into.
The different parser uses different means to construct the action table and the goto table. Common ways are LR(0), SLR(1), LR(1).
LR(0) Parser
Item and Its Closure
We first define two concepts, item and item closure.
An item is a rule that has special note on the right side. We use ⇓. That is, for example, from the rule,
A→aBc
We can get the following items,
A→⇓aBcA→a⇓BcA→aB⇓cA→aBc⇓
Then we define its closure CLOSURE(I) as all the possible reduction from the current state, that is,
∀r=A→α⇓X where r∈Cn−1append(Cn,r)If X=Bβ′ where B∈N,∀B→γ,append(Cn,B→⇓γ)CLOSURE(I)=C+∞
For example, consider the grammar,
S′→SSSS→CCC→cC∣d
Consider the closure of S′→S⇓SS
C0={S′→S⇓SS}C1={S′→⇓SS→⇓CC}C2={S′→S⇓SSS→⇓CCC→⇓cCC→⇓d}
Repeating the process doesn't generate more items, so, CLOSURE(S→S⇓SS)=C2
Generate the Table
For a CFG grammar. Firstly, we add the extra grammar S′→SSS, and use S′ as the starting symbol.
The idea here is similar to converting NFA to DFA. Basically, we need to find all the possible occurring items.
Firstly, we have I0=CLOSURE({S′→S⇓SS}). This is the first state.
THen, suppose we have items,
In={A→α⇓Xβ}
Where X represents a non-terminal or a terminal.
Then, we would like to look for, for example, GOTO(In,Z) or ACTION(In,Z). If Z is a terminal, it is an action. Or else, a goto.
We first create a set In+1,Z′, and then,
∀A→α⇓Xβ∈In∧X=Zappend(In+1,Z′,A→αX⇓β)
We define the new state In+1,Z=CLOSURE(In+1,Z′), and GOTO(In,Z)=In+1,Z or ACTION(In,Z)=In+1,Z.
If In+1,Z exists already, you should use the old state. This should be repeated until there is no new state.
For example, we have CFG,
S′→SSSS→CCC→cC∣d
And the initial state,
I0=CLOSURE({S′→S⇓SS})={S′→S⇓SSS→⇓CCC→⇓cCC→⇓d}
So,
I1,S=CLOSURE({S′→SS⇓S})={S′→SS⇓S}I1,C=CLOSURE({S→C⇓C})={S→C⇓C,C→⇓cC,C→⇓d}I1,c=CLOSURE({C→c⇓C})={C→c⇓C,C→⇓cC,C→⇓d}I1,d=CLOSURE({C→d⇓})={C→d⇓}
I2,S,S=CLOSURE({S′→SSS⇓})={S′→SSS⇓}I2,C,d=CLOSURE({C→d⇓})=I1,dI2,C,C=CLOSURE({S→CC⇓})={S→CC⇓}I2,C,c=CLOSURE({C→c⇓C})=I1,cI2,c,c=CLOSURE({C→c⇓C})=I1,cI2,c,C=CLOSURE({C→cC⇓})={C→cC⇓}I2,c,d=CLOSURE({C→d⇓})=I1,d
Now, we have finished all the states.
If GOTO(In,Z)=In+1,Z, then we just goto the new I. This is done by pushing the new state to the state stack.
If ACTION(In,Z)=In+1,Z, if there exists A→α⇓xβ, then the action is push stack and goto the new state.
If In contains A→α⇓, then the action is reduce using that rule.
Specially, if there is S′→SS⇓S, then it's an accept.
We had,
I0=CLOSURE({S′→S⇓SS})={S′→S⇓SSS→⇓CCC→⇓cCC→⇓d}
So,
I1,S={S′→SS⇓S}I1,C={S→C⇓C,C→⇓cC,C→⇓d}I1,c={C→c⇓C,C→⇓cC,C→⇓d}I1,d={C→d⇓}
I2,S,S={S′→SSS⇓}I2,C,d=I1,dI2,C,C={S→CC⇓}I2,C,c=I1,cI2,c,c=I1,cI2,c,C={C→cC⇓}I2,c,d=I1,d
We label them with number. Please note that I2,S,S won't be present in the table because before this state, the parser has accepted.
Old Name | Number |
---|
I0 | 0 |
I1,S | 1 |
I1,C | 2 |
I1,c | 3 |
I1,d | 4 |
I2,C,C | 5 |
I2,c,C | 6 |
And we can conclude the ACTION and GOTO table. We note the action si as shift and move to state i. If we write i, it means to move to state i. If we write the rule, it means to reduce using that rule.
| S | c | d | S | C |
---|
0 | - | s3 | s4 | 1 | 2 |
1 | acc | - | - | - | - |
2 | - | s3 | s4 | - | 5 |
3 | - | s3 | s4 | - | 6 |
4 | C→d | C→d | C→d | - | - |
5 | S→CC | S→CC | S→CC | - | - |
6 | C→cC | C→cC | C→cC | - | - |
If your table looks the same as this one, that every action has a definite action or goto, then it is an LR(1) grammar. If there exists multiple reduction, there exists reduction-reduction conflict. If there exists both reduction and shift, there exists shift-reduction conflict.
Parsing example
| S | c | d | S | C |
---|
0 | - | s3 | s4 | 1 | 2 |
1 | acc | - | - | - | - |
2 | - | s3 | s4 | - | 5 |
3 | - | s3 | s4 | - | 6 |
4 | C→d | C→d | C→d | - | - |
5 | S→CC | S→CC | S→CC | - | - |
6 | C→cC | C→cC | C→cC | - | - |
Let's use this table to parse cdd.
Suppose we have a pointer that points to the next symbol in the sentence ready to parse.
When there exists remaining input, we make judgement based on the next character. If the pointer is already at S in the end, we use the top of symbol stack.
- By shifting, we mean to push the next input symbol and push the new state. We also move the pointer.
- By reducing, we mean to, pop the corresponding part out of the stack, and do a reduction. The reduced non-terminal will be inserted after the next pointer. In addition, it will also pop equal number of states to the number of symbols in the target of the rules. That is, if we have A→α, we pop ∣α∣ states out, pop ∣α∣ symbols out, and insert A after the pointer.
- By goto, we do the same as shifting, it's just that we are pushing a non-terminal.
We are always determining the action based on the next symbol and the current state.
Symbol Stack | State Stack | Input | Action |
---|
S | 0 | ↓cddS | s3 |
Sc | 03 | c↓ddS | s4 |
Scd | 034 | cd↓dS | C→d |
Sc | 03 | cd↓Cd | 6 |
ScC | 036 | cdC↓dS | C→cC |
S | 0 | cdC↓Cd | 2 |
SC | 02 | cdCC↓dS | s4 |
SCd | 024 | cdCCd↓S | C→d |
SC | 02 | cdCCd↓CS | 5 |
SCC | 025 | cdCCdC↓S | S→CC |
S | 0 | cdCCdC↓SS | 1 |
SS | 01 | cdCCdCS↓S | acc |
SLR(1)
SLR(1) and LR(1) are solutions to the case where there exists conflict in the LR table. We look at SLR(1) (Simplified LR(1)) first.
Consider the rules,
S′→SSSS→yAxA→a∣ab
We calculate its items.
I0=CLOSURE({S′→S⇓SS})={S′→S⇓SS,S→⇓yAx}GOTO(I0,S)=CLOSURE({S′→SS⇓S})={S′→SS⇓S}=I1ACTION(I0,y)=CLOSURE({S→y⇓Ax})={S→y⇓Ax,A→⇓a∣⇓ab}=I2GOTO(I2,A)=CLOSURE({S→yA⇓x})={S→yA⇓x}=I3ACTION(I2,a)=CLOSURE({A→a⇓∣a⇓b})={A→a⇓∣a⇓b}=I4ACTION(I3,x)=CLOSURE({S→yAx⇓})={S→yAx⇓}=I5ACTION(I4,b)=CLOSURE({A→ab⇓})={A→ab⇓}=I6
We have the table,
| S | S | A | a | b | x | y |
---|
0 | - | 1 | - | - | - | - | s2 |
1 | acc | - | - | - | - | - | - |
2 | - | - | 3 | s4 | - | - | - |
3 | - | - | - | - | - | s5 | - |
4 | A→a | A→a | A→a | A→a | A→a, s6 | A→a | A→a |
5 | S→yAx | S→yAx | S→yAx | S→yAx | S→yAx | S→yAx | S→yAx |
6 | A→ab | A→ab | A→ab | A→ab | A→ab | A→ab | A→ab |
You can see that at 4,b, there exists two possible actions, wether to shift and goto 6, or to reduce. This is a shift-reduction conflict. To resolve this conflict, we calculate the FOLLOW of A (Because the conflict happens to the reduction rule with the source symbol of A).
Obviously, the only rule that produces A is S→yAx, thus FOLLOW(A)={x}.
Encountered the case where the machine is at state 4, and the pointer to the next symbol b, we will peek ahead the new symbol n.
If n∈FOLLOW(A), we choose reduction because, will, n is a valid successor of the reduced A.
If n∈FOLLOW(A), we choose shifting.
Because we need to peek ahead one character, this is a SLR(1) grammar.
Suppose we have yabx to parse.
Symbol Stack | State Stack | Input | Action |
---|
S | 0 | ↓yabxS | s2 |
Sy | 02 | y↓abxS | s4 |
Sya | 024 | ya↓bxS | b∈FOLLOW(A) s6 |
Syab | 0246 | yab↓xS | A→ab |
Sy | 02 | yab↓AxS | 3 |
SyA | 023 | yabA↓xS | s5 |
SyAx | 0235 | yabAx↓S | S→yAx |
S | 0 | yabAx↓SS | 1 |
SS | 01 | yabAxS↓S | acc |
This is a case where we chose to shift. For yax, we choose to reduce.
Symbol Stack | State Stack | Input | Action |
---|
S | 0 | ↓yaxS | s2 |
Sy | 02 | y↓axS | s4 |
Sya | 024 | ya↓xS | x∈FOLLOW(A), A→a |
Sy | 02 | ya↓AxS | 3 |
SyA | 023 | yaA↓xS | s5 |
SyAx | 0235 | yaAx↓S | S→yAx |
S | 0 | yaAx↓SS | 1 |
SS | 01 | yaAxS↓S | acc |
As for reduction-reduction conflict, let's consider a row that looks like,
| x |
---|
i | A→α, B→α |
To choose which item to reduce do depends on the next symbol x
If x∈FOLLOW(A), reduce to A, or else reduce to B. So is it for multiple conflicts.
However, SLR(1) does not guarantee to eliminate all conflicts. For example, when multiple textFOLLOW overlaps, we can't choose which one to reduce.
LR(1)
LR(k) will not only peek the symbol ahead when conflict happens. It constantly look ahead.
We define an item in LR(k) as,
A→x⇓y,w
Where ∣w∣=k, and there exists a derivation such that S′⇒+αxywβ. That is, w can be a successor string of the handle A derivate.
In the case where k=1, that is to say,
A→x⇓y,f
Where f∈FOLLOW(A).
Then, utilizing the LR(1) item, we can build a new set in the same way as before- it's just that we need to distinguish the following characters. This will results in an increase in the states.
The way to get the closure is a bit different.
∀r=A→α⇓X,a where r∈Cn−1append(Cn,r)If X=Bβ′ where B∈N,∀B→γ,append(Cn,(B→⇓γ,FIRST(β′a)))CLOSURE(I)=C+∞
And we starts with I0=CLOSURE({(S′→S⇓SS,S)})
Let's use the grammar,
S′→SSSS→CCC→cC∣d
This is our previous grammar. Here it is just for demonstration.
We first calculate the FIRST and FOLLOW.
FIRST(C)={c,d}FIRST(S)=FIRST(C)={c,d}FOLLOW(S)={S}FOLLOW(C)=FOLLOW(S)+FIRST(C)={S,c,d}
I0=CLOSURE({(S′→S⇓SS,S)})={(S′→S⇓SS,S),(S→⇓CC,S),(C→⇓cC∣⇓d,{c,d})}
We use (C→⇓cC∣⇓d,{c,d}) as a shorthand. Don't treat it as one item. It is a shorthand of four.
GOTO(I0,S)=CLOSURE({(S′→SS⇓S,S)})={(S′→SS⇓S,S)}=I1GOTO(I0,C)=CLOSURE({(S→C⇓C,S)})={(S→C⇓C,S),(C→⇓cC∣⇓d,S)}=I2ACTION(I0,c)=CLOSURE({(C→c⇓C,{c,d})})={(C→c⇓C,{c,d}),(C→⇓cC∣⇓d,{c,d})}=I3ACTION(I0,d)=CLOSURE({(C→d⇓,{c,d})})={(C→d⇓,{c,d})}=I4GOTO(I2,C)=CLOSURE({(S→CC⇓,S)})={(S→CC⇓,S)}=I5ACTION(I2,c)=CLOSURE({(C→c⇓C,S)})={(C→c⇓C,S),(C→⇓cC∣⇓d,S)}=I6=I3ACTION(I2,d)=CLOSURE({(C→d⇓,S)})={(C→d⇓,S)}=I7=I4GOTO(I3,C)=CLOSURE({(C→cC⇓,{c,d})})={(C→cC⇓,{c,d})}=I8ACTION(I3,c)=CLOSURE({(C→c⇓C,{c,d})})={(C→c⇓C,{c,d}),(C→⇓cC∣⇓d,{c,d})}=I3ACTION(I3,d)=CLOSURE({(C→d⇓,{c,d})})={(C→d⇓,{c,d})}=I4GOTO(I6,C)=CLOSURE({(C→cC⇓,S)})={(C→cC⇓,S)}=I9=I8GOTO(I6,c)=CLOSURE({(C→c⇓C,S)})={(C→c⇓C,S),(C→⇓cC∣⇓d,S)}=I6GOTO(I6,d)=CLOSURE({(C→d⇓,S)})={(C→d⇓,S)}=I7
Now we have the table,
| S | C | c | d | S |
---|
0 | 1 | 2 | s3 | s4 | - |
1 | - | - | - | - | acc |
2 | - | 5 | s6 | s7 | - |
3 | - | 8 | s3 | s4 | - |
4 | (C→d⇓,{c,d}) | (C→d⇓,{c,d}) | (C→d⇓,{c,d}) | (C→d⇓,{c,d}) | (C→d⇓,{c,d}) |
5 | (S→CC⇓,S) | (S→CC⇓,S) | (S→CC⇓,S) | (S→CC⇓,S) | (S→CC⇓,S) |
6 | - | 9 | s6 | s7 | - |
7 | (C→d⇓,S) | (C→d⇓,S) | (C→d⇓,S) | (C→d⇓,S) | (C→d⇓,S) |
8 | (C→cC⇓,{c,d}) | (C→cC⇓,{c,d}) | (C→cC⇓,{c,d}) | (C→cC⇓,{c,d}) | (C→cC⇓,{c,d}) |
9 | (C→cC⇓,S) | (C→cC⇓,S) | (C→cC⇓,S) | (C→cC⇓,S) | (C→cC⇓,S) |
This is the table we need.