Romanian Language Technology

Mihai Ciocoiu, Ştefan Bruda * GULiveR: Generalized Unification Based LR Parser for Natural Languages

Fig. 1. - A parse table

2. Tomita's algorithm

In this section we will first have a look at how the standard LR parsing algorithm was extended to efficiently handle the non determinism occurring due to the conflicting entries in the LR parsing table and then, in the fourth section, we will show how the algorithm was modified for unification-based grammars.

2.1. The LR parsing algorithm

A LR Parser consists of a control program which is the same for all grammars, and a parse table with two components: ACTION and GOTO. The control program reads tokens from the input string and uses a stack for storing sequences like s0 X1 s1 X2 s2 … Xm sm. Each Xi corresponds to a grammar category and each si is a symbol called state. Depending on the state sm on the top of the stack, and the current input token ai, the control program looks for ACTION[sm, ai] which can be one of:

shift s, where s is a state
reduce
accept
error

A LR parser configuration is a pair (s0 X1 s1 X2 s2 … Xm sm, ai ai+1 … an $) with the first component being the stack contents and the second the remaining input. On each step, the action of the parser is completely determined by the contents of the parse table, as follows:

If ACTION[sm, ai] = shift s, then the parser does a shift to the configuration (s0 X1 s1 X2 s2 … Xm sm ai s, ai+1 … an $).
If ACTION[sm, ai] = reduce, then the parser does a reduction to the configuration (s0 X1 s1 X2 s2 … Xm-r sm-r A' s, ai ai+1 … an $) where s = GOTO[sm-r, A] and A' is a tree having root A and descendants Xm-r+1, Xm-r+2, … Xm.
If ACTION[sm, ai] = accept, the parsing is successfully finished.
If ACTION[sm, ai] = error, the string is rejected.

Based on an input string w and a LR parse table for a grammar G, the parser returns a parse tree for w, if or a message that . The initial configuration is (s0, w $).