The PREDICT-k Function


The standard LL(1) PREDICT function maps a given nonterminal symbol and a given terminal symbol onto a unique production in the grammar. The LL(1) PREDICT function is usually a partial function, so some nonterminal and terminal combinations may not produce a valid mapping. This circumstance is generally indicated by a zero entry in the parse table to indicate that the given terminal symbol does not predict any valid production in the grammar, for the given nonterminal symbol. Ignoring the issue of error detection for the moment, it should be clear that the primary purpose of the LL(1) PREDICT function is to predict which of the alternate productions for a given nonterminal should be used in the current parsing context. For example, consider the following grammar:

                        A  -->  a  b                       Grammar 5.1
                        A  -->  b  c

This grammar contains two A-productions that can be distinguished at the first level of lookahead. On a lookahead of "a", the first production is predicted, whereas on a lookahead of "b" the second production is predicted. Clearly, this grammar is LL(1) because the PREDICT function is able to differentiate these two productions. Put another way, the PREDICT function is able to resolve the conflict that arises from the fact that there are two different productions in the grammar for the same nonterminal symbol. Viewing the PREDICT function as conflict resolution, and extending that view to construct an alternate form of the PREDICTk function is the subject of this paper.

Definition of Conflicts

The standard LL(1) PREDICT function can be generalized to make it applicable to an arbitrary lookahead level for SLL(k) grammars. This is because the strong LL grammars are predictive without regard to the context of the production in which the nonterminal in question occurs. That is, given a lookahead string of length k and a nonterminal to be expanded, the production to be used can always be determined uniquely. The problem is that the construction of the PREDICTk sets requires that those sets be computed from the FIRSTk and the FOLLOWk sets for the grammar. As k gets large, the complexity of this computation grows exponentially. The possibility of having to consider all lookahead strings of length k is the factor that causes the complexity of the grammar analysis of SLL(k) grammars to be exponential in k. For a grammar with a terminal set T, the number of possible lookahead strings will be |T|k. Consider the following grammar for example:

                        A  -->  a b c d e f x              Grammar 5.2
                        A  -->  a b c d e f y

This grammar is obviously SLL(7) because the two productions can be differentiated at the seventh level of lookahead. This is easily determined by inspection of the grammar. However, the standard SLL(k) analysis of this grammar would be quite complex. The size of the terminal set for grammar 5.2 is 8, so the possible number of k-length lookahead strings for k=7 is 87, or more than 2 million. The dimensions of the SLL(k) parse table are:

                        |N|  ·  |T|k
Just filling in the entries of a table this size would be time consuming. That so much work is required to analyze such a simple grammar suggests that there must be a more efficient way to approach this problem.

The basic flaw in the standard SLL(k) parsing method is that a very large number of irrelevant lookahead strings must be considered. In the case of grammar 5.2, only two out of two million strings are actually needed to be able to parse the grammar. Consideration of the useless lookahead strings can be eliminated by analyzing the grammar from the perspective of conflicts. This is the intuitive way that a person would look at grammar 5.2. First he would compare the lookahead terminal symbols at level one. Since they are equal, there is a conflict at level one, and level two must be considered. This process continues up through lookahead level 7, where it is discovered that the two terminal symbols are different. At this point, the analysis terminates with the conclusion that the grammar is SLL(7).

If a grammar is to be analyzed from the perspective of conflicts, the exact nature and properties of a conflict should be considered. A general definition of a parse table conflict follows.

    Definition:  For any nonterminal "A" in a grammar, a parse table conflict 
    is defined as two or more different A-productions being predicted by the 
    same lookahead symbol.

    Conflicts can be classified by the ordered pairs

                ( conflict production list, conflict lookahead symbol )
    

Examples of Conflicts

A conflict in an LL(1) parse table can be resolved by constructing a row in another parse table in much the same way that a row in the LL(1) table is constructed. That is, for each conflicting production, a PREDICT set is constructed. The resulting PREDICT sets are used to build a parse table row in the usual way. For example, consider the following grammar:

                        A  -->  a  b                       Grammar 5.3
                        A  -->  a  c
                        A  -->  a  d

These three productions conflict at lookahead level one on terminal symbol "a". The conflict can be resolved by constructing a row in another parse table that predicts what action to take for different terminal symbols at the second level of lookahead. Since the three different productions are combined into a single row, that row can be fully described without retaining the production list notation. In this case, the row is classified by the pair (A, a), signifying that it resolves conflicts of nonterminal symbol "A" on lookahead terminal symbol "a". This pair describes a row in a parse table at lookahead level two. This type of parse table will be called a conflict resolution table to differentiate it from the traditional LL(1) parse table. A conflict resolution table is similar to an LL(1) parse table, except that it contains conflict resolution rows that resolve parse table conflicts at the previous lookahead level.

    Definition:  For SLL(2) grammars, the LL(2) conflict resolution table 
    resolves conflicts in the standard LL(1) parse table. The LL(2) CR table 
    consists of:

                    conflicts[1]  x  lookahead[2]

    where conflicts[1] refers to the set of conflicts in the LL(1) parse table. 
    Lookahead[2] refers to the terminal set at the second level of lookahead 
    in the input string. An LL(2) conflict resolution row can be
    indexed by a pair of the form

                    ( nonterminal A, conflict terminal c )

    where the conflict terminal "c" is a symbol that predicts at least two 
    different A-productions at lookahead level one.
    

The LL(2) conflict resolution table for grammar 5.3 is shown in figure 5.1. It was built by applying the PREDICT construction to the part of each of the three productions that follows the conflict symbol "a" on the right hand side of those productions. This is in contrast to the traditional method of constructing an LL(2) parse table from the PREDICT2 sets. Figure 5.2 shows part of the traditional LL(2) parse table and the sets from which it is constructed. Only the relevant columns are included. The full parse table would contain 42, or 16 columns, one for each possible lookahead string of length two.





Lookahead[2] a b c d ------------------------ | Conflict[1] (A, a) | 1 2 3 |


Figure 5.1 LL(2) Conflict Resolution Table For Grammar 5.3




FIRST2 ( ab ) = ab FIRST2 ( ac ) = ac FIRST2 ( ad ) = ad PREDICT2 ( A --> a b ) = 1 PREDICT2 ( A --> a c ) = 2 PREDICT2 ( A --> a d ) = 3 Lookahead ab ac ad ---------------------- | Nonterminal A | 1 2 3 |

Figure 5.2 Partial LL(2) Parse Table For Grammar 5.3

The application of the PREDICT construction to a part of the right hand side of a production instead of to the full right hand side seems intuitively correct in the case of a simple grammar like grammar 5.3. However, further justification is in order for more complicated grammars. Consider the following two grammars:

                        A  -->  a  b  c  d                 Grammar 5.4

A --> a B Grammar 5.5 B --> b c d

Clearly these two grammars are equivalent in that they both recognize the same simple language, "abcd". Grammar 5.5 has the effect of separating out the second lookahead symbol of grammar 5.4 into a new production. In this form, it is clear that the 2-length lookahead string "ab" can be defined as follows.

                        FIRST2 ( abcd )  =  FIRST ( a )  FIRST ( B )

This simple example illustrates the more general principle that a lookahead string can be broken down into a concatenation of its component parts. In principle, any SLL(k) grammar can be rewritten in a way similar to the way that grammar 5.4 was rewritten into grammar 5.5. More generally, the right hand side of any production that contains a terminal symbol can always be split into two different parts as follows.

                        A  -->  alpha a beta           Grammar 5.6

A --> alpha a B Grammar 5.7 B --> beta

Here alpha and beta are possibly-empty strings of symbols and "B" is a newly-generated nonterminal symbol. Clearly this type of transformation can never alter the language that is accepted by the original grammar, so the new grammar can be considered to be equivalent to the old grammar. Since it is always possible to make this type of grammar transformation, it is also possible to consider the right hand side of a production of the form of the production in grammar 5.6 as actually consisting of two separate right hand side parts. The first part is that part of the production up to and including the terminal symbol in question, and the second part is whatever follows the terminal symbol. The part of the production that follows the terminal symbol can be thought of as the right hand side of a virtual production, and so the PREDICT construction can be applied to this grammar fragment in exactly the same way that it can be applied to the right hand side of any other production in the grammar. Successive applications of this principle allow any SLL(k) grammar to be analyzed one lookahead level at a time, rather than having to consider the entire k-length lookahead string at once. This procedure can be summarized as follows.

        1.  Find the conflict terminal symbol in the context
            of a production.

        2.  Apply the PREDICT construction to what follows
            the terminal symbol to construct a row in a conflict
            resolution table.

        3.  If more conflicts are found in the new row, continue
            this procedure in the context of the new conflicts
            to construct a new table that operates at the next
            lookahead level.

This technique of analyzing an LL(k) grammar one lookahead level at a time will be referred to as the conflict resolution method of grammar analysis. The following sections show how the conflict resolution method can be applied to LL(k) grammars for specific values of k.

    Definition:  The conflict resolution method of grammar analysis
    of SLL(k) grammars begins with a construction of the standard
    LL(1) parse table. Any conflicts in the LL(1) table are resolved
    by constructing an LL(2) conflict resolution table. Conflicts in
    the LL(2) CR table are resolved by the construction of an LL(3) CR table, 
    and so on, ending with the construction of the LL(k) conflict
    resolution table.
    

LL(0) Grammar Conflicts

The LL(0) grammars are the simplest variant of the LL family of context-free grammars. Since the lookahead length k is zero, the parse algorithm must make all parsing decisions without the benefit of any input lookahead information. This means that there is no way to distinguish between two different productions for the same nonterminal. As a result, any reduced LL(0) grammar must have exactly the same number of productions as nonterminals. That is, the grammar has one production per nonterminal in the grammar. Grammar 5.8 is an example of an LL(0) grammar.

                        A  -->  a B C                      Grammar 5.8
                        B  -->  b
                        C  -->  c D
                        D  -->  d

As is the case with all LL(0) grammars, grammar 5.8 only derives one string. In this case, the single string that can be derived from the grammar is "abcd". The ability to derive only a single string is a severe limitation that makes the LL(0) class of grammars somewhat uninteresting, and not very useful in practice. However, the nature of the LL(1) class of grammars can be better understood in light of their relation to the LL(0) grammars. Consider the following grammar that is a slight modification of grammar 5.8:

                        A  -->  a B C                      Grammar 5.9
                        B  -->  b
                        B  -->  e
                        C  -->  c D
                        D  -->  d

Grammar 5.9 has two alternate productions for nonterminal "B", so it is not an LL(0) grammar. Since grammar 5.9 is only slightly changed from grammar 5.8, it is tempting to say that grammar 5.9 is almost LL(0). This is a reasonable statement since without the third production, grammar 5.9 would indeed be LL(0). Another way of looking at grammar 5.9 is to say that it is an LL(0) grammar that contains a single LL(0) conflict. That conflict is between productions two and three. The LL(0) parsing methodology is unable to decide which of these productions should be used to expand nonterminal "B" when it occurs during a parse. However, the LL(0) conflict is easily resolved at the LL(1) level. This is because the LL(1) algorithm uses a single lookahead token to determine which alternate production to use to expand a nonterminal that has more than one production. In this case, a lookahead of "b" predicts production 2 and a lookahead of "e" predicts production 3.

The traditional approach to analyzing grammar 5.9 would be to recognize it as an LL(1) grammar, and to use the LL(1) parsing methodology to analyze and to parse the grammar. This approach ignores the fact that grammar 5.9 is primarily an LL(0) grammar that contains only one LL(1) construct. Another method of handling this grammar would be to treat the LL(1) construct as an exception that must be handled differently from the rest of the grammar that is LL(0). figure 5.3 shows the LL(1) parse table for grammar 5.9.





Lookahead a b c d e ------------------------------ | A | 1 | B | 2 3 Nonterminals | C | 4 | D | 5 |


Figure 5.3 LL(1) Parse Table For Grammar 5.9

Figure 5.4 is the parse table for grammar 5.9 when it is treated as an LL(0) grammar that contains a single LL(0) conflict. The negative entry specifies both that a conflict has occurred, and that the conflict can be resolved by consulting the row of an LL(1) table that is the absolute value of the LL(0) table entry. Figure 5.3 is the LL(1) parse table that can be used to resolve the LL(0) conflict that occurs in grammar 5.9. Even for this tiny grammar, the total table size is significantly reduced by treating grammar 5.9 as a primarily LL(0) grammar that contains one conflict. The size of the LL(1) parse table is 20 versus the combined sizes of the LL(0) parse table and the LL(1) conflict resolution table, which is only 9. Since the LL(1) parse table has two dimensions, it will grow much more rapidly than the corresponding 1-dimensional LL(0) table as the size of the grammar is increased. Of course, the size of the conflict resolution table will be proportional to the number of LL(0) conflicts in the grammar. For grammars that are primarily LL(0), the overall space savings will be substantial if the conflict resolution method is used instead of the traditional LL(1) methodology.





------------ | A | 1 | B | -1 | LL(0) parse table C | 4 | D | 5 | a b c d e ---------------------------- | Conflict table B | 2 3 |


Figure 5.4 LL(0) Parse Tables For Grammar 5.9

The LL(1) Parse Table as LL(0) Conflict Resolution

The previous section showed that grammar conflicts at the LL(0) level can be resolved by using a partial LL(1) parse table to augment the LL(0) parse table. That was the perspective from the LL(0) point of reference. In this section, the same concept will be viewed from the LL(1) perspective. That is, the LL(1) parse table will be considered to be a conflict resolution table for an LL(0) grammar, even though the grammar may in fact be primarily, or even completely LL(1). Even so, it is still possible to view the LL(1) parse table as a method of resolving LL(0) conflicts. If the LL(1) parse table contains a row for every nonterminal in the grammar, then it is not necessary to construct or to use an LL(0) parse table. Such an LL(1) parse table is used in the traditional LL(1) methodology, so what is being proposed here is simply a different view of that methodology. The reason for introducing a new way of looking at the LL(1) parsing method is to lay a foundation for an extension of that method to the LL(k) grammars for values of k that are greater than 1.

If the LL(1) parse table is to be viewed as a conflict resolution table, there must be some way to incorporate a representation of the conflict terminal symbol into the table. This naturally brings up the question, "What is the conflict terminal symbol for an LL(0) conflict?" Since an LL(0) grammar does not use lookahead, and since all A-productions in an LL(0) grammar conflict with each other regardless of the content of the input string, it would appear that there is no conflict symbol for LL(0) conflicts. In the terminology of context-free grammars, there is a standard representation for the lack of a symbol. It is the null string, epsilon. The zero-length lookahead terminal symbol for each production in an LL(0) grammar is the null string, epsilon. Using the same classification of conflicts that was previously introduced, the form of the LL(0) conflicts would be ( A, epsilon ) for the A-productions in a grammar. Using this new terminology, the parse table for grammar 5.9 can be rewritten in the form:

                        conflicts[0]  x  lookahead[1]




Lookahead[1] a b c d e ------------------------------- | (A, null) | 1 | (B, null) | 2 3 Conflict[0] | (C, null) | 4 | (D, null) | 5 |


Figure 5.5 LL(1) Conflict Table For Grammar 5.9

Figure 5.5 illustrates this view of the parse table for grammar 5.9. Figure 5.5 is simply an alternate representation of figure 5.3. No meaningful information has been added to or removed from the parse tables, so the tables in figure 5.3 and figure 5.5 can be considered to be identical from the perspective of the information that they contain. In practice, either table could be used by the LL(1) parsing algorithm, since that algorithm only uses the index values of the rows, not their literal representation.




        A  -->  a B C
        A  -->  a B epsilon C
        A  -->  a B epsilon C epsilon
        A  -->  a epsilon B epsilon C epsilon
        A  -->  epsilon a epsilon B epsilon C epsilon


Figure 5.6 Adding epsilon To a String

figure 5.6 shows several different representations of the same string of symbols. The figure is meant to illustrate the fact that epsilon can be introduced at any point in any string without changing the meaning of the original string. One of the equivalent representations of the string in figure 5.6 has epsilon as the leftmost symbol on the right hand side of an A-production in some grammar. Grammar 5.10 is grammar 5.9 with every production modified in that way.

                        A  -->  epsilon a B C                        Grammar 5.10
                        B  -->  epsilon b
                        B  -->  epsilon e
                        C  -->  epsilon c D
                        D  -->  epsilon d

Grammar 5.10 represents a change to the form of grammar 5.9, but not a change to its meaning. Clearly grammar 5.10 recognizes exactly the same language recognized by grammar 5.9, since the null string epsilon cannot contribute anything to any derivation in any language. This is one reason that epsilon does not usually appear in non-null productions. In fact, most grammar representation methods even leave epsilon out of the null productions in the grammar. An empty right hand side of a production is usually used to signify that the production derives the null string. However, just as a null production can include epsilon on its right hand side, so can any non-null production be thought of as beginning with the null string as shown in grammar 5.10. It may be somewhat redundant, but it cannot be considered to be incorrect.

The reason for beginning each production in an LL(1) grammar with the null string symbol epsilon is to introduce a slight modification to the LL(1) grammar analysis method. Of course, it is not necessary to actually include epsilon in the productions; it is enough to recognize that it can be thought of as being there in any production in any grammar. Further, the leftmost occurrence of epsilon on the right hand side of a production must be at the left edge of the production. This should be obvious from the last representation of the string in figure 5.6. If the occurrence of epsilon as the first symbol of the right hand side of a production is considered to be a terminal symbol that separates the right-hand side of the production into two parts, then the PREDICT construction can be applied to the part of the production that follows epsilon, as was illustrated earlier.

The standard PREDICT construction for LL(1) grammars can be modified to specify that the PREDICT sets are to be constructed from the part of each right hand side of a production that follows the leftmost occurrence of epsilon. Since this occurrence of epsilon is the LL(0) conflict terminal symbol, this change transforms the PREDICT construction into an LL(0) conflict resolution construction. The parse tables constructed using this conflict resolution method will of course be identical to the parse tables that would be constructed using the traditional LL(1) methodology. This is because the part of a production that follows the leftmost occurrence of the null string is the entire right hand side of the production, and that is what is also used in the standard PREDICT set construction method. So for LL(1) grammars, the conflict resolution method of parse table construction reduces to exactly the traditional method of parse table construction. Since the LL(1) parsing methodology is well known to be correct, the conflict resolution methodology must also be correct, for the case of LL(1) grammars.

The Conflict Resolution Tables

The structure of the conflict resolution tables is quite similar to the structure of the standard LL(1) parse table. The LL(1) conflict resolution table is identical to the LL(1) parse table in format. They both consist of nonterminals by terminals. The only difference is that the LL(1) conflict resolution table contains a new type of entry referred to as a conflict entry. The conflict entries must be distinguished from the other types of entries in the table because the conflict entries indicate that a different course of action must be taken by the parsing algorithm. A conflict entry in the LL(1) conflict resolution table refers to a row in the LL(2) conflict resolution table. More generally, a conflict entry in an LL(n) conflict resolution table refers to a row in an LL(n+1) conflict resolution table, for values of n that are greater than zero.

The structure of the LL(2) conflict resolution table departs more radically from that of the standard LL(1) parse table than does the LL(1) conflict resolution table. The columns in the LL(2) conflict resolution table consist of all of the terminal symbols in the grammar, as is the case with the standard LL(1) parse table. However, here the terminals refer to the second level of lookahead instead of the first level of lookahead. The rows consist of conflict[1] instead of nonterminals. A conflict[1] consists of a nonterminal and a terminal symbol that represent a grammar conflict in the LL(1) conflict resolution table. In general, an LL(n) conflict resolution table for values of n that are greater than one consists of

                    conflict[n-1] x lookahead[n]

The purpose of the LL(n) conflict resolution table is to resolve grammar conflicts at the n-1 level of input lookahead. An SLL(k) grammar requires k conflict resolution tables in the conflict resolution method.

Definition of Conflict Resolution Tables

Actually the definition of the rows in a conflict resolution table should be stated somewhat more generally. The conflict resolution rows represent the conflict of a nonterminal symbol on a lookahead terminal string. For an LL(n) CR table, the length of the lookahead string is n. When the lookup is performed in the LL(n) CR table using the lookahead string that is part of the conflict[n-1], the nth-position lookahead terminal from the column of the LL(n) CR table is concatenated onto the n-1 length lookahead string to give the n-length lookahead string that is required to parse an SLL(n) grammar construct. This lookahead string corresponds exactly to the n-length lookahead string that would represent the column of a traditional SLL(n) parse table. The only difference is in the way that the lookahead string is formed. In the conflict resolution method, the lookahead string is concatenated one terminal symbol at a time to produce the final n-length lookahead string. In the traditional SLL(k) parse table, the k-length lookahead string is a precomputed k-length permutation of terminals from the set of all terminal symbols in the grammar.

    Definition:  A conflict entry in an LL(n) conflict
    resolution table refers to a conflict resolution row in an
    LL(n+1) conflict resolution table. The conflict entry is
    formed from the previous n-1 conflict tables as follows.

        n-1 conflict entry = (...((A, a1),a2),...an-1)
        n   column = an
        n   table entry = ((...((A, a1),a2),...an-1), an)

    The conflict entry in an LL(n) conflict resolution table
    reduces to a pair of the form:

        ( nonterminal, n-length lookahead string )

    where, in this case, the nonterminal is "A" and the
    n-length lookahead string is:

        a1 a2... an

    The pair represents a conflict in the grammar of the nonterminal "A" 
    on the n-length lookahead string shown. If the grammar is SLL(n), the 
    conflict resolution table entry is the production number that resolves 
    the conflict, rather than another conflict entry.
    

    Definition:  A lookahead terminal string is a string of terminal 
    symbols from a grammar. An n-length lookahead terminal string for values 
    of n that are greater than zero is formed when a lookup is done
    in an LL(n) conflict resolution table as follows.

        old row = (...((A, a1),a2),...an-1)
        old column = an
        new table entry = ((...((A, a1),a2),...an-1), an)
                                  if the entry is a conflict entry,
                                  a production number otherwise

    Removing the nonterminal symbol and the parentheses gives the
    new lookahead terminal string that follows.
                    a1 a2... an-1 an
    

For the case of the LL(1) conflict resolution table, the 1-length lookahead string is the lookahead terminal symbol that represents the columns of the standard LL(1) parse table. In conflict resolution terminology, the LL(1) lookahead string is the concatenation of the null string from the LL(0) parse table with the first lookahead terminal symbol from the LL(1) conflict resolution table. This reduces to the familiar lookahead symbol of the standard LL(1) parse table.

Conflict resolution table lookup is similar to standard LL(1) parse table lookup with the following two exceptions:

        1.  There is a third type of table entry in addition to
            the traditional production number entries and error entries.
            The new type of table entry is the conflict entry.

        2.  More than a single table lookup may be required to
            get the production number that is used to expand the
            current nonterminal symbol. The number of tables that
            must be consulted can vary from 1 to k for an SLL(k)
            grammar.

The conflict resolution method requires that table lookup proceed table by table until the production number is found. Once the production number or an error entry is found, the parsing algorithm proceeds exactly as in the standard LL(1) methodology. It is common practice in standard LL(1) parsing to number the productions in the grammar from 1 up to the total number of productions. This numbering is used as the index into a table of productions that is used by the parsing algorithm. The number 0 is reserved as the null parse table entry. A zero parse table entry indicates that the current nonterminal/lookahead combination does not predict any production in the grammar. This of course means that the input string cannot be a valid sentence in the grammar.

An implementation of the conflict resolution method requires an additional numbering that is easily distinguishable from the traditional parse table entries. The negative integers serve this purpose quite well. Whenever a parse table lookup is done, a test must be made to see if the entry is valid. That is, is the entry a positive integer rather than zero. Adding another test to check for negative entries in the conflict resolution table seems straight-forward enough. If a negative entry is encountered, the absolute value of that entry can be conveniently used to indicate the row index of the next conflict resolution table that is to be consulted in the search for a valid production number. This approach is similar to the exception handling that is invoked when a table entry of zero is encountered in the standard LL(1) parsing method. The algorithm that is used for conflict resolution table lookup is summarized in figure 5.7.



1. n = 1 2. entry = current nonterminal number 3. table = LLtable [n] 4. entry = table [entry] [lookahead[n]] 5. if entry < 0 n = n + 1 entry = |entry| go to 3 6. standard LL(1) procedure here 7. go to 1
Figure 5.7 Algorithm For Conflict Resolution Table Lookup.

The structure of most SLL(k) grammars dictates that the number of table lookups is variable for different parsing contexts when the conflict resolution method is used. Most SLL(k) grammars contain constructs that can be parsed using less than a k-length lookahead. If the lookahead string length that is required to parse a particular grammar construct is n, then exactly n conflict resolution tables need to be consulted to obtain the production that is used to expand the nonterminal symbol currently being parsed. The nature of the conflict resolution method itself also dictates that the number of table lookups is variable. This is because the conflict resolution method works by resolving conflicts that occur at each level of lookahead as they are encountered. The procedure naturally stops at whatever lookahead level finally resolves the original conflict. For most SLL(k) grammars, this number varies from 1 to k depending on the nonterminal being parsed. It would be possible to always use all k tables for each lookup, but that would be unnecessarily inefficient as well as being simply unnecessary.

As an optimization, all of the conflict resolution tables can be concatenated together to form a single table. This simplifies the table lookup algorithm by eliminating the need to change tables. Though the conflict resolution tables are logically separate, their structure is identical. The rows represent conflicts at different lookahead levels in the different conflict resolution tables. This difference can be overcome by simply renumbering the conflicts in each table so that the first row in one table is numbered one greater than the last row of the preceding conflict resolution table. The columns in a conflict resolution table represent the terminal symbols in the grammar at the different lookahead levels. However, the particular lookahead level of the terminal symbol in question is not necessary to the mechanics of the table lookup procedure. Only the terminal symbol itself is required in order to know which column to use.

The LL(1) Conflict Resolution Table

The LL(1) conflict resolution table is used as the basis for the LL(k) conflict resolution method. It is the first table that is constructed, bypassing the LL(0) parse table. This is because it is not necessary to construct the LL(0) parse table in order to construct the LL(1) conflict resolution table. As was shown earlier, the LL(1) conflict resolution table is identical in form to the LL(1) parse table, if the conflict resolution table includes a row for every nonterminal in the grammar. From the perspective of conflict resolution, this may introduce a slight redundancy in the table. That is, the LL(1) conflict resolution table may include rows to resolve conflicts that do not exist at the LL(0) level. This will be the case for any nonterminal that has only a single production in the grammar. However, there are several reasons for defining the LL(1) conflict resolution table differently from the LL(n) CR tables for values of n that are greater than one. These reasons are summarized as follows.

        Ability to use the standard LL(1) construction method.
        Consistency in the parsing algorithm.
        Improved error detection during parsing.

There are several benefits to using the standard LL(1) table construction method to build the LL(1) conflict resolution table. The standard LL(1) methodology is simple, efficient, and already widely used to parse LL(1) grammars. Extending this familiar algorithm to LL(k) grammars for values of k that are greater than one seems both natural and logical. Using the standard LL(1) parse table eliminates the need to compute and to maintain an LL(0) table. This tactic is consistent, since the LL(0) parse table cannot be considered to be a conflict resolution table. Thus the tables are all conflict resolution tables that are defined similarly. Also, the computation of the FIRST and the FOLLOW sets constitutes the bulk of the work that is done to build the LL(1) parse table. These sets will be shown to be needed in the LL(n) CR table construction for values of n that are greater than one. So, in fact, very little extra work needs to be done to produce the full LL(1) parse table versus the work that would be required to produce an LL(1) conflict resolution table that did not contain any redundancies from the perspective of LL(0) conflict resolution.

If the standard LL(1) parse table is used, then so can the standard LL(1) parsing algorithm be used in the conflict resolution method. This means that all nonterminals in the grammar can be treated uniformly by the parsing algorithm because there is a row in the parse table for every nonterminal in the grammar. For table-driven parsing, this is an important optimization. It would be awkward for the parsing algorithm to be required to consult two different parse tables just because of the fact that some nonterminals in the grammar have only a single production, and therefore do not require an entry in the LL(1) parse table. The ability to use the entire standard LL(1) parsing methodology as the basis for the conflict resolution methodology is beneficial as described above. The full LL(1) parsing methodology consists primarily of two phases as follows.

        1.  A grammar analysis phase to build the parse table.
        2.  The parsing of input strings using the parse table.

Error detection is somewhat better in the LL(1) parsing method than in the LL(0) method because of the fact that the lookahead is consulted in all cases. This can result in earlier error detection since an illegal input string can be detected before the selection of the next production in the parse. In the LL(0) parsing method, an input error cannot be detected until a terminal symbol is encountered in the parse. However, the LL(1) parsing algorithm uses the combination of both the nonterminal symbol being expanded and the lookahead symbol to determine which alternate production should be chosen. If none of the productions are predicted by the current lookahead symbol, then an error is immediately reported. Early error detection is generally considered to be a valuable feature of any parsing methodology.

While the LL(1) conflict resolution table conceptually resolves LL(0) conflicts, its definition varies slightly from that of the other tables due to its special position in the hierarchy of the conflict resolution tables. Since it is the base table of the conflict resolution methodology, it is defined exactly the same way that the standard LL(1) parse table is defined. This allows the standard LL(1) parsing algorithm to be used to parse LL(1) grammars. The benefits of this approach have been enumerated. There are only two modifications to the general conflict resolution method that need to be made to be able to use the standard LL(1) parse table as the LL(1) conflict resolution table. One is to recognize that the conflict nonterminal representation (A, epsilon ) can be reduced to simply "A" without loss of information. The other is to allow null-conflict rows in the LL(1) conflict resolution table. That is, rows for which the nonterminal in question has only a single production in the grammar. This allows the LL(1) conflict resolution table to contain a row for every nonterminal in the grammar, as required by the standard LL(1) parsing methodology.

The LL(n) Conflict Resolution Table

The rows of the LL(n) conflict resolution tables for values of n that are greater than one are all defined in terms of the previous conflict resolution table, the LL(n-1) CR table. This is because the purpose of the LL(n) conflict resolution table is to resolve any parsing conflicts that occur in the LL(n-1) CR table. One result is that the sizes of the LL(n) conflict resolution tables can vary widely for different values of n, depending on the particular grammar that is being analyzed. All of the LL(n) conflict resolution tables have the same number of columns, one for each terminal symbol in the grammar. The difference in the columns from table to table is that the terminals represent the lookahead level that corresponds to the level of the conflict resolution table. So the columns in the LL(n) CR table represent the lookahead[n] terminals. This method of representing lookahead strings will be shown to be much more space efficient than the traditional SLL(k) method of using a single parse table that contains a column for every k-length permutation of the terminal set.

Both the first and the last conflict resolution tables are slightly different from the other LL(n) CR tables. As pointed out earlier, the LL(1) conflict resolution table may contain null-conflict rows that are not allowed in any of the other tables. The LL(k) conflict resolution table is slightly different from the other tables in that it cannot contain any conflict entries, if the grammar is in fact SLL(k). This should be clear since a conflict entry in the LL(k) CR table signifies a grammar conflict at the k-lookahead level, implying that the grammar is not SLL(k). This feature of the conflict resolution method means that a given SLL(k) grammar's k-value is automatically determined by the grammar analysis. This is in contrast to the traditional method of performing an SLL(k) analysis on a grammar by first guessing a value of k, and then attempting to build the traditional SLL(k) parse table using the FIRSTk and the FOLLOWk sets to construct the PREDICTk sets. Of course, an upper bound for k is required by the conflict resolution method as with any other method, since without an upper bound on k the problem of determining whether or not a grammar is SLL(k) for some k is undecideable.

Conflict Resolution Table Size Analysis

The size of the standard SLL(k) parse table is fixed by k and by the size of the grammar. There is no variability in the size of the table depending on the form of the grammar. For a grammar G = { N, T, P, S } where

                        N = nonterminals
                        T = terminals
                        P = productions
                        S = start symbol
the size of the SLL(k) parse table is always:
                        |N|  ·  |T|k

The exponential number of columns is required because the naive extension of the LL(1) parsing method to SLL(k) for values of k that are greater than one must account for all possible k-length lookahead strings. The more efficient table organization of the conflict resolution method results in a total parse table size that is smaller than the size of the standard SLL(k) parse table. In the worst case, both methods result in exponential size tables, so the difference is not significant. However, for many common grammars, the size of the conflict resolution tables is closer to the size of the LL(1) parse table than to the size of the SLL(k) parse table.

Worst Case Size Analysis

The size of the LL(1) conflict resolution table is the same as that of the standard LL(1) parse table. It is

                        |N|  ·  |T|

The size of the LL(n) conflict resolution tables for values of n that are greater than one is variable, depending on the number of conflicts present in the grammar. In order to do a worst case size analysis, it is necessary to determine the form of the grammar that yields the maximum number of conflict rows for the size of the grammar. This maximizes the total table size for a given grammar size. Recall that a conflict of the form ( conflict production list, conflict lookahead symbol ) at the n-1 lookahead level is resolved as a conflict row in the LL(n) conflict resolution table. Clearly there is a relationship between the number of productions in a grammar and the number of conflict rows in a given conflict resolution table. Since the conflict production list consists of two or more different productions, the minimum number of productions per conflict resolution row must be two. There may more than two productions per row, but an absolute minimum of two productions are required to define a conflict resolution row.

In terms of conflict resolution table size, the worst-case grammar is one that generates the maximum number of conflicts for a given size grammar and for a given k value. The family of left linear grammars shown in figure 5.7a is such a grammar.



A1 --> A2 a A1 --> A2 b A2 --> A3 a A2 --> A3 b A3 --> A4 a A3 --> A4 b · · · Ak --> a Ak --> b
Figure 5.7a Worst-case SLL(k) Grammars

In this family of grammars, the k subscript in the last two productions is also the k-value of the SLL(k) grammar. The number of nonterminals in the grammar is also equal to k. The number of productions in the grammar is 2k. The grammar contains only two terminal symbols, a and b. The language generated by this regular grammar is all k-length strings of a's and b's. The number of these strings is |T|k = 2k

The size of the conflict resolution tables for this grammar is exponential in k. This is because the number of conflicts at each successive lookahead level grows exponentially up to a maximum of 2k-1 conflicts at lookahead level k-1. Since the grammar is SLL(k), all of these conflicts are resolved in the kth conflict resolution table. That table has 2k entries. This means that the total size for the conflict resolution tables is on the order of |T|k just as with the standard SLL(k) parse table. However, it will be shown that the actual size of the conflict resolution tables is somewhat less than that of the SLL(k) parse table for the worst-case grammar. For typical grammars, the size of the conflict resolution tables will be shown to be significantly smaller than that of the standard SLL(k) parse table.

The analysis of the size of the conflict resolution tables for the grammar given in figure 5.7a is more complicated than it might at first appear to be. This is due to the fact that the grammar contains constructs that are SLL(n), where n is less than k. Each conflict resolution table consists of a variable mix of conflict and non-conflict entries up to lookahead level k-1. Table 5.7b gives some statistics for the SLL(6) grammar from the family of grammars shown in figure 5.7a.

Table 5.7b Statistics For The Worst-case SLL(6) Grammar
Table Number Conflict Entries Non-conflict Entries Total Entries
1 10 2 12
2 16 4 20
3 24 8 32
4 32 16 48
5 32 32 64
6 0 64 64

The only obvious pattern from this table is that the number of non-conflict entries is 2n, where n is the table number. Table 5.7c factors the statistics from Table 5.7b in a way that shows a regular pattern that can be analyzed.

Table 5.7c Factored Statistics For The Worst-case SLL(6) Grammar
Table Number Conflict Entries Non-conflict Entries Total Entries
1 5 · 21 21 12
2 4 · 22 22 20
3 3 · 23 23 32
4 2 · 24 24 48
5 1 · 25 25 64
6 0 · 26 26 64

As can be seen in Table 5.7c, the size of the ith conflict resolution table is given by

                2i + ( k - n ) 2i
Since the size of the terminal set for this grammar is two, this can be rewritten as
                |T|i + ( k - i ) |T|i
Rearranging terms converts this to
                ( k - i + 1 ) |T|i
The total size of all of the conflict resolution tables for an SLL(k) grammar from the family of grammars in figure 5.7a is the sum of these terms for i equals 1 to k as follows.
                Total Size  =  sumi=1 to k ( k - i + 1 ) |T|i

The sum of the sizes of the conflict resolution tables for a grammar must always be less than the corresponding size of the standard SLL(k) parse table. Recall that the SLL(k) parse table has size

                        |N|  ·  |T|k
For the grammars given in figure 5.7a,
                        |N|  =  k
so the size of the SLL(k) parse table can be expressed as
                        k  ·  |T|k
This corresponds to k tables of size |T|k. The conflict resolution method also results in k tables. However, only the last two tables have size |T|k. The other tables are smaller, so the overall size of the conflict resolution tables must be smaller than the size of the standard SLL(k) parse table, for the worst case grammar with k greater than two.

Typical Case Size Analysis

The analysis of the typical size of the conflict resolution tables is quite different from the analysis of the table size in the worst case. The primary difference is in the assumptions that are made about the number of conflicts that are present in a typical SLL(k) grammar. In the worst case analysis, the number of conflicts was taken to be the maximum number possible for a given grammar size. For SLL(k) grammars that are used in practical applications such as common programming languages, the number of conflicts that are present is significantly less than the maximum number that is possible. In fact, most programming languages can be expressed as an LL(1) grammar by applying common techniques such as left-factoring to a few of the productions in the grammar. These grammar transformations convert the SLL(2) constructs of a grammar into LL(1) constructs. Clearly a typical SLL(k) grammar can be assumed to be primarily LL(1), with a small percentage of SLL(k) constructs for values of k that are greater than one.

For the purposes of this analysis, the number of conflicts in a typical SLL(k) grammar will be assumed to be about ten percent of the number of productions in the grammar. That is, ten percent of the productions conflict at every level of lookahead from 1 to k. This percentage is probably a little high, but it should be good enough for an approximate analysis. As with the worst case analysis, the size of the LL(1) conflict resolution table is the same as the size of the standard LL(1) parse table as follows.

                |N|  ·  |T|
The size of each of the LL(n) CR tables for values of n that are greater than one is the number of conflict rows per table times the size of the terminal set as follows.
                (number of conflict rows)  ·  |T|
If it is assumed that 10% of the productions conflict, and that each of those conflicts results in a separate conflict row, then the size of each LL(n) CR table is:
                floor ( (0.10 / 2) |P| )  ·  |T|
The number is divided by two because it takes two productions to produce a single conflict. The assumption that each conflict results in a separate conflict row is again tending to err on the high side. Combining these results yields a total typical table size of:
                |N| · |T| + (k - 1) floor ( (0.10 / 2) |P| )  ·  |T|
Rearranging the terms yields:
                |T|  ·  ( |N|  +  (k - 1)  floor ( 0.05 |P| ) )

Table 5.1 compares the typical case table size of the conflict resolution tables with the size of the standard SLL(k) parse table for several values of k. The size of the grammar used in the table is approximately the size of a small programming language like Pascal or C. The grammar has the following dimensions:

                        |P| = 280
                        |N| = 130
                        |T| =  80

Table 5.1 Comparison of SLL(k) Table Sizes
k value Typical Case CR table Standard SLL(k) table
1 10400 1.040e+004
2 11520 8.320e+005
3 12640 6.656e+007
4 13760 5.325e+009
5 14880 4.260e+011
6 16000 3.408e+013
7 17120 2.726e+015
8 18240 2.181e+017
9 19360 1.745e+019
10 20480 1.396e+021
11 21600 1.117e+023
12 22720 8.934e+024
13 23840 7.147e+026
14 24960 5.717e+028
15 26080 4.574e+030



PREDICT-k Function Definition

The approach of the conflict resolution method of grammar analysis is to split the overall SLL(k) analysis problem into k smaller subproblems that are each relatively easy to solve. The first subproblem is the construction of the LL(1) conflict resolution table using the standard LL(1) parse table construction methods. The only difference from the traditional method is that if conflicts are detected in the LL(1) CR table, they are maintained on a list for later use. In contrast, the traditional LL(1) table construction halts and reports failure if a grammar conflict is encountered.

Each of the remaining k-1 subproblems involves building a conflict resolution table that is similar to the standard LL(1) parse table. The conflict resolution tables are similar to the standard LL(1) parse table in that they represent a PREDICT function that determines the proper parsing action to take given a single token of lookahead. The difference is in the lookahead level of the single token. In the conflict resolution method, the lookahead level of the token matches the number of the conflict resolution table. So the lookahead level of the LL(n) CR table is n. A column in the LL(n) CR table represents the nth token in a lookahead string of length n.

Each single-level PREDICT function in the conflict resolution method bears a resemblance to the standard PREDICT function of the SLL(1) method in that the prediction of the parsing action is based on a single token of lookahead. However, the overall conflict resolution PREDICT function is a combination of single-level PREDICT functions that results in making a PREDICTk parsing decision, as does the standard SLL(k) PREDICTk function. It should be apparent that the conflict resolution method PREDICT function is a hybrid between the PREDICT and the PREDICTk functions. This being the case, some new terminology is indicated in order to avoid confusion when discussing the various types of PREDICT functions. The single-level PREDICT functions that are used in the conflict resolution method will be distinguished from the other types of PREDICT functions by referring to them as PREDICT[n] functions.

    Definition:  A PREDICT[n] function is the function that 
    corresponds to doing lookups in an LL(n) conflict resolution table. 
    The PREDICT[n] function maps a conflict[n-1] and a lookahead[n]
    terminal symbol onto an integer. The integer may have different 
    meanings as follows.

            positive = a production number
            negative = a conflict entry
            zero = an error entry
    

The PREDICT-k function that is defined below is similar to the standard PREDICTk function in that the end result of both is to map a nonterminal symbol and a lookahead string onto a production in a grammar. The production number zero represents the null-production in both cases. The PREDICT-k function differs from the PREDICTk function in the way that its final mapping is generated, and in the fact that the length of a given lookahead string may vary from 1 to k. The PREDICTk function always maps a nonterminal symbol and a k-length lookahead string onto an integer that represents a production number. The PREDICT-k function uses a variable number of PREDICT[n] functions to produce a result that corresponds to the mapping of a nonterminal symbol and a lookahead string of length from 1 to k onto a production number.

    Definition:  The PREDICT-k function for an SLL(k) grammar 
    maps a nonterminal and a lookahead string of length at most k
    onto a production in the grammar. The mapping is indirect
    through the use of at most k PREDICT[n] functions. The PREDICT-k 
    function is referred to as a PREDICT-n function in the case 
    where the length of the lookahead string required is n, and n is 
    less than k. In this case, the PREDICT[n] function produces the 
    final mapping to a production number.
    

The PREDICT-k function is unusual in that it is defined slightly differently depending on the value of the domain element in question. This is because the PREDICT-k function uses only as much lookahead as is required to parse a given grammar construct. For most SLL(k) grammars, this amount of lookahead varies from 1 to k tokens. This means that the application of the PREDICT-k function consists of the sequential application of several PREDICT[n] functions. The variable nature of the PREDICT-k function makes it resemble an algorithm more than a function. figure 5.8 describes the PREDICT-k function in the terms of an algorithm.



1. n = 1 2. PREDICT-k = PREDICT[n] 3. if PREDICT-k < 0 n = n + 1 go to 2
Figure 5.8 Algorithm For PREDICT-k Computation

PREDICT-k and the Lookahead String

The key to understanding how the PREDICT-k function works is to see how the lookahead string is constructed one token at a time by successive applications of PREDICT[n] functions for values of n from 1 up to the length of the required lookahead string. Take the following simple grammar as an example:

                        A  -->  a b c d                    Grammar 5.11
                        A  -->  a b c x

This grammar is SLL(4) because the two productions can be differentiated at the fourth level of lookahead. The standard PREDICTk function would be as follows.

                PREDICTk (A, abcd) = 1
                PREDICTk (A, abcx) = 2

This simply states that production number 1 is predicted on a lookahead of abcd for nonterminal A, and that production 2 is predicted on a lookahead of abcx for nonterminal A. The same result is achieved by the PREDICT-k function in four stages. The prediction of production 1 would proceed as in figure 5.9 using the PREDICT-k function.



PREDICT-k = PREDICT[4] ( PREDICT[3] ( PREDICT[2] ( PREDICT[1] ( A, a ), b ), c ), d )
Figure 5.9 Successive Applications of PREDICT[n]

Figure 5.9 illustrates how the PREDICT-k function is a combination of the successive applications of PREDICT[n] functions for values from 1 to 4, in this case. The PREDICT[1] function operates on the nonterminal in question and the first token of the lookahead terminal string. The PREDICT[2] function has the result of the PREDICT[1] function and the second token of the lookahead string as its arguments. Similarly, PREDICT[3] takes PREDICT[2] and the third token of the lookahead string to produce a result that, along with the fourth lookahead token, is operated on by PREDICT[4]. In this case, where the grammar construct is SLL(4), the result of the PREDICT-k function is equal to the result of the PREDICT[4] function. In general, a PREDICT-n function result is equal to the result of a PREDICT[n] function.

Another point to note from figure 5.9 is that the PREDICT[n] function can be defined in terms of the PREDICT[n-1] function and the nth lookahead symbol. Since the PREDICT-n function is in a practical sense equal to the PREDICT[n] function, the PREDICT-n function can also be thought of as being defined this way. This does not contradict the definition of PREDICT-k that was given previously, it is simply a different view of the PREDICT-k function when a particular SLL(n) grammar construct is being considered within a larger SLL(k) grammar. Of course, the value of n must always be less than or equal to the value of k. This view of the PREDICT-n function is represented as follows.

                PREDICT-n = PREDICT[n] ( PREDICT[n-1], lookahead[n] )

The format of the argument string for the PREDICT-k function in figure 5.9 is

                ( ( ( ( A, a ), b ), c ), d )
after removing the intermediate PREDICT[n] function applications. This reduces to
                ( A, abcd )
which is the same argument string that was given earlier in the representation of the PREDICTk function that produces production number one for grammar 5.11. This shows that the PREDICT-k function computes the same value that is computed by the PREDICTk function, but in a different way.

PREDICT-k Correctness

The PREDICT-k function computes a subset of the values computed by the PREDICTk function for SLL(k) grammars. The contention here is that this subset is sufficient to enable the recognition of any string in the language of a given SLL(k) grammar. This is true even when the subset is proper because the lookahead strings that are not represented by the new method are not needed by the new parsing algorithm. Examples of the types of lookahead strings that are not needed by the new parsing method are:

        1.  All lookahead strings of length k where the relevant
            grammar construct is SLL(n), and n is less than k.

        2.  All lookahead strings that cannot be part of some
            string that can be derived in the grammar.

These types of lookahead strings represent headings of the columns of the standard SLL(k) parse table. The mappings of the nonterminals in the grammar to the entries in these columns of the standard table have no corresponding mapping using the PREDICT-k function. These mappings are required by the standard SLL(k) parsing method because of its uniform, and naive approach to the problem. However, it should be apparent that in a fundamental sense, they are not required to recognize strings in the language of an SLL(k) grammar. In case one above, if the shorter lookahead string can be used by the parsing algorithm, then the need for all other k-length strings that have the short string as a prefix is obviated in that particular parsing context. Errors represented by the strings of case two are more efficiently detected at some prefix level. For instance, if terminal t can begin no derivation of an A-production in a grammar, then all k-length strings that start with t can be eliminated in that context.

The correctness of the PREDICT-k function requires the following:

        1.  The ability to represent all possible lookahead strings
            that are required for a given grammar. In the worst case,
            this number of strings is |T|k.

        2.  The correctness of each of the k PREDICT[n] functions
            for a given SLL(k) grammar. This is equivalent to the
            requirement that each of the k conflict resolution tables
            must be correct.

        3.  The correctness of the technique of combining the k
            PREDICT[n] functions to make parsing decisions.
            This is equivalent to the requirement that the new parse
            algorithm must be correct.

The first requirement above will be shown by example. The second requirement implies the need for a more general form of the PREDICT function, and so must be deferred for now. The third requirement will be proved on the assumption of the ability to construct correct conflict resolution tables.

The Lookahead String Representation

One difference between the PREDICT-k function and the PREDICTk function is that the PREDICT-k function does not necessarily require a mapping for all possible k-length lookahead strings. Only those lookahead strings that are required to resolve grammar conflicts are used in the PREDICT-k function. However, it is possible to represent all of the k-length lookahead strings in the conflict resolution method. If this were not the case, then some SLL(k) grammars could not be analyzed using the conflict resolution method.

In the conflict resolution method of grammar analysis, the lookahead string is represented as a sequence of single tokens. Each lookahead[n] is represented as a column in a LL(n) conflict resolution table for values of n from 1 to k. The full k-length lookahead string can be constructed by concatenating each column value in a conflict resolution table lookup as the tables are accessed. In the conflict resolution method, the complete lookahead string is not needed or used, but the string could be constructed if it were required for some reason. Since each LL(n) conflict resolution table contains a column for every terminal symbol in the grammar in question, it seems intuitively clear that all possible k-length strings could be constructed from a combination of the k tables. In each case, the result of PREDICT[n] refers to a row in the LL(n+1) conflict resolution table. This means that at each stage in the lookup process there are |T| possible continuations of the lookahead string at the next level of lookahead, so all |T|k lookahead strings can be represented by k tables.

Example Using All Possible Lookahead Strings

An example of a grammar that requires the use of all |T|k lookahead strings in the conflict resolution method illustrates that all of the possible lookahead strings can be represented using the conflict resolution tables. Consider the following grammar:

                        S  -->  A | B | C | D |            Grammar 5.12
                                E | F | G | H
                        A  -->  000
                        B  -->  001
                        C  -->  010
                        D  -->  011
                        E  -->  100
                        F  -->  101
                        G  -->  110
                        H  -->  111

The terminal set for this grammar is:

                        T = { 0, 1 }

Grammar 5.12 is contrived so that it is an SLL(3) grammar that requires all 8 possible lookahead strings. The lookahead strings correspond to the binary numbers from 0 to 7 for convenience of representation in the parse tables. figure 5.10 shows the SLL(k) parse table for this grammar.





Lookahead String Number 0 1 2 3 4 5 6 7 --------------------------------------------- | S | 1 2 3 4 5 6 7 8 | A | 9 | B | 10 | C | 11 Nonterminals | D | 12 | E | 13 | F | 14 | G | 15 | H | 16 |


Figure 5.10 SLL(3) Parse Table For Grammar 5.12

The conflict resolution tables for grammar 5.12 are shown in figure 5.11. Recall that a negative entry in the LL(n) CR table refers to the row in the LL(n+1) CR table that corresponds to the absolute value of that negative entry. The eight entries in the LL(3) CR table correspond to each of the eight possible 3-length lookahead strings, as well as to the first eight productions in the grammar. Note that in a grammar in which all of the lookahead strings are used, the LL(n) conflict resolution tables for values of n that are greater than one are completely filled. This is in contrast to the standard LL(k) parse table, which is very sparse.



Lookahead[1] 0 1 ----------------- | S | -1 -2 | A | 9 | B | 10 | C | 11 Nonterminals | LL(1) CR Table D | 12 | E | 13 | F | 14 | G | 15 | H | 16 Lookahead[2] 0 1 ----------------- | (S, 0) | -1 -2 Conflict[1] | LL(2) CR Table (S, 1) | -3 -4 Lookahead[3] 0 1 ----------------- | ((S, 0), 0) | 1 2 | ((S, 0), 1) | 3 4 Conflict[2] | LL(3) CR Table ((S, 1), 0) | 5 6 | ((S, 1), 1) | 7 8
Figure 5.11 Conflict Resolution Tables For Grammar 5.12

Correctness of the Parsing Algorithm

The correctness of the mechanics of the PREDICT-k function are considered here. That is, given the modified LL(1) parsing algorithm shown in figure 5.7 along with k correct conflict resolution tables for some SLL(k) grammar G, the PREDICT-k function predicts the correct parsing action to take at any point in the parse of some input string. The correctness of the individual conflict resolution tables requires that they be constructed using a generalized form of the PREDICT function that is not available at this point. What is shown here is that k correct conflict resolution tables can be used together to correctly parse strings in the language of an SLL(k) grammar.

Theorem 5.1. The algorithm given in figure 5.7 can be used to correctly parse strings in the language of an SLL(k) grammar. It is assumed that k correct conflict resolution tables for the grammar are available.

PROOF. In the case of an LL(1) grammar, the LL(1) conflict resolution table is identical to the LL(1) parse table. Since no conflict entries are present in the table, the part of the parsing algorithm that accesses higher level conflict resolution tables is never invoked. This means that the parsing algorithm reduces to the standard LL(1) parsing algorithm. The standard LL(1) parsing method is well known to be correct, so the conflict resolution parsing algorithm must also be correct for the case where k = 1.

The parse algorithm may encounter two types of grammar symbols. These are nonterminals and terminals. If a terminal symbol is encountered, it is matched against the input just as in the standard LL(1) method. So for terminal symbols, the new method again reduces to the old method, and must be correct for that reason.

For a nonterminal symbol, there are three possible types of entries in the LL(1) conflict resolution table:

                        1.  production number
                        2.  error entry
                        3.  conflict entry

Entry types 1 and 2 are the same as in the standard LL(1) table, and so give the same result as in that method. Entry type 3 indicates a row in the LL(2) conflict resolution table. This row number is correct by the assumption that all of the conflict resolution tables are correct. The parse algorithm uses the current nonterminal symbol and the current lookahead symbol at level one to perform lookups in the LL(1) conflict resolution table. This corresponds to the standard parsing algorithm, so the correct table entry must also be found by the new algorithm.

The algorithm uses the row number given by the previous table along with the lookahead symbol at the current lookahead level to perform a lookup in any of the tables from level 2 through level k. The resulting entry must be correct by the assumption that the conflict resolution tables are correct, and by the definition of the tables. The row number is correct because the previous table is assumed to be correct. The new table entry is correct because the conflict resolution tables are defined as mapping the row number obtained from the previous table and the lookahead token at the lookahead level that corresponds to the k-value of the current table to one of the 3 possible types of table entries given above.

The algorithm encounters one of the three possible entry types given above when doing a lookup in any of conflict resolution tables 2 through k-1. In case one, a production number is found, so the parsing algorithm uses that number to continue exactly as in the standard LL(1) parse. The production number must be correct, as explained above. In case 2, an error is detected and again the procedure to be followed is exactly as in the standard LL(1) method. This table entry is also correct by the reasoning given above. In case 3, the row number and the next input token are used to perform a lookup in the next conflict resolution table. The correctness of this case is also guaranteed by the reasoning given above. So for the tables 2 through k-1, either the lookup proceeds to the next table or the procedure reverts to the standard LL(1) parsing method.

If the table lookups continue up to the kth conflict resolution table, there are only two possible types of table entry. They are:

                        1.  production number
                        2.  error entry

As was the case in the previous tables, the correctness of the lookup procedure in the kth conflict resolution table is guaranteed. However, since no conflict entries are allowed in a correct kth conflict resolution table, the lookup procedure must always terminate with a correct production number or an error condition. This is guaranteed by the assumption that the table is correct, and that the grammar is SLL(k). These are the same two possibilities that are present in the standard LL(1) parse table, so the result of this table lookup must be the same as that of the standard method. That is, either the correct production number is found and the parse continues normally, or an error is detected and the parse halts or takes whatever other error recovery measures that the standard LL(1) method would perform.

Summary

This paper introduced the conflict resolution method of grammar analysis. The method is applicable to the strong LL(k) class of grammars. The conflict resolution method is quite different from most methods of grammar analysis in that it approaches the problem from the back end. That is, rather than analyze the grammar completely in a single stage, the conflict resolution method starts with the construction of the LL(1) parse table, and then attempts to resolve any conflicts that occur in that table by building an LL(2) conflict resolution table. If more conflicts are encountered in the LL(2) CR table, an LL(3) CR table is constructed, and so on up to a maximum of k tables for an SLL(k) grammar. The motivation for this approach is to reduce the size of the parser for SLL(k) grammars.

The PREDICT-k function was introduced as a hybrid between the PREDICT and the PREDICTk functions. As with the PREDICTk function, the PREDICT-k function has the ability to differentiate k-length lookahead strings in order to make parsing decisions. The difference is that the PREDICT-k function interprets the lookahead string one token at a time in a manner similar to the way that the PREDICT function works. For SLL(1) grammars, the PREDICT-k function reduces to the PREDICT function. The correctness of the PREDICT-k function was also investigated.



Home


Copyright © 2001-2009 SLK Systems