BranchTaken

Hemlock language insights

IELR(1) as Implemented by hocc

The hocc parser generator, which is part of the Hemlock programming language project, implements several LR(1)-family parser generation algorithms, namely LALR(1) [1], canonical LR(1) [2], PGM(1) [3,4], and IELR(1) [5]. These algorithms are amply documented and (re-)implemented, with the notable exception of IELR(1), which is documented only in the original paper and implemented only by the original authors in bison. This posed extreme implementation challenges in the context of hocc. The IELR(1) paper is closely tied to the particulars of the bison implementation, and perhaps for that reason the terminology and structure are closely based on the idiosyncrasies of DeRemer’s presentation of LALR(1). This terminology diverges substantially from that of Pager’s presentation of PGM(1), whence hocc took original inspiration. This report recasts the IELR(1) algorithm as distilled during hocc implementation, giving a pragmatic high-level perspective more conducive to straightforward (if less efficient) implementation than that provided by the original paper.

Introduction

Knuth [2] originated the “Left to right, Rightmost recursive” (LR) family of parser generation algorithms. The theory applies generally to languages recognizable by LR(k), where k denotes the number of tokens of lookahead. In practical use, k is almost always 1, in part because additional lookahead complicates implementation, but more importantly because most practical grammars can be easily rewritten to avoid the need for multi-token lookahead.

In 1965, canonical LR(1) in all its elegance posed serious implementation challenges due to ~10X state redundancy in the generated state machines. Lookahead LR(1) (LALR(1)) came along in 1969 as a practical compromise that collapses isocore sets (described later), even if doing so introduces parser inadequacies relative to the grammar specification. The Practical General Method (PGM(1)) was presented in its full form in 1977, and it dramatically improves on LALR(1) by avoiding parser inadequacies, with the important caveat that the algorithm can only provide those guarantees in the absence of disambiguation via precedence/associativity rules. PGM(1) never saw wide adoption, perhaps because LALR(1) was already widely implemented; nonetheless PGM(1) is strictly superior. IELR(1) stemmed from a research need for non-redundant parsers with no LR(1)-relative inadequacies. Although there are edge cases that can in practice cause redundant states during parser generation, the parsers are much smaller than their LR(1) counterparts, and IELR(1) does definitively deliver on inadequacy elimination, thus assuring that LR(1) and IELR(1) parsers recognize the same grammars.

The remainder of this report overviews the canonical LR(1) parser generation algorithm with a focus on concepts upon which IELR(1) builds, briefly describes LALR(1), then presents IELR(1) as implemented by hocc. The perspective is primarily LR(1)-relative, which differs substantially from the LALR(1)-relative exposition of the original IELR(1) paper.

Canonical LR(1)

Terminology

The following common example “arithmetic” grammar in hocc format suffices for defining various relevant terms as they are used in the hocc source code.

hocc
    left mul
    left add < mul
    token STAR "*" prec mul
    token SLASH "/" prec mul
    token PLUS "+" prec add
    token MINUS "-" prec add
    token INT
    token EOI
    nonterm MulOp ::=
      | "*"
      | "/"
    nonterm AddOp ::=
      | "+"
      | "-"
    nonterm Expr ::=
      | Expr MulOp Expr prec mul
      | Expr AddOp Expr prec add
      | INT
    start Answer ::= Expr EOI

A grammar comprises production rules, abbreviated as prod(s). Answer ::= Expr EOI and Expr ::= INT are examples of prods. A prod has a left-hand side (LHS), which is always a non-terminal symbol, abbreviated as nonterm. Answer and Expr are examples of nonterms. A prod also has a right-hand side (RHS), which is a sequence of nonterms and tokens. STAR, its alias "*", and EOI are examples of tokens.

An LR(0) item is a prod with associated position, where the current parsing position is indicated by a dot. For example, Answer ::= · Expr EOI, Answer ::= Expr · EOI, and Answer ::= Expr EOI · are distinct LR(0) items based on the same prod.

An LR(1) item is an LR(0) item with an associated follow set, i.e. a set of tokens which may immediately follow the prod. For example, MulOp ::= · "*", {INT} indicates that a multiplication operator may be followed only by an integer. For a less obvious example, Expr ::= · INT, {"*", "/", "+", "-", EOI} indicates that an integer may be followed by a math operator or end-of-input (EOI). Note that the dot position is not particularly relevant to the follow set.

An LR(0) item set, also known as a core, is simply a set of LR(0) items. Two cores are isocores if they are isomorphic, i.e. they comprise identical LR(0) item sets.

An LR(1) item set, also known as a kernel, is simply a set of LR(1) items, also known as kernel items. Two kernels are isokernels if they are isomorphic, i.e. they comprise identical LR(1) item sets. A kernel can be mapped to a core by extracting the LR(0) items from all all LR(1) items. It is possible (and common in canonical LR(1) parsers) for non-isokernels to map to isocores.

Parser generators have long supported grammar disambiguation via precedence and associativity. For example mul has higher precedence than add, both of which are left-associative. hocc differs from most parser generators in that precedences comprise an explicit optionally-disjoint directed acyclic graph, rather than a mostly implicit single linear precedence order. Furthermore, although hocc supports neutral associativity (%precedence in bison), it intentionally omits support for non-associativity (%nonassoc in YACC-family parser generators). These deviations from the status quo avoid masking grammar flaws and increase specification precision.

Work queue processing

The full details of the canonical LR(1) parser generation algorithm are beyond the scope of this report, but it is important to describe 1) the iterative work queue process by which a grammar specification is converted to a state machine, and 2) how isocores play into the state generation process.

The work queue manages incremental state set creation. Compatible states which are merged can in turn affect later compatibility test results for IELR(1) (and PGM(1)). No effort is made to puzzle together isocores via optimal merging order, but since merging order can dramatically impact the total number of work queue insertions, care is taken to insert at the front versus back of the work queue in such a way as to process states in an approximately breadth-first order rather than depth-first.

Once the work queue is seeded with start states, states are consumed from the head of the work queue and processed until none remain, i.e. a fixpoint has been reached. The work queue contains each distinct state exactly once, over the entire duration of work queue processing. The end goal is to traverse the entirety of the resulting state machine once. What makes a state distinct is critical to understanding the work queue behavior. If a kernel compatibility test determines that two non-isokernels are compatible, the merged result is distinct from at least one of the input isokernels, even though the state number stays the same. A state number may ephemerally correspond to a series of distinct states, and although some of those ephemeral states may never be processed by the work queue, the last one certainly will be. Given this understanding, the work queue insertion regimen is straightforward:

State generation/merging

State generation begins with the grammar’s start symbol(s). A pseudo-start symbol and kernel item is wrapped around each start symbol, always with a kernel of the form Start' ::= · Start "⊥", {"ε"}, e.g. Answer' ::= · Answer "⊥", {"ε"} in the example grammar. Multiple computations lead to a fully defined state, namely closing on the added set, which is the set of productions reachable from the kernel without advancing the input, and then computing the goto set for each symbol to the right of a dot. These goto sets define kernels of other states, in some cases being merged into existing compatible states, and in other cases defining distinct states not yet encountered during state machine construction.

For the example grammar, the pseudo-start symbol results in the following state, which is inserted into both the work queue and the state set:

    State 0
        Kernel
            [Answer' ::= · Answer "⊥", {"ε"}]
        Added
            [Expr ::= · Expr MulOp Expr, {"*", "/", "+", "-", EOI}] prec mul
            [Expr ::= · Expr AddOp Expr, {"*", "/", "+", "-", EOI}] prec add
            [Expr ::= · INT, {"*", "/", "+", "-", EOI}]
            [Answer ::= · Expr EOI, {"⊥"}]
        Actions
            INT : ShiftPrefix 1
        Gotos
            Expr : 2
            Answer : 3

Note that every symbol immediately to the right of a dot is represented in the actions or gotos, for tokens and nonterms, respectively. Consider Expr, which is to the right of multiple dots. Each item in the goto set for Expr is created by advancing the dot one position relative to the item in this state. The following kernel results:

    [Expr ::= Expr · MulOp Expr, {"*", "/", "+", "-", EOI}] prec mul
    [Expr ::= Expr · AddOp Expr, {"*", "/", "+", "-", EOI}] prec add
    [Answer ::= Expr · EOI, {"⊥"}]

A work queue state is processed by 1) popping it from the front of the work queue, and 2) computing the actions and gotos, such that work queue and state set insertion may result.

Canonical LR(1) isocore merge compatibility is maximally strict, in that kernels must be isokernels in order to be merged. Merging two isokernels results in an isokernel; thus merging is a no-op in practice. Such strict compatibility commonly results in nearly identical states which could in practice be merged without changing the grammar recognized by the state machine. IELR(1) eliminates (nearly) all such redundancy.

LALR(1)

The LALR(1) algorithm bears mentioning only because it is used in the IELR(1) algorithm as the foundation on which inadequacy attribution metadata are computed. As described in more detail later, IELR(1) conceptually patches up all inadequacies of an LALR(1) parser to make it recognize the same grammar as the corresponding canonical LR(1) parser. That means starting with an LALR(1) parser, analyzing inadequacies, then generating the IELR(1) parser.

As described earlier, canonical LR(1) isocore compatibility requires that corresponding kernel items have identical follow sets, i.e. isocores are compatible only if they are isokernels. LALR(1) goes to the other extreme, and treats all isocores as compatible, regardless of follow sets. Given a functioning canonical LR(1) parser generator, LALR(1) parser generation is trivial to add; isocore compatibility tests always return true.

IELR(1)

The IELR(1) algorithm is substantially more complicated than canonical LR(1) or LALR(1). The IELR(1) paper [5] describes six stages as implemented by bison, three of which directly correspond to stages in hocc:

  1. Generate the LALR(1) state machine, with conflict resolution disabled so that later analysis can discover and analyze ambiguities.
  2. Trace lanes backward from conflict-containing states and annotate state transitions with conflict contribution metadata that enable inadequacy-eliminating isocore compatibility testing.
  3. Compute the IELR(1) state machine fixpoint, using metadata from (2) to attach metadata to IELR(1) states, use those metadata to inform isocore compatibility, and propagate derivative metadata as states are created.

The interaction between metadata pre-computed in (2) and dynamically propagated in (3) is surprisingly subtle, but implementation requires only a modest amount of code given valid data abstractions. The remainder of this section describes these two phases in detail.

Terminology

A characteristic finite state machine generated by any of the LR(1)-family algorithms is a pushdown automoton. Such state machines maintain a stack while traversing a digraph. The hocc code uses the following terminology related to state machine digraphs:

States that contain conflicting reduce actions for the same input symbol (reduce-reduce conflicts) are the starting point for tracing backward through every lane, some or all of which may contribute to conflicts. From the perspective of the conflict state, an individual lane is a linear (i.e. non-forking but potentially cyclic) predecessor path back to a start state; the lane may extend forward past the conflict state either via shift or via goto, but these extensions are irrelevant to the conflict unless they participate in a cycle back to the conflict state. Cycles pose complications with regard to lane tracing, as do acyclic diamond-pattern fork/join patterns, whether sequential or nested. Such topologies can induce an infinitude of lanes, which is why analyses based on lane tracing must be able to reach closure while tracing each lane segment only once.

Lane tracing matters to inadequacy elimination because each shift and reduce contribution (contrib) is attributed to a transit — an attrib (attribution). Removal of an inadequacy entails splitting portions of two or more merged lanes such that attribs are fully partitioned with respect to each conflict-containing state. Furthermore, although LR(1)-relative inadequacies always result in reduce-reduce conflicts, shift actions must also be tracked in conflict manifestations in order to determine which state(s) to split. Each attrib is specific to a conflict state, symbol, and transit. Thus an attrib comprises a (conflict state, symbol, conflict manifestation, isucc LR(1) itemset, contrib) tuple.

Lane tracing

The goal of lane tracing is to annotate transits with conflict attribution metadata, such that attributions can be propagated forward through all relevant lanes during state machine fixed-point iteration. These annotations are kernel attribs keyed by conflict state kernel items (one follow set symbol per key).

For example, consider this transcription of Pager’s G2 grammar [3]:

hocc
    token At
    token Bt
    token Ct
    token Dt
    token Et
    token Tt
    token Ut
    token EOI

    start Sn ::= Xn EOI

    nonterm Xn ::=
      | At Yn Dt
      | At Zn Ct
      | At Tn
      | Bt Yn Et
      | Bt Zn Dt
      | Bt Tn

    nonterm Yn ::=
      | Tt Wn
      | Ut Xn

    nonterm Zn ::= Tt Ut

    nonterm Tn ::= Ut Xn At

    nonterm Wn ::= Ut Vn

    nonterm Vn ::= epsilon

As analyzed by hocc in LALR(1)/IELR(1) mode, this results in the following state subgraph (inconsequential states omitted for brevity):

        LALR(1)                           IELR(1)
          ___                               ___
         |   |                             |   |
    /----| 0 |----\                   /----| 0 |----\
   /     |___|     \                 /     |___|     \
  |                 |               |                 |
 _v_      ___      _v_             _v_      ___      _v_
|   |--->|   |<---|   |           |   |--->|   |<---|   |
| 1 |    | 6 |    | 2 |           | 1 |    | 6 |    | 2 |
|___|<---|___|--->|___|           |___|<---|___|--->|___|
  |                 |               |                 |
  |       ___       |              _v_               _v_
   \     |   |     /              |   |             |   |
    \--->| 5 |<---/               | 5₀|             | 5₁|
         |___|                    |___|             |___|
           |                        |                 |
          _v_                      _v_               _v_
         |   |                    |   |             |   |
         |15 |                    |15₀|             |15₁|
         |___|                    |___|             |___|

There are four acyclic lanes in the LALR(1) state graph which can be traced backward from state 15: 0→1→5→15, 0→1→6→2→5→15, 0→2→5→15, and 0→2→6→1→5→15. Additionally, there is an infinitude of cyclic lanes, e.g. 0→1→{6→1}⁺→5→15 and 0→1→{6→2→6→1}⁺→5→15. This grammar lacks chained and/or nested fork/join topologies, which would combinatorially induce lanes even in the absence of cycles.

All depicted transits except for 0→1 and 0→2 initially receive annotations, but most of the annotations are useless because they cannot lead to state splitting, and can therefore optionally be filtered out prior to state closure.

annotations=[
    ({src=1; dst=5}, [
        [Xn ::= At · Yn Dt, {Dt}] = [
            {conflict_state_index=15;
              symbol_index=5 (Dt);
              conflict={Reduce [Zn ::= Tt Ut]; Reduce [Vn ::= epsilon]};
              isucc_lr1itemset=[
                [Yn ::= Tt · Wn, {Dt}]
              ];
              contrib={Reduce [Vn ::= epsilon]}}
          ]
      ])
    ({src=2; dst=5}, [
        [Xn ::= Bt · Zn Dt, {Dt}] = [
            {conflict_state_index=15;
              symbol_index=5 (Dt);
              conflict={Reduce [Zn ::= Tt Ut]; Reduce [Vn ::= epsilon]};
              isucc_lr1itemset=[
                [Zn ::= Tt · Ut, {Dt}]
              ];
              contrib={Reduce [Zn ::= Tt Ut]}}
          ]
      ])
    ({src=5; dst=15}, [
        [Yn ::= Tt · Wn, {Dt}] = [
            {conflict_state_index=15;
              symbol_index=5 (Dt);
              conflict={Reduce [Zn ::= Tt Ut]; Reduce [Vn ::= epsilon]};
              isucc_lr1itemset=[
                [Wn ::= Ut · Vn, {Dt}]
              ];
              contrib={Reduce [Vn ::= epsilon]}}
          ]
        [Zn ::= Tt · Ut, {Dt}] = [
            {conflict_state_index=15;
              symbol_index=5 (Dt);
              conflict={Reduce [Zn ::= Tt Ut]; Reduce [Vn ::= epsilon]};
              isucc_lr1itemset=[
                [Zn ::= Tt Ut ·, {Dt}]
              ];
              contrib={Reduce [Zn ::= Tt Ut]}}
          ]
      ])
  ]

The G2 grammar has no shift action involved in the conflict, which makes useless annotation filtering simpler. Consider state 6 in the set of all annotations. There are two annotations on 1→6, and two more on 2→6, but all four attribs contain the same contrib. No matter how those attribs are partitioned by state splitting, the contrib will be either unchanged or completely absent, i.e. state splitting cannot remove a conflict for the lanes passing through state 6 by splitting its ipreds.

A conflict involving a shift action is more complicated to handle, because all lanes are implicitly implicated in the shift action. Even if a lane makes no reduce contributions, it is still implicated in the shift action, so the “completely absent” splitting outcome can potentially eliminate inadequacies. This situation requires non-local graph analysis because lanes that make no reduce action contribution can converge anywhere in the transitive predecessor graph.

The useful G2 grammar annotations indicate that states 5 and 15 must be split in order to remove inadequacies. Following are lightly edited excerpts from LALR(1) and IELR(1) hocc reports showing the state splits.

             LALR(1)

 ____________State_5_____________
| Kernel                         |
|     [Yn ::= Tt · Wn, {Dt, Et}] |
|     [Zn ::= Tt · Ut, {Ct, Dt}] |
| Added                          |
|     [Wn ::= · Ut Vn, {Dt, Et}] |
| Actions                        |
|     Ut : ShiftPrefix 15        |
| Gotos                          |
|     Wn : 16                    |
|________________________________|
                |
                |
                v
 ____________State_15____________
| Kernel                         |
|     [Zn ::= Tt Ut ·, {Ct, Dt}] |
|     [Wn ::= Ut · Vn, {Dt, Et}] |
| Added                          |
|     [Vn ::= ·, {Dt, Et}]       |
| Actions                        |
|     Ct : Reduce Zn ::= Tt Ut   |
|     Dt : CONFLICT              |
|          Reduce Zn ::= Tt Ut   |
|          Reduce Vn ::= epsilon |
|     Et : Reduce Vn ::= epsilon |
| Gotos                          |
|     Vn : 22                    |
|________________________________|


                                    IELR(1)

 _____________State_5₀_______________     ____________State_5₁_______________
| Kernel                             |   | Kernel                           |
|     [Yn ::= Tt · Wn, {Dt}]         |   |     [Yn ::= Tt · Wn, {Et}]       |
|     [Zn ::= Tt · Ut, {Ct}]         |   |     [Zn ::= Tt · Ut, {Dt}]       |
| Added                              |   | Added                            |
|     [Wn ::= · Ut Vn, {Dt}]         |   |     [Wn ::= · Ut Vn, {Et}]       |
| Actions                            |   | Actions                          |
|     Ut : ShiftPrefix 15₀           |   |     Ut : ShiftPrefix 15₁         |
| Gotos                              |   | Gotos                            |
|     Wn : 16₀                       |   |     Wn : 16₀                     |
| Conflict contributions             |   | Conflict contributions           |
|     [Yn ::= Tt · Wn, {Dt}]         |   |     [Zn ::= Tt · Ut, {Dt}]       |
|         15 : Reduce Vn ::= epsilon |   |         15 : Reduce Zn ::= Tt Ut |
|____________________________________|   |__________________________________|
                 |                                       |
                 |                                       |
                 v                                       v
 ____________State_15₀_______________     ____________State_15₁_____________
| Kernel                             |   | Kernel                           |
|     [Zn ::= Tt Ut ·, {Ct}]         |   |     [Zn ::= Tt Ut ·, {Dt}]       |
|     [Wn ::= Ut · Vn, {Dt}]         |   |     [Wn ::= Ut · Vn, {Et}]       |
| Added                              |   | Added                            |
|     [Vn ::= ·, {Dt}]               |   |     [Vn ::= ·, {Et}]             |
| Actions                            |   | Actions                          |
|     Ct : Reduce Zn ::= Tt Ut       |   |     Dt : Reduce Zn ::= Tt Ut     |
|     Dt : Reduce Vn ::= epsilon     |   |     Et : Reduce Vn ::= epsilon   |
| Gotos                              |   | Gotos                            |
|     Vn : 22₀                       |   |     Vn : 22₀                     |
| Conflict contributions             |   | Conflict contributions           |
|     [Wn ::= Ut · Vn, {Dt}]         |   |     [Zn ::= Tt Ut ·, {Dt}]       |
|         15 : Reduce Vn ::= epsilon |   |         15 : Reduce Zn ::= Tt Ut |
|____________________________________|   |__________________________________|

As mentioned earlier, lanes may contain cycles (conceptually an infinite set of lanes with [0..∞] cycle transits), and lanes may fork/join (combinatorial explosion of lanes), which means that lanes cannot be iteratively annotated. The tractable approach is to simultaneously trace all lanes passing through each relevant transit by recursing backward through transits until no new annotation is added. Think of each recursive call as traversing a transit from a state to its ipred, such that each application of the recursive function is on behalf of a state in the context of the transit just recursed on, i.e. a lane context (lanectx).

Each lanectx comprises a map of zero or more lane traces, where each map key is a (symbol, conflict manifestation, action) tuple, and map values are in turn M:N maps that associate transit source/destination LR(1) items. Note that a trace key/value pair may represent multiple lanes, because multiple conflict state kernel items can induce the same added ε production. Furthermore, note that a lanectx generates annotations only if it contains traces, and since traces are transitively based on those of successors, it is possible for tracing to terminate before reaching a start state, as for the G2 grammar above.

Lane tracing starts at each conflict-containing state, with traces for all reduce actions implicated in conflicts. At each recursion depth, a lanectx is computed based on the caller’s lanectx. The basic idea at each recursion is to move the dot in each traced LR(1) item one position to the left; in the case where the dot is already at position 0 (i.e. the item is in the added set), attempt to trace into generating kernel items.

The critical section for IELR(1) lies in repeatedly computing the leftmost transitive closure of a state’s kernel items given a LHS symbol and a lookahead symbol. This computation recurses backward through a state’s added items to determine which kernel items are implicated in lanes for the conflict being traced. The naíve approach to this computation suffices, but for complicated grammars memoization is a superlinear optimization that can improve overall performance by over 10X.

The precise details of trace initialization are intricate enough that the hocc implementation serves as a clearer explanation than would further verbiage. That said, it is worth calling out that lane tracing operates on kernel items rather than kernels as a whole, which can be especially confusing when traces intertwine as mentioned vis a vis ε productions.

State machine fixpoint computation

Each step of the state machine fixpoint computation for IELR(1) is structurally very similar to the approach taken for LALR(1), PGM(1), and canonical LR(1). All of the algorithms rely on isocore compatibility testing and merging, but IELR(1) is more complicated than the other algorithms in two ways. First, compatibility testing must reference lane tracing metadata that are attached to the isocores, and second, merging isocores requires merging the attached lane tracing metadata. Thus IELR(1) does not operate on bare goto sets and states, rather on goto nubs and state nubs, both of which carry kernel attribs.

What metadata actually flow through the state machine during fixpoint computation, and how far do they propagate? This is perhaps the most confusing part of IELR(1) implementation. Attrib flow through the state graph is disjoint — each attrib flows through a single transit, i.e. from an ipred to a state nub. In other words, the attribs that accumulate in a state nub matter to isocore compatibility testing, but they do not flow to isuccs. Attribs are introduced afresh for each transit by initializing each goto nub with the transit’s kernel attribs attached. Given a goto nub that is incompatible with existing state nubs (if any), the goto nub is converted to a state nub. If, on the other hand, the goto nub is compatible with an existing state nub, the goto nub is merged into the state nub, including its kernel attribs.

Remerging

Redundant states can arise during IELR(1) state machine closure because hocc makes no effort to exclude to-be-orphaned states from consideration during isocore compatibility testing. Instead it implements a limited form of state remerging that iteratively merges states with isomorphic isucc graphs. Specifically, two isocoric states are remergeable if for all paired out transits one of the following holds:

Iterative application of state remerging in practice works backward through the state graph, because remerging isocoric states’ successors may enable subsequent remerging.

Although remerging was initially motivated by IELR(1) in hocc, it also minorly benefits PGM(1), and majorly benefits canonical LR(1). Given the same grammar, canonical LR(1) tends to generate roughly ten times more states than does LALR(1)/PGM(1)/IELR(1). Initial results indicate that remerging reduces that from a factor of ~10 to a factor of ~4. For example, consider hocc results for the Gpic grammar originally analyzed in the IELR(1) paper [5].

Algoritm # of states Ratio
LALR(1) 423 1___
PGM(1) 423 1___
PGM(1)* 426 1.01
IELR(1) 428 1.01
IELR(1)* 437 1.03
LR(1) 1506 3.56
LR(1)* 4834 11.43

* — no remerging

Interestingly, bison generates 428 states for Gpic even though it lacks a remerging implementation. bison presumably omits remergeable states by generating states in a different order, but to my knowledge there is no tractable approach which universally eliminates remerging utility.

Performance

The hocc implementation of IELR(1) is dramatically slower than that of bison. The following wall clock times for the Gpic grammar are representative. Both hocc and bison are configured to emit no reports nor generated code, and the reported numbers are the best of three runs on an AMD EPYC 7742 CPU running Ubuntu Linux 24.04, using OCaml 5.2.0 with flambda enabled for hocc versus the vendor-supplied bison 3.8.2.

Algorithm hocc bison
LALR(1) 0.929 0.017
PGM(1) 1.487
IELR(1) 13.623 0.029
LR(1) 10.011 1.527

bison is a C application that relies on flat and linearly allocated mutable global data structures, whereas hocc is an OCaml application that relies on high-level purely functional data structures. hocc uses maps in many places where a custom low-level data structure would perform much better, albeit at the cost of code clarity and maintainability. More critically, hocc is implemented on top of the Basis library, which is an OCaml-native standard library intended to correspond closely to the standard library designed for the Hemlock programming language. OCaml unfortunately provides 63-bit integers as its default high-performance integer type, but Hemlock’s design calls for 64-bit integers. As a consequence, Basis ubiquitously uses OCaml’s boxed 64-bit integers, which imposes an epic load on OCaml’s automatic memory management subsystem. These Basis-related implementation quirks could easily account for ~10X of the hocc-bison performance gap, saying nothing of the high-level data structure overhead. Nonetheless, current performance meets practical requirements for Hemlock bootstrapping, and further optimization is left as an exercise for the aspirational Hemlock-native hocc implementation.

A basic IELR(1) implementation can get away without two of the refinements described in this report, namely useless annotation filtering and leftmost transitive closure memoization. That said, anecdotal evidence based on processing the Lyken grammar (an abandoned research language) suggests that these refinements can matter for antagonistic inputs. The Lyken grammar was developed using an implementation of the PGM(1) algorithm, and it relied heavily on per conflict precedence relationships to converge on a specification which ended up with no LR(1)-relative inadequacies. But the IELR(1) annotations required to determine this are copious, and the lanes are heavily intertwined. Absent either refinement, IELR(1) processing requires nearly 30 GiB of RAM and approximately 16 hours of wall time. Useless annotation filtering reduces this to 13 GiB and 14 hours. Leftmost transitive closure memoization has no significant impact on memory usage, and further reduces wall time to approximately 1 hour. Performance impacts for less tortuous grammars range from neutral to modest speedup, e.g. ~1.25X for Gpic.

Conclusion

This report is intended to help others bypass the morass that IELR(1) implementation turned out to be in the context of hocc. My initial intention regarding IELR(1) was to demonstrate that it has no practical utility relative to PGM(1), but careful rereading of the IELR(1) paper convinced me otherwise. Full understanding was elusive, and the hocc implementation is in large part a re-invention given the benefits of an imperfectly understood paper and an existence proof in the form of bison.

Although hocc is primarily a Hemlock-targeting parser generator, it is also generally useful for grammar experimentation/validation due to its clean syntax when omitting embedded reduction code. More importantly in the context of this report, hocc serves as a straightforward reference implementation of IELR(1), even to implementers who are unfamiliar with OCaml. Choice of data structures is key to implementation, and OCaml record syntax is self-evident to experienced programmers, e.g. Prod.t as defined in the prod.mli interface file:

type t = {
  index: Index.t;
  (** Unique production index. *)

  lhs_index: SymbolIndex.t;
  (** LHS symbol index. *)

  rhs_indexes: SymbolIndex.t array;
  (** RHS symbol indexes in left-to-right order. *)

  prec: Prec.t option;
  (** Precedence, if any. This is denormalized with respect to the hocc
      specification, such that it is [Some p] regardless of whether precedence
      is specified for just this prod versus all of the nonterm (LHS symbol)
      prods. *)

  stmt: Parse.prod option;
  (** Declaration AST. *)

  reduction: Reduction.t;
  (** Reduction code. *)
}

Was the effort required to implement IELR(1) worthwhile? For the Hemlock project, almost certainly not — myriad false starts imposed an extreme opportunity cost. But IELR(1) is a powerful tool with practical application, and I hope to see it broadly implemented over the coming years. In the meanwhile Hemlock’s grammar specification development will leverage IELR(1), first as a safety tool during prototyping, and later to assure that no LR(1)-relative inadequacies survive in the grammar’s stable form even when generated by the LALR(1) algorithm. PGM(1) is capable of this role only if precedence/associativity are completely avoided, and such an austere grammar development environment is unacceptable to me. I look forward to routinely pulling IELR(1) out of my toolbox and crafting grammars with it.

Citations

  1. Frank DeRemer, “Practical Translators for LR(k) languages”, Ph.D Dissertation, Department of Electrical Engineering, Massachusetts Institute of Technology, Cambridge, 1969. 

  2. Donald Knuth, “On the Translation of Languages from Left to Right”, Information and Control 8(6):607–639, July 1965.  2

  3. David Pager, “A Practical General Method for Constructing LR(k) Parsers”, Acta Informatica 7:249-268, 1977.  2

  4. François Pottier and Yann Régis-Gianas, “Menhir LR(1) Parser Generator,” http://gallium.inria.fr/~fpottier/menhir/ 

  5. Joel E. Denny and Brian A. Malloy, “The IELR(1) algorithm for generating minimal LR(1) parser tables for non-LR(1) grammars with conflict resolution”, Science of Computer Programming, 75(11):943-979, 2010.  2 3