BranchTaken

Hemlock language insights

Hardly Original Compiler Compiler

Cooling towers

hocc is an LR(1) parser generator. Its name carries on a long tradition, to wit:

  • yacc stands for “Yet Another Compiler Compiler”. Clearly the name derives from “yack”, as in, “Chuck’s dinner didn’t sit well and he yacked it.”
  • hocc stands for “Hardly Original Compiler Compiler”. The name derives from “hock”, as in, “Hank hocked a loogie.”

Both programs interpret high-level human-written parser descriptions and produce output unfit for human consumption.

So begins the documentation I have been writing for the hocc parser generator, which will provide the foundation for Hemlock’s parser. It’s important to be clear about the imagery associated with names, lest people be confused and imagine that, say, hocc alludes to the hawk. Indeed, yacc to this day is confusingly associated with the yak, which inevitably leads to uncomfortable yak shaving conversations.

No, hocc is not a cute and cuddly parser generator, rather it’s something of an environmentally sustainable nuclear option [1]. For reasons described in an earlier blog post, the Hemlock compiler will utilize LR(1) parsing. This is a work in progress, so for now I just want to give an overview of how hocc will syntactically integrate with Hemlock. Consider the following example that specifies a simple expression evaluator for expressions like 2 + 3 * 4:

open import Basis

include hocc
    left add
    left mul > add

    token PLUS prec add
    token MINUS prec add
    token STAR prec mul
    token SLASH prec mul
    token INT of Int.t
    token EOI # End of input.

    nonterm AddOp of Token.t ::=
      | _:PLUS -> PLUS
      | _:MINUS -> MINUS

    nonterm MulOp of Token.t ::=
      | _:STAR -> STAR
      | _:SLASH -> SLASH

    nonterm Expr of Int.t ::=
      | x:INT -> x
      | e0:Expr op:AddOp e1:Expr prec add ->
        match op with
          | AddOp PLUS -> Int.(e0 + e1)
          | AddOp MINUS -> Int.(e0 - e1)
          | _ -> not_reached ()
      | e0:Expr op:MulOp e1:Expr prec mul ->
        match op with
          | MulOp STAR -> Int.(e0 * e1)
          | MulOp SLASH -> Int.(e0 / e1)
          | _ -> not_reached ()

    start Answer of Int.t ::=
      | e:Expr _:EOI -> e

The example includes some intentional design quirks in order to succinctly demonstrate features. For example, it uses precedence-based conflict resolution to assure that 2 + 3 * 4 is parsed as 2 + (3 * 4) rather than (2 + 3) * 4.

  Correct       Incorrect
    +               *
   / \             / \
  2   *           +   4
     / \         / \
    3   4       2   3

This is all basic stuff for parser generators. For now the thing to note is how much like Hemlock code the parser specification looks. This turns out to be surprisingly easy to implement using a thin wrapper around Hemlock’s lexical scanner. hocc adds a handful of new keywords:

The hocc parser of course needs to parse statements which begin with these keywords, but the embedded Hemlock code blocks can essentially be treated as black boxes, thanks to the indentation-defined code structure. The result is a BNF-inspired domain-specific language that is so tightly integrated with Hemlock syntax that the seams barely show.

Footnotes