Hardly Original Compiler Compiler
hoccis an LR(1) parser generator. Its name carries on a long tradition, to wit:
yaccstands for “Yet Another Compiler Compiler”. Clearly the name derives from “yack”, as in, “Chuck’s dinner didn’t sit well and he yacked it.”
hoccstands for “Hardly Original Compiler Compiler”. The name derives from “hock”, as in, “Hank hocked a loogie.”
Both programs interpret high-level human-written parser descriptions and produce output unfit for human consumption.
So begins the documentation I have been writing for the
hocc parser generator, which will provide
the foundation for Hemlock’s parser. It’s important to be clear about the imagery associated with
names, lest people be confused and imagine that, say,
hocc alludes to the
yacc to this day is confusingly associated
with the yak, which inevitably leads to uncomfortable yak
hocc is not a cute and cuddly parser generator, rather it’s something of an environmentally
sustainable nuclear option . For reasons described in an earlier blog
post, the Hemlock compiler will utilize
LR(1) parsing. This is a work in progress, so
for now I just want to give an overview of how
hocc will syntactically integrate with Hemlock.
Consider the following example that specifies a simple expression evaluator for expressions like
2 + 3 * 4:
open import Basis include hocc left add left mul > add token PLUS prec add token MINUS prec add token STAR prec mul token SLASH prec mul token INT of Int.t token EOI # End of input. nonterm AddOp of Token.t ::= | _:PLUS -> PLUS | _:MINUS -> MINUS nonterm MulOp of Token.t ::= | _:STAR -> STAR | _:SLASH -> SLASH nonterm Expr of Int.t ::= | x:INT -> x | e0:Expr op:AddOp e1:Expr prec add -> match op with | AddOp PLUS -> Int.(e0 + e1) | AddOp MINUS -> Int.(e0 - e1) | _ -> not_reached () | e0:Expr op:MulOp e1:Expr prec mul -> match op with | MulOp STAR -> Int.(e0 * e1) | MulOp SLASH -> Int.(e0 / e1) | _ -> not_reached () start Answer of Int.t ::= | e:Expr _:EOI -> e
The example includes some intentional design quirks in order to succinctly demonstrate features. For
example, it uses precedence-based conflict resolution to assure that
2 + 3 * 4 is parsed as
2 + (3 * 4) rather than
(2 + 3) * 4.
Correct Incorrect + * / \ / \ 2 * + 4 / \ / \ 3 4 2 3
This is all basic stuff for parser generators. For now the thing to note is how much like Hemlock
code the parser specification looks. This turns out to be surprisingly easy to implement using a
thin wrapper around Hemlock’s lexical scanner.
hocc adds a handful of new keywords:
- Empty production:
hocc parser of course needs to parse statements which begin with these keywords, but the
embedded Hemlock code blocks can essentially be treated as black boxes, thanks to the
indentation-defined code structure. The result is a
language that is so tightly integrated with
Hemlock syntax that the seams barely show.