BranchTaken

Hemlock language insights

Stitches in Time Save RAM

Skeins

The hocc parser generator implementation can now compute the canonical collection of LR(1) items (𝐶𝐶 for short), as briefly introduced in a previous post regarding follow sets. All seems well with 𝐶𝐶 generation, but some of the larger stress tests tripped on an inefficiency in Hemlock’s naïve string formatting implementation. The initial solution struck me as elegant, if overly clever, but things took a surprising turn while drafting this blog post. Read on for details.

Hemlock’s string formatting design is introduced in an earlier post. The bootstrap compiler is being implemented in OCaml, and we have a string formatting implementation in OCaml, minus the interpolated string syntax that Hemlock will provide. Formatters are first-class modules that can either write their contents incrementally to files or generate single strings representing all the formatted contents. The original String.Fmt implementation was exceedingly simple.

module Fmt = struct
  let empty : (module Fmt.Formatter) =
    (module struct
      type t = string list
      let state = []
      let fmt s t =
        s :: t
      let sync t =
        Fmt.To_string (join_rev t)
    end)
end

There’s some scary syntax at the fringes thanks to how first-class modules were retrofitted into OCaml, but all that matters for our purposes is that fmt pushes strings onto a list, and sync generates the final string output by joining the strings in the list. That’s great… as long as there aren’t millions of short strings in the list. For small strings the list and per string overheads are overwhelming relative to the raw string contents.

A variety of solutions come to mind, for example:

I initially chose a complicated variation of the exponential increase invariant that reduced per fmt join overhead variability by delaying some joins, but benchmarking prompted simplifications. In that simplified implementation, each element in the list is limited to contain no more than 2^depth bytes of data, with the caveat that some elements may be missing. For example, there may be an element limited to 16 bytes followed by an element limited to 128 bytes; the 32- and 64-byte logical elements are empty, and therefore absent in the list.

The exponentially increasing element size approach is compelling because it is scale-free. In other words, it behaves consistently regardless of the size distribution for fmt inputs. Unfortunately it also increases algorithmic complexity from O(n) to O(n lg n), where n is the total number of formatted bytes. Even so, I was willing to trade off a bit of speed (0.9X as fast) for the 4X memory usage reduction it afforded.

But wait, how about the batching approaches? Full disclosure: I didn’t even try them until drafting this post. But batching turns out to be much faster (1.5X faster), presumably due to reduced garbage collection overheads, still with a 4X memory usage reduction. And the implementation is simple enough to show here.

module Fmt = struct
  let empty : (module Fmt.Formatter) =
    (module struct
      let batch_count = 128L
      type t = {
        batch_rem: uns;
        elms: string list;
      }
      let state = {batch_rem=batch_count; elms=[]}
      let fmt s {batch_rem; elms} =
        match batch_rem with
        | 0L -> begin
            (* Join a batch of inputs to limit list and per string overheads. *)
            let batch, elms' = List.rev_split batch_count elms in
            {batch_rem=pred batch_count; elms=s :: (join batch) :: elms'}
          end
        | _ -> {batch_rem=pred batch_rem; elms=s :: elms}
      let sync {elms; _} =
        Fmt.To_string (join_rev elms)
    end)
end

It’s a good thing I had my ducks in a row today.

Ducks

Rubber ducks lording over Hemlock development