Stitches in Time Save RAM
The hocc
parser generator
implementation can now compute the canonical collection of LR(1) items (š¶š¶ for short), as briefly
introduced in a previous post regarding follow
sets. All seems well with š¶š¶ generation, but some of
the larger stress tests tripped on an inefficiency in Hemlockās naĆÆve string formatting
implementation. The initial solution struck me as elegant, if overly clever, but things took a
surprising turn while drafting this blog post. Read on for details.
Hemlockās string formatting design is introduced in an earlier
post. The bootstrap compiler is being implemented
in OCaml, and we have a string formatting implementation in OCaml, minus the
interpolated string syntax that Hemlock will
provide. Formatters are first-class modules
that can either write their contents incrementally to files or generate single strings representing
all the formatted contents. The original String.Fmt
implementation was exceedingly simple.
module Fmt = struct
let empty : (module Fmt.Formatter) =
(module struct
type t = string list
let state = []
let fmt s t =
s :: t
let sync t =
Fmt.To_string (join_rev t)
end)
end
Thereās some scary syntax at the fringes thanks to how first-class modules were retrofitted into
OCaml, but all that matters for our purposes is that fmt
pushes strings onto a list, and sync
generates the final string output by joining the strings in the list. Thatās greatā¦ as long as
there arenāt millions of short strings in the list. For small strings the list and per string
overheads are overwhelming relative to the raw string contents.
A variety of solutions come to mind, for example:
- Every
m
bytes of input tofmt
, join a batch comprising the most recently added elements which add up to at leastm
bytes of string data. - Every
m
calls tofmt
, join a batch of the firstm
elements. - Maintain an invariant on the list that the element sizes exponentially increase.
I initially chose a complicated variation of the exponential increase invariant that reduced per
fmt
join overhead variability by delaying some joins, but benchmarking prompted simplifications.
In that simplified implementation, each element in the list is limited to contain no more than
2^depth
bytes of data, with the caveat that some elements may be missing. For example, there may
be an element limited to 16 bytes followed by an element limited to 128 bytes; the 32- and 64-byte
logical elements are empty, and therefore absent in the list.
The exponentially increasing element size approach is compelling because it is scale-free. In other
words, it behaves consistently regardless of the size distribution for fmt
inputs. Unfortunately
it also increases algorithmic complexity from O(n)
to O(n lg n)
, where n
is the total number
of formatted bytes. Even so, I was willing to trade off a bit of speed (0.9X as fast) for the 4X
memory usage reduction it afforded.
But wait, how about the batching approaches? Full disclosure: I didnāt even try them until drafting this post. But batching turns out to be much faster (1.5X faster), presumably due to reduced garbage collection overheads, still with a 4X memory usage reduction. And the implementation is simple enough to show here.
module Fmt = struct
let empty : (module Fmt.Formatter) =
(module struct
let batch_count = 128L
type t = {
batch_rem: uns;
elms: string list;
}
let state = {batch_rem=batch_count; elms=[]}
let fmt s {batch_rem; elms} =
match batch_rem with
| 0L -> begin
(* Join a batch of inputs to limit list and per string overheads. *)
let batch, elms' = List.rev_split batch_count elms in
{batch_rem=pred batch_count; elms=s :: (join batch) :: elms'}
end
| _ -> {batch_rem=pred batch_rem; elms=s :: elms}
let sync {elms; _} =
Fmt.To_string (join_rev elms)
end)
end
Itās a good thing I had my ducks in a row today.
Rubber ducks lording over Hemlock development