Formatting Creature Comforts
In a recent blog post I described our design for
string formatting, which is also the basis for formatted file output. We have since implemented the
underlying infrastructure in our OCaml-based bootstrap Basis
library and transitioned all of our
file output to use it. There were a couple surprises along the way, as well as one planned syntactic
enhancement that is part of the syntax I’m currently working on adapting the lexical scanner to
support.
Surprises
pretty
Our original design called for using #
to indicate “alternate” formatting, e.g. prefixing
hexadecimal integers with 0x
and interspersing _
digit separators, but unconditionally
formatting the type suffix. However as I started using the formatting API I ran into several places
where I needed to omit the type suffix in the output, but tying the type suffix output to #
caused
a different set of problems. This prompted the addition of the p
“pretty” specifier.
hemlock> "%#xu32(^42u32^)"
- : string = "0x2a"
hemlock> "%#xpu32(^42u32^)"
- : string = "0x2au32"
hemlock> "%#08xpu32(^42u32^)"
- : string = "0x0000_002au32"
precision mode
The other surprise had to do with compact representation of floating point values. Hemlock supports
bit-precise output in binary, octal, and hexadecimal, but unlike printf
-esque formatting APIs
which bolted on hexadecimal output long after decimal output (with all the weird inconsistencies you
might imagine that entails), Hemlock is uniformly supporting multiple bases. We made this choice so
that bit-precise output doesn’t require heroics; it merely requires using a power-of-two base.
The problem we ran into is that we needed control over whether to omit trailing zeros for any base. Our solution was to make the default precision mode “limited” (omit trailing zeros), whereas “fixed” precision mode unconditionally formats all digits.
hemlock> "%.3r(^4.2^)"
- : real = "4.2"
hemlock> "%.=3r(^4.2^)"
- : real = "4.200"
A creature comfort
As Cameron reviewed the formatting diffs he noticed an unfortunate boilerplate pattern.
type t =
x: uns
s: string
let pp t formatter =
formatter |> Fmt.fmt "{x=%pu(^t.x^); s=%ps(^t.s^)}"
# ^^ ^^
The redundancy is not so terrible, but the fact that the compiler can’t help us catch mismatches
during refactors is a real concern. So we added an optional “separator” in format specifiers, which
changes format specifier output from <evaluated expression>
to <raw
expression><separator><evaluated expression>
.
let pp {x; s} formatter =
formatter |> Fmt.fmt "{%pu=(^x^); %ps=(^s^)}"
# ^ ^
What I’m really excited about though is that this will dramatically streamline writing throwaway debugging output code.
let add x y =
File.Fmt.stdout |> "%u = (^x + y^)\n" |> ignore # add 3 4 -> "x + y = 7\n"
x + y
This feature alone would have already saved me dozens of hours over the course of my programming career, if only I had been programming in Hemlock!