BranchTaken

Hemlock language insights

Formatting Creature Comforts

String Formatting in Hemlock

In a recent blog post I described our design for string formatting, which is also the basis for formatted file output. We have since implemented the underlying infrastructure in our OCaml-based bootstrap Basis library and transitioned all of our file output to use it. There were a couple surprises along the way, as well as one planned syntactic enhancement that is part of the syntax I’m currently working on adapting the lexical scanner to support.

Surprises

pretty

Our original design called for using # to indicate “alternate” formatting, e.g. prefixing hexadecimal integers with 0x and interspersing _ digit separators, but unconditionally formatting the type suffix. However as I started using the formatting API I ran into several places where I needed to omit the type suffix in the output, but tying the type suffix output to # caused a different set of problems. This prompted the addition of the p “pretty” specifier.

hemlock> "%#xu32(^42u32^)"
- : string = "0x2a"

hemlock> "%#xpu32(^42u32^)"
- : string = "0x2au32"

hemlock> "%#08xpu32(^42u32^)"
- : string = "0x0000_002au32"

precision mode

The other surprise had to do with compact representation of floating point values. Hemlock supports bit-precise output in binary, octal, and hexadecimal, but unlike printf-esque formatting APIs which bolted on hexadecimal output long after decimal output (with all the weird inconsistencies you might imagine that entails), Hemlock is uniformly supporting multiple bases. We made this choice so that bit-precise output doesn’t require heroics; it merely requires using a power-of-two base.

The problem we ran into is that we needed control over whether to omit trailing zeros for any base. Our solution was to make the default precision mode “limited” (omit trailing zeros), whereas “fixed” precision mode unconditionally formats all digits.

hemlock> "%.3r(^4.2^)"
- : real = "4.2"

hemlock> "%.=3r(^4.2^)"
- : real = "4.200"

A creature comfort

As Cameron reviewed the formatting diffs he noticed an unfortunate boilerplate pattern.

type t =
    x: uns
    s: string

let pp t formatter =
    formatter |> Fmt.fmt "{x=%pu(^t.x^); s=%ps(^t.s^)}"
    #                      ^^            ^^

The redundancy is not so terrible, but the fact that the compiler can’t help us catch mismatches during refactors is a real concern. So we added an optional “separator” in format specifiers, which changes format specifier output from <evaluated expression> to <raw expression><separator><evaluated expression>.

let pp {x; s} formatter =
    formatter |> Fmt.fmt "{%pu=(^x^); %ps=(^s^)}"
    #                         ^          ^

What I’m really excited about though is that this will dramatically streamline writing throwaway debugging output code.

let add x y =
    File.Fmt.stdout |> "%u = (^x + y^)\n" |> ignore # add 3 4 -> "x + y = 7\n"
    x + y

This feature alone would have already saved me dozens of hours over the course of my programming career, if only I had been programming in Hemlock!