BranchTaken

Hemlock language insights

"String Formatting in %s(^lang^)"

String Formatting in Hemlock

Most programs need to generate some form of textual output, whether for direct human consumption or for structured logging. The C programming language provides printf, which has served as inspiration for many other languages including OCaml and older versions of Python. However, printf is a brittle API because its format specifiers are (potentially dynamic) templates which are interpreted at run time. Consider this C code as an egregious example of how printf can fail.

printf("String Formatting in %s\n", "C"); // Okay.

const char *fmt = "String Formatting in %s is %s\n";
printf(fmt, "C", "dynamic"); // Compiler doesn't validate fmt.
printf(fmt, "C"/* brittle */); // Uh oh.

Anyone who uses a modern C compiler knows that the compiler does format specifier validation for specially designated function calls to protect against specifier/parameter mismatches, but this C code trivially thwarts the validation. Given sufficient care, this is a manageable problem, but it is a problem nonetheless.

OCaml has an interesting solution that seems quite elegant in conjunction with static type inference: statically compute the specifier/parameter types and require that they match. How does OCaml actually pull this off though? The answer is surprisingly complicated, as hinted at by the compiler error generated for the final source line in the following.

let _ = printf "String Formatting in %s is %s\n" "OCaml" "static"

let fmt = "String Formatting in %s is %s\n"
let _ = printf fmt "OCaml" "subtle" (* Doesn't compile. *)

The error shows that the format specifier is actually being treated as a function rather than as a string!

10 | let _ = printf fmt "OCaml" "subtle"
                    ^^^
                    Error: This expression has type string but an expression was expected of type
                             ('a -> 'b -> 'c, out_channel, unit) format =
                               ('a -> 'b -> 'c, out_channel, unit, unit, unit, unit)
                               CamlinternalFormatBasics.format6

This sort of transformation obviously cannot happen unless the compiler knows that the string will be used as a specifier; by separating the specifier from its use, we have broken the façade.

Enter enhanced string interpolation

In Hemlock we wanted to make it impossible to break the formatting abstraction, and the most obvious solution was to make the specifier and parameters syntactically inseparable. We looked to Python’s f-strings for a modern take on string interpolation, and after several iterations settled on a design which fits well with Hemlock’s type system.

We started off with a clean syntax that turned out to be quite difficult to parse.

let lang = "Hemlock"

"String Formatting in %(lang)"

"String Formatting %(" with %("nesting") is hard to parse")"

Nested code requires nested parsers in order to recognize the terminating ) in %(...), which would mean parser feedback during scanning. This was unacceptable because, for example, text editors would not be able to reliably perform token-based syntax highlighting.

Our next design resolved the parsing challenge by introducing distinct delimiters for parameters.

"String Formatting in %(^"Hemlock"^)"

"%(^3^) %(^true^) examples of %(^"challenging"^) type inference"

Unfortunately, this design required type inference feedback during desugaring. The following syntax removes that complication.

"%#xu(^3^) %b(^true^) examples of %s(^"no need for"^) type inference"

# Equivalent desugared form.
Fmt.to_string 
  String.Fmt.empty
  |> Uns.fmt ~alt:true ~base:Fmt.Hex (3)
  |> Fmt.fmt " "
  |> Bool.fmt (true)
  |> Fmt.fmt " examples of "
  |> String.fmt ("no need for")
  |> Fmt.fmt " type inference"

General formatting

OCaml’s format strings as used with its Printf and Format modules look a lot like those supported by C’s printf. However they also allow the application to integrate custom per parameter formatting, which we find immensely useful for printing syntactically valid representations of lists, arrays, etc. Hemlock’s interpolated strings provide a similar ability.

let al = [|
    ["a"; "b"]
    ["c"; "d"]
  |]

# Formatting matches that of the source code above.
"let al = %#f(^Array.fmt (List.fmt String.pp) al^)"

The interesting thing to note here is that we’re using %...f(^...^) to call a partially applied fmt function with the signature val fmt >e: Fmt.Formatter e -> Fmt.Formatter e. We can nest any conforming code, including custom formatters for application data. And because interpolated string parameters support arbitrary nesting, the formatting can be arbitrarily sophisticated. Our uses thus far have been rather pedestrian, but we can easily imagine using this language facility in the future to generate html, JSON, S-expressions, etc.