"String Formatting in %s(^lang^)"
Most programs need to generate some form of textual output, whether for direct human consumption or
for structured logging. The C programming
language provides printf, which has served
as inspiration for many other languages including OCaml and
older versions of
Python. However,
printf
is a brittle API because its format specifiers are (potentially dynamic) templates which
are interpreted at run time. Consider this C code as an egregious example of how printf
can fail.
printf("String Formatting in %s\n", "C"); // Okay.
const char *fmt = "String Formatting in %s is %s\n";
printf(fmt, "C", "dynamic"); // Compiler doesn't validate fmt.
printf(fmt, "C"/* brittle */); // Uh oh.
Anyone who uses a modern C compiler knows that the compiler does format specifier validation for specially designated function calls to protect against specifier/parameter mismatches, but this C code trivially thwarts the validation. Given sufficient care, this is a manageable problem, but it is a problem nonetheless.
OCaml has an interesting solution that seems quite elegant in conjunction with static type inference: statically compute the specifier/parameter types and require that they match. How does OCaml actually pull this off though? The answer is surprisingly complicated, as hinted at by the compiler error generated for the final source line in the following.
let _ = printf "String Formatting in %s is %s\n" "OCaml" "static"
let fmt = "String Formatting in %s is %s\n"
let _ = printf fmt "OCaml" "subtle" (* Doesn't compile. *)
The error shows that the format specifier is actually being treated as a function rather than as a string!
10 | let _ = printf fmt "OCaml" "subtle"
^^^
Error: This expression has type string but an expression was expected of type
('a -> 'b -> 'c, out_channel, unit) format =
('a -> 'b -> 'c, out_channel, unit, unit, unit, unit)
CamlinternalFormatBasics.format6
This sort of transformation obviously cannot happen unless the compiler knows that the string will be used as a specifier; by separating the specifier from its use, we have broken the façade.
Enter enhanced string interpolation
In Hemlock we wanted to make it impossible to break the formatting abstraction, and the most obvious solution was to make the specifier and parameters syntactically inseparable. We looked to Python’s f-strings for a modern take on string interpolation, and after several iterations settled on a design which fits well with Hemlock’s type system.
We started off with a clean syntax that turned out to be quite difficult to parse.
let lang = "Hemlock"
"String Formatting in %(lang)"
"String Formatting %(" with %("nesting") is hard to parse")"
Nested code requires nested parsers in order to recognize the terminating )
in %(...)
, which
would mean parser feedback during scanning. This was unacceptable because, for example, text editors
would not be able to reliably perform token-based syntax highlighting.
Our next design resolved the parsing challenge by introducing distinct delimiters for parameters.
"String Formatting in %(^"Hemlock"^)"
"%(^3^) %(^true^) examples of %(^"challenging"^) type inference"
Unfortunately, this design required type inference feedback during desugaring. The following syntax removes that complication.
"%#xu(^3^) %b(^true^) examples of %s(^"no need for"^) type inference"
# Equivalent desugared form.
Fmt.to_string
String.Fmt.empty
|> Uns.fmt ~alt:true ~base:Fmt.Hex (3)
|> Fmt.fmt " "
|> Bool.fmt (true)
|> Fmt.fmt " examples of "
|> String.fmt ("no need for")
|> Fmt.fmt " type inference"
General formatting
OCaml’s format strings as used with its Printf
and
Format
modules look a lot like those supported by C’s
printf
. However they also allow the application to integrate custom per parameter formatting,
which we find immensely useful for printing syntactically valid representations of lists, arrays,
etc. Hemlock’s interpolated strings provide a similar ability.
let al = [|
["a"; "b"]
["c"; "d"]
|]
# Formatting matches that of the source code above.
"let al = %#f(^Array.fmt (List.fmt String.pp) al^)"
The interesting thing to note here is that we’re using %...f(^...^)
to call a partially applied
fmt
function with the signature val fmt >e: Fmt.Formatter e -> Fmt.Formatter e
. We can nest
any conforming code, including custom formatters for application data. And because interpolated
string parameters support arbitrary nesting, the formatting can be arbitrarily sophisticated. Our
uses thus far have been rather pedestrian, but we can easily imagine using this language facility in
the future to generate html,
JSON,
S-expressions, etc.