BranchTaken

Hemlock language insights

Open Ranges

Open range

Hemlock’s bootstrap compiler is being written in OCaml, and at some point we will be able to compile Hemlock code well enough to write the compiler directly in Hemlock. In order to make the bootstrap compiler as similar to the native compiler as reasonably possible, we have implemented a Basis library in OCaml that is quite similar to the standard library we expect Hemlock to provide. Until recently the Basis library used OCaml’s 63-bit int type as the implementation for uns and sint (the s in sint helps avoid confusion with OCaml’s int), as well as for u8 through i32. This started causing complications in the scanner, because in our OCaml code uns and u64 were incompatible, yet the compiler needs to treat them as the same.

To reduce impedance mismatch between the bootstrap compiler and Hemlock, we refactored the bootstrap Basis to completely avoid OCaml’s 63-bit int, instead using the always-boxed int64 type. Thankfully int64 is nearly as ergonomic as int. int64 constants have an L suffix, e.g. 0L and 42L, and they can be used nearly everywhere int can be, for example in pattern matching. But int64 does not work with the for .. {to,downto} .. do .. done full-open range iteration construct, because it is monomorphic and supports only int indexing.

for loops are ubiquitous in imperative languages, but they don’t get much use in mostly-functional languages like OCaml and Hemlock. That said, our unit test suite was riddled with for loops, and we had to use a helper function like iter in the following snippet to complete the refactor from int to int64.

(* int-based for loop. *)
for i = 0 to 5 do
  ...
done

let iter base past f =
  let rec fn i past f = begin
    match Uns.(i < past) with
    | false -> ()
    | true -> begin
        f i;
        fn (Uns.succ i) past f
      end
  end in
  fn base past f

(* int64-based iteration. *)
iter 0L 6L (fun i ->
  ...
)

We learned two things from this:

Recall that ranges can have open (inclusive) and closed (exclusive) bounds. This gives rise to four combinations of bounds, but only two are needed in practice:

OCaml’s for loops implement full-open ranges, but for Hemlock we want half-open ranges to be best-supported. To that end we chose two infix operators inspired by Rust ranges:

The overwhelmingly most common use case for ranges is implemented as RangeH.Uns, for which Range is an alias. Furthermore the range type and the .. operator are provided by Rudiments, so an application can operate on ranges as follows:

open import Basis

Range.iter (0 .. 5) ~f:fun i ->
    # i in [0..5)...

Range.iter_right (0 .. 5) ~f:fun i ->
    # i in (5..0]...

Ranges are in fact collections, so other functions like fold, length and mem are also available.

With the .. operator in the default lexical environment we soon realized that ranges improve the API for several common collection initialization functions, for example Array.init:

# Without ranges, the uns parameter is equivalent to (0 .. past).
# val init 'a ^t >e: uns -> f:(uns >e-> a) >e-> ^&t a
let arr = Array.init 10 ~f:(fun i -> i + 42)

# With ranges.
# val init 'a ^t >e: range -> f:(uns >e-> a) >e-> ^&t a
let arr = Array.init (42..52) ~f:(fun i -> i)

Ranges are especially nice for functions like Array.pare:

# Without ranges.
# val pare 'a ^t: base:uns -> past:uns -> _&t a -> ^&t a
let pare ~base ~past t =
    assert (base <= past)
    init (past - base) ~f:(fun i -> get (base + i) t)

# With ranges.
# val pare 'a ^t: range -> _&t -> ^&t a
let pare range t =
    init range ~f:(fun i -> get i t)

Rust’s polymorphic dispatch facilities allow it to go much further with range variants than Hemlock can cleanly support with only monomorphic dispatch. In Hemlock it would not work to write something like Array.init (0 ..= 4) ... nor Array.init ( .. 5) .... On the bright side, Hemlock’s ranges are robust with regard to wrap-around as well as length reporting.

hemlock> RangeH.U8.(to_list (254u8 .. 2u8))
_ : list u8 = [254u8; 255u8; 0u8; 1u8]

hemlock> RangeH.U8.(length (0u8 .. 255u8))
_ : u8 = 255u8

hemlock> RangeF.U8.(length (0u8 ..= 254u8))
_ : RangeIntf.l u8 = RangeIntf.Length 255u8

hemlock> RangeF.U8.(length (0u8 ..= 255u8))
_ : RangeIntf.l u8 = RangeIntf.Overflow

The length reporting approach might not seem all that useful in the context of u8, but it’s pretty important to report the length of a RangeF.U256.t as type RangeIntf.l u256 rather than as type uns!