Open Ranges
Hemlock’s bootstrap compiler is being written in OCaml, and at some point we
will be able to compile Hemlock code well enough to write the compiler directly in Hemlock. In order
to make the bootstrap compiler as similar to the native compiler as reasonably possible, we have
implemented a Basis
library in OCaml that is quite similar to the standard library we expect
Hemlock to provide. Until recently the Basis
library used OCaml’s 63-bit int
type as the
implementation for uns
and sint
(the s
in sint
helps avoid confusion with OCaml’s int
), as
well as for u8
through i32
. This started causing complications in the scanner, because in our
OCaml code uns
and u64
were incompatible, yet the compiler needs to treat them as the same.
To reduce impedance mismatch between the bootstrap compiler and Hemlock, we refactored the bootstrap
Basis
to completely avoid OCaml’s 63-bit int
, instead using the always-boxed int64
type.
Thankfully int64
is nearly as ergonomic as int
. int64
constants have an L
suffix, e.g. 0L
and 42L
, and they can be used nearly everywhere int
can be, for example in pattern matching. But
int64
does not work with the for .. {to,downto} .. do .. done
full-open range iteration
construct, because it is monomorphic and supports only int
indexing.
for
loops are ubiquitous in imperative languages, but they don’t get much use in mostly-functional
languages like OCaml and Hemlock. That said, our unit test suite was riddled with for
loops, and
we had to use a helper function like iter
in the following snippet to complete the refactor from
int
to int64
.
(* int-based for loop. *)
for i = 0 to 5 do
...
done
let iter base past f =
let rec fn i past f = begin
match Uns.(i < past) with
| false -> ()
| true -> begin
f i;
fn (Uns.succ i) past f
end
end in
fn base past f
(* int64-based iteration. *)
iter 0L 6L (fun i ->
...
)
We learned two things from this:
for .. {to,downto} .. do .. done
isn’t a very versatile construct in the absence of polymorphism, and we don’t want it in Hemlock.- We need something like
iter
, but with support for more than one type, as well as support for half-open ranges. In fact, half-open ranges are usually (but not always) what we actually want to work with.
Recall that ranges can have open (inclusive) and closed (exclusive) bounds. This gives rise to four combinations of bounds, but only two are needed in practice:
- [base .. past): Closed-open ranges are half-open. So are open-closed ranges — (before .. last] — but we don’t directly use those.
- [base ..= last]: Open-open ranges are full-open.
OCaml’s for
loops implement full-open ranges, but for Hemlock we want half-open ranges to be
best-supported. To that end we chose two infix operators inspired by Rust
ranges:
base .. past
: Half-open range, [base
..past
), supported by theRangeH.*
submodules.base ..= last
: Full-open range, [base
..last
], supported by theRangeF.*
submodules.
The overwhelmingly most common use case for ranges is implemented as RangeH.Uns
, for which Range
is an alias. Furthermore the range
type and the ..
operator are provided by Rudiments
, so an
application can operate on ranges as follows:
open import Basis
Range.iter (0 .. 5) ~f:fun i ->
# i in [0..5)...
Range.iter_right (0 .. 5) ~f:fun i ->
# i in (5..0]...
Ranges are in fact collections, so other functions like fold
, length
and mem
are also
available.
With the ..
operator in the default lexical environment we soon realized that ranges improve the
API for several common collection initialization functions, for example Array.init
:
# Without ranges, the uns parameter is equivalent to (0 .. past).
# val init 'a ^t >e: uns -> f:(uns >e-> a) >e-> ^&t a
let arr = Array.init 10 ~f:(fun i -> i + 42)
# With ranges.
# val init 'a ^t >e: range -> f:(uns >e-> a) >e-> ^&t a
let arr = Array.init (42..52) ~f:(fun i -> i)
Ranges are especially nice for functions like Array.pare
:
# Without ranges.
# val pare 'a ^t: base:uns -> past:uns -> _&t a -> ^&t a
let pare ~base ~past t =
assert (base <= past)
init (past - base) ~f:(fun i -> get (base + i) t)
# With ranges.
# val pare 'a ^t: range -> _&t -> ^&t a
let pare range t =
init range ~f:(fun i -> get i t)
Rust’s polymorphic dispatch facilities allow it to go much further with range variants than Hemlock
can cleanly support with only monomorphic dispatch. In Hemlock it would not work to write something
like Array.init (0 ..= 4) ...
nor Array.init ( .. 5) ...
. On the bright side, Hemlock’s ranges
are robust with regard to wrap-around as well as length reporting.
hemlock> RangeH.U8.(to_list (254u8 .. 2u8))
_ : list u8 = [254u8; 255u8; 0u8; 1u8]
hemlock> RangeH.U8.(length (0u8 .. 255u8))
_ : u8 = 255u8
hemlock> RangeF.U8.(length (0u8 ..= 254u8))
_ : RangeIntf.l u8 = RangeIntf.Length 255u8
hemlock> RangeF.U8.(length (0u8 ..= 255u8))
_ : RangeIntf.l u8 = RangeIntf.Overflow
The length reporting approach might not seem all that useful in the context of u8
, but it’s pretty
important to report the length of a RangeF.U256.t
as type RangeIntf.l u256
rather than as type
uns
!