BranchTaken

Hemlock language insights

The Lone Ranges

Mosquito Lake

Hemlock is a mostly functional programming language in which tail recursion is ubiquitous, so it should come as no surprise that Hemlock has no while loop construct. Perhaps a bit surprising though is that it doesn’t even have a for loop construct like OCaml’s for ... {to,downto} ... do ... done. We have instead been using ranges, the subject of a previous blog post. Unfortunately there was a rough edge in the original design, but there is a fix! Read on for details.

The problem

It can be useful to work with half-open or full-open ranges, depending on context, and it is not possible to represent all relevant ranges with only one kind of range. For example, [0..0) is an empty half-open range, which is impossible to represent as a full-open range — [0..0] contains the value 0. Conversely, [0..max_value] contains all values, which cannot be represented as a half-open range. It’s also important to note that the length of the maximal full-open range is one more than max_value, i.e. it overflows the relevant integer type, uns in this example.

We initially decided to solve the half-open vs full-open range problem by making them distinct types. That is, for every type in Hemlock’s integer numerical tower, there was a corresponding half-open range type and a full-open range type. For example, for the Uns.t type (aka uns), there was RangeH.Uns.t and RangeF.Uns.t. But this posed a problem in that functions like Array.init could only support half-open ranges. That’s okay most of the time, but sometimes a range is more naturally expressed as full-open.

# The easy way to create an array of uns values.
a = RangeH.Uns.to_array (0..5)
# [|0; 1; 2; 3; 4|]

# Equivalent to the above.
a' = Array.init (0..5) ~f:id

# A natural use of full-open ranges.
b = RangeF.Uns.to_array (1..=5)
# [|1; 2; 3; 4; 5|]

# Non-obvious simulation of full-open ranges.
b' = Array.init (1..(succ 5)) ~f:id
b'' = Array.init (1..6) ~f:id

Another problem with the .. and ..= operators in the original design is that they don’t fit well with pattern matching. At the time we had some plans for supporting pure function application in patterns rather than supporting full-open ranges (e.g. 'a'..='z'), but that pattern matching design turned out to be be ill-defined. And in contrast to most other code, full-open ranges are the better default for patterns. The need for ..= rather than .. in pattern match syntax would have been a clear POLA violation, which left Hemlock with no good default choice for ranges.

The solution

Types

There is no fundamental problem with merging half-open and full-open range types into one type, as long as the merged type encodes in each value whether the range is half-open or full-open. Where we previously had past vs last limits in separate types we now conceptually have a single variant type [1]. Furthermore, rather than past/last functions, there is a limit function that accommodates both range variants.

type t 'a: t a =
  | RangeH of {base: a; past: a}
  | RangeF of {base: a; last: a}

type limit 'a: limit a =
  | Excl of a
  | Incl of a

limit 'a: t a -> limit a

type length 'a: length a =
  | Overflow
  | Length of a

length 'a: t a -> length a

# Halt on overflow. Overflow is impossible for half-open ranges, and many use cases would halt
# anyway due to memory exhaustion long before overflow became an issue.
length_hlt 'a: t a -> a

Obviously it’s a bit more complicated to use limit than it was to use past, but the majority of range-related code does not deal with these functions, rather it creates a range and uses container-standard iteration functions like fold and iter.

Syntax

The ../..= syntax is problematic due to the lack of obvious default range variant. A simple, if slightly ugly, solution would be to use ..</..=. Indeed, those are the functions that Hemlock now uses as the basis for range construction, but with a bit of syntactic sugar things look pretty good! Hemlock will desugar [base..past) to (..<) base past and [base..last] to (..=) base last.

# The easy way to create an array of uns values.
a = Range.Uns.to_array [0..5)
# [|0; 1; 2; 3; 4|]

# Equivalent to the above.
a' = Array.init [0..5) ~f:id

# A natural use of full-open ranges.
b = Range.Uns.to_array [1..5]
# [|1; 2; 3; 4; 5|]

# Equivalent to the above.
b' = Array.init [1..5] ~f:id

And only the sugared syntax is supported in pattern matching.

is_cident_cp = function
  | '_'
  | ['A'..'Z']
  | ['a'..'z']
  | ['0'..'9'] -> true
  | _ -> false

Footnotes

  1. The range types are actually monomorphized via functor generation in the current implementation, but the concepts hold in the simplified polymorphic presentation.