Explicit Signage

Integers seem like a simple concept. Start with natural numbers (ℕ⁺), incorporate 0 (ℕ⁰), then add a sign to create integers (ℤ). In theory, integers range from -∞ to ∞, but computers currently lack infinite memory, so we have to compromise in implementation. And compromise we do! Some high-level scripting languages try pretty hard to make integers behave like the mathematical definition, but at great computational cost. Most systems programming languages instead utilize high-performance CPU features to provide some combination of limited-range 2’s complement signed integers and unsigned integers which provide modular arithmetic semantics (i.e. fixed-width wrap-around behavior).

It’s bad enough that signed integers in languages like C and C++ have undefined underflow/overflow semantics. Adding insult to injury, it’s common for languages to implicitly convert between types of differing bitwidths! This can produce some very surprising results because the implicit conversion doesn’t consistently happen soon enough to maximize the fidelity of intermediate values.

For an interesting example of how implicit conversion can go wrong, see the analysis of Stagefright vulnerabilities in this blog post. The blog post presents an alternative solution to integer conversion, and is worth reading in its entirety.

Hemlock takes a rigid approach to integer types that resolves some historical baggage and removes footguns.

A broad range of fixed-width unsigned/signed integer types are built in, currently powers of two from 8 through 512, e.g. u16 and i256. In addition nearly-arbitrary-precision (max ~2³² bits) nat (ℕ⁰) and zint (ℤ) types are built in.
The default numerical type is 64-bit unsigned integers, known as type uns or u64. Constants like 0 and 42 are of type uns, as are 0u and 0u64. This appears to be a big departure from the status quo, but in my experience as a systems software engineer I’ve found unsigned integers to be overwhelmingly more applicable than signed integers. I’d rather have to carefully craft lower bounds checks to avoid wrap-around than have APIs which imply that negative values make sense.
All other numerical constants must have explicit type suffixes or other distinguishing characteristics. For example:
- -1i and 13i64 are 64-bit signed values.
- 7u8 is an 8-bit unsigned value.
- -15z is an arbitrary-precision zint.
- 77n is an arbitrary precision nat.
- 0., 3.2, 4r, and 7r64 are 64-bit reals, and 17.3r32 is a 32-bit real (based on IEEE 754).
This might seem overly verbose to C/C++ programmers, but in fact it enables concision in the context of type inference (i.e. no associated explicit type declarations are necessary).
Modular arithmetic semantics apply to both unsigned and signed fixed-width integer types. In this modern era we can assume that signed integers are implemented via 2s complement. So long, 1960s-era design decision!
Conversions fall into two categories, e.g.:
- Bitwise: Uns.bits_to_i64 bit-casts, Uns.trunc_of_i128 truncates, Int.extend_to_u256 sign-extends (unsigned types zero-extend).
- Value-preserving: Uns.like_of_int_opt succeeds if the input is non-negative, Uns.narrow_of_u128_opt succeeds if the input is in the range supported by uns, Int.widen_to_u128_opt succeeds if the input is non-negative.

I have literally spent decades now fighting the good fight against implicit integer conversion while writing low-level C/C++ code. In Hemlock, all conversions are explicit, and I look forward to a rich set of numerical types that produce exactly the results one would expect when reading the code.

BranchTaken

Hemlock language insights

Explicit Signage