Integers seem like a simple concept. Start with natural numbers (ℕ⁺), incorporate 0 (ℕ⁰), then add a sign to create integers (ℤ). In theory, integers range from -∞ to ∞, but computers currently lack infinite memory, so we have to compromise in implementation. And compromise we do! Some high-level scripting languages try pretty hard to make integers behave like the mathematical definition, but at great computational cost. Most systems programming languages instead utilize high-performance CPU features to provide some combination of limited-range 2’s complement signed integers and unsigned integers which provide modular arithmetic semantics (i.e. fixed-width wrap-around behavior).
It’s bad enough that signed integers in languages like C and C++ have undefined underflow/overflow semantics. Adding insult to injury, it’s common for languages to implicitly convert between types of differing bitwidths! This can produce some very surprising results because the implicit conversion doesn’t consistently happen soon enough to maximize the fidelity of intermediate values.
For an interesting example of how implicit conversion can go wrong, see the analysis of Stagefright vulnerabilities in this blog post. The blog post presents an alternative solution to integer conversion, and is worth reading in its entirety.
Hemlock takes a rigid approach to integer types that resolves some historical baggage and removes footguns.
- A broad range of fixed-width unsigned/signed integer types are built in, currently powers of two
from 8 through 512, e.g.
i256. In addition nearly-arbitrary-precision (max ~2³² bits)
zint(ℤ) types are built in.
- The default numerical type is 64-bit unsigned integers, known as type
u64. Constants like
42are of type
uns, as are
0u64. This appears to be a big departure from the status quo, but in my experience as a systems software engineer I’ve found unsigned integers to be overwhelmingly more applicable than signed integers. I’d rather have to carefully craft lower bounds checks to avoid wrap-around than have APIs which imply that negative values make sense.
- All other numerical constants must have explicit type suffixes or other distinguishing
characteristics. For example:
13i64are 64-bit signed values.
7u8is an 8-bit unsigned value.
-15zis an arbitrary-precision
77nis an arbitrary precision
7r64are 64-bit reals, and
17.3r32is a 32-bit real (based on IEEE 754).
This might seem overly verbose to C/C++ programmers, but in fact it enables concision in the context of type inference (i.e. no associated explicit type declarations are necessary).
- Modular arithmetic semantics apply to both unsigned and signed fixed-width integer types. In this modern era we can assume that signed integers are implemented via 2s complement. So long, 1960s-era design decision!
- Conversions fall into two categories, e.g.:
Int.extend_to_u256sign-extends (unsigned types zero-extend).
Uns.like_of_int_optsucceeds if the input is non-negative,
Uns.narrow_of_u128_optsucceeds if the input is in the range supported by
Int.widen_to_u128_optsucceeds if the input is non-negative.
I have literally spent decades now fighting the good fight against implicit integer conversion while writing low-level C/C++ code. In Hemlock, all conversions are explicit, and I look forward to a rich set of numerical types that produce exactly the results one would expect when reading the code.