Work Stealing - Type System
Hemlock’s type system is a critical factor in making some of the language’s best features possible — deterministic, constant-overhead garbage collection, aggressive compile-time optimization, and (today’s topic) work stealing. Essentially, stealable work has the same rules as parallelizable work.
- Mutability - The work can not modify state while other, parallel work might observe or modify the
same state.
int arr[42]; // None of these can be reordered or run in parallel. modify_arr0(arr); // void modify_arr0(int arr[]); modify_arr1(arr); // void modify_arr1(int arr[]); observe_arr(arr); // void observe_arr(const int arr[]);
- Effects - Work that has side effects are order-dependent and cannot be parallelized since we
can’t guarantee that the order of the underlying effects will be preserved.
const char *contents = "File contents to write and then read.\n"; char buf[1024]; // This deals with filesystem operations. It's obviously dependent on order of // execution. fprintf(f, "%s", contents); fseek(f, 0, SEEK_SET); fread(&buf, sizeof(char), sizeof(buf), f); fprintf(stdout, "%s", buf);
Fortunately, that’s really it for the rules. Unfortunately, I’m not aware of a single mainstream programming language that can guarantee either of these without extremely deep, brittle compiler introspection of the transitive implementation behind every function call. In cases where the implementation is unavailable at compile time (shared libraries, static libraries without link-time optimization) or the implementation is just unknowable (application of composed functions — functors), it is impossible to determine that such functions can be reordered, parallelized, or stolen. Certainty is an absolute requirement when a compiler attempts to do any of this, and mainstream compilers just can’t do it.
Hemlock easily does better in this regard. Mutability and side effect information is written into all function signatures. We know what inputs are mutable, in which function applications they are mutated, and in which function applications side effects will happen.
Mutability
Starting with mutability. Take the following C
function signature.
int foo(char * bar);
The input bar
is mutable, but does foo
mutate it? We need to check the implementation to find
out.
bool baz = false;
size_t foo(char * bar) {
size_t ret = (size_t) strlen(bar);
if (baz) {
baz = false;
bar[0] = '\0';
}
return ret;
}
Yes, it does mutate bar
. Even worse, it has the extra surprise mutation of the global variable
baz
. Ew. Unfortunately, if C
implementation can’t be fully introspected, the compiler has to
be conservative and assume that something like this is happening.
Here’s a close approximation of what the same function looks like in Hemlock.
# Note that mutable global variables don't exist in Hemlock, but per-actor
# module-level mutable variables do.
# val &baz: bool
let &baz := false
# val foo: !&bytes >mut-> uns
let foo bar =
let ret = Bytes.length bar
match baz with
| false -> ()
| true ->
baz := false
Bytes.set_inplace 0 '\0' bar
ret
We didn’t need to see the function body to know what is going on. The function signature tells us
that the input will be mutated (!&bytes
) when the function executes and that additional
module-level state will be mutated (>mut->
).
We can even track that the input is not mutated during function application, but during a later time as in the application of an encapsulating closure.
val bar: &bytes -> (unit >!&bytes-> uns)
Calling bar
by itself does not mutate anything. Calling the returned closure does. Knowing
during which function applications mutation occurs provides us valuable information about what can
be reordered or parallelized.
let a = Bytes.of_string "I'm a little teapot"
# Reordering or parallelizing via work stealing the
# following 3 calls is valid (though these simple
# functions would likely execute before they can be
# stolen).
let b = bar a
let c = bar a
let d = bar a
# Reordering or parallelizing via work stealing the
# following 3 calls is not valid.
let b = b ()
let c = c ()
let d = d ()
Knowing what will be mutated makes it possible for us to identify work stealing opportunities just by knowing function signatures. It’s extremely easy to implement in the Hemlock compiler and we’re allowed to be very aggressive about it.
Side effects
Some functions don’t mutate any data inside the Hemlock runtime, but have other side effects that
must be executed in the correct order or exclusively. Two such side effects are hlt
(input may
cause the actor to halt) and os
(the function interacts with the operating system, and we must
assume mutation external to the process).
Using the Hemlock translation of the example in rule #2 at the beginning of this post, we can see
how the {hlt|os}
set of effects keeps us from reordering or parallelizing function applications
that must be run in sequence.
# val write_hlt: _&slice _&bytes -> File.t >{hlt|os}-> unit
# val seek_hd_hlt: uns -> File.t >{hlt|os}-> unit
# val read_into_hlt: !&slice !&bytes -> File.t >{hlt|os}-> unit
let contents = Slice.of_container
(Bytes.of_string "File contents to write and then read.\n")
write_hlt contents f
seek_hd_hlt 0 f
let buf = Slice.of_container (Bytes.init 1024 ~f:(_ -> '\0'))
read_into_hlt buf f
write_hlt buf File.stdout
Only the read_into_hlt
function mutates any data, but every statement must happen in sequence and
without being reordered for this code snippet to run as intended.
What exactly can be stolen?
I spent a lot of time showing things that can’t be stolen in this blog post. Here is the punch-line.
Any function application with no mutation (e.g. !&t
) or side effects (e.g. >os->
) can be
computed in parallel with any other pure function application that comes before or after it. There
is, quite honestly, a lot of work that can be stolen. As demonstrated by the reduce
example in the
Work Stealing - Intro blog post, there are some pretty
fundamental, ubiquitous coding patterns that have great work stealing opportunities. Hemlock is
going to find them and take them.