BranchTaken

Hemlock language insights

Work Stealing - Determinism

This is the final post for the “Work Stealing” series. In the Work Stealing - Type System post, we went over language semantics that provide our compiler all the information needed to safely perform work stealing. In this post, we’ll cover a a few interesting aspects of determinism with work stealing from a systems perspective.

Little Constraints, Big Freedoms

Automatic parallelization is an optimization that is enabled by a small, simple set of language constraints. Mainstream languages can’t pull off work stealing as they lack necessary constraints or semantics to give their compilers the knowledge that certain optimizations are valid. Potential optimizations may be usually valid, but unless they’re always valid, the optimization is broken and unusable. I can point to a few common language features that make work stealing and other smaller optimizations impossible.

Hemlock, however, has all of the necessary restrictions to make work stealing possible.

Effects on System Resources

We made the assertion that it’s safe to automatically parallelize pure computation. That’s true. Usually, the only observable effect you can see is that the pure computation finishes faster. There are effects on other system resources to consider.

The intent with Hemlock to efficiently use as much CPU as possible. Aggressively parallelizing pure code does a great job of doing that. While only pure code is parallelized, the results of pure code may be written to the disk or network, so the increased speed provided by work stealing can cause a proportional increase in I/O utilization. What is great about both CPU and I/O is that hitting 100% utilization simply means we are working as fast as we can and can’t go any faster. The negative consequences are resource starvation. It’s a real problem, but easily the lesser evil compared to poor resource utilization (and the countless wasted programmer hours spent on manually expressing every detail of parallelism to utilize resources better).

Unfortunately, memory utilization may also increase proportionally. Hitting 100% memory utilization means that we hit allocation failures if we try to use any more. The negative consequence is process failure. There are technically-valid mechanisms to mitigate the out-of-memory failure case. For instance, it is valid to reap delegates that are not yet finished with their stolen work simply to free up system resources. It would then be up to those delegates’ associated victims to perform the work with all partial solutions from the delegates being lost. It would be an unfortunate sacrifice to make, and ultimately counterproductive.

While considering mitigation strategies, we realized that we should avoid performing such heroics within the Hemlock runtime. For a short time, our perspective regarding automatic parallelization and program correctness was flawed. The natural paradigm shift that Hemlock provides is that automatic parallelization is the baseline correct behavior and need not be considered magic. In Hemlock if a program runs out of memory, it is a problem with the program that should be fixed, not an instability to blame on work stealing.

Hemlock will have sane mechanisms to avoid out-of-memory failures due to hugely parallelized pure computation. Supervisor startup strategies will enable programmers to set more targeted memory and concurrency limits on delegates. With that, it should be straightforward for programmers to tame work stealing so that it does not run rampant and take down the system.