Achieving determinism on real software systems remains difficult. Even a batch-processing job, whose task is to map input bits to output bits, risks nondeterminism from thread scheduling, system calls, CPU instructions, and leakage of environmental information such as date or CPU model.
In 2015 we asked ourselves why there was no available, deployable way to run real Linux software deterministically. We thus set out to create user-space deterministic execution sandboxes. This project follows from our prior work on deterministic parallel programming, which we conducted at the language- and library-level as well as by modifying the operating system and computer architecture.
Our first prototype, called DetFlow was described in a paper at OOPSLA'17 [1]. DetFlow uses a mix of language-support and runtime sandboxing to achieve an end-to-end determinism guarantee.
- DetFlow is available on GitHub.
- The OOPSLA paper's artifact is available from the ACM DL, under "Source Materials","Auxiliary Archive". It is a 5GB download.
In DetFlow, process-parallel programs are allowed if they are controlled by a parallel coordinator process written with special language support (a Haskell monad). Subprocess are determinized using a runtime sandbox, but are sequentialized to keep overheads low. DetFlow has well below 5% overhead in all our experiments. However, this prototype uses LD_PRELOAD libc interception which is not sufficiently general or robust for a production quality implementation.
Our second prototype of a runtime determinism sandbox is called, DetTrace. DetTrace uses ptrace and is much more general than DetFlow, but is also higher-overhead. In case studies, we have run the build and test code for 12,130 Debian packages inside DetTrace, and have run large applications like blender
and pdflatex
inside the deterministic sandbox as well.
A preprint for DetTrace will be posted here soon.
Our end-to-end userspace determinism approach is being commercialized by Cloudseal Inc.
More discussion of determinism and reproducibility problems can be found on the blog.
Cloudseal is building low-overhead record-and-replay-as-a-service (for bug and crash repro). The core is a deterministic execution capability that minimizes the amount of recording needed, and eliminates unnecessary nondeterminism -- nondeterminism which creates headaches like flaky tests.
(In this third generation, production implementation of the approach, binary instrumentation is used to avoid ptrace and achieve both the low overhead of DetFlow and the generality of DetTrace.)
Believe it or not, in spite of decades of work on binary instrumentation and mechanisms for hooking functions or syscalls, we've had to develop new techniques for low-overhead interposition on Linux system calls and nondeterministic instructions. Part of the reason for this is needing an all-userspace approach.
-
One chapter of our work on binary instrumentation was to develope rapidly-toggleable, low-overhead user-space probes, presented in our PLDI 2016 and 2017 publications [2,3].
-
We are developing the Reverie system, an open-source binary instrumentation framework available on GitHub.
The basic idea of our approach is to avoid both the high startup overhead of full binary translation approaches (Pin, DynamoRIO) and avoid the high cost of trap instructions in breakpoint-based tracing frameorks (DTrace, SystemTap, etc). This is done by patching guest process code in-place and injecting new code in the guest, but a fully general and performant solution requires solving a number of challenges (see the PLDI papers linked below).
The deterministic workflows project follows from several years of research on deterministic parallel programming, mainly using Haskell as a vehicle (though other projects target other languages, like Deterministic Parallel Java). An example system our work in this area is the LVish system for programming with monotonic concurrent data structures [4,5].
In this work, we looked at enforcing determinism without requiring source-level changes to programs, instead allowing ourselves to freely change the operating system, compiler, and even computer architecture. Representive papers include Eurosys'15, and ASPLOS'10,11 [6,7,8]. Research of this kind serves to show what is possible, while the emphasis of the deterministic workflows project is to emphasize deployable user-space solutions that achieve end-to-end determinism on existing computers and operating systems.
Baojun Wang | Kelly Shiptoski | Joe Devietti |
Omar Navarro Leija | Ryan Newton | Ryan Scott |
-
(OOPSLA'17) "Monadic Composition for Deterministic, Parallel Batch Processing", R Scott, O Navarro Leija, J Devietti, and R Newton, ACM SIGPLAN conference on Object-oriented Programming, Systems, Languages and Applications.
-
(PLDI'16) "Living on the edge: Rapid-toggling probes with cross modification on x86", B Chamith, B Svensson, L Dalessandro, R Newton. ACM SIGPLAN conference on Programming Language Design and Implementation.
-
(PLDI'17) "Instruction Punning: Lightweight Instrumentation for x86-64", B Chamith, B Svensson, L Dalessandro, R Newton. ACM SIGPLAN conference on Programming Language Design and Implementation.
-
(POPL'14) "Freeze After Writing: Quasi-Deterministic Parallel Programming with LVars and Handlers", L Kuper, A Turon, N Krishnaswami, R Newton. ACM SIGPLAN Principals of Programming Languages.
-
(PLDI'14) "Taming the Parallel Effect Zoo: Extensible Deterministic Parallelism with LVish", L Kuper, A Todd, S Tobin-Hochstadt, R Newton. ACM SIGPLAN Programming Languages Design and Implementation.
-
(Eurosys'15) "High-Performance Determinism with Total Store Order Consistency", T Merrifield, J Devietti and J Eriksson. European Conference on Computer Systems.
-
(ASPLOS'11) "RCDC: A Relaxed-Consistency Deterministic Computer", J Devietti, J Nelson, T Bergan, L Ceze and D Grossman. International Conference on Architectural Support for Programming Languages & Operating Systems.
-
(ASPLOS'10) "CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution". T Bergan, O Anderson, J Devietti, L Ceze and D Grossman. International Conference on Architectural Support for Programming Languages & Operating Systems.