Content area
Abstract
Designing and debugging distributed systems is notoriously difficult. For single-node systems, interactive debuggers enable stepping through an execution of the program and inspecting its state. For distributed systems, however, the execution control and state inspection facilities of traditional debuggers fall short. The execution of a distributed system is defined by the order in which events---messages and timeouts---are delivered; traditional debuggers do not allow developers to control this order. Additionally, significant system state resides on messages in transit rather than locally in program memory, and traditional debuggers are not able to display this state
to developers. Existing step-through debuggers are therefore of limited utility to distributed systems developers.
The thesis of this dissertation is that a step-through debugger for distributed systems can bring the advantages of traditional single-node step-through debugging to distributed systems, helping developers to diagnose bugs and understand system behavior. We present Oddity: a graphical, interactive debugger for distributed systems. It brings the power of traditional step-through debugging---fine-grained control and observation of a program as it executes---to distributed systems. It also enables exploratory testing, in which an engineer examines and perturbs the behavior of a system in order to better understand it, perhaps without a specific bug in mind. A programmer can directly control message and failure interleaving. Oddity can be used on both executable system models and on system implementations. Oddity supports time travel, allowing a developer to explore multiple branching executions of a system within a single debugging session. Oddity includes a model checker for skipping tedious event sequences and for finding states matching particular predicates.





