Stop me if you’ve heard this one before.
You’re working on your phone or tablet, editing an online document, and you get on a plane. At first everything is fine; your edits are cached locally, and most of the application is local to your phone anyway. But then something happens; the page refreshes, or your device auto-connects to the dodgy in-flight wifi, and your previously-functioning application suddenly begins to … not.
Distributed programs need to accept disconnection.
This common experience is a demonstration of an equally common flaw in modern client-facing distributed applications. Trouble is, it’s fundamentally hard to write a wide-area, heavily replicated distributed system—leading many applications to opt for a weaker model, like an “interactive web page” where data can be “cached at the client” for quick responsiveness. But this model fundamentally isn’t reality. No matter what our frameworks tell us, distributed programs can’t use the same assumptions and metaphors as local concurrency or traditional websites. Modern distributed frameworks must accept that caches are replicas, that replicas are fundamental to good performance, and that replicas must tolerate periods of disconnection elegantly. It is key that a distributed program should never implicitly block, operating off of local replicas whenever possible.
This makes consistency hard.
Because the moment you embrace the fact that your cached data is actually a first-class replica of shared global state, you must ask yourself how stale you can tolerate this data becoming, and what guarantees you need on the visibility of updates you make to this data. Existing replicated systems that handle disconnection well do quite badly at this, giving rise to bizarre errors on re-connection including the ever-present threat of losing the last few hours—or even days—of work. Correct applications must provide strong consistency by default, while ensuring any deviation from sequential consistency is locally justified and isolated from sensitive parts of the application.
These features need language support.
There are a few (very few) applications today which can offer transparent replication without sacrificing consistency. But these applications are invariably hard to build, and prone to shipping subtle consistency bugs in production. The properties that we’re proposing—non-blocking replication which tolerates disconnection while ensuring strong consistency by default—are global properties of a distributed application. They can’t be enforced by a library alone, and they’re difficult enough that relying on “programmer discipline” has, repeatedly, been shown to be insufficient. These features can only be enforced at the language level—which is where we come in.
So we’re building a new language.
We’ve recently begun work on Gallifrey, a new programming language designed to address exactly these issues. Gallifrey enables per-task programming, where each thread can share any object with the world without sacrificing consistency. Gallifrey is based on Java with a minimum of changes, making it easy to pick up and intuitive to use.
Gallifrey’s core feature is enabled via author-provided Restrictions on objects’ interfaces. When you want to share an object in Gallifrey, you must define a Restriction of its interface, limiting the available operations to only those which are safe to call concurrently and during disconnection. This Restriction, in effect, transforms your normal local object into a CRDT, a class of object which has a rich and well-studied semantics for guaranteeing convergence even in the face of weakly-consistent updates.
To keep the world of Restricted objects from consuming entire local programs, Gallifrey also employs a hybrid affine ownership type system which statically guarantees that an object (and everything it points to), when shared under a Restriction, can only be accessed via that Restriction. We’ve managed to take the best of affine typing (from languages like Rust) and apply it to Gallifrey, while still allowing a majority of Gallifrey code to enjoy the [almost] unrestricted aliasing and pervasive mutability common to Java programs.
Finally we introduce branches, a new take on transactions inspired by the eponymous concept from version control. Opening a branch allows a program to create a new scope which is totally isolated from the rest of the world, in effect voluntarily disconnecting a portion of the program from the wider Gallifrey network. While within a branch, programmers can perform provisional operations which don’t guarantee sequential consistency, without worrying that these operations will threaten the consistency of the wider application.
More details can be found in the Gallifrey SNAPL paper.