Publications | Mae Milano

Building Bridges: Safe Interactions with Foreign Languages through Omniglot

Leon Schuermann, Jack Toubes, Tyler Potyondy, Pat Pannuto, Mae Milano, Amit Levy

19th USENIX Symposium on Operating Systems Design and Implementation, 2025

PDF Project

Flo: A Semantic Foundation for Progressive Stream Processing

Streaming systems are present throughout modern applications, processing continuous data in real-time. Existing streaming languages have a variety of semantic models and guarantees that are often incompatible. Yet all these languages are considered “streaming”—what do they have in common? In this paper, we identify two general yet precise semantic properties: streaming progress and eager execution. Together, they ensure that streaming outputs are deterministic and kept fresh with respect to streaming inputs. We formally define these properties in the context of Flo, a parameterized streaming language that abstracts over dataflow operators and the underlying structure of streams. It leverages a lightweight type system to distinguish bounded streams, which allow operators to block on termination, from unbounded ones. Furthermore, Flo provides constructs for dataflow composition and nested graphs with cycles. To demonstrate the generality of our properties, we show how key ideas from representative streaming and incremental computation systems—Flink, LVars, and DBSP—have semantics that can be modeled in Flo and guarantees that map to our properties.

Shadaj Laddad, Alvin Cheung, Joseph M. Hellerstein, Mae Milano

Proceedings of the ACM on Programming Languages (PACMPL) Volume 7, No. PLDI, 2025

PDF Project DOI

Verifying a C Implementation of Derecho’s Coordination Mechanism Using VST and Coq

Derecho is a C++ framework for distributed programming leveraging high performance communication primitives such as RDMA. At its core is the shared state table (SST), a replicated data structure that supports efficient protocols for consensus and group membership. Our aim is to formalize the reasoning principles articulated by the designers, which focus on knowledge and monotonicity, as basis for highly assured high performance distributed applications. To this end we develop a high level model that exposes the SST principles in an application-friendly way. We use the model to specify and verify a re-implementation in C of the SST API. We validate the specifications by verifying simple applications that embody key parts of the Derecho protocols. The development is carried out using VST and Coq. This lays groundwork for verification of the full Derecho protocols and applications built on them.

Ramana Nagasamudram, Lennart Beringer, Ken Birman, Mae Milano, David A Naumann

NASA Formal Methods 16th International Symposium, 2024

PDF Project DOI

Monotonicity and Opportunistically-Batched Actions in Derecho

Our work centers on a programming style in which a system separates data movement from control-data exchange, streaming the former over hardware-implemented reliable channels, while using a new form of distributed shared memory to manage the latter. Protocol decisions and control actions are expressed as monotonic predicates over the control data guarding protocol actions. Provable invariants about the protocol are expressed as effectively-common knowledge, which can be derived from the monotonic predicates in effect during a particular membership epoch. The methodology enables a natural style of code that is easy to reason about, and it runs efficiently on modern hardware. We used this approach to create Derecho, an optimal Paxos-based data replication library that sets performance records, and we believe it is broadly applicable to the construction of reliable distributed systems on high-bandwidth networks.

Ken Birman, Sagar Jha, Mae Milano, Lorenzo Rosa, Weijia Song, Edward Tremel

International Symposium on Stabilizing, Safety, and Security of Distributed Systems, 2023

PDF Project Project DOI

Initial Steps Toward a Compiler for Distributed Programs

In the Hydro project we are designing a compiler toolkit that can optimize for the concerns of distributed systems, including scale-up and scale-down, availability, and consistency of outcomes across replicas. This invited paper overviews the project, and provides an early walk-through of the kind of optimization that is possible. We illustrate how type transformations as well as local program transformations can combine, step by step, to convert a single-node program into a variety of distributed design points that offer the same semantics with different performance and deployment characteristics.

Joseph M. Hellerstein, Shadaj Laddad, Mae Milano, Conor Power, Mingwei Samuel

Proceedings of the 5th workshop on Advanced tools, programming languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems, 2023

PDF Project Project DOI

Better Defunctionalization through Lambda Set Specialization

Higher-order functions pose a challenge for both static program analyses and optimizing compilers. To simplify the analysis and compilation of languages with higher-order functions, a rich body of prior work has proposed a variety of defunctionalization techniques, which can eliminate higher-order functions from a program by transforming the program to a semantically-equivalent first-order representation. Several modern languages take this a step further, specializing higher-order functions with respect to the functions on which they operate, and in turn allowing compilers to generate more efficient code. However, existing specializing defunctionalization techniques restrict how function values may be used, forcing implementations to fall back on costly dynamic alternatives. We propose lambda set specialization (LSS), the first specializing defunctionalization technique which imposes no restrictions on how function values may be used. We formulate LSS in terms of a polymorphic type system which tracks the flow of function values through the program, and use this type system to recast specialization of higher-order functions with respect to their arguments as a form of type monomorphization. We show that our type system admits a simple and tractable type inference algorithm, and give a formalization and fully-mechanized proof in the Isabelle/HOL proof assistant showing soundness and completeness of the type inference algorithm with respect to the type system. To show the benefits of LSS, we evaluate its impact on the run time performance of code generated by the MLton compiler for Standard ML, the OCaml compiler, and the new Morphic functional programming language. We find that pre-processing with LSS achieves run time speedups of up to 6.85x under MLton, 3.45x for OCaml, and 78.93x for Morphic.

William Brandon, Benjamin Driscoll, Frank Dai, Wilson Berkow, Mae Milano

Proceedings of the ACM on Programming Languages (PACMPL) Volume 7, No. PLDI, 2023

PDF Project DOI

Keep CALM and CRDT On

Despite decades of research and practical experience, developers have few tools for programming reliable distributed applications without resorting to expensive coordination techniques. Conflict-free replicated datatypes (CRDTs) are a promising line of work that enable coordination-free replication and offer certain eventual consistency guarantees in a relatively simple object-oriented API. Yet CRDT guarantees extend only to data updates; observations of CRDT state are unconstrained and unsafe. We propose an agenda that embraces the simplicity of CRDTs, but provides richer, more uniform guarantees. We extend CRDTs with a query model that reasons about which queries are safe without coordination by applying monotonicity results from the CALM Theorem, and lay out a larger agenda for developing CRDT data stores that let developers safely and efficiently interact with replicated application state.

Shadaj Laddad, Conor Power, Mae Milano, Alvin Cheung, Joseph M. Hellerstein

Proceedings of the VLDB endowment, Vol. 16 No. 4, 2023

PDF Project Project Project DOI

Katara: Synthesizing CRDTs with Verified Lifting

Conflict-free replicated data types (CRDTs) are a promising tool for designing scalable, coordination-free distributed systems. However, constructing correct CRDTs is difficult, posing a challenge for even seasoned developers. As a result, CRDT development is still largely the domain of academics, with new designs often awaiting peer review and a manual proof of correctness. In this paper, we present Katara, a program synthesis-based system that takes sequential data type implementations and automatically synthesizes verified CRDT designs from them. Key to this process is a new formal definition of CRDT correctness that combines a reference sequential type with a lightweight ordering constraint that resolves conflicts between noncommutative operations. Our process follows the tradition of work in verified lifting, including an encoding of correctness into SMT logic using synthesized inductive invariants and hand-crafted grammars for the CRDT state and runtime. Katara is able to automatically synthesize CRDTs for a wide variety of scenarios, from reproducing classic CRDTs to synthesizing novel designs based on specifications in existing literature. Crucially, our synthesized CRDTs are fully, automatically verified, eliminating entire classes of common errors and reducing the process of producing a new CRDT from a painstaking paper proof of correctness to a lightweight specification.

Shadaj Laddad, Conor Power, Mae Milano, Alvin Cheung, Joseph M. Hellerstein

Proceedings of the ACM in Programming Languages, Vol. 6, No. OOPSLA2, Article 173, 2022

PDF Project Project Project DOI

A flexible type system for fearless concurrency

This paper proposes a new type system for concurrent programs, allowing threads to exchange complex object graphs without risking destructive data races. While this goal is shared by a rich history of past work, existing solutions either rely on strictly enforced heap invariants that prohibit natural programming patterns or demand pervasive annotations even for simple programming tasks. As a result, past systems cannot express intuitively simple code without unnatural rewrites or substantial annotation burdens. Our work avoids these pitfalls through a novel type system that provides sound reasoning about separation in the heap while remaining flexible enough to support a wide range of desirable heap manipulations. This new sweet spot is attained by enforcing a heap domination invariant similarly to prior work, but tempering it by allowing complex exceptions that add little annotation burden. Our results include: (1) code examples showing that common data structure manipulations which are difficult or impossible to express in prior work are natural and direct in our system, (2) a formal proof of correctness demonstrating that well-typed programs cannot encounter destructive data races at run time, and (3) an efficient type checker implemented in Gallina and OCaml.

Mae Milano, Joshua Turcotti, Andrew Myers

PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2022

PDF Project Project DOI

New Directions in Cloud Programming

Nearly twenty years after the launch of AWS, it remains difficult for most developers to harness the enormous potential of the cloud. In this paper we lay out an agenda for a new generation of cloud programming research aimed at bringing research ideas to programmers in an evolutionary fashion. Key to our approach is a separation of distributed programs into a PACT of four facets: Program semantics, Availablity, Consistency and Targets of optimization. We propose to migrate developers gradually to PACT programming by lifting familiar code into our more declarative level of abstraction. We then propose a multi-stage compiler that emits humanreadable code at each stage that can be hand-tuned by developers seeking more control. Our agenda raises numerous research challenges across multiple areas including language design, query optimization, transactions, distributed consistency, compilers and program synthesis.

Alvin Cheung, Natacha Crooks, Joseph M. Hellerstein, Mae Milano

The 11th Conference on Innovative Data Systems Research (CIDR ‘21), 2021

PDF Project Project

A Tour of Gallifrey, a Language for Geodistributed Programming

Programming efficient distributed, concurrent systems requires new abstractions that go beyond traditional sequential programming. But programmers already have trouble getting sequential code right, so simplicity is essential. The core problem is that low-latency, high-availability access to data requires replication of mutable state. Keeping replicas fully consistent is expensive, so the question is how to expose asynchronously replicated objects to programmers in a way that allows them to reason simply about their code. We propose an answer to this question in our ongoing work designing a new language, Gallifrey, which provides orthogonal replication through restrictions with merge strategies, contingencies for conflicts arising from concurrency, and branches, a novel concurrency control construct inspired by version control, to contain provisional behavior.

Mae Milano, Rolph Recto, Tom Magrino, Andrew Myers

3rd Summit on Advances in Programming Languages (SNAPL 2019), 2019

PDF Project Project DOI

Derecho: Fast State Machine Replication for Cloud Services

Cloud computing services often replicate data and may require ways to coordinate distributed actions. Here we present Derecho, a library for such tasks. The API provides interfaces for structuring applications into patterns of subgroups and shards, supports state machine replication within them, and includes mechanisms that assist in restart after failures. Running over 100Gbps RDMA, Derecho can send millions of events per second in each subgroup or shard and throughput peaks at 16GB/s, substantially outperforming prior solutions. Configured to run purely on TCP, Derecho is still substantially faster than comparable widely used, highly-tuned, standard tools. The key insight is that on modern hardware (including non-RDMA networks), data-intensive protocols should be built from non-blocking data-flow components.

Sagar Jha, Jonathan Behrens, Theo Gkountouvas, Mae Milano, Weijia Song, Edward Tremel, Sydney Zink, Kenneth P. Birman, Robbert Van Renesse

ACM Transactions on Computer Systems (TOCS), 2019

PDF Project DOI

A Language for Mixing Consistency in Geodistributed Transactions: Technical Report

Mae P. Milano, Andrew C. Myers

2018

PDF Project

MixT: A Language for Mixing Consistency in Geodistributed Transactions

Programming concurrent, distributed systems is hard—especially when these systems mutate shared, persistent state replicated at geographic scale. To enable high availability and scalability, a new class of weakly consistent data stores has become popular. However, some data needs strong consistency. To manipulate both weakly and strongly consistent data in a single transaction, we introduce a new abstraction: mixed-consistency transactions, embodied in a new embedded language, MixT. Programmers explicitly associate consistency models with remote storage sites; each atomic, isolated transaction can access a mixture of data with different consistency models. Compile-time information-flow checking, applied to consistency models, ensures that these models are mixed safely and enables the compiler to automatically partition transactions. New run-time mechanisms ensure that consistency models can also be mixed safely, even when the data used by a transaction resides on separate, mutually unaware stores. Performance measurements show that despite their stronger guarantees, mixed-consistency transactions retain much of the speed of weak consistency, significantly outperforming traditional serializable transactions.

Mae Milano, Andrew C. Myers

Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

PDF Project DOI

Building Smart Memories and Cloud Services with Derecho

The coming generation of Internet-of-Things (IoT) applications will process massive amounts of incoming data while supporting data mining and online learning. In cases with demanding real-time requirements, such systems behave as smart memories: high-bandwidth services that capture sensor input, proceses it using machine-learning tools, replicate and store “interesting” data (discarding uninteresting content), update knowledge models, and trigger urgently-needed responses. Derecho is a high-throughput librry for building smart memories and similar services. At its core Derecho implements atomic multicast and state machine replication. Derecho’s replicated template defines a replicated type; the corresponding objects are associated with subgroups, which can be sharded into keyvalue structures. The persistent and volatile storage templates implement version vectors with optional NVM persistence. These support time-indexed access, offering lock-free snapshot isolation that blends temporal precision and causal consistency. Derecho automates application management, supporting multigroup structures and providing consistent knowledge of the current membership mapping. A query can access data from many shards or subgroups, and consistency is guaranteed without any form of distributed locking. Whereas many systems run consensus on the critical path, Derecho requires consensus only when updating membership. By leveraging an RDMA data plane and NVM storage, and adopting a novel receiver-side batching technique, Derecho can saturate a 12.5GB RDMA network, sending millions of events per second in each subgroup or shard. In a single subgroup with 2-16 members, throughput peaks at 16 GB/s for large (100MB or more) objects. When using version-vector storage, Derecho is limited by the speed of the SSD or RamDisk, showing no loss of performance as group sizes grow. While key-value subgroups would typically use 2 or 3-member shards, unsharded subgroups could be large. In tests with a 128-member group, Derecho’s multicast and Paxos protocols were just 2-3x slower than for a small group, depending on the traffic pattern. With network contention, slow members, or overlapping groups that generate concurrent traffic, Derecho’s protocols remain stable and adapt to the available bandwidth.

Sagar Jha, Jonathan Behrens, Theo Gkountouvas, Mae Milano, Weijia Song, Edward Tremel, Sydney Zink, Kenneth P. Birman, Robbert Van Renesse

2017

PDF Project

Mixing Consistency in Geodistributed Transactions: Technical Report (old)

Mae P. Milano, Andrew C. Myers

2016

PDF Project

A Coalgebraic Decision Procedure for NetKAT

Nate Foster, Dexter Kozen, Mae Milano, Alexandra Silva, Laure Thompson

Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2015

PDF DOI

Python: The Full Monty

Joe Gibbs Politz, Alejandro Martinez, Mae Milano, Sumner Warren, Daniel Patterson, Junsong Li, Anand Chitipothu, Shriram Krishnamurthi

Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, 2013

PDF DOI