Papers I’ve written or worked on (ordered by date):
Making Systems More Robust with Flexible RPC Lookup
Modern distributed systems use names everywhere. Lockservices such as Chubby and ZooKeeper provide an effective mechanism for mapping from application names to server instances, but proper usage of them requires a large amount of error-prone boiler-plate code. Application programmers often try to write wrappers to abstract away this logic, but it turns out there is a more general and easier way of handling the issue. We show that by extending the existing name resolution capabilities of RPC libraries, we can remove the need for such annoying boiler-plate code while at the same time making our services more robust.
Transactional storage for geo-replicated systems. SOSP, 2011.
We describe the design and implementation of Walter, a key-value store that supports transactions and replicates data across distant sites. A key feature behind Walter is a new property called Parallel Snapshot Isolation (PSI). PSI allows Walter to replicate data asynchronously, while providing strong guarantees within each site. PSI precludes write-write conflicts, so that developers need not worry about conflict-resolution logic. To prevent write-write conflicts and implement PSI, Walter uses two new and simple techniques: preferred sites and counting sets. We use Walter to build a social networking application and port a Twitter-like application.*
Design and implementation of contextual information portals. 2011.
Oolong: Asynchronous Distributed Applications Made Easy APSYS 2012.
Oolong is a distributed programming framework designed for sparse asynchronous applications such as distributed web crawling, shortest paths, and connected components. Oolong stores program state in distributed in-memory key-value tables on which user-defined triggers may be set. Triggers can be activated whenever a key-value pair is modified. The event-driven nature of triggers is particularly appropriate for asynchronous computation where workers can independently process part of the state towards convergence without any need for global synchronization. Using Oolong, we have implemented solutions for several large-scale asynchronous computation problems, achieving good performance and robust fault tolerance.
Piccolo: Building fast, distributed programs with partitioned tables. OSDI, 2010.
Piccolo is a new data-centric programming model for writing parallel in-memory applications in data centers. Unlike existing data-flow models, Piccolo allows computation running on different machines to share distributed, mutable state via a key-value table interface. Piccolo enables efficient application implementations. In particular, applications can specify locality policies to exploit the locality of shared state access and Piccolo’s run-time automatically resolves write-write conflicts using userdefined accumulation functions. Using Piccolo, we have implemented applications for several problem domains, including the PageRank algorithm, k-means clustering and a distributed crawler. Experiments using 100 Amazon EC2 instances and a 12 machine cluster show Piccolo to be faster than existing data flow models for many problems, while providing similar fault-tolerance guarantees and a convenient programming interface.
Document classification for focused topics. Proceedings of AAAI Spring Symposium. 2010.