Monday, October 29, 2012

IEEE Computer issue on the CAP Theorem

Due to Hurricane Sandy, Yale gave me a day off from teaching today and I have finally been able to get to a few things on my "to-do" list. One of them is to write a blog post about the IEEE Computer CAP Retrospective edition and make my paper that appeared inside of it publicly available.

Earlier this year, the IEEE Computer magazine came out with an issue largely devoted to a 12-year retrospective of the CAP theorem and contains several articles from distributed systems researchers that contribute various opinions and thoughts about CAP. The first article is from Eric Brewer, who coined the CAP theorem 12 years ago (though he points out in his article that it was actually 14 years ago). A PDF of Brewer’s article is available for free from: The second article is from Seth Gilbert and Nancy Lynch (the same Gilbert and Lynch that proved the CAP theorem 10 years ago). 

The third article is from me, and contains my criticisms of CAP that long-time readers of my blog will be familiar with. In particular, I point out that many people assume that modern NoSQL systems relax consistency guarantees in order to gain availability due to the constraints of the CAP theorem, when the reality is that these systems give up on consistency even in the absence of network partitions, which is not required according to the CAP theorem. The  reason why they give up on consistency is because of a desire to improve system latency, an increasingly important requirement in the modern impatient world. I then describe the latency-consistency tradeoff in more detail, and end the article with the PACELC reformulation of CAP that debuted on my blog over two years ago. With the permission of the IEEE, I am making a free version of this article available today. This article is the first time that the PACELC formulation and my thoughts on CAP appear in a scholarly article, which gives people a venue to refer to (bibtex code available here) when citing this work (you can stop citing a blog post!)

The fourth article is from Raghu Ramakrishnan, entitled “CAP and Cloud Data Management” and describes the PNUTS system that I have mentioned in the past as a good example of a system for which the consistency-latency tradeoff has had a more direct impact on the system design than the consistency-availability tradeoff of CAP. The fifth article is from Ken Birman, Daniel Freedman, Qi Huang, and Patrick Dowell of Cornell University on overcoming CAP with soft-state replication. Unfortunately, I cannot find a free link to Raghu’s article, but if you have an IEEE account, you can access it at at: The Birman et. al. article can be found for free at:

If you have enjoyed my thoughts on CAP on this blog, I highly recommend you read each of these five articles. The Brewer article in particular acknowledges my past criticism of CAP not actually being about picking two of three out of C (consistency), A (availability), and P (partition tolerance) due to the fact that it does not make sense to reason about a system that is ‘CA’. (If there is no partition, any system can be both consistent and available --- the only question is what happens when there is a partition --- does consistency or availability get sacrificed?) Brewer uses this observation to lead into a nice generalization of consistency-availability tradeoff. In particular, when a partition occurs, the system does three things: (1) detect that the partition occurred, (2) enter a partition mode that may or may not limit some operations, and (3) initiate some sort of reconciliation algorithm when the partition is fixed. Depending on how these three things are implemented, it is  possible to obtain much of the spectrum between CP systems and AP systems. The article also contains a nice reference to the CRDT work by Shapiro et. al. at INRIA. Overall, I strongly support Brewer’s approach to navigating this tradeoff. It also fits nicely with Mehul Shah’s talk at HPTS in the way that the spectrum between consistency and availability is explicitly considered at system design time, rather than trying to bolt consistency on top of an AP (eventually consistent) system after the fact (a wildly suboptimal endeavor).

While most of Brewer’s article focused on the consistency-availability tradeoff, Brewer also briefly acknowledges that “in its classic interpretation, the CAP theorem ignores latency”, and that some systems reduce consistency for latency (he even refers to the PNUTS example I used in my original blog post). I remain convinced that PACELC is the best way to reason about both of these tradeoffs in a single formulation: if there is a partition (P) how does the system tradeoff between availability and consistency (A and C); else (E) when the system is running as normal in the absence of partitions, how does the system tradeoff between latency (L) and consistency (C)?


  1. For some reason I didn't get PACELC when reading your original blog post. I read the paper today and now I get it. Thanks for removing a lot of the confusion from future CAP discussions by explaining the source of the confusion and introducing PACELC.

    Can you provide more details on why "R+W >= N" isn't sufficient to provide C as defined by Gilbert and Lynch?

  2. Mark, thanks for the kind words. Your question about R+W > N implying consistency is a common misconception, so I'm glad you brought it up.

    In their proof of CAP, Seth Gilbert and Nancy Lynch used the definition of atomic/linearizable consistency: “There must exist a total order on all operations such that each operation looks as if it were completed at a single instant. This is equivalent to requiring requests of the distributed shared memory to act as if they were executing on a single node, responding to operations one at a time.”

    When you have R+W > N, all that is guaranteed is that reads return at least the latest complete write. However, this is weaker than the definition of consistency given above.

    For example, suppose you have 3 replicas A, B and C, and both R and W are equal to 2 (so 2+2 > 3). Let's say that a write is sent to A and B and A has completed the write while B is delayed before completing the write. Now a read comes and is sent to A and C. A has the more recent value, so this new value is returned (even though the write that resulted in this value has not yet "committed"). Now, another read comes and goes to B and C. Since neither one has the new value, the old value is read. If the two reads would execute on the same single server this would be impossible since the second read started after the first one completed.

    Of course, it is possible to guarantee higher levels of consistency in quorum systems, but just R+W >= N by itself is not sufficient to guarantee consistency.

  3. Hi, the link for the Birdman article (5th?) doesn't seem to be valid:
    What is the title of that article, by chance?