Earlier this year, the IEEE Computer magazine came out with an issue largely devoted to a 12-year retrospective of the CAP theorem and contains several articles from distributed systems researchers that contribute various opinions and thoughts about CAP. The first article is from Eric Brewer, who coined the CAP theorem 12 years ago (though he points out in his article that it was actually 14 years ago). A PDF of Brewer’s article is available for free from: http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed. The second article is from Seth Gilbert and Nancy Lynch (the same Gilbert and Lynch that proved the CAP theorem 10 years ago).
The third article is from me, and contains my criticisms of
CAP that long-time readers of my blog will be familiar with. In particular, I
point out that many people assume that modern NoSQL systems relax consistency
guarantees in order to gain availability due to the constraints of the CAP
theorem, when the reality is that these systems give up on consistency even in
the absence of network partitions, which is not required according to the CAP
theorem. The reason why they give up on
consistency is because of a desire to improve system latency, an increasingly
important requirement in the modern impatient world. I then describe the
latency-consistency tradeoff in more detail, and end the article with the
PACELC reformulation of CAP that debuted on my blog over two years ago.
With the permission of the IEEE, I am making a free version of this article
available today.
This article is the first time that the PACELC formulation and my thoughts on
CAP appear in a scholarly article, which gives people a venue to refer to (bibtex
code available here) when citing this work (you can stop citing a blog post!)
The fourth article is from Raghu Ramakrishnan, entitled “CAP
and Cloud Data Management” and describes the PNUTS system that I have mentioned in the past as a good example of a system for which
the consistency-latency tradeoff has had a more direct impact on the system
design than the consistency-availability tradeoff of CAP. The fifth article is
from Ken Birman, Daniel Freedman, Qi Huang, and Patrick Dowell of Cornell
University on overcoming CAP with soft-state replication. Unfortunately, I cannot find a free
link to Raghu’s article, but if you have an IEEE account, you can access it at at: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6122007&tag=1. The Birman et. al. article can be found for free at: http://www.cs.cornell.edu/Projects/mrc/CAP.pdf.
If you have enjoyed my thoughts on CAP on this blog, I
highly recommend you read each of these five articles. The Brewer article in particular acknowledges my past criticism
of CAP not actually being about picking two of three out of C (consistency), A (availability),
and P (partition tolerance) due to the fact that it does not make sense to reason about
a system that is ‘CA’. (If there is no partition, any system can be both
consistent and available --- the only question is what happens when there is a
partition --- does consistency or availability get sacrificed?) Brewer uses
this observation to lead into a nice generalization of consistency-availability
tradeoff. In particular, when a partition occurs, the system does three things:
(1) detect that the partition occurred, (2) enter a partition mode that may or
may not limit some operations, and (3) initiate some sort of reconciliation algorithm
when the partition is fixed. Depending on how these three things are
implemented, it is possible to obtain
much of the spectrum between CP systems and AP systems. The article also
contains a nice reference to the CRDT work by Shapiro et. al. at INRIA. Overall, I strongly support Brewer’s approach to
navigating this tradeoff. It also fits nicely with Mehul Shah’s talk at HPTS in the way that the spectrum between consistency and availability is explicitly
considered at system design time, rather than trying to bolt consistency on top
of an AP (eventually consistent) system after the fact (a wildly suboptimal
endeavor).
While most of Brewer’s article focused on the
consistency-availability tradeoff, Brewer also briefly acknowledges that “in
its classic interpretation, the CAP theorem ignores latency”, and that some
systems reduce consistency for latency (he even refers to the PNUTS example I
used in my original blog post). I remain convinced that PACELC is the best way
to reason about both of these tradeoffs in a single formulation: if there is a
partition (P) how does the system tradeoff between availability and consistency
(A and C); else (E) when the system is running as normal in the absence of
partitions, how does the system tradeoff between latency (L) and consistency
(C)?
For some reason I didn't get PACELC when reading your original blog post. I read the paper today and now I get it. Thanks for removing a lot of the confusion from future CAP discussions by explaining the source of the confusion and introducing PACELC.
ReplyDeleteCan you provide more details on why "R+W >= N" isn't sufficient to provide C as defined by Gilbert and Lynch?
Mark, thanks for the kind words. Your question about R+W > N implying consistency is a common misconception, so I'm glad you brought it up.
ReplyDeleteIn their proof of CAP, Seth Gilbert and Nancy Lynch used the definition of atomic/linearizable consistency: “There must exist a total order on all operations such that each operation looks as if it were completed at a single instant. This is equivalent to requiring requests of the distributed shared memory to act as if they were executing on a single node, responding to operations one at a time.”
When you have R+W > N, all that is guaranteed is that reads return at least the latest complete write. However, this is weaker than the definition of consistency given above.
For example, suppose you have 3 replicas A, B and C, and both R and W are equal to 2 (so 2+2 > 3). Let's say that a write is sent to A and B and A has completed the write while B is delayed before completing the write. Now a read comes and is sent to A and C. A has the more recent value, so this new value is returned (even though the write that resulted in this value has not yet "committed"). Now, another read comes and goes to B and C. Since neither one has the new value, the old value is read. If the two reads would execute on the same single server this would be impossible since the second read started after the first one completed.
Of course, it is possible to guarantee higher levels of consistency in quorum systems, but just R+W >= N by itself is not sufficient to guarantee consistency.
Hi, the link for the Birdman article (5th?) doesn't seem to be valid: http://dfreedman.cs.cornell.edu/IEEE_Computer_45_50.pdf
ReplyDeleteWhat is the title of that article, by chance?
Link fixed.
Delete