tag:blogger.com,1999:blog-8899645800948009496.post5926754179729515588..comments2024-03-18T22:39:19.239-07:00Comments on DBMS Musings: Problems with CAP, and Yahoo’s little known NoSQL systemDaniel Abadihttp://www.blogger.com/profile/16753133043157018521noreply@blogger.comBlogger37125tag:blogger.com,1999:blog-8899645800948009496.post-60804884213702352982014-09-06T19:05:33.687-07:002014-09-06T19:05:33.687-07:00This article and comments are very interesting. Yo...This article and comments are very interesting. You are discussing synching Distributed DBs using the PACELC protocol theory. This is all understandable when the DB nodes are all interconnected by very high speed transmission lines directly connected and you have maybe 50 to 100 nodes. But is this scalable to 100,000 nodes? Some how I am not sure.<br /><br />What I am curious about is a completely Decentralized Network Database that uses a common Internet Wi-Fi connection speed of 15 to 20 MB/s. Does the theory work with say 500 million nodes?<br /><br />Just wondering...<br /><br />LloydLloydhttps://www.blogger.com/profile/16916110515875732512noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-38043252634788258642014-06-06T08:40:00.758-07:002014-06-06T08:40:00.758-07:00True Network Partitions are a joke problem, but th...True Network Partitions are a joke problem, but the P in the CAP theorem really stands for "Bad networking", i.e. "arbitrary message loss or failure" or unreliable latency, etc.<br /><br />True partitions are a joke because:<br /><br />If clients are partitioned from your servers, then CAP will not help you; you have no service for those clients. If clients are partitioned from some servers they should use the ones they can access (but access protocol needs to enforce that).<br /><br />The first soluble problem case is when servers are partitioned in two (Quorum/SubQuorum) classes, but clients can see all servers. If sub-quorum servers refuse service, and clients try multiple servers before giving up, then CAP (but not L) is maintained.<br /><br />And so it goes. But my point is that unreliable messaging is the actual meaning of P in Brewer's theorem, and by my standards two server clusters communicating over WAN between continents are in a Permanent state of Partition.<br /><br />Which means that in a way I agree with Mr Abadi, though I conceptualize it more like this:<br /><br />There is a more fundamental theorem(^*) that two databases cannot be reliably synchronized across an unreliable messaging network (aka IP). In practice this means that any DB-network can be rendered non-C through (malicious, generally) P. I.e. the entire CAP theorem is a sick joke.<br /><br />However, all is not lost, because we ignore the "proof" case and recall Stephen J Gould's definition of Scientific Fact. It is in fact possible to factually synchronize DBs to any level of certainty by trading in latency (except in the face of malicious P).<br /><br />Not to go all real-world on you, but if you are selling the Mona-Lisa online, you have no choice but to have a single point of failure for completing the final transaction. There are strategies to move that single point, but doing so automatically without human intervention will have at least one failure mode. (with human intervention doesn't guarantee success, but does make it not-analyzable).<br /><br />(*) [ IIRC: given DBs A & B, both known to be in state S0: A transitions to S1; A&B communicate with unreliable messages (either unknown arrival or unknown latency) in order for B to transition to S1. It is impossible for B to "know" that A knows that B has transitioned to S1 (i.e. knowledge of synchrony is actually forbidden).<br /><br />Imagine a (two-phase commit) protocol such that A&B serve S only so long as they are synchronized; when unsynchronized they queue requests and await synchronization. A receives update, begins queueing requests, communicates update to B; when they are synchronized again, they process their queues. Such a protocol can never be "honestly" implemented. ]David Stewart Zinkhttps://www.blogger.com/profile/01029851101914367025noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-31511982299118972022013-04-17T04:19:24.450-07:002013-04-17T04:19:24.450-07:00PNUTS doesn't even have a homepage, with downl...PNUTS doesn't even have a homepage, with downloads, docs, a forum or a mailing list, or anything like it. Small wonder, then, that it's almost unknown.<br />A_flj_https://www.blogger.com/profile/13123052099912825525noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-14699500354988166522012-12-23T03:30:45.793-08:002012-12-23T03:30:45.793-08:00I really liked this post. It made me reflect more ...I really liked this post. It made me reflect more about the CAP theorem which I always saw as something with a restricted practical usefulness. That's because the tradeoff points in any solution, or more specifically their consequences, depend on the required usage scenarios and on the concrete system design. Still it was great food for thought, and it is true that this is about a pattern, which has a predefined set of tradeoffs that need to be considered. I posted myself on this subject:<br />http://architectedsystems.blogspot.de/2012/12/cap-and-other-tradeoffs.html<br />Anonymoushttps://www.blogger.com/profile/13140567555148625255noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-89364405820955786412012-01-26T04:48:48.531-08:002012-01-26T04:48:48.531-08:00As I understand it, you're saying PNUTS is PA/...As I understand it, you're saying PNUTS is PA/EL for reading, but PC/EC for writing?<br /><br />Thus, going back to CP or AP, it's AP for reading and CP for writing. (with the implication that dropping C can give L - as is done for reading). <br /><br />Am I right in thinking that PNUTS can give low latency reads when partitioned? That's not obvious from saying it's PA/EL, which suggests low latency is only achieved when not partitioned, as it's only in the Else clause.Christopherhttps://www.blogger.com/profile/17608709029549888584noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-20563830008840143612011-10-26T10:14:13.744-07:002011-10-26T10:14:13.744-07:00“consistent and tolerant of network partitions, bu...“consistent and tolerant of network partitions, but not available” is not strange at all. When there is strict consistency, your data is not available unless your data is replicated in a distributed system. This is the case. There is nothing strange about it and several entities are using this model, take banks and financial industry. Amazon used to favor this as well. Perhaps, you have a misunderstanding of this model.Unknownhttps://www.blogger.com/profile/12324676586926787774noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-71458701570443823422011-09-11T04:14:14.226-07:002011-09-11T04:14:14.226-07:00I must've misunderstood the point, but isn'...I must've misunderstood the point, but isn't latency the reason why we need distributed systems in the first place?Abdulla Farazhttps://www.blogger.com/profile/11749751465694238954noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-8518349309760348992011-04-26T18:40:01.748-07:002011-04-26T18:40:01.748-07:00Great post. However, latency and availability are ...Great post. However, latency and availability are also correlated. One could argue that loss of availability is merely high latency. Thus, PACELC can be reduced to PACEAC. This new formulation calls out what I see as the central contribution in your argument, which is that systems do not have to mindlessly couple the behavior they exhibit when the network is partitioned with the behavior they exhibit when it is not partitioned. The trade off made by PNUTS in decoupling the two is interesting, but what I would most like to see is PAEC - a system that provides consistency when the network is not partitioned, and only relaxes consistency when and as needed to deal with network partition. In other words, I would like a system that makes an effort to achieve consistency at the expense of some latency, but then falls back when it detects that the network is down and deals with any inconsistencies introduced through conflict resolution later on. With this behavior, the system takes advantage of the opportunity to avoid inconsistency whenever possible, and only pays the price of inconsistency when forced to do so to provide availability, i.e., to provide reasonable latency.Unknownhttps://www.blogger.com/profile/15656196494721168483noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-17479527905021124402011-04-16T22:07:23.407-07:002011-04-16T22:07:23.407-07:00I'll never stop to learn from you Dr.Abadi. Th...I'll never stop to learn from you Dr.Abadi. Thanks,mahahttps://www.blogger.com/profile/02147975079574069135noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-68263011093274970402011-01-28T11:45:19.203-08:002011-01-28T11:45:19.203-08:00Yet another paper summary/review where I had to re...Yet another paper summary/review where I had to resort to the PACELC model.<br />http://muratbuffalo.blogspot.com/2011/01/lessons-from-giant-scale-services.htmlMurathttps://www.blogger.com/profile/07842046940394980130noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-59103236270945544682011-01-12T21:17:14.716-08:002011-01-12T21:17:14.716-08:00It's good to see the discussion pick up again....It's good to see the discussion pick up again. Markus, I agree that consistency can mean different things for different applications, and this can cause confusion. Dan, I added a comment to your post.Daniel Abadihttps://www.blogger.com/profile/16753133043157018521noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-16853581825754518942011-01-12T13:56:51.739-08:002011-01-12T13:56:51.739-08:00On integrity vs. correctness: at a former employer...On integrity vs. correctness: at a former employer of mine, an internal study estimated that about 50% of all non-financial entries in the company's various data sources (conventional and relational DBMSs, spreadsheets, etc.) were incorrect: misspelled names, obsolete addresses, etc.John Cowanhttps://www.blogger.com/profile/11452247999156925669noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-36663529103761542922011-01-12T12:55:54.968-08:002011-01-12T12:55:54.968-08:00This was so interesting that I wrote a blog entry ...This was so interesting that I wrote a blog entry about it. See http://danweinreb.org/blog/improving-the-pacelc-taxonomy<br /><br />Markus, using integrity constraints is not the right way to think of it. Just because the DBMS's integrity constraints are met does not mean that the data is correct.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-79957263978845326302011-01-11T00:43:28.938-08:002011-01-11T00:43:28.938-08:00Hi Daniel,
I just found out about your blog post....Hi Daniel,<br /><br />I just found out about your blog post. I think you are right. I guess that part of the confusion around the CAP theorem comes from an unclear understanding of what C, A, and P actually mean.<br /><br />Imho, it does not make much sense to talk about consistency without a specific application or system architecture in mind. For example, in the case of a relational database, consistency usually equals integrity constraints & atomic transactions; in the case of Dynamo/Cassandra it is specified by the N,R,W configuration of the quorum protocol (or hard-wired consistency level); et cetera.<br /><br />In a similar sense, when defining availability, there should be a time constraint (e.g. if the ping does not return after 2 secs, the app is down). Everything in between [0, 2] is the "availability spectrum", aka latency. Apparently, it is again application specific what the upper bound should be and when we talk about latency as opposed to availability.<br /><br />I agree that from a client perspective there is no difference between unavailability due to server failure and unavailability due to network partition. However, different repair mechanisms will be used in either case so it might make sense to differentiate when looking from a system perspective (?)<br /><br />Thoughts?<br /><br />Thank you,<br /><br />MarkusMarkushttps://www.blogger.com/profile/03248680368129494096noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-11407512299247703212010-12-19T01:25:55.376-08:002010-12-19T01:25:55.376-08:00More generally, "A" in CAP stands for op...More generally, "A" in CAP stands for operational availability, which can subsume how well (according to latency, throughput, etc. metrics) a system is available for operations.Masood Mortazavihttps://www.blogger.com/profile/08360285774352781059noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-67076218036907831572010-12-19T01:15:40.439-08:002010-12-19T01:15:40.439-08:00CAP seems, to me, to mean that you need to trade-o...CAP seems, to me, to mean that you need to trade-off the three dimensions, not that you need to only choose two of the three. The latter view is too simplistic. In reality, each pair of the three can be traded off against each other. (In the triangle model, approaching any of the vertices makes you more distant from the other two.)Masood Mortazavihttps://www.blogger.com/profile/08360285774352781059noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-54717390851677937832010-10-19T08:53:14.170-07:002010-10-19T08:53:14.170-07:00I really liked the PACELC model much better than t...I really liked the PACELC model much better than the CAP explanations I have seen so far. I used it in my recent summary of a Paxos variant for WANs. http://muratbuffalo.blogspot.com/2010/10/mencius-building-efficient-replicated.html<br /><br />Thanks Dan. Great post.Murathttps://www.blogger.com/profile/07842046940394980130noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-28116075980295440402010-09-29T14:13:13.931-07:002010-09-29T14:13:13.931-07:00Good post. I think CAP, like ACID and BASE, is a ...Good post. I think CAP, like ACID and BASE, is a bit contrived. It seems to me that there's a bit of stretch to make the acronyms pronounceable and "catchy". I think it confuses people (self included) because of that. IMHO, CAP, ACID, BASE are all still good for providing context and shouldn't be taken quite so literally. However, I am starting to like PACELC now as well. :-)Unknown DBAhttps://www.blogger.com/profile/05396525237916588223noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-82300673976570562202010-05-04T01:45:50.789-07:002010-05-04T01:45:50.789-07:00One approach in a split-brain situation is (assumi...One approach in a split-brain situation is (assuming at least three datacentres) to use a quorum approach where the service can continue as long as one of the service partitions holds a majority (defined as more than half) of the service. Increasing numbers of data-centres and maintaining an odd number increases the availability to the point where availability would only be compromised in an extremely improbable circumstance.<br /><br />Only where the largest group of service-partitions cannot form a quorum would the service become unavailable. With large (five or more) globally distributed data-centres then if there is no group of three that can communicate then this would suggest some sort of extreme failure that perhaps it's best that the service does not continue.<br /><br />So, it would seem to me that the trade-off in this scenario is one of performance - i.e. in all likely scenarios there will be no loss of consistency nor availability, but simply one of the aggregate performance of the service.Simon Griffithshttps://www.blogger.com/profile/05813723076139741714noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-81607757655674071782010-05-03T11:51:02.374-07:002010-05-03T11:51:02.374-07:00---
The difference between CA and CP hasn't af...---<br />The difference between CA and CP hasn't affected the design of any system that I know of.<br />---<br /><br />I agree with pretty much everything else you said, but I don't agree with this one. If you assume that the network link never partitions, there are simplified system designs you can choose. For example: most EMC storage arrays I know of are 2 node systems at the frontend with a shared SAS network at the backend to get to the actual data on disk, with an extra link between them for private communication. But the 2 nodes don't really worry so much about partition (presumably because it's the same chassis etc). To me, this is a distributed system that chose CA, and not worry about P at all, and I can see how the array design is simplified because of that.<br /><br />But your point is valid, and you can teach a CP system to be CA system with a bit of extra work and no semantic difference w.r.t the client of the system.vgbhttps://www.blogger.com/profile/16532680122488153560noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-9155970626578184792010-04-28T22:39:39.579-07:002010-04-28T22:39:39.579-07:00Hmmm... PACELC has an E for "else" but n...Hmmm... PACELC has an E for "else" but no I for "if" which makes me have to re-interpret the abbreviation every time i look at it. Perhaps this would be more immediately interpretable if it was <br /><br /> P?(A|C):(L|C)<br /><br />;-)Mike McCartneyhttps://www.blogger.com/profile/07741401713418002168noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-9934113578315416602010-04-28T19:04:08.636-07:002010-04-28T19:04:08.636-07:00Well written, and good post. For the lesser intell...Well written, and good post. For the lesser intellects like me, this smoothly presents why developing the current buzz nosql dbs easyAgoglifehttps://www.blogger.com/profile/13468811705676599388noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-80363577260046512642010-04-27T17:55:10.543-07:002010-04-27T17:55:10.543-07:00Excellent. I had the same feeling when I blogged ...Excellent. I had the same feeling when I blogged about classifying systems by CAP. I too found myself struggling to understand the true difference between a CA and a CP system. It always felt like the difference boiled down to the system itself - how it handled the partition so I introduced a fudge factor I called recovery - how does the system recover from the missing CAP. Your solution is better. <br /><br />However I think there may be a missing dimension - causal vs non causal C. It sounds like PNUTS provides ordered C but there are ways to provide causaul updates while relaxing L.Taylorhttps://www.blogger.com/profile/07193759050963768511noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-9731434696847017502010-04-27T05:34:21.626-07:002010-04-27T05:34:21.626-07:00The differences between CA and CP are significant ...The differences between CA and CP are significant to me. <br /><br />Protocols like XA provide CA and protocols like Paxos provide CP. CP & CA systems are running in production today but most of the R&D is on CP.<br /><br />See page 4 from Brewer's keynote for examples of each - http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdfMark Callaghanhttps://www.blogger.com/profile/09590445221922043181noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-90258757347160448312010-04-26T21:20:22.457-07:002010-04-26T21:20:22.457-07:00Alaric, I'm pleased to see that PACELC can be ...Alaric, I'm pleased to see that PACELC can be applied to GenieDB :)<br /><br />Toby, thanks for the data.<br /><br />3a4ot, I think you overestimate the speed of light.<br /><br />Mark,<br /><br />Whether you use traditional definitions or the definitions from the Gilbert and Lynch paper (which I linked to when alluding to the definition of consistency in the PNUTS and CAP section), the difference between CA and CP are are either minor or nonexistent when it comes to building a distributed system. The difference between CA and CP hasn't affected the design of any system that I know of. All the action and decision making is regarding CA/CP vs AP.<br /><br />I strongly agree that 2PC is not necessary for consistency in general (which I think I mentioned in the original post). I also agree that some systems would rather pay a single round trip latency rather than give up consistency in the absence of failures or partitions. PA/EC systems (and even PC/EC systems) certainly have practical applications in the real world.Daniel Abadihttps://www.blogger.com/profile/16753133043157018521noreply@blogger.com