CAP Theorem

Nowadays, no blurbs about distributed data store systems are complete without mentioning the CAP theorem, which dictates a distributed data store system can have at most two, but never all three, of Consistency, Availability, and Partition tolerance. Usually, partition tolerance is indispensable, so they often make it look like it inevitably boils down to the choice between consistency (in which case the system is labeled “CP”) and availability (“AP”).

Martin Kleppmann, however, argues in “Please stop calling databases CP or AP” that things are not all that simple. He meticulously breaks it down as follows:

  • The actual definitions of consistency, availability and partition tolerance à la CAP theorem are quite narrow and vastly different from what you’d naively expect. The “CP” and “AP” labels thrown around in marketing literature are often not technically correct. Conversely, a system that can only be labeled just “P” in the strict sense may still have practical values.
    • Consistency in CAP has nothing to do with consistency in ACID. The latter seems to be usually defined as a property of a database system where starting in a valid state, it always end in a valid state after a transaction. Consistency in CAP, on the other hand, means linearizability, which is essentially the following:

      If operation B started after operation A successfully completed, then operation B must see the the system in the same state as it was on completion of operation A, or a newer state.

    • Availability in CAP is defined as “every request received by a non-failing [database] node in the system must result in a [non-error] response”. It’s not sufficient for some node to be able to handle the request: any non-failing node needs to be able to handle it. Many so-called “highly available” (i.e. low downtime) systems actually do not meet this definition of availability. Availability in the practical sense does not quite correspond to CAP-availability.
    • Partition Tolerance (terribly mis-named) basically means that you’re communicating over an asynchronous network that may delay or drop messages. The internet and all our datacenters have this property, so you don’t really have any choice in this matter.

Consistency has nuances, which are spelled out in “Consistency Models” by Kyle Kingsbury of Jepsen.

コメントを残す

以下に詳細を記入するか、アイコンをクリックしてログインしてください。

WordPress.com ロゴ

WordPress.com アカウントを使ってコメントしています。 ログアウト /  変更 )

Twitter 画像

Twitter アカウントを使ってコメントしています。 ログアウト /  変更 )

Facebook の写真

Facebook アカウントを使ってコメントしています。 ログアウト /  変更 )

%s と連携中

このサイトはスパムを低減するために Akismet を使っています。コメントデータの処理方法の詳細はこちらをご覧ください