Tuesday, May 10, 2016

Look Ma No Locks - Act III - "STRONG" Eventual Consistency with CRTDs and other Geeky Stuff

In Act-I we saw that Consistency and Serializability were the goals of our Hero Developer was striving for, the plot started with the basic idea of Two Phase Locking and in Act II we encountered the concept of "Tunable  Consistency" in the form of isolation levels and then stumbled upon the CAP theorem , both very important developments. In this Act III, the story continues with new and exciting developments in research community - namely CRDTs and Co-ordination Free Consistency...

What is this "Eventual Consistency" business?

If our Micro-Services were dealing with Read-Only-Data then we could scale them infinitely by adding one clone after another- all nodes will have same data and every result would always be consistent. But alas - the world changes and so must our data - on each individual node responsible for that piece of data.  The problem is that the change has to "Start" somewhere and then must spread to all other nodes in the cluster and eventually somehow we must detect that change propagation is "Finished". Network is unreliable  and THAT exactly is the curse of being a distributed system, my friends. There is that brief amount of time when the change has been "Started" but not yet "Finished" completely. How do you deal with the next change request already in the queue ?
Now, you can be pessimistic and wait for the first change to Finish Completely before next one Starts. OR - you can be optimistic and assume the Best Case and proceed to adapt the next change before first one Finishes and only deal with any issues when they arise. The intuition here is that the series of changes may be re-ordered or delayed but "Eventually" every one sees the all changes and "deals with it" in case when problems arise. (It is as if my otherwise consistently normal friends behave a little strange when in Las Vegas, but they are eventually back to normal when we are back in Seattle...)
In short here is the reality 
  • By the time you detect the consistency problem, the client requesting the change may be long gone already and you are left alone with inconsistencies  to resolve.
  • Eventual consistency is a meaningless term until you define what it means to "reconcile" the differences/inconsistency. Eg. What happens after you detect that  two customers have added the same last remaining item to the cart or have booked the same last seat on the plane? What is your MERGE strategy ?
  • In eventual consistency you decide to pay very little price upfront for the consistency but your are signing up to pay heavy price later to  reconcile and correct the in-consistency when it arises. The math works as long as the cost of "Fixing" is amortized over mostly problem free executions.
  • It is an absolute requirement  that ultimately all nodes resolve the inconsistency the same way. I repeat - It is an absolute requirement  that ultimately all nodes resolve the inconsistency the same way. It is turtles all the way down - How do all nodes come to know what was the resolution decision and how do you propagate that decision consistently? Fun times ... 
I'm already thinking of writing a next best seller novel based on the effects of eventual consistency and meaning of destiny - Is it cute Ms. Isabella who gets the last seat or it is really her destiny to miss the flight and meet Mr. Right on the next flight ??  ...

Commutativity : 2 + 3 == 5 == 3 + 2

  • What if there was no additional costs to pay when inconsistency was detected in eventually consistent system?
  • What if no coordination was required to resolve inconsistency in a distributed system? 
  • What if every node already knew how to resolve the inconsistency in a  prescribed deterministic ways? 
  • What if there were no bad effects of message re-ordering to begin with ?
Suppose our Hero Developer wins three lotteries in a row in three days . Now, it doesn't matter in what order he gets paid from each lottery. As long as he gets the money in reasonable time - it's all good.  The commutative nature of natural numbers "adds up" to a solid logic based reasoning that eventually our Hero Developer will be rich by the amount that equals to sum of all three lottery prizes !
In 2007 , Marc Shapiro and others at INRIA, formalized  this notion in what is called as "Conflict Free Replicated Data Types" 
This system works in following conditions ...
  • If the system is monotonically increasing , then it never has to "roll-back"
  • And if state transitions are partially ordered and MERGE operations are associativecommutative and idempotent
  • THEN the replicas that have received and applied the same set of updates must immediately have equivalent state. There is no conflict arbitration process, because conflicts do not exist in strongly-consistent systems.
In short, if only operation we allow is INCR(account, amount)  then is monotonically increasing and addition being commutative, associative as long as the operations have "at least once" and "at most once" guarantees (i.e. idempotent) , the system is STRONGLY consistent and co-ordination free !!!

Mr. Luca Pacioli and Double Entry Accounting

The Father of profession of Accounting, Luca Pacioli was a Italian mathematician who invented what is now known as double entry accounting system. This system revolutionized the world of commerce. This system is fundamental to modern civilization as is Newton's Mechanics or theory of Social Contracts. In his system there is a column for Debit and another column for Credit. The total balance is result of subtraction of one column from the other 

How is that relevant ?  

Both debit and credit are monotonically increasing and follows the rules of addition in short can be implemented as CRDTs. And natural number can be represented as a sum of a positive and negative number

What other data structures can be implemented as CRDTs

  • Sets
  • Maps
  • Graphs 

Shipping Pie vs Shipping Pie Recipe 

The theory of CRDT makes a great deal of distinction between shipping the end results and shipping the instructions to achieve the results. While shipping End Result is cumbersome. It is theoretically easy to deal with and easy to prove properties of the system. While shipping recipe is easy it is not as accessible to theoretical treatment. The genius of team who developed CRDTs at INRIA is to show the equivalence of both the approaches.

The story continues next week ....

No comments:

Post a Comment