TEXT   12


Guest on 14th May 2021 12:34:25 AM

  1. Topic: "Strong" Consistency
  2.   Consistency = meaning of operations in the face of concurrency and failure
  3.   Choice trades off between performance and programmer-friendliness
  4.     Huge factor in many designs
  6. Many systems have storage/memory w/ concurrent readers and writers (and can fail)
  7.   Multiprocessors, databases, AFS, lab key-value service
  8.   You often want to improve in ways that risk changing behavior:
  9.     add caching
  10.     split over multiple servers
  11.     replicate for fault tolerance
  12.   How do we know if an optimization is correct?
  13.   We need a way to think about correct (expected) behavior
  14.   Most of these ideas from multiprocessors and databases 20/30 years ago
  16. Linearizability paper focues on abstract data types, we use the same concept but focus on
  17.   key/value storage with simple reads/writes
  19. Naive replicated key-value store
  20.   [diagram]
  21.   C0, M0, C1, M1, LAN
  22.   each machine has a local copy of all key-value content
  23.   read: from local copy
  24.   write: send update msg to each other host (but don't wait)
  25.   fast: never waits for communication
  26.   Does this memory work well?
  28. Example 1:
  29.   initial values are all zeros
  30.   C0:
  31.       PUT("x",1);
  32.       PUT("y",1);
  33.   C1:
  34.     while ((y = GET("y"))!=1);
  35.       x = GET("x");
  36.       print x, y
  38.   Intuitive intent:
  39.     C1 should print x=1, y=1
  41. Problem A:
  42.   [time diagram]
  43.   M0's PUTs of x and y may be interchanged by network
  44.   leaving x unset but y=1
  45.   how to fix? would lab RPC fix?
  47. Naive distributed memory is fast but has unexpected behavior
  48.   maybe it isn't "correct"
  49.   maybe we should never have expected Example 1 to work
  51. How can we write correct distributed programs w/ shared storage?
  52.   Storage (or memory) system promises to behave according to certain rules.
  53.   We write programs assuming those rules.
  54.   Rules are a "consistency model"
  55.   Contract between storage system and programmer
  57. What makes a good consistency model?
  58.   There are no "right" or "wrong" models
  59.   A model may make it harder or easier to program
  60.     i.e. lead to more or less intuitive results
  61.   A model may be harder or easier to implement efficiently
  62.   Also application dependent
  63.     e.g. Web pages vs memory vs. database
  65. What's a strong model like?
  66.   It's easy for users to reason about correctness assuming
  67.      1) sequential behavior and 2) everything has only one-copy
  68.   Intuitively, a user should expect anything that can be explained by
  69.   an **equivalent**  *sequential behavior*
  71. Example 1:
  72.   C0:   WR(x=1) WR_ok(x)   WR(y=1)        WR_ok(y)
  73.   C1:                        RD(y=?)  RD_ok(y=1)  RD(x=?) RD(x=1)
  75.   An equivalent sequential history
  76.   C0:   WR(x=1) WR_ok(x)   WR(y=1)    WR_ok(y)
  77.   C1:                                          RD(y=?)  RD_ok(y=1)  RD(x=?) RD(x=1)
  79.   C0:   WR(x=1)      WR_ok(x)       WR(y=1)        WR_ok(y)
  80.   C1:          RD(y=?)  RD_ok(y=1)      RD(x=?) RD(x=0)
  82.   An "equivalent" sequential history
  83.   C0:                    WR(x=1) WR_ok(x) WR(y=1)  WR_ok(y)
  84.   C1:   RD(x=?)RD_ok(x=0)                                   RD(y=?) RD(y=1)
  87. How to define equivalence?
  88.   Equivalence ==> certain orders in the original history must be maintained by the constructed, hypothetical sequential history
  90. Many possibilities, equivalent sequential history can preserve:
  91.    1 global issuing order
  92.    2 global completion order
  93.    3 per-process issuing/completion order (sequential consistency)
  94.    4 global "completion-to-issuing" order (linearizability)
  96.    1,2 > 4 > 3
  98. 1,2 are impractical to realize in a distributed setting!
  99.   Example: difficulty of 1
  100.   M0: PUT(x)              PUT_ok(x)
  101.   M1:  PUT(y)    PUT_ok(y)
  102.   Put(x) must be ordered before PUT(y), but how does machine M1 even aware of
  103.   another machine M0's PUT request and to pause till M0 is finished?
  104.   (the paper's "blocking/non-blocking" refers to this kind of impracticality)
  106. 3 is practical example implementation: each (non-replicated) server
  107. (responsible for an object) processes the request in FIFO order.
  109. 4 is also practical,
  110.   same example implementation.
  112. The subtle difference between 3 & 4.
  113.   C0:   P(x=1)   P_ok(x)       P(y=1)        P_ok(y)
  114.   C1:                         G(x=?)  G_ok(x=0)      G(y=?) G_ok(y=0)
  115. Legal under 3, but not 4.
  117. Why choosing the stronger 4 over 3?
  118.   * If an application does not have any "external communication" (communication only happens through reads/writes of shared objects), 3 is sufficient.
  119.   * Otherwise, one might see "unexpected behavior".
  120.   In the above history, after C0 has gotten P_ok(x), user C0 calls C1 over the phone (external communication) and tells him to go check the value of x, C1 performs his GET as shown in the history and sees the value of x=0. This is "unexpected behavior" for the application...
  121.   * Hence, sometimes, linearizability is also referred to as "external consistency"
  123. Properties of linearizability:
  124.   * local (if each object is linearizable, then overall system is linearizable)
  125.   --> distribution/scalability is easily realizied by partitioning the responsibility of objects
  127. How to implement 4. with data replication?
  128.   two servers M0, M1, replicating a single object x
  129.   Processing PUTs:
  130.   * can a client send updates to either of the servers?
  131.   * must a client wait till all servers have processed the update?
  133.   Process GETs:
  134.   * can clients send read to either of the servers?
  135.   * can a server always return its current value?
  137. Simple design #D1:
  138.   Clients send all reads/writes at a designated machine, say M0,
  139.   for writes:
  140.      1. M0 forwards writes to M1 and waits for acknowledgement
  141.        2. M0 executes writes locally (in order)
  142.        3. M0 responds to the client
  143.   for reads:
  144.      1. M0 reads its local copy and returns value to the client
  146.   Two notes on step 1 of write:
  147.     - M0 must associate each forwarded write with a proper seqno so M1 processes writes in the same order as M0)
  148.     - M0 must wait for the forwarded write to be safely stored at M1 (hence waiting for M1's acknowledgement). Otherwise, a write might be lost across failure and violates linearizability.
  150.   * D1 is the simple primary/backup replication scheme
  152. Simple design #D2:
  153.   1. process all writes at M0, M0 replicate writes to M1
  154.   2. client waits for updates to M1 to complete.
  155.   3. client can issue read to either M0 or M1.
  157.   #D2 is not linearizable!
  158.   P(x=1)                            P_ok(x=1)
  159.         G(x=?)G_ok(x=1)  
  160.                         G(x=?)G_ok(x=0)
  162.   First GET contacts M0, second GET contacts M1
  164.   If each client sticks to sending all its requests to one machine (but different clients can send to different machines), our implementation is sequentially consistent (but not still linearizable)
  166. How to allow reads to happen at a different node? two ways:
  167.   1. writes occur in two passes
  168.     first phase M0 sends "prepare write"
  169.       second phase M0 sends "commit write"...
  170.       block reads after seeing "prepare write" but before seeing "commit write"
  171.   2. chain replication (google "OSDI 2004 chain replication")
  173. What about multi-processor CPUs?
  174.   * does not do linearizability
  175.   * does not do sequential consistency either
  176.   * more nuanced / subtle
  177.   * Example 1 does not work under multi-processor. use locks when concurrently accessing shared memory!
  178.   * why not?

Raw Paste

Login or Register to edit or fork this paste. It's free.