Mar 13, 2024

[DB study] Serializable Snapshot Isolation (SSI) and write skew; how Spanner works.

Reference:
Serializable Snapshot Isolation in PostgreSQL paper 
A Read-Only Transaction Anomaly Under Snapshot Isolation paper
A Critique of ANSI SQL Isolation Levels paper

Two main properties that characterize Snapshot Isolation:

  1. A transaction in Snapshot Isolation has two significant timestamps: the one at which it performs its reads, and the one at which it performs its writes. The read timestamp defines the “consistent snapshot” of the database the transaction sees. If someone else commits writes at a point after a transaction T’s read timestamp, T will not see those changes (this is generally enforced using MVCC. We’ll refer to a transaction named “x” as Tx and its read and write timestamps as Rx and Wx, respectively.
  2. Two transactions are concurrent if the intervals during which they are executing overlap (R1 < W2 and W1 > R2). In Snapshot Isolation, the database enforces that two committed transactions which are concurrent have disjoint write sets (meaning they don’t write to any of the same memory locations). Any transaction whose commit would cause this restriction to be violated is forced to abort and be retried.

Consider two transactions, P and Q. P copies the value in a register x to y, and Q copies the value in a register y to x. There are only two serial executions of these two, P, Q or Q, P. In either, the end result is that x = y. However, Snapshot Isolation allows for another outcome:
  • Transaction P reads x
  • Transaction Q reads y
  • Transaction P writes the value it read to y
  • Transaction Q writes the value it read to x
This is valid in Snapshot Isolation: each transaction maintained a consistent view of the database and its write set didn’t overlap with any concurrent transaction’s write set. Despite this, x and y have been swapped, an outcome not possible in either serial execution.


Conceptually, there are three types of conflicts: 
  • wr-conflicts (Dirty Reads)
  • ww-conflicts (Lost Updates)
  • rw-conflicts.
PostgreSQL SSI data structure:
  • SIREAD locks
    • An SIREAD lock, internally called a predicate lock, is a pair of an object and (virtual) txids that store information about who has accessed which object.
  • rw-conflicts
    • A rw-conflict is a triplet of an SIREAD lock and two txids that reads and writes the SIREAD lock. The CheckForSerializableConflictIn function is invoked whenever either an INSERT, UPDATE, or DELETE command is executed in SERIALIZABLE mode, and it creates rw-conflicts when detecting conflicts by checking SIREAD locks.


SQL syntax:

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.