Ataraxia through Epoché: [db]

safetime vs. max_applied_timestamp:

Safetime calculated:

now + e

Concurrency::MinReadTimestamp:

ST::Internal:

MinReadTimestamp and _span.GroupLookup:

CDT

PlacementProto::placement_id

_RangeMapElements

Protogag vs. Family ID

Tablet Key (SplitId, the ssformat encoded key, and the FamilyId.)

Since ssformat has column info, why family ID needed?

ERE

Constraints

In db, a Constraint Failure occurs when a data modification operation (like an insert, update, or delete) attempts to make a change that violates the rules defined in your database schema. These constraints are in place to maintain data integrity.

When a constraint is violated, db will typically cause the transaction to fail, returning an error status indicating the nature of the failure.

While there isn't a single enum listing all possible "ConstraintFailure types," they can be categorized by the type of schema rule being violated. Here are common types of constraints and the errors they might produce:

Primary Key Constraints:

Violation: Attempting to insert a row with a Primary Key that already exists.

Typical Error Code: ALREADY_EXISTS

Unique Index Constraints:

Violation: Attempting to insert or update a row where the value(s) in the unique index columns duplicate those of an existing row.

Typical Error Code: ALREADY_EXISTS

NOT NULL Constraints:

Violation: Attempting to insert a row without providing a value for a column marked as NOT NULL, or attempting to update such a column to NULL.

Typical Error Code: INVALID_ARGUMENT or BAD_USAGE

Foreign Key Constraints:

Violation:

Inserting a row in the child table with a foreign key value that doesn't exist in the referenced parent table.

Deleting or updating a row in the parent table that is still referenced by rows in the child table (without appropriate ON DELETE / ON UPDATE actions defined).

Typical Error Code: NOT_FOUND (when the referenced row is missing) or FAILED_PRECONDITION (when an existing reference blocks deletion).

Check Constraints:

Violation: Attempting to insert or update a column value in a way that violates a boolean expression defined as a CHECK constraint on the table.

Typical Error Code: OUT_OF_RANGE or FAILED_PRECONDITION

Row Existence:

Violation: Attempting to update a row that does not exist.

Typical Error Code: NOT_FOUND

Db uses a combination of canonical error codes (like those listed above, see go/cspanner/error-codes) and specific error messages to detail the cause of the constraint failure.

PAXOS

Spanner's Paxos implementation doesn't have a dedicated RPC specifically named "Heartbeat".
The mechanism to ensure the leader maintains its status and replicas remain aware of the active leader is integral to the leader lease system.

The RPC that functions most like a heartbeat, used for explicit lease renewal in the absence of write traffic, is _paxos.UpdateLease.

Purpose: The elected leader replica sends _paxos.UpdateLease RPCs to the other replicas within the Paxos group. This is done to explicitly renew its leadership lease, particularly when there have been no recent _paxos.Propose messages. Successful _paxos.Propose calls also implicitly renew the leader's lease. The _paxos.UpdateLease calls prevent other replicas from thinking the leader is down and attempting to start a new election.

Proto Buffer: The request message for this RPC is spn.PaxosUpdateLeaseRequest. This message is defined in the file cp_paxos.proto.
The PaxosUpdateLeaseRequest includes a PaxosMsgHeader and an enum Mode, which can be:

NORMAL: Used for routine lease renewal.
RELEASE: Used by a leader to relinquish its lease.
HANDOFF: Used during a graceful leader change.

Where is it sent?: These RPCs are sent internally between the Spanner servers (span_servers) that host the tablets belonging to a specific Paxos group. The current leader sends the requests to the other replicas in the group. These communications use Spanner's internal coprocessor RPC framework.

In essence, _paxos.UpdateLease ensures lease maintenance and leader stability, serving a heartbeat-like function for the Paxos leader lease.

Raft "Term" vs. Spanner Paxos "ViewId":

In Raft, time is divided into terms, each starting with an election. Terms are numbered monotonically and are crucial for distinguishing between stale leaders or messages from previous leadership periods.

In Spanner's Paxos, the equivalent concept is the ViewId. Each leader election or attempt is associated with a ViewId. A replica will generally only accept messages or grant leadership to a proposal with a ViewId greater than or equal to the highest ViewId it has previously observed. This mechanism, like Raft terms, ensures that outdated leaders are superseded. One of the search results explicitly states: "Last observed PAXOS view ID (Same as Raft 's term)".

Raft "Log Index" vs. Spanner Paxos "Sequence Number":

Raft maintains a replicated log where each entry is identified by a sequential log index. This index imposes a total order on the log entries.

Spanner's Paxos also orders proposals using a Sequence Number.
When a leader bundles mutations into proposals, each proposal is assigned a sequence number. These proposals are applied to the state machine (the tablet's data) in the order of their sequence numbers, ensuring all replicas apply changes consistently. The _PaxosLog table stores these proposals, ordered by their sequence number.

Both consensus algorithms rely on these mechanisms (under different names) to ensure a consistent ordering of operations and to handle leadership changes safely.

Spanner's Paxos implementation has mechanisms to handle leader election timeouts and potential ties:

Leader Election Timeout:

Lease-Based Leadership:
A leader maintains its status as long as it holds active leases from a quorum of replicas in the group. These leases have a duration (e.g., 10 seconds, as mentioned in one of the slide decks).

Lease Renewal: Leaders continuously renew their leases. This happens implicitly with successful write proposals (Propose RPCs) or explicitly through UpdateLease RPCs if there are no writes as the lease expiration approaches.

Detecting Timeouts: Non-leader replicas periodically check for the presence of an active leader using _paxos.Query RPCs. If a leader fails to renew its leases (e.g., due to a crash or network partition), its lease expires. Other replicas will detect the absence of an active leader.

Triggering New Election:
A replica that doesn't detect an active leader will transition into the Booting state and may then enter the Takeover state to try and become the new leader by sending NewLeader RPCs to gain a quorum of votes.

Handling Equal Votes / Ties ("Duels"):

The Problem: Situations can arise where multiple replicas attempt to become leader simultaneously, potentially leading to a split vote where no single replica can achieve a quorum. This is referred to as a "Duel."

Resolution Mechanisms:
Random Backoff: Replicas attempting to become leader will wait for a random delay before retrying if their attempt fails or contention is detected. The TakeoverBackoffState and LeaseAlarmBootingSleepState exist to desynchronize these attempts.

Increasing View Numbers: Each election attempt uses a ViewId. A replica will only vote for a potential leader if the ViewId in the NewLeader request is greater than or equal to the highest ViewId it has ever acked. When an election attempt fails due to a split vote or seeing a higher ViewId, a replica will retry with an even higher ViewId.

Unique Replica ID: To break ties among identical view numbers, unique replica IDs are used, ensuring that even in a perfectly synchronized scenario, one replica will have a deciding factor.
In summary, leader timeouts are managed through the lease mechanism, triggering new elections when leases expire. Ties or election duels are resolved through a combination of random backoffs, monotonically increasing view numbers, and unique replica identifiers to ensure that a single leader can be elected.

Ataraxia through Epoché

Feb 6, 2025

[db]