Ataraxia through Epoché: [accu2018][design] Read/Write thinking would be harmful for system design

May 18, 2018

[accu2018][design] Read/Write thinking would be harmful for system design

Read and write considered harmful - Hubert Matthews [ACCU 2018]

Could be the best presentation in this year's ACCU talk.

Also reference:
Mike Acton's Data-oriented Design cppcon 2014 presentation

Notes below:

Basics of Read/Write

Business processes, rules and schemas
Performance, scaling, concurrency
Six questions about data
Asynchrony and queues
Structure changing
State management

Reads

Can be cached, at multiple levels (e.g CPU)
Caching is transparent(mostly)
Idempotent (can be retried without side-effects)
Can be partitioned easily (routing, i.e load balancer, API gateway etc.)
Access control rules only
Synchronous and blocking
Scalable bandwidth - fanout reduces contention

Writes

Caching is horrible for writes

writeback/through, eviction policy, coherence, etc.

Scaling writes is horrible - fan-in creates contention
Sharding works well only for primary key writes
Access control rules plus update rules
Can be delayed or asynchronous (write back cache, i.e store buffer)
Idempotent is a design issue/choice

Dependencies

Even simple code has dependencies caused by read and write
Asymmetric - caller object doesn't know who calls it
Makes testing difficult

Substitution
Mocks, etc

Introduces notion of push and pull

REST APIs and rules

REST = getters/setters on steroids

Industrial-scale anti-pattern
Separates code and data
Opposite of encapsulation
Very non-OO
Duplicated logic/rules in every client
e.g stock_level >= 0
If rule is broken

Tradeoffs: early/late validation failure, schema migration, versioning, code/rule duplication

REST = CRUD

statechart has one state and four transitions
OK for metadata
Real processes have multiple states

Have different entities per state

(Possiblly sub-entities)

Typical system scaling path (important!)

Think it in the WRONG WAY.

Application is too slow
Get more front-end boxes(scale R + W)

Application is still too slow
Get bigger DB box(scale R + W)

Run out of read bandwidth
Replicate or cache data(Scale R)

Run out of write bandwidth
Shard/Partition data on primary key(scale R + W)

Create separate services per entity/component
Cross-service joins done in client(scale entities)

Scaling problems

Partitioning or sharding works to an extent

If access is strongly biased around primary key
例如: access flight dates(平均平坦) vs. access every days meal(sequential even if hashed keys)

Nasty to scale cross-partition operations

Particularly for write (usually not idempotent)
Partial failure on write, cross-box transactions, concurrency, latency, etc.
Service boundaries aligned with operations boundaries and failure boundaries

Bulk access to multiple records can cause N+1 access problems
(get primary keys then N single-row accesses:
beware of REST and ORMs) 可用batch access解決.

Avoid sharing mutable data

Shared mutable data is the evil of all computing
Read-only data can be shared safely without locks
Const is our friend!
Pure message-passing approach avoids this

Shared writes don't scale

In memory programming
0 scalability!

Why shared writes don't scale?

MESI / Cores

Caches have to communicate to ensure coherent view
MESI protocol passes messages between caches
Shared writes limited by MESI communication
JUST DON'T DO SHARE WRITE!

6 questions about data access

Primary key: hashing, partitioning, op==

Most common from of access

database primary key
std::map/unordered_map

Can use hashing [O(1)], binary search [O(logN)] or linear search for small N [O(N)]
Requires only operator ==
Partition on PK into multiple parts that can operate in parallel or to avoid contention
e.g Product catalogue, customer records, sticky web sessions, NoSQL, memcache, REST

Non-Primary key: search, secondary indexes, full-text search

Finding items by value, not by key
Need for secondary indexes(e.g database indexes)
Search on parts of a record
Metadata search (data/time, etc)
Full-text search
May require substantially more work to build compared to PK-based access
Usually slower than PK access for lookup

Range scans: ordering, iteration, bulk vs. single values, op<

Requires ordering, i.e operator<, (ordering costs)
Requires iterators/cursors/traversal state
Seek then scan - first find is slow, then fast
Dense linear access and prefetch
Watch out for read/write amplification
Bulk, not single record, access - may require bulk aggregate operations rather than N times single record operations for speed (DB N+1 problem)

R/W ratio: caching, cost of lookups vs. cost of updates

Not all data in a system has similar R/W ratios

e.g metadata is often read-heavy

High reads

caches are effective
cache writethrough/back
cache eviction policy
cache coherency
index structures useful

High writes

caches don't help much (except for metadata)
locking overhead
index structure require updating

Working set: how big is the commonly accessed set of data(RAM)

How much of the common data will fit in main memory, the L1/L2/L3 cache
Will the index structures fit but not the main data

Index data tends to be 'HOT', main data may be 'COLD'

Depends on data access patterns - 80/20 rule

Consistency: exact results vs. fast approximations, eventual consistency, replication, batched updates

Do all copies of the data need to be exactly up-to-date right now

ACID, 2-phase commit, centralised, locking, slow

How often are copies updated
Batch updates (e.g overnight)
Data vs. metadata (transactions vs. reference data)
ACID vs. BASE(eventual consistency)

BASE allows for decoupled asynchronous systems

Read or write, pull or push?

How to read a diagram?
Is X reading Y or writing to it?
Or both at different times?
Is X pushing or Y pulling?
Where is the THREAD of control?
Is this a full batch update or a partial incremental change?
Is this an asynchronous push (fire and forget) or a synchronous blocking call?
When? Who? What?
Horizontal dataflow thinking!

Full vs. incremental change

Full changes allow the state to be reset on a regular basis

Prevents build-up of errors or divergence from base data
Slow, long latency, partial failure problems
One big transaction

Incremental changes are fast but don't guarantee to keep state changes synchronised

lost messages because of unavailability
transactionality only on each update

Lambda architecture

Reader/writer vs. data flow

Readers and writers view hides inherent data flow in systems
Split R and W into micro-services
i.e Command Query Responsibility Segregation(CQRS)

separates rules, performance, scaling, access control, etc.
Often W and R are very different
W usually reads from somewhere

Sync vs. async systems

ACID is hard to scale, partition, get right, can promote failures, makes for a more fragile system as everything has to be up all the time(brittle)

10 sync systems with 99% uptime => 90% uptime
10 async systems => 99% uptime for end system

Can maybe delay writes or cover them up
(lambda arch, i.e layer by layer)
Reads are sync but recent data may be sufficient, particularly for metadata(caching helps lots!)

Data flow and sync/async

Data flows can be point-to-point or broadcast
Can be synchronous or asynchronous

message queues provide simple sync intermediary
flat file batch transfer is popular for a reason

Queues can also be event stores with reread!!(RAFT)

Makes queue reads idempotent

Content management example

Keep two APIs separate

security, clarity of purpose
separation of concerns
horizontal not vertical thinking!!

Larger example

Data flow makes us think about where data comes from and goes to

Who is actually going to read the data I'm writing?

Micro-services may have two APIs for sending and receiving

'Vertical' thinking may lead to trying to fit both into one API

When? Who? What?

Horizontal dataflow thinking!

Command Query Representation Separation (CQRS)

Need to change the data structure from the form that best suits the write API to the structure that best suits the read API
Can be synchronous or asynchronous transformation
Think as READ and WRITE separately

CQRS examples

Structure change can be on read or on write

Twitter does it on read for high-value users, not write

Log-structured merge systems do this internally and asynchronously (e.g HBase, RocksDB)
Other examples:

time-series db
event sourcing
struct-of-arrays vs. array-of-structs(e.g non-OO, data oriented)
Columnar analytics db(e.g BigQuery)

State management

Read and write focus doesn't help manage state across a complex system
State management needs to address:

transactions vs. eventual consistency
failure management
availability and MTTR (Mean Time To Repair)

MTBF(Mean Time Between Failure)

immutability

Align boundaries with failure and aggregate boundaries

REST API: /resourceA/1/resourceB/2
Fragmentation and transactionality problems

Failure management

Distributed systems can suffer from partial failures on writes

Writes in distributed are inherently concurrent
Recovery and re-sync state is 'fun'
Idempotent writes allow for replay and deduping
Make deduping easy: serial number, timestamp, etc.
Repeatable queues are useful (e.g Kafka, flat files)

Check-pointing of known good state

Point-in-time recovery
Full vs. incremental update problem again

Bell-LaPadula and Biba models

Bell-LaPadula: confidentiality
Biba: integrity

You CANNOT have BOTH.

Immutability (makes things simpler)

Functional programming languages have immutable data

Make sharing and reasoning about data easier

Russian Doll caching, MVCC

Change the key not the value

Pets vs. cattle - infrastructure
SSA - compilers and CPI reservation stations
Lambda architecture - immutable master data

Availability

Availability = MTBF / (MTBF + MTTR)
(where MTBF = mean time between failures
MTTR = mean time to repair)
Maximise software practices, hardware failover, reliable well-known technology choices
Minimise MTTR by making systems easier to understand, debug and restart

minimise state management(redo logs, fsck, etc)
use immutability where possible

Read and write are too low level

They don't help us to design or analyse systems

they are the assembler-level of data (CRUD)

They don't relate to the larger picture
It is too easy to deal with them in isolation

REST APIs are an all-too common example

Looking at data flow, push vs. pull, sync/async, business processes, operational profiles, state management, etc. are much more fruitful approaches

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Subscribe to: Post Comments (Atom)