Read and write considered harmful - Hubert Matthews [ACCU 2018]
Could be the best presentation in this year's ACCU talk.
Also reference:
Mike Acton's Data-oriented Design cppcon 2014 presentation
Notes below:
Could be the best presentation in this year's ACCU talk.
Also reference:
Mike Acton's Data-oriented Design cppcon 2014 presentation
Notes below:
Basics of Read/Write
- Business processes, rules and schemas
- Performance, scaling, concurrency
- Six questions about data
- Asynchrony and queues
- Structure changing
- State management
Reads
- Can be cached, at multiple levels (e.g CPU)
- Caching is transparent(mostly)
- Idempotent (can be retried without side-effects)
- Can be partitioned easily (routing, i.e load balancer, API gateway etc.)
- Access control rules only
- Synchronous and blocking
- Scalable bandwidth - fanout reduces contention
Writes
- Caching is horrible for writes
- writeback/through, eviction policy, coherence, etc.
- Scaling writes is horrible - fan-in creates contention
- Sharding works well only for primary key writes
- Access control rules plus update rules
- Can be delayed or asynchronous (write back cache, i.e store buffer)
- Idempotent is a design issue/choice
Dependencies
- Even simple code has dependencies caused by read and write
- Asymmetric - caller object doesn't know who calls it
- Makes testing difficult
- Substitution
- Mocks, etc
- Introduces notion of push and pull
REST APIs and rules
- REST = getters/setters on steroids
- Industrial-scale anti-pattern
- Separates code and data
- Opposite of encapsulation
- Very non-OO
- Duplicated logic/rules in every client
- e.g stock_level >= 0
- If rule is broken
Typical system scaling path (important!)
Think it in the WRONG WAY.
- Application is too slow
- Get more front-end boxes(scale R + W)
- Application is still too slow
- Get bigger DB box(scale R + W)
- Run out of read bandwidth
- Replicate or cache data(Scale R)
- Run out of write bandwidth
- Shard/Partition data on primary key(scale R + W)
- Create separate services per entity/component
- Cross-service joins done in client(scale entities)
Scaling problems
- Partitioning or sharding works to an extent
- If access is strongly biased around primary key
- 例如: access flight dates(平均平坦) vs. access every days meal(sequential even if hashed keys)
- Nasty to scale cross-partition operations
- Particularly for write (usually not idempotent)
- Partial failure on write, cross-box transactions, concurrency, latency, etc.
- Service boundaries aligned with operations boundaries and failure boundaries
- Bulk access to multiple records can cause N+1 access problems
(get primary keys then N single-row accesses:
beware of REST and ORMs) 可用batch access解決.
Avoid sharing mutable data
- Shared mutable data is the evil of all computing
- Read-only data can be shared safely without locks
- Const is our friend!
- Pure message-passing approach avoids this
Shared writes don't scale
Why shared writes don't scale?
MESI / Cores- Caches have to communicate to ensure coherent view
- MESI protocol passes messages between caches
- Shared writes limited by MESI communication
- JUST DON'T DO SHARE WRITE!
6 questions about data access
Primary key: hashing, partitioning, op==
- Most common from of access
- database primary key
- std::map/unordered_map
- Can use hashing [O(1)], binary search [O(logN)] or linear search for small N [O(N)]
- Requires only operator ==
- Partition on PK into multiple parts that can operate in parallel or to avoid contention
e.g Product catalogue, customer records, sticky web sessions, NoSQL, memcache, REST
Non-Primary key: search, secondary indexes, full-text search
- Finding items by value, not by key
- Need for secondary indexes(e.g database indexes)
- Search on parts of a record
- Metadata search (data/time, etc)
- Full-text search
- May require substantially more work to build compared to PK-based access
- Usually slower than PK access for lookup
Range scans: ordering, iteration, bulk vs. single values, op<
- Requires ordering, i.e operator<, (ordering costs)
- Requires iterators/cursors/traversal state
- Seek then scan - first find is slow, then fast
- Dense linear access and prefetch
- Watch out for read/write amplification
- Bulk, not single record, access - may require bulk aggregate operations rather than N times single record operations for speed (DB N+1 problem)
R/W ratio: caching, cost of lookups vs. cost of updates
- Not all data in a system has similar R/W ratios
- e.g metadata is often read-heavy
- High reads
- caches are effective
- cache writethrough/back
- cache eviction policy
- cache coherency
- index structures useful
- High writes
- caches don't help much (except for metadata)
- locking overhead
- index structure require updating
Working set: how big is the commonly accessed set of data(RAM)
- How much of the common data will fit in main memory, the L1/L2/L3 cache
- Will the index structures fit but not the main data
- Index data tends to be 'HOT', main data may be 'COLD'
- Depends on data access patterns - 80/20 rule
Consistency: exact results vs. fast approximations, eventual consistency, replication, batched updates
- Do all copies of the data need to be exactly up-to-date right now
- ACID, 2-phase commit, centralised, locking, slow
- How often are copies updated
- Batch updates (e.g overnight)
- Data vs. metadata (transactions vs. reference data)
- ACID vs. BASE(eventual consistency)
- BASE allows for decoupled asynchronous systems
Read or write, pull or push?
- How to read a diagram?
- Is X reading Y or writing to it?
- Or both at different times?
- Is X pushing or Y pulling?
- Where is the THREAD of control?
- Is this a full batch update or a partial incremental change?
- Is this an asynchronous push (fire and forget) or a synchronous blocking call?
- When? Who? What?
- Horizontal dataflow thinking!
Full vs. incremental change
- Full changes allow the state to be reset on a regular basis
- Prevents build-up of errors or divergence from base data
- Slow, long latency, partial failure problems
- One big transaction
- Incremental changes are fast but don't guarantee to keep state changes synchronised
- lost messages because of unavailability
- transactionality only on each update
Lambda architecture
Reader/writer vs. data flow
- Readers and writers view hides inherent data flow in systems
- Split R and W into micro-services
i.e Command Query Responsibility Segregation(CQRS) - separates rules, performance, scaling, access control, etc.
- Often W and R are very different
- W usually reads from somewhere
Sync vs. async systems
- ACID is hard to scale, partition, get right, can promote failures, makes for a more fragile system as everything has to be up all the time(brittle)
- 10 sync systems with 99% uptime => 90% uptime
- 10 async systems => 99% uptime for end system
- Can maybe delay writes or cover them up
(lambda arch, i.e layer by layer) - Reads are sync but recent data may be sufficient, particularly for metadata(caching helps lots!)
Data flow and sync/async
- Data flows can be point-to-point or broadcast
- Can be synchronous or asynchronous
- message queues provide simple sync intermediary
- flat file batch transfer is popular for a reason
- Queues can also be event stores with reread!!(RAFT)
- Makes queue reads idempotent
Content management example
- Keep two APIs separate
- security, clarity of purpose
- separation of concerns
- horizontal not vertical thinking!!
Larger example
- Data flow makes us think about where data comes from and goes to
- Who is actually going to read the data I'm writing?
- Micro-services may have two APIs for sending and receiving
- 'Vertical' thinking may lead to trying to fit both into one API
When? Who? What?
Horizontal dataflow thinking!
Command Query Representation Separation (CQRS)
- Need to change the data structure from the form that best suits the write API to the structure that best suits the read API
- Can be synchronous or asynchronous transformation
- Think as READ and WRITE separately
CQRS examples
- Structure change can be on read or on write
- Twitter does it on read for high-value users, not write
- Log-structured merge systems do this internally and asynchronously (e.g HBase, RocksDB)
- Other examples:
- time-series db
- event sourcing
- struct-of-arrays vs. array-of-structs(e.g non-OO, data oriented)
- Columnar analytics db(e.g BigQuery)
State management
- Read and write focus doesn't help manage state across a complex system
- State management needs to address:
- transactions vs. eventual consistency
- failure management
- availability and MTTR (Mean Time To Repair)
- MTBF(Mean Time Between Failure)
- immutability
- Align boundaries with failure and aggregate boundaries
- REST API: /resourceA/1/resourceB/2
- Fragmentation and transactionality problems
Failure management
- Distributed systems can suffer from partial failures on writes
- Writes in distributed are inherently concurrent
- Recovery and re-sync state is 'fun'
- Idempotent writes allow for replay and deduping
- Make deduping easy: serial number, timestamp, etc.
- Repeatable queues are useful (e.g Kafka, flat files)
- Check-pointing of known good state
- Point-in-time recovery
- Full vs. incremental update problem again
Bell-LaPadula and Biba models
- Bell-LaPadula: confidentiality
- Biba: integrity
You CANNOT have BOTH.
Immutability (makes things simpler)
- Functional programming languages have immutable data
- Make sharing and reasoning about data easier
- Russian Doll caching, MVCC
- Change the key not the value
- Pets vs. cattle - infrastructure
- SSA - compilers and CPI reservation stations
- Lambda architecture - immutable master data
Availability
- Availability = MTBF / (MTBF + MTTR)
(where MTBF = mean time between failures
MTTR = mean time to repair) - Maximise software practices, hardware failover, reliable well-known technology choices
- Minimise MTTR by making systems easier to understand, debug and restart
- minimise state management(redo logs, fsck, etc)
- use immutability where possible
Read and write are too low level
- They don't help us to design or analyse systems
- they are the assembler-level of data (CRUD)
- They don't relate to the larger picture
- It is too easy to deal with them in isolation
- REST APIs are an all-too common example
- Looking at data flow, push vs. pull, sync/async, business processes, operational profiles, state management, etc. are much more fruitful approaches
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.