May 18, 2018

[accu2018][design] Read/Write thinking would be harmful for system design

Read and write considered harmful - Hubert Matthews [ACCU 2018]

Could be the best presentation in this year's ACCU talk.

Also reference:
Mike Acton's Data-oriented Design cppcon 2014 presentation

Notes below:

Basics of Read/Write

  • Business processes, rules and schemas
  • Performance, scaling, concurrency
  • Six questions about data
  • Asynchrony and queues
  • Structure changing
  • State management

Reads

  • Can be cached, at multiple levels (e.g CPU)
  • Caching is transparent(mostly)
  • Idempotent (can be retried without side-effects)
  • Can be partitioned easily (routing, i.e load balancer, API gateway etc.)
  • Access control rules only
  • Synchronous and blocking
  • Scalable bandwidth - fanout reduces contention


Writes

  • Caching is horrible for writes
    • writeback/through, eviction policy, coherence, etc.
  • Scaling writes is horrible - fan-in creates contention
  • Sharding works well only for primary key writes
  • Access control rules plus update rules
  • Can be delayed or asynchronous (write back cache, i.e store buffer)
  • Idempotent is a design issue/choice

Dependencies

  • Even simple code has dependencies caused by read and write
  • Asymmetric - caller object doesn't know who calls it
  • Makes testing difficult
    • Substitution
    • Mocks, etc
  • Introduces notion of push and pull

REST APIs and rules

  • REST = getters/setters on steroids
    • Industrial-scale anti-pattern
    • Separates code and data
    • Opposite of encapsulation
    • Very non-OO
    • Duplicated logic/rules in every client
    • e.g stock_level >= 0
    • If rule is broken
Tradeoffs: early/late validation failure, schema migration, versioning, code/rule duplication

REST = CRUD

  • statechart has one state and four transitions
  • OK for metadata
  • Real processes have multiple states
    • Have different entities per state
      • (Possiblly sub-entities)

Typical system scaling path (important!)

Think it in the WRONG WAY.
  • Application is too slow
  • Get more front-end boxes(scale R + W)
    • Application is still too slow
    • Get bigger DB box(scale R + W)
      • Run out of read bandwidth
      • Replicate or cache data(Scale R)
        • Run out of write bandwidth
        • Shard/Partition data on primary key(scale R + W)
          • Create separate services per entity/component
          • Cross-service joins done in client(scale entities)


Scaling problems

  • Partitioning or sharding works to an extent
    • If access is strongly biased around primary key
    • 例如: access flight dates(平均平坦) vs. access every days meal(sequential even if hashed keys)
  • Nasty to scale cross-partition operations
    • Particularly for write (usually not idempotent)
    • Partial failure on write, cross-box transactions, concurrency, latency, etc.
    • Service boundaries aligned with operations boundaries and failure boundaries
  • Bulk access to multiple records can cause N+1 access problems
    (get primary keys then N single-row accesses:
    beware of REST and ORMs) 可用batch access解決.

Avoid sharing mutable data

  • Shared mutable data is the evil of all computing
  • Read-only data can be shared safely without locks
  • Const is our friend!
  • Pure message-passing approach avoids this


Shared writes don't scale

  • In memory programming
  • 0 scalability!

Why shared writes don't scale?

MESI / Cores
  • Caches have to communicate to ensure coherent view
  • MESI protocol passes messages between caches
  • Shared writes limited by MESI communication
  • JUST DON'T DO SHARE WRITE!



6 questions about data access

Primary key: hashing, partitioning, op==

  • Most common from of access
    • database primary key
    • std::map/unordered_map
  • Can use hashing [O(1)], binary search [O(logN)] or linear search for small N [O(N)]
  • Requires only operator ==
  • Partition on PK into multiple parts that can operate in parallel or to avoid contention
    e.g Product catalogue, customer records, sticky web sessions, NoSQL, memcache, REST

Non-Primary key: search, secondary indexes, full-text search

  • Finding items by value, not by key
  • Need for secondary indexes(e.g database indexes)
  • Search on parts of a record
  • Metadata search (data/time, etc)
  • Full-text search
  • May require substantially more work to build compared to PK-based access
  • Usually slower than PK access for lookup

Range scans: ordering, iteration, bulk vs. single values, op<

  • Requires ordering, i.e operator<, (ordering costs)
  • Requires iterators/cursors/traversal state
  • Seek then scan - first find is slow, then fast
  • Dense linear access and prefetch
  • Watch out for read/write amplification
  • Bulk, not single record, access - may require bulk aggregate operations rather than N times single record operations for speed (DB N+1 problem)

R/W ratio: caching, cost of lookups vs. cost of updates

  • Not all data in a system has similar R/W ratios
    • e.g metadata is often read-heavy
  • High reads
    • caches are effective
    • cache writethrough/back
    • cache eviction policy
    • cache coherency
    • index structures useful
  • High writes
    • caches don't help much (except for metadata)
    • locking overhead
    • index structure require updating

Working set: how big is the commonly accessed set of data(RAM)

  • How much of the common data will fit in main memory, the L1/L2/L3 cache
  • Will the index structures fit but not the main data
    • Index data tends to be 'HOT', main data may be 'COLD'
  • Depends on data access patterns - 80/20 rule

Consistency: exact results vs. fast approximations, eventual consistency, replication, batched updates

  • Do all copies of the data need to be exactly up-to-date right now
    • ACID, 2-phase commit, centralised, locking, slow
  • How often are copies updated
  • Batch updates (e.g overnight)
  • Data vs. metadata (transactions vs. reference data)
  • ACID vs. BASE(eventual consistency)
    • BASE allows for decoupled asynchronous systems

Read or write, pull or push?

  • How to read a diagram?
  • Is X reading Y or writing to it?
  • Or both at different times?
  • Is X pushing or Y pulling?
  • Where is the THREAD of control?
  • Is this a full batch update or a partial incremental change?
  • Is this an asynchronous push (fire and forget) or a synchronous blocking call?
  • When? Who? What?
  • Horizontal dataflow thinking!

Full vs. incremental change

  • Full changes allow the state to be reset on a regular basis
    • Prevents build-up of errors or divergence from base data
    • Slow, long latency, partial failure problems
    • One big transaction
  • Incremental changes are fast but don't guarantee to keep state changes synchronised
    • lost messages because of unavailability
    • transactionality only on each update

Lambda architecture


Reader/writer vs. data flow

  • Readers and writers view hides inherent data flow in systems
  • Split R and W into micro-services
    i.e Command Query Responsibility Segregation(CQRS)
    • separates rules, performance,  scaling, access control, etc.
    • Often W and R are very different
    • W usually reads from somewhere

Sync vs. async systems

  • ACID is hard to scale, partition, get right, can promote failures, makes for a more fragile system as everything has to be up all the time(brittle)
    • 10 sync systems with 99% uptime => 90% uptime
    • 10 async systems => 99% uptime for end system
  • Can maybe delay writes or cover them up
    (lambda arch, i.e layer by layer)
  • Reads are sync but recent data may be sufficient, particularly for metadata(caching helps lots!)

Data flow and sync/async

  • Data flows can be point-to-point or broadcast
  • Can be synchronous or asynchronous
    • message queues provide simple sync intermediary
    • flat file batch transfer is popular for a reason
  • Queues can also be event stores with reread!!(RAFT)
    • Makes queue reads idempotent

Content management example

  • Keep two APIs separate
    • security, clarity of purpose
    • separation of concerns
    • horizontal not vertical thinking!!


Larger example

  • Data flow makes us think about where data comes from and goes to
    • Who is actually going to read the data I'm writing?
  • Micro-services may have two APIs for sending and receiving
    • 'Vertical' thinking may lead to trying to fit both into one API


When? Who? What?
Horizontal dataflow thinking!

Command Query Representation Separation (CQRS)

  • Need to change the data structure from the form that best suits the write API to the structure that best suits the read API
  • Can be synchronous or asynchronous transformation
  • Think as READ and WRITE separately

CQRS examples

  • Structure change can be on read or on write
    • Twitter does it on read for high-value users, not write
  • Log-structured merge systems do this internally and asynchronously (e.g HBase, RocksDB)
  • Other examples:
    • time-series db
    • event sourcing
    • struct-of-arrays vs. array-of-structs(e.g non-OO, data oriented)
    • Columnar analytics db(e.g BigQuery)

State management

  • Read and write focus doesn't help manage state across a complex system
  • State management needs to address:
    • transactions vs. eventual consistency
    • failure management
    • availability and MTTR (Mean Time To Repair)
      • MTBF(Mean Time Between Failure)
    • immutability
  • Align boundaries with failure and aggregate boundaries
    • REST API: /resourceA/1/resourceB/2
    • Fragmentation and transactionality problems

Failure management

  • Distributed systems can suffer from partial failures on writes
    • Writes in distributed are inherently concurrent
    • Recovery and re-sync state is 'fun'
    • Idempotent writes allow for replay and deduping
    • Make deduping easy: serial number, timestamp, etc.
    • Repeatable queues are useful (e.g Kafka, flat files)
  • Check-pointing of known good state
    • Point-in-time recovery
    • Full vs. incremental update problem again

Bell-LaPadula and Biba models

  • Bell-LaPadula: confidentiality
  • Biba: integrity
You CANNOT have BOTH.

Immutability (makes things simpler)

  • Functional programming languages have immutable data
    • Make sharing and reasoning about data easier
  • Russian Doll caching, MVCC
    • Change the key not the value
  • Pets vs. cattle - infrastructure
  • SSA - compilers and CPI reservation stations
  • Lambda architecture - immutable master data

Availability

  • Availability = MTBF / (MTBF + MTTR)
    (where MTBF = mean time between failures
    MTTR = mean time to repair)
  • Maximise software practices, hardware failover, reliable well-known technology choices
  • Minimise MTTR by making systems easier to understand, debug and restart
    • minimise state management(redo logs, fsck, etc)
    • use immutability where possible

Read and write are too low level

  • They don't help us to design or analyse systems
    • they are the assembler-level of data (CRUD)
  • They don't relate to the larger picture
  • It is too easy to deal with them in isolation
    • REST APIs are an all-too common example
  • Looking at data flow, push vs. pull, sync/async, business processes, operational profiles, state management, etc. are much more fruitful approaches

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.