Nov 2, 2018

[cppcon 2018] OOP Is Dead, Long Live Data-oriented Design - Stoyan Nikolov


Data-Oiented Design

OOP marries data with operations

  • Heterogeneous data is brought together by a 'logical' black box object.
  • The object is used in vastly different contexts
  • Hides 'state' all over the place
  • Impact on
    • Performance
    • Scalability
    • Modifiability
    • Testability
  • Why? Cache miss~

Data-oriented design

  • Like Golang, data first
  • Separates data from logic
  • Structs and functions live independent lives
  • Data is regarded as information that has to be transformed
  • The logic embraces the data
  • Does not try to hide the logic
  • Leads to functions that work on arrays
  • Reorganizes data according to it's usage

If we aren't going to use a piece of information, why packs it together?

Examples from Chromium code base :-)

--
class CORE_EXPORT Animation final: public ~
--


So, for OOP in Chromium:
  • Uses more than 6 non-trivial classes
  • Objects contain smart pointers to other objects
  • Interpolation uses abstract classes to handle different property types
  • CSS Animations directly 'reach out' to other systems - coupling
  • Calling events
  • Setting values in DOM element
  • What's the lifetime of elements being synchronized?



DOD:
  • Data operations
    • Tick -> 99.9%
    • Add
    • Remove
    • Pause
    • ...
  • Tick Input
    • Definition
    • Time
  • Tick Output
    • Changed properties
    • New property values
    • Who owns the new values
  • Design for 'many animations',
    i.e many objects


Define a type:
struct AnimationController{
    AnimationState* as_ [];
};

// Golang style.
// No shared_ptr, every instance of this type
// has it's own value. 
// Thread safe.
struct AnimationState{
    AnimationID Id;
    time StartTime;
    time PauseTime;
    ...
};

// Avoid type erasure, use template
template<typename T>
struct AnimationStateProperty : public AnimationState {
    AnimatedDefiniationFrames<T> Keyframes;
};


// We can't use vector<baseType>
// But since we know every property types,
// create vector for each type
CSSVector<AnimationStateProperty<ZIndex>> m_ZIndexActiveAnimState;

// Iterates them for every CSSVector types

With above design, keep in mind,
std::vector
is the best container to avoid cache misses!
(continuous memory, sequential container)



Avoid branches:
  • Keep lists per-boolean 'flag'
  • Separate Active and Inactive animations
    i.e Base on the states we have, put object into a list of the same state.
  • avoid using 'if branch' test.
  • Avoid 'if (isActive)'
  • If there are too many states, try to cut down the size of states, or put the state that changes most into 'list' style.



Add API to the caller:
  • We don't have OOP style object, thus
    no member functions!
    i.e Animation.Play()
  • Use free function taking ID!
    i.e
    void PlayAnimation(AnimationID aid);


Key points:
  • Keep data flat (Golang style)
    • Maximise cache usage
    • No RTTI
    • Amortized dynamic allocations
    • Some read-only duplication improves performance and readability
  • Existence-based predication
    • Reduce branching
    • Apply the same operation on a whole table
  • Id-Based handles
    • No pointers
    • Allow rearranging internal memory
  • Table-based output
    • No external dependencies
    • Easy to reason about the flow


Scalability:
  • OOP multi-threading
    • Complicated
  • DoD multi-threading
    • Group state into list
    • Each task/job/thread keeps a private table of modified data
    • Join merges the tables (thread.join)
    • Classic fork-join


Testability:
  • OOP case
    • Hard to mock(lots of types)
    • Hidden states
    • Asserting correct state is difficult - multiple output points(VERY BAD DESIGN)
  • DOD case
    • Contract style design
    • Easier to mock(less types)
    • Asserting correct state is easy

    
Modifiability:
  • OOP
    • Hard to modify base types
    • But, easy to do 'quick' changes, because we have if branches
  • DOD
    • FP style. Building blocks
    • A bit harder to to quick changes, but with FP, we have monoid.

    
Downsides of DOD:
  • Correct data separation can be hard
    • Know the problem well
  • Existence-based predication is not always feasible(or easy)
  • 'Quick' modifications can be tough


What to keep from OOP:
  • Simple struct with simple methods are fine
  • Keep polymorphism & interface under control
  • Use template
  • Use 'impl'


Extra reference:

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.