Ataraxia through Epoché: [cppcon 2018] OOP Is Dead, Long Live Data-oriented Design

Data-Oiented Design

OOP marries data with operations

Heterogeneous data is brought together by a 'logical' black box object.
The object is used in vastly different contexts
Hides 'state' all over the place
Impact on

Performance
Scalability
Modifiability
Testability

Why? Cache miss~

Data-oriented design

Like Golang, data first
Separates data from logic
Structs and functions live independent lives
Data is regarded as information that has to be transformed
The logic embraces the data
Does not try to hide the logic
Leads to functions that work on arrays
Reorganizes data according to it's usage

If we aren't going to use a piece of information, why packs it together?

Avoids 'hidden state'
No virtual calls
Promotes deep domain knowledge
Reference:
http://vsdmars.blogspot.com/2017/11/cppcon-2014-data-oriented-design-mike.html

Examples from Chromium code base :-)

--
class CORE_EXPORT Animation final: public ~
--

So, for OOP in Chromium:

Uses more than 6 non-trivial classes
Objects contain smart pointers to other objects
Interpolation uses abstract classes to handle different property types
CSS Animations directly 'reach out' to other systems - coupling
Calling events
Setting values in DOM element
What's the lifetime of elements being synchronized?

DOD:

Data operations

Tick -> 99.9%
Add
Remove
Pause
...

Tick Input

Definition
Time

Tick Output

Changed properties
New property values
Who owns the new values

Design for 'many animations',
i.e many objects

Define a type:

struct AnimationController{
    AnimationState* as_ [];
};

// Golang style.
// No shared_ptr, every instance of this type
// has it's own value. 
// Thread safe.
struct AnimationState{
    AnimationID Id;
    time StartTime;
    time PauseTime;
    ...
};

// Avoid type erasure, use template
template<typename T>
struct AnimationStateProperty : public AnimationState {
    AnimatedDefiniationFrames<T> Keyframes;
};


// We can't use vector<baseType>
// But since we know every property types,
// create vector for each type
CSSVector<AnimationStateProperty<ZIndex>> m_ZIndexActiveAnimState;

// Iterates them for every CSSVector types

With above design, keep in mind,
std::vector
is the best container to avoid cache misses!
(continuous memory, sequential container)

Avoid branches:

Keep lists per-boolean 'flag'
Separate Active and Inactive animations
i.e Base on the states we have, put object into a list of the same state.
avoid using 'if branch' test.
Avoid 'if (isActive)'
If there are too many states, try to cut down the size of states, or put the state that changes most into 'list' style.

Add API to the caller:

We don't have OOP style object, thus
no member functions!
i.e Animation.Play()
Use free function taking ID!
i.e
void PlayAnimation(AnimationID aid);

Key points:

Keep data flat (Golang style)

Maximise cache usage
No RTTI
Amortized dynamic allocations
Some read-only duplication improves performance and readability

Existence-based predication

Reduce branching
Apply the same operation on a whole table

Id-Based handles

No pointers
Allow rearranging internal memory

Table-based output

No external dependencies
Easy to reason about the flow

Scalability:

OOP multi-threading

Complicated

DoD multi-threading

Group state into list
Each task/job/thread keeps a private table of modified data
Join merges the tables (thread.join)
Classic fork-join

Testability:

OOP case

Hard to mock(lots of types)
Hidden states
Asserting correct state is difficult - multiple output points(VERY BAD DESIGN)

DOD case

Contract style design
Easier to mock(less types)
Asserting correct state is easy

Modifiability:

Hard to modify base types
But, easy to do 'quick' changes, because we have if branches

FP style. Building blocks
A bit harder to to quick changes, but with FP, we have monoid.

Downsides of DOD:

Correct data separation can be hard

Know the problem well

Existence-based predication is not always feasible(or easy)
'Quick' modifications can be tough

What to keep from OOP:

Simple struct with simple methods are fine
Keep polymorphism & interface under control
Use template
Use 'impl'

Extra reference:

Open addressing: http://www.mathcs.emory.edu/~cheung/Courses/323/Syllabus/Map/open-addr.html

Ataraxia through Epoché

Nov 2, 2018

[cppcon 2018] OOP Is Dead, Long Live Data-oriented Design - Stoyan Nikolov

Data-Oiented Design

OOP marries data with operations

Data-oriented design

No comments:

Post a Comment