Data-Oiented Design
OOP marries data with operations
- Heterogeneous data is brought together by a 'logical' black box object.
- The object is used in vastly different contexts
- Hides 'state' all over the place
- Impact on
- Performance
- Scalability
- Modifiability
- Testability
- Why? Cache miss~
Data-oriented design
- Like Golang, data first
- Separates data from logic
- Structs and functions live independent lives
- Data is regarded as information that has to be transformed
- The logic embraces the data
- Does not try to hide the logic
- Leads to functions that work on arrays
- Reorganizes data according to it's usage
If we aren't going to use a piece of information, why packs it together?
- Avoids 'hidden state'
- No virtual calls
- Promotes deep domain knowledge
- Reference:
http://vsdmars.blogspot.com/2017/11/cppcon-2014-data-oriented-design-mike.html
Examples from Chromium code base :-)
--
class CORE_EXPORT Animation final: public ~
--
So, for OOP in Chromium:
- Uses more than 6 non-trivial classes
- Objects contain smart pointers to other objects
- Interpolation uses abstract classes to handle different property types
- CSS Animations directly 'reach out' to other systems - coupling
- Calling events
- Setting values in DOM element
- What's the lifetime of elements being synchronized?
DOD:
- Data operations
- Tick -> 99.9%
- Add
- Remove
- Pause
- ...
- Tick Input
- Definition
- Time
- Tick Output
- Changed properties
- New property values
- Who owns the new values
- Design for 'many animations',
i.e many objects
Define a type:
struct AnimationController{ AnimationState* as_ []; }; // Golang style. // No shared_ptr, every instance of this type // has it's own value. // Thread safe. struct AnimationState{ AnimationID Id; time StartTime; time PauseTime; ... }; // Avoid type erasure, use template template<typename T> struct AnimationStateProperty : public AnimationState { AnimatedDefiniationFrames<T> Keyframes; }; // We can't use vector<baseType> // But since we know every property types, // create vector for each type CSSVector<AnimationStateProperty<ZIndex>> m_ZIndexActiveAnimState; // Iterates them for every CSSVector types
With above design, keep in mind,
std::vector
is the best container to avoid cache misses!
(continuous memory, sequential container)
Avoid branches:
- Keep lists per-boolean 'flag'
- Separate Active and Inactive animations
i.e Base on the states we have, put object into a list of the same state. - avoid using 'if branch' test.
- Avoid 'if (isActive)'
- If there are too many states, try to cut down the size of states, or put the state that changes most into 'list' style.
Add API to the caller:
- We don't have OOP style object, thus
no member functions!
i.e Animation.Play() - Use free function taking ID!
i.e
void PlayAnimation(AnimationID aid);
Key points:
- Keep data flat (Golang style)
- Maximise cache usage
- No RTTI
- Amortized dynamic allocations
- Some read-only duplication improves performance and readability
- Existence-based predication
- Reduce branching
- Apply the same operation on a whole table
- Id-Based handles
- No pointers
- Allow rearranging internal memory
- Table-based output
- No external dependencies
- Easy to reason about the flow
Scalability:
- OOP multi-threading
- Complicated
- DoD multi-threading
- Group state into list
- Each task/job/thread keeps a private table of modified data
- Join merges the tables (thread.join)
- Classic fork-join
Testability:
- OOP case
- Hard to mock(lots of types)
- Hidden states
- Asserting correct state is difficult - multiple output points(VERY BAD DESIGN)
- DOD case
- Contract style design
- Easier to mock(less types)
- Asserting correct state is easy
Modifiability:
- OOP
- Hard to modify base types
- But, easy to do 'quick' changes, because we have if branches
- DOD
- FP style. Building blocks
- A bit harder to to quick changes, but with FP, we have monoid.
Downsides of DOD:
- Correct data separation can be hard
- Know the problem well
- Existence-based predication is not always feasible(or easy)
- 'Quick' modifications can be tough
What to keep from OOP:
- Simple struct with simple methods are fine
- Keep polymorphism & interface under control
- Use template
- Use 'impl'
Extra reference:
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.