Showing posts with label golang_code_design. Show all posts
Showing posts with label golang_code_design. Show all posts

Mar 13, 2021

[Go] Design

Reference:
https://dave.cheney.net/2016/08/20/solid-go-design


Bad code

Old fashion articles

  • Rigid. Is the code rigid? Does it have a straight jacket of overbearing types and parameters, that making modification difficult?
  • Fragile. Is the code fragile? Does the slightest change ripple through the code base causing untold havoc?
  • Immobile. Is the code hard to refactor? Is it one keystroke away from an import loop?
  • Complex. Is there code for the sake of having code, are things over-engineered?
  • Verbose. Is it just exhausting to use the code? When you look at it, can you even tell what this code is trying to do?


SOLID

  • Single Responsibility Principle
    • A class should have one, and only one, reason to change.
    • Coupling & Cohesion
    • Package names, don't use name like 'common', 'util' which are too broad.
    • Unix philosophy; small, sharp tools which combine to solve larger tasks.
  • Open / Closed Principle
    Software entities should be open for extension, but closed for modification.
  • Liskov Substitution Principle
    • Two types are substitutable if they exhibit behaviour 
    • such that the caller is unable to tell the difference.
      Require no more, promise no less.
  • Interface Segregation Principle
    • Clients should not be forced to depend on methods they do not use.
    • A great rule of thumb for Go is accept interfaces, return structs.
  • Dependency Inversion Principle
    • High-level modules should not depend on low-level modules. Both should depend on abstractions.
    • Abstractions should not depend on details. Details should depend on abstractions.
    • In Go, import graph must be acyclic.
    • All things being equal the import graph of a well designed Go program should be a wide, and relatively flat, rather than tall and narrow.
    • If you have a package whose functions cannot operate without enlisting the aid of another package,that is perhaps a sign that code is not well factored along package boundaries.


Focus        

  • Go programmers need to start talking less about frameworks, and start talking more about design. 
  • We need to stop focusing on performance at all cost, and focus instead on reuse at all costs.
  • What I want to hear is people talking about how to design Go programs in a way that is well engineered, decoupled, reusable, and above all responsive to change.

Jul 4, 2020

[Pacific++ 2018][re-read] "Designing for Efficient Cache Usage" - Scott McMillan

Reference:
Latency Numbers Every Programmer Should Know:
https://colin-scott.github.io/personal_website/research/interactive_latency.html



Cache Lines



 

Hardware Prefetch

  • Predictable access patterns are faster
  • We want sequential locality

 

Access Locality

  • Cache locality
    • Spatial
    • Temporal
  • Beware the algorithm/data structure to use honors cache locality.
  • Look beyond just big-O notation as constant-time costs can differ significantly.
  • Large benefit in hitting faster cache levels.
  • In C++, allocators matters. In Go(lang), the standard implement dealt with this problem.
  • In Go(lang), use flat map instead of std map.

 
 

Multipe CPU Core Considerations



 
 

Write Combined Memory

  • Accumulate writes to flush as 64Bytes  operations
  • Partial buffer flush causes: (avoid this)
    • Not writing all bytes convered by a buffer
    • Writing too many streams at once
    • Atomic read-modify-write operations
  • Write Combined memory read causes:
    • C++ bit fields
    • Optimization
    • Virtually always an accident (read the source code of implement)
    • Solution: Expose write-only interface
  • Non-temporal writes on x86: 
    Optimizing Cache Usage With Nontemporal Accesses
    • Use compiler intrinsics:
      • SSE2
        • _mm_stream_si32: store 4 bytes
        • _mm_Stream_si128: store 16 bytes
      • AVX
        • _mm256_stream_si256: store 32 bytes
      • AVX-512
        • _mm512_stream_si512: store 64 bytes



Address Translation

  • Platform-specific
  • Not directly pageable
  • Difficult/slow to allocate
  • Linux: Use hugepage

Jun 20, 2020

[Go][design] high through-put low contention in memory cache

Project https://github.com/dgraph-io/ristretto as example


Fast Access:

batching:

  • use ring-buffer for each lock and update the global cache.

generate a fast hash:


High Concurrency and Contention Resistance:

Get (tinyLFU):

  • use sync.Pool to implement striped, lossy ring-buffers that have great performance with little loss of data to update frequency. 
  • Use with channel for async update.

Set (tinyLFU + Count–min sketch): 

  • use a channel to capture the Sets, dropping them on the floor if the channel is full to avoid contention.    

Memory Bounding:

  • An infinitely large cache is practically impossible.
  • A cache must be bounded in size. Many cache libraries would consider cache size to be the number of elements, this is naive and only works in a workload where values are of identical size. Most workloads, however, have variable-sized values. One value could cost a few bytes, another a few kilobytes and yet another, a few megabytes.Treating them as having the same memory cost isn't realistic.

Idea:

  • Attach a cost to every key-value.
  • Users can specify what that cost is when calling Set. 
  • When the cache is operating at capacity, a heavy item could displace many lightweight items.

Admission Policy:

  • What should we let into the cache when the capacity is hit?
    The goal, obviously, is to let in new items if they are more “valuable” than the current items. (there is overhead)
  • Using LFU, not LRU:
    Paper: TinyLFU: A Highly Efficient Cache Admission Policy
    • The main idea is to only let in a new item if its estimate is higher than that of the item being evicted
    • Use count-min sketch.
    • After N key increments, the counters get halved.
    • Use TTL. A key that has not been seen for a while would have its counter get reset to zero.
  • To find a key with low ɛ, use randomness sampling, which provided by iterating through Go's map key.
  • If the incoming key's ɛ is higher than ɛ from randomness sampling, reject the incoming key, otherwise, incoming key is stored.
  • Note that the cost of the incoming key does not play a factor in choosing the eviction keys.

DoorKeeper:

  • When capacity hits, use a bloom filter to first check if the key has been seen before. 
  • Only if the key is already present in the bloom filter, is it inserted into the TinyLFU.

Metrics:

  • When using atomic counters for metrics, be ware of false sharing between caches among CPU cores.
  • While cache line size, on my thinkpad is 64-bytes, use atomic.AddUint64 as atomic counter.



Reference:

Design Of A Modern Cache (Benjamin Manes)
http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html

BP-Wrapper: A System Framework Making Any Replacement Algorithms (Almost) Lock Contention Free
https://dgraph.io/blog/refs/bp_wrapper.pdf

Adaptive Software Cache Management
https://dgraph.io/blog/refs/Adaptive%20Software%20Cache%20Management.pdf

TinyLFU: A Highly Efficient Cache Admission Policy
https://dgraph.io/blog/refs/TinyLFU%20-%20A%20Highly%20Efficient%20Cache%20Admission%20Policy.pdf

The State of Caching in Go
https://dgraph.io/blog/post/caching-in-go/

Count–min sketch
https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch

Sep 28, 2019

[Go] Naming

Reference:
https://talks.golang.org/2014/names.slide



Name it in:
  • MixedCase.
  • Consistent (easy to guess),
  • Short (easy to type),
  • Accurate (easy to understand).



The greater the distance between a name's declaration and its uses,
the longer the name should be.
i.e using single char naming for 'for' loop is just fine.



Prefer i to index.
Prefer r to reader.
Prefer b to buffer.



Function parameters are like local variables,
but they also serve as documentation.
e.g Where the types are descriptive, they should be short:
func AfterFunc(d Duration, f func()) *Timer
func Escape(w io.Writer, s []byte)


Where the types are more ambiguous, the names may provide documentation:
func Unix(sec, nsec int64) Time
func HasPrefix(s, prefix []byte) bool



Return values on exported functions should only be named for
documentation purposes.
These are good examples of named return values:
func Copy(dst Writer, src Reader) (written int64, err error)
func ScanBytes(data []byte, atEOF bool) (advance int, token []byte, err error)



Receivers are a special kind of argument.
By convention, they are one or two characters that reflect the receiver type,
because they typically appear on almost every line.
Receiver names should be consistent across a type's methods.
(Don't use r in one method and rdr in another.)



Exported names are qualified by their package names.
Remember this when naming exported
variables, constants, functions, and types.
That's why we have bytes.Buffer and strings.Reader,
not bytes.ByteBuffer and strings.StringReader.




Interfaces that specify just one method are usually just that function
name with 'er' appended to it.
Sometimes the result isn't correct English, but we do it anyway.
When an interface includes multiple methods,
choose a name that accurately describes its purpose
(examples: net.Conn, http.ResponseWriter, io.ReadWriter).




Error types should be of the form FooError:
type ExitError struct {
    ...
}

Error values should be of the form ErrFoo:
var ErrFormat = errors.New("image: unknown format")




Choose package names that lend meaning to the names they export.
Steer clear of util, common, and the like.



The last component of a package path should be the same as the package name.
Avoid stutter in repository and package paths.
Avoid upper case letters (not all file systems are case sensitive).
For libraries, it often works to put the package code in the repo root:
"github.com/golang/oauth2" // package oauth2

[Go] project directory layout and etc.

Reference:
https://github.com/golang-standards/project-layout
https://blog.golang.org/v2-go-modules


Publishing v2 and beyond:

e.g v2 directory
github.com/googleapis/gax-go/v2



Go for Industrial Programming:
https://peter.bourgon.org/go-for-industrial-programming/
Optimizing for mean time to recovery (MTTR) instead of mean time between failure (MTBF).



Mitchell Hashimoto - Advanced Testing with Go:
https://www.youtube.com/watch?v=8hQG7QlcLBk



How you name your variable:
https://talks.golang.org/2014/names.slide
https://golang.org/doc/effective_go.html#names
https://blog.golang.org/package-names
https://rakyll.org/style-packages/ (No plurals on package naming)



Code review comments:
https://github.com/golang/go/wiki/CodeReviewComments

Feb 17, 2019

[structure concurrency][Go][design] Graceful Shutdown

Recently I'm working on Actor design utilizing with Golang's channel.

Link: https://github.com/vsdmars/actor

While bumped into Martin Sústrik's blog post:
Graceful Shutdown: http://250bpm.com/blog:146
Along with the discussion: https://trio.discourse.group/t/graceful-shutdown/93
Structured concurrency resources: https://trio.discourse.group/t/structured-concurrency-resources/21

It's a nice summary, touched the well known concept of pthread cancellation point as well as Golang's context.Done()/Cancel() pattern.

Notes:
In-band cancel:
through application level channel

out-of-band cancel:
through language run-time

Sending a graceful shutdown request cannot possibly be a feature of the language. It must be done manually by the application.


POSIX C thread:
Reference:
https://stackoverflow.com/a/27374983
http://man7.org/linux/man-pages/man7/pthreads.7.html (Cancellation points)

The rules of hard cancellation are strict:
Once a coroutine has been hard-canceled the very next blocking call will immediately return ECANCELED.
In other words, every single blocking call is a cancellation point.

As an additional note, if looking at the POSIX requirement for cancel points, virtually all blocking interfaces are required to be cancel points.
Otherwise, on any completely blocked thread (in such call), there would be no safe way to terminate that thread.

Graceful shutdown can terminate the coroutine only when it is idling and waiting for a new request.

Reasoning:
If we allowed graceful shutdown to terminate the coroutine while it is trying to send the reply, it would mean that a request could go unanswered.
And that doesn't deserve to be called "graceful shutdown".
(e.g. Golang, context.Done() pattern)


Definition summarize:
Hard cancellation:
- Is triggered via an invisible communication channel created by the language runtime.
- It manifests itself inside the target coroutine as an error code (ECANCELED in libdill) or an exception (Cancelled in Trio).
- The error (or the exception) can be returned from any blocking call.
- In response to it, the coroutine is not expected to do any application-specific work. It should just exit.

Graceful shutdown:
- Is triggered via an application-specific channel.
- Manifests itself inside the target coroutine as a plain old message.
- The message may only be received at specific, application-defined points in the coroutine.
- In response to it, the coroutine can do arbitrary amount of application-specific work.

Hard cancellation is fully managed by the language.
Graceful shutdown is fully managed by the application.

Golang library reference:
go-resiliency/deadline: https://github.com/eapache/go-resiliency/tree/master/deadline

Nov 2, 2018

[cppcon 2018] OOP Is Dead, Long Live Data-oriented Design - Stoyan Nikolov


Data-Oiented Design

OOP marries data with operations

  • Heterogeneous data is brought together by a 'logical' black box object.
  • The object is used in vastly different contexts
  • Hides 'state' all over the place
  • Impact on
    • Performance
    • Scalability
    • Modifiability
    • Testability
  • Why? Cache miss~

Data-oriented design

  • Like Golang, data first
  • Separates data from logic
  • Structs and functions live independent lives
  • Data is regarded as information that has to be transformed
  • The logic embraces the data
  • Does not try to hide the logic
  • Leads to functions that work on arrays
  • Reorganizes data according to it's usage

If we aren't going to use a piece of information, why packs it together?

Examples from Chromium code base :-)

--
class CORE_EXPORT Animation final: public ~
--


So, for OOP in Chromium:
  • Uses more than 6 non-trivial classes
  • Objects contain smart pointers to other objects
  • Interpolation uses abstract classes to handle different property types
  • CSS Animations directly 'reach out' to other systems - coupling
  • Calling events
  • Setting values in DOM element
  • What's the lifetime of elements being synchronized?



DOD:
  • Data operations
    • Tick -> 99.9%
    • Add
    • Remove
    • Pause
    • ...
  • Tick Input
    • Definition
    • Time
  • Tick Output
    • Changed properties
    • New property values
    • Who owns the new values
  • Design for 'many animations',
    i.e many objects


Define a type:
struct AnimationController{
    AnimationState* as_ [];
};

// Golang style.
// No shared_ptr, every instance of this type
// has it's own value. 
// Thread safe.
struct AnimationState{
    AnimationID Id;
    time StartTime;
    time PauseTime;
    ...
};

// Avoid type erasure, use template
template<typename T>
struct AnimationStateProperty : public AnimationState {
    AnimatedDefiniationFrames<T> Keyframes;
};


// We can't use vector<baseType>
// But since we know every property types,
// create vector for each type
CSSVector<AnimationStateProperty<ZIndex>> m_ZIndexActiveAnimState;

// Iterates them for every CSSVector types

With above design, keep in mind,
std::vector
is the best container to avoid cache misses!
(continuous memory, sequential container)



Avoid branches:
  • Keep lists per-boolean 'flag'
  • Separate Active and Inactive animations
    i.e Base on the states we have, put object into a list of the same state.
  • avoid using 'if branch' test.
  • Avoid 'if (isActive)'
  • If there are too many states, try to cut down the size of states, or put the state that changes most into 'list' style.



Add API to the caller:
  • We don't have OOP style object, thus
    no member functions!
    i.e Animation.Play()
  • Use free function taking ID!
    i.e
    void PlayAnimation(AnimationID aid);


Key points:
  • Keep data flat (Golang style)
    • Maximise cache usage
    • No RTTI
    • Amortized dynamic allocations
    • Some read-only duplication improves performance and readability
  • Existence-based predication
    • Reduce branching
    • Apply the same operation on a whole table
  • Id-Based handles
    • No pointers
    • Allow rearranging internal memory
  • Table-based output
    • No external dependencies
    • Easy to reason about the flow


Scalability:
  • OOP multi-threading
    • Complicated
  • DoD multi-threading
    • Group state into list
    • Each task/job/thread keeps a private table of modified data
    • Join merges the tables (thread.join)
    • Classic fork-join


Testability:
  • OOP case
    • Hard to mock(lots of types)
    • Hidden states
    • Asserting correct state is difficult - multiple output points(VERY BAD DESIGN)
  • DOD case
    • Contract style design
    • Easier to mock(less types)
    • Asserting correct state is easy

    
Modifiability:
  • OOP
    • Hard to modify base types
    • But, easy to do 'quick' changes, because we have if branches
  • DOD
    • FP style. Building blocks
    • A bit harder to to quick changes, but with FP, we have monoid.

    
Downsides of DOD:
  • Correct data separation can be hard
    • Know the problem well
  • Existence-based predication is not always feasible(or easy)
  • 'Quick' modifications can be tough


What to keep from OOP:
  • Simple struct with simple methods are fine
  • Keep polymorphism & interface under control
  • Use template
  • Use 'impl'


Extra reference:

Aug 12, 2018

[Go] On the uses and misuses of panics in Go - Eli Bendersky

https://eli.thegreenplace.net/2018/on-the-uses-and-misuses-of-panics-in-go/

Start my reading of Eli Bendersky's blog since my days in VMWare, while I was once a rookie in C++.

tl;dl;
Short take away:
Sometimes, it's better to be pragmatic than a zealot.

I found that in golang world people are a bit stubborn and insist to stick on to some idiom, which by sometimes doesn't make much sense (i.e try to mandate passing context to every callee function etc.)

C++ gives us freedom, thus we code by reasoning; however, in golang, emm... emm... emm...







Nov 19, 2017

[CPPCON 2014] Data-oriented Design - Mike Acton

video:
https://www.youtube.com/watch?v=rX0ItVEVjHc

  • No Exception
  • No Template
  • No IOSTREAM
  • No Multiple Inheritance
  • Seldom Operator overloading
  • No RTTI (turn off) i.e no virtual function
  • No STL
  • Custom allocators
  • Custom debugging tools (into the code)


The purpose of all programs, and all parts of those programs, is to transform data from one form to another.

If you don't understand the data you don't understand the problem.

Conversely, understand the problem by understanding the data.

Different problems require different solutions.

If you don't understand the cost of solving the problem, you don't understand the problem.

If you don't understand the hardware, you can't reason about the cost of solving the problem.

Everything is a data problem. Including usability, maintenance, debug-ability etc.
It's not code problem.

Avoid adding problem into the problem while solving the problem.

Latency and throughput are only the same in sequential systems.

Rule of thumb: Where there is one, there are many. Try looking on the time axis.

Code is NEVER more important then data.

Software is NOT platform. Hardware is.

There's no ideal abstract solution.

Help the compiler reasoning.

Don't put evaluation inside a loop. Put it outside.

Good programming is HARD.
Bad programming is EASY.

Design pattersn are spoonfeed material for brainless programmers incapable of independent thought, who will be resolved to producing code as mediocre as the design patterns they use to create it.