Paper:
Dapper, a Large-Scale Distributed Systems Tracing Infrastructure: https://ai.google/research/pubs/pub36356
Reference:
The OpenTracing Semantic Specification:
https://opentracing.io/specification/
Towards Turnkey Distributed Tracing
https://www.jaegertracing.io/
Dapper uses annotation to tag records with global identifier.
Dapper, a Large-Scale Distributed Systems Tracing Infrastructure: https://ai.google/research/pubs/pub36356
Reference:
The OpenTracing Semantic Specification:
https://opentracing.io/specification/
Towards Turnkey Distributed Tracing
https://www.jaegertracing.io/
Dapper uses annotation to tag records with global identifier.
- span
Tree node
edge indicates relationship between current span and it's parent span.
Contains timestamp: start time, end time.
Contains span name: human readable.
Contains span id Contains span parent id
Root span: has no parent span.
Share same trace ID iff associated with a specific trace.
Beware time skew since span flows from client to server on different host. - tree
span forms the tree - annotation
Implement:
- when use with thread, attaches a trace context to thread-local storage.(slow in dynamic library)
- when use with async calls, put the trace context into the async function call's argument.
- embedded trace context into each RPC/IPC call.
- Use customize annotation to trace data. Although, tracing is not logging, there's upper bound of calls to the trace API.
- Annotation is not an indication to the behavior of the tracing.
- Supports key/value(global) data structure to assist tracing calls.
- Use sampling to restrict number of calls to the tracing core.
- Provides out-of-bands trace information due to there are times when the callee is returing the result but the callee's callees haven't returning the data yet!
Tracer can be used as security check as well, which checks the code is actually hitting the code that is supposed to be hit.
- sampling statistic skew
- aggressive sampling
Does not hinder high-throughput services. - adaptive sampling
- trace generation overhead
- trace collection overhead
- effect on production workloads
- addressing long tail latency
- beware of coalescing effects (read the paper)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.