Sidekick pattern
Ambassadors
- Application-Layer Replicated Services
- Introducing a Caching Layer
- A cache exists between your stateless application and the end-user request.
- Deploy using the sidecar pattern
Using https://varnish-cache.org/ for HTTP cacheing
Sharded Services
- Sharded Caching
Many sharding functions use consistent hashing functions.
Consistent hashing functions are special hash functions that are guaranteed to only remap # keys / # shards, when being resized to # shards.
For example, if we use a consistent hashing function for our sharded cache, moving from 10 to 11 shards will only result in remapping < 10% (K / 11) keys.
This is dramatically better than losing the entire sharded service.
The performance of your cache is defined in terms of its hit rate.
The hit rate is the percentage of the time that your cache contains the data for a user request.
Ultimately, the hit rate determines the overall capacity of your distributed system and affects the overall capacity and performance of your system.
Sharding Functions
Shard = ShardingFunction(Req)
or, for programming languages,
Shard = hash(Req) % 10
Hashing functions are functions that transform an arbitrary object into an integer hash.
The hash function has two important characteristics for our sharding:
- Determinism
- The output should always be the same for a unique input.
- Uniformity
- The distribution of outputs across the output space should be equal.
Selecting a Key
A better sharding function would be shard(request.path).
When we use request.path as the shard key, then we map both requests to the same shard, and thus the response to one request can be served out of the cache to service the other.
Hot Sharding Systems
Ideally the load on a sharded cache will be perfectly even, but in many cases this isn't true and “hot shards” appear because organic load patterns drive more traffic to one particular shard.
As an example of this, consider a sharded cache for a user's photos; when a particular photo goes viral and suddenly receives a disproportionate amount of traffic, the cache shard containing that photo will become “hot.”
When this happens, with a replicated, sharded cache, you can scale the cache shard to respond to the increased load.
Indeed, if you set up auto scaling for each cache shard, you can dynamically grow and shrink each replicated shard as the organic traffic to your service shifts around.
An illustration of this process is shown in Figure 6-3. Initially the sharded service receives equal traffic to all three shards.
Then the traffic shifts so that Shard A is
receiving four times as much traffic as Shard B and Shard C.
The hot sharding system moves Shard B to the same machine as Shard C, and replicates Shard A to a second machine.
Traffic is now, once again, equally shared between replicas.
Ideally the load on a sharded cache will be perfectly even, but in many cases this isn't true and “hot shards” appear because organic load patterns drive more traffic to one particular shard.
As an example of this, consider a sharded cache for a user's photos; when a particular photo goes viral and suddenly receives a disproportionate amount of traffic, the cache shard containing that photo will become “hot.”
When this happens, with a replicated, sharded cache, you can scale the cache shard to respond to the increased load.
Indeed, if you set up auto scaling for each cache shard, you can dynamically grow and shrink each replicated shard as the organic traffic to your service shifts around.
An illustration of this process is shown in Figure 6-3. Initially the sharded service receives equal traffic to all three shards.
Then the traffic shifts so that Shard A is
receiving four times as much traffic as Shard B and Shard C.
The hot sharding system moves Shard B to the same machine as Shard C, and replicates Shard A to a second machine.
Traffic is now, once again, equally shared between replicas.
Scatter/Gather
Uses replication for scalability in terms of time.
Like replicated and sharded systems, the scatter/gather pattern is a tree pattern with a root that distributes requests and leaves that process those requests.
However, in contrast to replicated and sharded systems, with scatter/gather requests are simultaneously farmed out to all of the replicas in the system.
Each replica does a small amount of processing and then returns a fraction of the result to the root.
The root server then combines the various partial results together to form a single complete response to the request and then sends this request back out to the client.
example:
scatter/gather systems lead us to some conclusions:
Like replicated and sharded systems, the scatter/gather pattern is a tree pattern with a root that distributes requests and leaves that process those requests.
However, in contrast to replicated and sharded systems, with scatter/gather requests are simultaneously farmed out to all of the replicas in the system.
Each replica does a small amount of processing and then returns a fraction of the result to the root.
The root server then combines the various partial results together to form a single complete response to the request and then sends this request back out to the client.
example:
scatter/gather systems lead us to some conclusions:
- Increased parallelism doesn't always speed things up because of overhead on each node.
- Increased parallelism doesn't always speed things up because of the straggler problem. (one process is slow)
- The performance of the 99th percentile is more important than in other systems because each user request actually becomes numerous requests to the service.
The same straggler problem applies to availability.
If you issue a request to 100 leaf nodes, and the probability that any leaf node failing is 1 in 100, you are again practically guaranteed to fail every single user request.
Below:
Built this way, each leaf request from the root is actually load balanced across all healthy replicas of the shard.
This means that if there are any failures, they won't result in a user visible outage for your system.
Likewise, you can safely perform an upgrade under load, since each replicated shard can be upgraded one replica at a time.
Indeed, you can perform the upgrade across multiple shards simultaneously, depending on how quickly you want to perform the upgrade.
Functions and Event-Driven Processing
Function-as-a-service (FaaS)
When FaaS Makes Sense:
When FaaS Makes Sense:
- Functions are stateless and thus any system you build on top of functions is inherently more modular and decoupled than a similar system built into a single binary.
- Each function is entirely independent.
- The only communication is across the network,
- And each function instance cannot have local memory, requiring all states to be stored in a storage service.
- Additionally, the request-based and serverless nature of functions means that certain problems are quite difficult to detect.
- FaaS is inherently an event-based application model. Functions are executed in response to discrete events that occur and trigger the execution of the functions.
The Decorator Pattern: Request or Response Transformation
Kubeless is deployed on top of the Kubernetes
container orchestration service. Assuming that you have provisioned a Kubernetes cluster, you can install Kubeless from its releases page.
Once you have the kubeless binary installed, you can
install it into your cluster with the following cmd:
kubeless install
To can see deployed functions cmd:
kubectl get functions.
Handling Events
Ownership Election
Work Queue Systems
In the containerized work queue, there are two interfaces:
- the source container interface(like event sourcing), which provides a stream(也就是log, unbounded, immutable logs) of work items that need processing,
- and the worker container interface, which knows how to actually process a work item.
URLs:
- GET http://localhost/api/v1/items
- GET http://localhost/api/v1/items/<item-name>
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.