Sep 20, 2016

[paper][distributed system] Sparrow: distrubuted, low latency scheduling

Sparrow: distrubuted, low latency scheduling

Traits:
  • Short task duration
  • Large degree of parallelism
  • Low latency


Require:
  • No centralized state
  • Operate autonomously


Legacy design:
  • Power of two choices load balancing technique
  • Choose 2 random servers
  • Put task to less queued server


Batch sampling:
Places the m tasks in a _job_ on the least loaded of d*m randomly selected worker machines(for d>1)

Late binding:
Only assign tasks to worker while worker is _ready_

Goal:
  • Low latency
  • Scheduler failure
  • Each server has a daemon worker receiving commands


Time out:
  • Wait time
  • Service time
  • Job response time
  • Delay time


Problems:
  • Some server has longer task queue;however, each task might take shorter time.
  • Race condition


Design:
  • Place a placeholder on d*m workers.
  • While a worker processing the placeholder, callback to scheduler
    ask for task.
  • Scheduler will dispatch m tasks to the first m workers
  • The rest of (d-1)*m worker will receive a no-op from scheduler while
    callback to scheduler asking for task
  • Scheduler can do a proactive cancellation to the (d-1)*m workers.
  • Scheduler <-> scheduler maintains no communication.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.