multithreading - Workload balancing between akka actors -
i have 2 akka actors used crawling links, i.e. find links in page x, find links in pages linked x, etc...
i want them progress more or less @ same pace, more not 1 of them becomes starved , 1 consumes resources.
i've tried following approaches (simplified). single page crawling done following actor:
class crawler extends actor { def receive = { case crawl(url, kind) => // download url // extract links sender ! parsed(url, links, kind) } }
approach 1:
class coordinator extends actor { val linksa = ... val linksb = ... def receive = { case parsed(url, links, kind) => val store = if (kind == kinda) linksa else linksb val newlinks = links -- store store ++= links newlinks.foreach { link => val crawler = context.actorof(props[crawler]) crawler ! crawl(link, kind) } } }
approach 2:
class coordinator extends actor { val linksa = ... val linksb = ... val rrprops = props[crawler].withrouter(roundrobinrouter(nrofinstances = 10) val crawlera = context.actorof(rrprops) val crawlerb = context.actorof(rrprops) def receive = { case parsed(url, links, kind) => val store = if (kind == kinda) linksa else linksb val newlinks = links -- store store ++= links newlinks.foreach { link => if (kind == kinda) crawlera ! crawl(link, kind) else crawlerb ! crawl(link, kind) } } }
second approach made things better, didn't fix whole.
is there way make crawlers of both kinds progress more or less @ same pace? should send messages between them unblocking each other in turn?
i'm working on similar program workers have non-uniform resource cost (in case task performing database queries , dumping results in database, crawling different websites have different costs different queries have different costs). 2 ways of dealing i've employed:
- replace
roundrobinrouter
smallestmailboxrouter
- don't have
coordinator
send out of messages @ once - instead send them out in batches, in case have ten workers sending out forty messages should keep them busy initially. whenever worker completes task sends messagecoordinator
, @ pointcoordinator
sends out message go worker completed task. (you can in batches, i.e. after receivingn
"task complete" messagescoordinator
sends outn
messages, don't maken
high or else workers extremely short tasks may idle.)
a third option cheat , share concurrentlinkedqueue
between actors: after filling queue coordinator
sends "start" message workers, , workers poll queue til it's empty.
Comments
Post a Comment