multithreading - Workload balancing between akka actors -
i have 2 akka actors used crawling links, i.e. find links in page x, find links in pages linked x, etc...
i want them progress more or less @ same pace, more not 1 of them becomes starved , 1 consumes resources.
i've tried following approaches (simplified). single page crawling done following actor:
class crawler extends actor { def receive = { case crawl(url, kind) => // download url // extract links sender ! parsed(url, links, kind) } } approach 1:
class coordinator extends actor { val linksa = ... val linksb = ... def receive = { case parsed(url, links, kind) => val store = if (kind == kinda) linksa else linksb val newlinks = links -- store store ++= links newlinks.foreach { link => val crawler = context.actorof(props[crawler]) crawler ! crawl(link, kind) } } } approach 2:
class coordinator extends actor { val linksa = ... val linksb = ... val rrprops = props[crawler].withrouter(roundrobinrouter(nrofinstances = 10) val crawlera = context.actorof(rrprops) val crawlerb = context.actorof(rrprops) def receive = { case parsed(url, links, kind) => val store = if (kind == kinda) linksa else linksb val newlinks = links -- store store ++= links newlinks.foreach { link => if (kind == kinda) crawlera ! crawl(link, kind) else crawlerb ! crawl(link, kind) } } } second approach made things better, didn't fix whole.
is there way make crawlers of both kinds progress more or less @ same pace? should send messages between them unblocking each other in turn?
i'm working on similar program workers have non-uniform resource cost (in case task performing database queries , dumping results in database, crawling different websites have different costs different queries have different costs). 2 ways of dealing i've employed:
- replace
roundrobinroutersmallestmailboxrouter - don't have
coordinatorsend out of messages @ once - instead send them out in batches, in case have ten workers sending out forty messages should keep them busy initially. whenever worker completes task sends messagecoordinator, @ pointcoordinatorsends out message go worker completed task. (you can in batches, i.e. after receivingn"task complete" messagescoordinatorsends outnmessages, don't makenhigh or else workers extremely short tasks may idle.)
a third option cheat , share concurrentlinkedqueue between actors: after filling queue coordinator sends "start" message workers, , workers poll queue til it's empty.
Comments
Post a Comment