multithreading - Workload balancing between akka actors -


i have 2 akka actors used crawling links, i.e. find links in page x, find links in pages linked x, etc...

i want them progress more or less @ same pace, more not 1 of them becomes starved , 1 consumes resources.

i've tried following approaches (simplified). single page crawling done following actor:

class crawler extends actor {   def receive = {     case crawl(url, kind) =>       // download url       // extract links       sender ! parsed(url, links, kind)   } } 

approach 1:

class coordinator extends actor {   val linksa = ...   val linksb = ...   def receive = {     case parsed(url, links, kind) =>       val store = if (kind == kinda) linksa else linksb       val newlinks = links -- store       store ++= links       newlinks.foreach { link =>         val crawler = context.actorof(props[crawler])         crawler ! crawl(link, kind)       }   } } 

approach 2:

class coordinator extends actor {   val linksa = ...   val linksb = ...   val rrprops = props[crawler].withrouter(roundrobinrouter(nrofinstances = 10)   val crawlera = context.actorof(rrprops)   val crawlerb = context.actorof(rrprops)   def receive = {     case parsed(url, links, kind) =>       val store = if (kind == kinda) linksa else linksb       val newlinks = links -- store       store ++= links       newlinks.foreach { link =>         if (kind == kinda) crawlera ! crawl(link, kind)         else crawlerb ! crawl(link, kind)       }   } } 

second approach made things better, didn't fix whole.

is there way make crawlers of both kinds progress more or less @ same pace? should send messages between them unblocking each other in turn?

i'm working on similar program workers have non-uniform resource cost (in case task performing database queries , dumping results in database, crawling different websites have different costs different queries have different costs). 2 ways of dealing i've employed:

  1. replace roundrobinrouter smallestmailboxrouter
  2. don't have coordinator send out of messages @ once - instead send them out in batches, in case have ten workers sending out forty messages should keep them busy initially. whenever worker completes task sends message coordinator, @ point coordinator sends out message go worker completed task. (you can in batches, i.e. after receiving n "task complete" messages coordinator sends out n messages, don't make n high or else workers extremely short tasks may idle.)

a third option cheat , share concurrentlinkedqueue between actors: after filling queue coordinator sends "start" message workers, , workers poll queue til it's empty.


Comments

Popular posts from this blog

python - TypeError: start must be a integer -

c# - DevExpress RepositoryItemComboBox BackColor property ignored -

django - Creating multiple model instances in DRF3 -