[Swift-devel] Re: Falkon worker config params?

Wed Sep 5 17:22:34 CDT 2007

Thanks, Ioan.  A few follow us, to clarify:

Ioan Raicu wrote:
> Hi,
> 
> Michael Wilde wrote:
>> Ioan, can you send/resend/point-me-to definitions of the critical 
>> parameters to control the startup of falkon workers, and review the 
>> attached file for anything I'm doing stupid here?
>>
>> Can you review/improve/fil in my comments with more (correct) details?
>>
>> Thanks,
>>
>> Mike
>>
>>
>>
>> #Provisioner config file
>> #KEY=VALUE
>> #if multiple lines have the same key, the previous value will be
>> overwritten with the new valu
>> e
>> #all paths are relative
>>
>> #resources numbers
>> MinNumExecutors=0      # min # of exec threads to keep extant
>> MaxNumExecutors=250    # max # of exec threads to allow extant
>> ExecutorsPerHost=2     # # of exec threads to run on each host
>>
>> #resources times
>> MinResourceAllocationTime_min=60   # ??? re-assess allocations
>> MaxResourceAllocationTime_min=60   # every this-many seconds? ???
>>                                    # if so, why uper and lower settings?
> This is the time that GRAM4 sets the max wall clock time to.  There is a 
> lower and upper bound, but in reality I am not doing anything with both 
> bounds, only one. In the future, the provisioner could make smarter 
> allocation requests in terms of time based on the workload 
> characteristics (not just number of jobs), and hence the existence of a 
> lower and upper bound.

Since you didnt say which one is in use, I'll assume its best to set 
both to the same value, as is done above. I'm assuming the _min means 
these times are in minutes.

>>
>> #resources types
>> HostType=any
>> #HostType=ia32_compute
>> #HostType=ia64_compute
>>
>> #allocation strategies            # please explain these

This is a little muddy.  Is the way it works, that these allocation 
strategies are used when the service looks in the queue (every few 
seconds?), sees that there is work to do, and allocates that many 
workers via new jobs, using the AllocationStrategy?

Ie, it wakes up, sees that it has 50 jobs to run, sees eg a max worker 
of 40, says "i need 40 workers at this point in time" and then starts 
that many using the designated strategy?

> If you want 20 workers with 2 per node, then it would go like this:
>> #AllocationStrategy=one_at_a_time
> GRAM job #1 of 1 node with 2 workers
> GRAM job #2 of 1 node with 2 workers
> GRAM job #3 of 1 node with 2 workers
> GRAM job #4 of 1 node with 2 workers
> GRAM job #5 of 1 node with 2 workers
> GRAM job #6 of 1 node with 2 workers
> GRAM job #7 of 1 node with 2 workers
> GRAM job #8 of 1 node with 2 workers
> GRAM job #9 of 1 node with 2 workers
> GRAM job #10 of 1 node with 2 workers

Ie, always runs gram job to get 1 worker node.

> 
>> #AllocationStrategy=additive
> GRAM job #1 of 1 node with 2 workers
> GRAM job #2 of 2 node with 2 workers
> GRAM job #3 of 3 node with 2 workers
> GRAM job #4 of 4 node with 2 workers

Ie grows the #nodes per job by 1 with each job it submits?
Eg to get 12 it would do 1,2,3,4,2 ???

> 
>> #AllocationStrategy=exponential
> GRAM job #1 of 1 node with 2 workers
> GRAM job #2 of 2 node with 2 workers
> GRAM job #3 of 4 node with 2 workers
> GRAM job #4 of 3 node with 2 workers

Ditto but exponential?

> 
> #AllocationStrategy=all_at_a_time
> GRAM job #1 of 10 node with 2 workers

Ie, very time it needs nodes, it asks for all the nodes it needs with 
one job?

So: you would use these to tune the worker requests to what you know 
abut a given site's scheduling policy?  If, if the site favors jobs that 
ask for lots of nodes at once, use "all_at_one_time", but if that would 
exceed a limit, use one of the other strategies?

> 
>> AllocationStrategy=additive
>> MinNumHostsPerAllocation=10       # get at least this many nodes per
>>                                   # alloc job?
>>                                   # (doesnt match what I see)
> This is not implemented yet.  The current MinNumHostsPerAllocation is 
> set to 1.  This feature shouldn't be hard to be implemented, I just 
> haven't had time to do it.
>> MaxNumHostsPerAllocation=100
> This is also not implemented yet.

I dont understand the explanation on this.

>>
>> #de-allocation strategies, 0 means never de-allocate due to idle time
>> DeAllocationIdleTime_sec=300000
>> # ^^^^ in msec 300,000 = 300 secs = 5 min  # Seems to work well.
>>                                    # But I see a few stragglers that
>>                                    # inger much longer (did last week)
> Did you see them in the Falkon logs?  

No.

Probably not.  Did you see them in
> showq/qstat (PBS monitoring tools)?  Probably yes. 

Yes.

- Mike

  If the first answer
> is no, and the second is yes, then it has to do with the fact that there 
> is no coordination among the workers when they de-allocate.  This is OK 
> as long as each worker is allocated in a separate job (i.e. 
> #AllocationStrategy=one_at_a_time).  However, all the other strategies 
> do allocate multiple workers per GRAM job, and hence the problem that 
> you are seeing arises.  Let me give you an example.
> The timeline is as follows:
> Time 0: Task 1 submitted, 20 min long
> Time 0: Task 2 submitted, 20 min long
> Time 0: Task 3 submitted, 20 min long
> Time 0: GRAM job allocates 2 workers for 60 min, with a 5 min idle time
> Time 0: Worker 1 receives task 1
> Time 0: Worker 2 receives task 2
> Time 20: Worker 1 completes task 1
> Time 20: Worker 2 completes task 2
> Time 20: Worker 1 receives task 3
> Time 25: Worker 2 de-allocates itself due to 5 min idle time reached
> Time 40: Worker 1 completes task 3
> Time 45: Worker 1 de-allocates itself due to 5 min idle time reached
> Time 45: GRAM completes and resources are returned to the LRM pool
> 
> Note that GRAM only completed when all workers exited.  Although Worker 
> 1 de-allocated itself from Falkon at time 25, it only got released back 
> into the LRM resource pool at time 45, when the Worker 2 also exited.  
> The only solution to this problem is to either 1) have a centralized 
> control over the workers, which would know what workers were allocated 
> together, and hence must be de-allocated together, or 2) have some 
> coordination among the workers so they only de-allocate when they are 
> all ready to de-allocate. 
> One artifact of this is that for large runs that vary in the number of 
> resources needed, the resources can become quite fragmented, and hence 
> Falkon's registered workers be less than the actual reserved resources 
> from GRAM/PBS.
> A short term solution is to either use the 
> #AllocationStrategy=one_at_a_time, or set the idle time to 0, which 
> would mean that the workers will only de-register when the lease is up, 
> which will be the same for all the workers, and hence this problem would 
> not appear.
> 
> Ioan
>>
>> #Falkon information
>> FalkonServiceURI=http://tg-viz-login1.uc.teragrid.org:50011/wsrf/services/GenericPortal/core/W 
>>
>> S/GPFactoryService
>> #FalkonServiceURI=http://viper.uchicago.edu:50001/wsrf/services/GenericPortal/core/WS/GPFactor 
>>
>> yService
>> EPR_FileName=WorkerEPR.txt
>> FalkonStatePollTime_sec=15
>>
>> #GRAM4 details
>> GRAM4_Location=tg-grid1.uc.teragrid.org
>> GRAM4_FactoryType=PBS
>> #GRAM4_FactoryType=FORK
>> #GRAM4_FactoryType=LSF
>> #GRAM4_FactoryType=CONDOR
>>
>> #project accounting information
>> Project=TG-STA040017N
>> #Project=default
>>
>> #Executor script
>> ExecutorScript=run.worker.sh
>>
>> #Security Descriptor File
>> SecurityFile=etc/client-security-config.xml
>>
>> #logging
>> DRP_Log=logs/drp-status.txt
>>
>> #enable debug statements
>> #DEBUG=true
>> DEBUG=false
>> DIPERF=false
>> #DIPERF=true
>>
>>
>>
>>
>>
>> -------- Original Message --------
>> Subject: PBS JOB 1512406.tg-master.uc.teragrid.org
>> Date: Wed,  5 Sep 2007 14:46:17 -0500 (CDT)
>> From: adm at tg-master.uc.teragrid.org (root)
>> To: wilde at tg-grid1.uc.teragrid.org
>>
>> PBS Job Id: 1512406.tg-master.uc.teragrid.org
>> Job Name:   STDIN
>> An error has occurred processing your job, see below.
>> Post job file processing error; job 1512406.tg-master.uc.teragrid.org on
>> host tg-v082/0+tg-v076/0+tg-v053/0+tg-v040/0+tg-v034/0Unknown resource
>> type  REJHOST=tg-v082.uc.teragrid.org MSG=invalid home directory
>> '/home/wilde' specified, errno=2 (No such file or directory)
>>
>>
>>
>>
>