[Swift-devel] Re: Falkon worker config params?

Wed Sep 5 16:59:13 CDT 2007

Hi,

Michael Wilde wrote:
> Ioan, can you send/resend/point-me-to definitions of the critical 
> parameters to control the startup of falkon workers, and review the 
> attached file for anything I'm doing stupid here?
>
> Can you review/improve/fil in my comments with more (correct) details?
>
> Thanks,
>
> Mike
>
>
>
> #Provisioner config file
> #KEY=VALUE
> #if multiple lines have the same key, the previous value will be
> overwritten with the new valu
> e
> #all paths are relative
>
> #resources numbers
> MinNumExecutors=0      # min # of exec threads to keep extant
> MaxNumExecutors=250    # max # of exec threads to allow extant
> ExecutorsPerHost=2     # # of exec threads to run on each host
>
> #resources times
> MinResourceAllocationTime_min=60   # ??? re-assess allocations
> MaxResourceAllocationTime_min=60   # every this-many seconds? ???
>                                    # if so, why uper and lower settings?
This is the time that GRAM4 sets the max wall clock time to.  There is a 
lower and upper bound, but in reality I am not doing anything with both 
bounds, only one. In the future, the provisioner could make smarter 
allocation requests in terms of time based on the workload 
characteristics (not just number of jobs), and hence the existence of a 
lower and upper bound.
>
> #resources types
> HostType=any
> #HostType=ia32_compute
> #HostType=ia64_compute
>
> #allocation strategies            # please explain these
If you want 20 workers with 2 per node, then it would go like this:
> #AllocationStrategy=one_at_a_time
GRAM job #1 of 1 node with 2 workers
GRAM job #2 of 1 node with 2 workers
GRAM job #3 of 1 node with 2 workers
GRAM job #4 of 1 node with 2 workers
GRAM job #5 of 1 node with 2 workers
GRAM job #6 of 1 node with 2 workers
GRAM job #7 of 1 node with 2 workers
GRAM job #8 of 1 node with 2 workers
GRAM job #9 of 1 node with 2 workers
GRAM job #10 of 1 node with 2 workers

> #AllocationStrategy=additive
GRAM job #1 of 1 node with 2 workers
GRAM job #2 of 2 node with 2 workers
GRAM job #3 of 3 node with 2 workers
GRAM job #4 of 4 node with 2 workers

> #AllocationStrategy=exponential
GRAM job #1 of 1 node with 2 workers
GRAM job #2 of 2 node with 2 workers
GRAM job #3 of 4 node with 2 workers
GRAM job #4 of 3 node with 2 workers

#AllocationStrategy=all_at_a_time
GRAM job #1 of 10 node with 2 workers

> AllocationStrategy=additive
> MinNumHostsPerAllocation=10       # get at least this many nodes per
>                                   # alloc job?
>                                   # (doesnt match what I see)
This is not implemented yet.  The current MinNumHostsPerAllocation is 
set to 1.  This feature shouldn't be hard to be implemented, I just 
haven't had time to do it.
> MaxNumHostsPerAllocation=100
This is also not implemented yet.
>
> #de-allocation strategies, 0 means never de-allocate due to idle time
> DeAllocationIdleTime_sec=300000
> # ^^^^ in msec 300,000 = 300 secs = 5 min  # Seems to work well.
>                                    # But I see a few stragglers that
>                                    # inger much longer (did last week)
Did you see them in the Falkon logs?  Probably not.  Did you see them in 
showq/qstat (PBS monitoring tools)?  Probably yes.  If the first answer 
is no, and the second is yes, then it has to do with the fact that there 
is no coordination among the workers when they de-allocate.  This is OK 
as long as each worker is allocated in a separate job (i.e. 
#AllocationStrategy=one_at_a_time).  However, all the other strategies 
do allocate multiple workers per GRAM job, and hence the problem that 
you are seeing arises.  Let me give you an example. 

The timeline is as follows:
Time 0: Task 1 submitted, 20 min long
Time 0: Task 2 submitted, 20 min long
Time 0: Task 3 submitted, 20 min long
Time 0: GRAM job allocates 2 workers for 60 min, with a 5 min idle time
Time 0: Worker 1 receives task 1
Time 0: Worker 2 receives task 2
Time 20: Worker 1 completes task 1
Time 20: Worker 2 completes task 2
Time 20: Worker 1 receives task 3
Time 25: Worker 2 de-allocates itself due to 5 min idle time reached
Time 40: Worker 1 completes task 3
Time 45: Worker 1 de-allocates itself due to 5 min idle time reached
Time 45: GRAM completes and resources are returned to the LRM pool

Note that GRAM only completed when all workers exited.  Although Worker 
1 de-allocated itself from Falkon at time 25, it only got released back 
into the LRM resource pool at time 45, when the Worker 2 also exited.  
The only solution to this problem is to either 1) have a centralized 
control over the workers, which would know what workers were allocated 
together, and hence must be de-allocated together, or 2) have some 
coordination among the workers so they only de-allocate when they are 
all ready to de-allocate.  

One artifact of this is that for large runs that vary in the number of 
resources needed, the resources can become quite fragmented, and hence 
Falkon's registered workers be less than the actual reserved resources 
from GRAM/PBS. 

A short term solution is to either use the 
#AllocationStrategy=one_at_a_time, or set the idle time to 0, which 
would mean that the workers will only de-register when the lease is up, 
which will be the same for all the workers, and hence this problem would 
not appear.

Ioan
>
> #Falkon information
> FalkonServiceURI=http://tg-viz-login1.uc.teragrid.org:50011/wsrf/services/GenericPortal/core/W 
>
> S/GPFactoryService
> #FalkonServiceURI=http://viper.uchicago.edu:50001/wsrf/services/GenericPortal/core/WS/GPFactor 
>
> yService
> EPR_FileName=WorkerEPR.txt
> FalkonStatePollTime_sec=15
>
> #GRAM4 details
> GRAM4_Location=tg-grid1.uc.teragrid.org
> GRAM4_FactoryType=PBS
> #GRAM4_FactoryType=FORK
> #GRAM4_FactoryType=LSF
> #GRAM4_FactoryType=CONDOR
>
> #project accounting information
> Project=TG-STA040017N
> #Project=default
>
> #Executor script
> ExecutorScript=run.worker.sh
>
> #Security Descriptor File
> SecurityFile=etc/client-security-config.xml
>
> #logging
> DRP_Log=logs/drp-status.txt
>
> #enable debug statements
> #DEBUG=true
> DEBUG=false
> DIPERF=false
> #DIPERF=true
>
>
>
>
>
> -------- Original Message --------
> Subject: PBS JOB 1512406.tg-master.uc.teragrid.org
> Date: Wed,  5 Sep 2007 14:46:17 -0500 (CDT)
> From: adm at tg-master.uc.teragrid.org (root)
> To: wilde at tg-grid1.uc.teragrid.org
>
> PBS Job Id: 1512406.tg-master.uc.teragrid.org
> Job Name:   STDIN
> An error has occurred processing your job, see below.
> Post job file processing error; job 1512406.tg-master.uc.teragrid.org on
> host tg-v082/0+tg-v076/0+tg-v053/0+tg-v040/0+tg-v034/0Unknown resource
> type  REJHOST=tg-v082.uc.teragrid.org MSG=invalid home directory
> '/home/wilde' specified, errno=2 (No such file or directory)
>
>
>
>

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================