[Swift-user] Using > 1 CPU per compute node under GRAM
Michael Wilde
wilde at mcs.anl.gov
Mon Jul 21 18:10:45 CDT 2008
Im asking this on behalf of Mike Kubal while I wait for more info on his
settings:
Mike is running under Swift on teragrid/Abe which has 8-core nodes. His
jobs are all running 1-job-per-node, wasting 7 cores.
I am waiting to hear if he is running on WS-GRAM or pre-WS-GRAM.
In the meantime, does anyone know if there's a way to specify
compute-node-sharing between separate single-cpu jobs via both GRAMs?
And if this is dependent on the local job manager code or settings? (Ie
might work on some sites but not others)?
On globus doc page:
http://www.globus.org/toolkit/docs/4.0/execution/wsgram/WS_GRAM_Job_Desc_Extensions.html#r-wsgram-extensions-constructs-nodes
I see:
<!-- *OR* an explicit number of processes per node... -->
<processesPerHost>...</processesPerHost>
</resourceAllocationGroup>
</extensions>
but cant tell if this applies to single-core jobs or only to multi-core
jobs.
This will ideally be handled as desired by Falkon or Coaster, but in the
meantime I was hoping there was a simple setting to give MikeK better
CPU yield on Abe.
- Mike Wilde
---
A sample of one of his jobs looks like this under qstat -ef:
Job Id: 395980.abem5.ncsa.uiuc.edu
Job_Name = STDIN
Job_Owner = mkubal at abe1196
job_state = Q
queue = normal
server = abem5.ncsa.uiuc.edu
Account_Name = onm
Checkpoint = u
ctime = Mon Jul 21 17:43:47 2008
Error_Path = abe1196:/dev/null
Hold_Types = n
Join_Path = n
Keep_Files = n
Mail_Points = n
mtime = Mon Jul 21 17:43:47 2008
Output_Path = abe1196:/dev/null
Priority = 0
qtime = Mon Jul 21 17:43:47 2008
Rerunable = True
Resource_List.ncpus = 1
Resource_List.nodect = 1
Resource_List.nodes = 1
Resource_List.walltime = 00:10:00
Shell_Path_List = /bin/sh
etime = Mon Jul 21 17:43:47 2008
submit_args = -A onm /tmp/.pbs_mkubal_21430/STDIN
And his jobs show up like this under qstat -n (ie are all on core /0 ):
395653.abem5.ncsa.ui mkubal normal STDIN 1767 1 1 --
00:10 R --
abe0872/0
While multi-core jobs use
+abe0582/2+abe0582/1+abe0582/0+abe0579/7+abe0579/6+abe0579/5+abe0579/4
+abe0579/3+abe0579/2+abe0579/1+abe0579/0
More information about the Swift-user
mailing list