[Swift-user] Using > 1 CPU per compute node under GRAM

Mike Kubal mikekubal at yahoo.com
Mon Jul 21 22:42:22 CDT 2008


I'm using pre-WS-GRAM. 

MikeK


--- On Mon, 7/21/08, Ioan Raicu <iraicu at cs.uchicago.edu> wrote:
From: Ioan Raicu <iraicu at cs.uchicago.edu>
Subject: Re: [Swift-user] Using > 1 CPU per compute node under GRAM
To: "Michael Wilde" <wilde at mcs.anl.gov>
Cc: "Swift User Discussion List" <swift-user at ci.uchicago.edu>, "Stu Martin" <smartin at mcs.anl.gov>, "Martin Feller" <feller at mcs.anl.gov>, "JP Navarro" <navarro at mcs.anl.gov>, "Mike Kubal" <mikekubal at yahoo.com>
Date: Monday, July 21, 2008, 6:57 PM

In the past (i.e. MolDyn), I don't think we ever found a easy solution 
to this when running straight through GRAM (if the LRM didn't support 
this policy). But, as JP said, it is site specific, so some sites will 
allow getting only 1 CPU per node, such as Teraport, in which case GRAM 
should work just fine.

Ioan

Michael Wilde wrote:
> Im asking this on behalf of Mike Kubal while I wait for more info on 
> his settings:
>
> Mike is running under Swift on teragrid/Abe which has 8-core nodes. 
> His jobs are all running 1-job-per-node, wasting 7 cores.
>
> I am waiting to hear if he is running on WS-GRAM or pre-WS-GRAM.
>
> In the meantime, does anyone know if there's a way to specify 
> compute-node-sharing between separate single-cpu jobs via both GRAMs?
>
> And if this is dependent on the local job manager code or settings? 
> (Ie might work on some sites but not others)?
>
> On globus doc page:
>
http://www.globus.org/toolkit/docs/4.0/execution/wsgram/WS_GRAM_Job_Desc_Extensions.html#r-wsgram-extensions-constructs-nodes

>
>
> I see:
>         <!-- *OR* an explicit number of processes per node... -->
>         <processesPerHost>...</processesPerHost>
>         </resourceAllocationGroup>
>         </extensions>
> but cant tell if this applies to single-core jobs or only to 
> multi-core jobs.
>
> This will ideally be handled as desired by Falkon or Coaster, but in 
> the meantime I was hoping there was a simple setting to give MikeK 
> better CPU yield on Abe.
>
> - Mike Wilde
>
> ---
>
> A sample of one of his jobs looks like this under qstat -ef:
>
> Job Id: 395980.abem5.ncsa.uiuc.edu
>     Job_Name = STDIN
>     Job_Owner = mkubal at abe1196
>     job_state = Q
>     queue = normal
>     server = abem5.ncsa.uiuc.edu
>     Account_Name = onm
>     Checkpoint = u
>     ctime = Mon Jul 21 17:43:47 2008
>     Error_Path = abe1196:/dev/null
>     Hold_Types = n
>     Join_Path = n
>     Keep_Files = n
>     Mail_Points = n
>     mtime = Mon Jul 21 17:43:47 2008
>     Output_Path = abe1196:/dev/null
>     Priority = 0
>     qtime = Mon Jul 21 17:43:47 2008
>     Rerunable = True
>     Resource_List.ncpus = 1
>     Resource_List.nodect = 1
>     Resource_List.nodes = 1
>     Resource_List.walltime = 00:10:00
>     Shell_Path_List = /bin/sh
>     etime = Mon Jul 21 17:43:47 2008
>     submit_args = -A onm /tmp/.pbs_mkubal_21430/STDIN
>
> And his jobs show up like this under qstat -n (ie are all on core /0 ):
>
> 395653.abem5.ncsa.ui mkubal   normal   STDIN        1767     1   1    
> --  00:10 R   --
>    abe0872/0
>
> While multi-core jobs use
>
> +abe0582/2+abe0582/1+abe0582/0+abe0579/7+abe0579/6+abe0579/5+abe0579/4
>    +abe0579/3+abe0579/2+abe0579/1+abe0579/0
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>

-- 
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================


      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20080721/b52068ad/attachment.html>


More information about the Swift-user mailing list