[Swift-devel] sites.xml for ranger sge coasters

Ketan Maheshwari ketancmaheshwari at gmail.com
Fri Mar 9 17:20:31 CST 2012


Glen,

Could you tell me if you were able to get through the ranger queue today or
yesterday? I am stuck sitting on the queue since one and a half day now.

Thanks,
Ketan

On Fri, Mar 9, 2012 at 3:59 PM, Glen Hocky <hockyg at uchicago.edu> wrote:

> David, Mike
> I'm now in a position to verify if this is working correctly or not,
> again.
>
> I wanted my new swift LAMMPS scripts to run 1 task per node using 16
> cores. This sites file seems to do that correctly, i.e., it appears w/
> David's change that only one coaster is started per node (and one job run
> per coaster). In principle I should be able to test packing different
> number of jobs in other ways
>
> -Glen
>
>       <pool handle="ranger">
>           <execution jobmanager="local:sge" provider="coaster" url="none"/>
>           <profile namespace="globus" key="maxWallTime">00:29:00</profile>
>           <profile namespace="globus" key="maxTime">3600</profile>
>           <profile key="jobsPerNode" namespace="globus">1</profile>
>           <profile key="coresPerNode" namespace="globus">16</profile>
>           <profile key="slots" namespace="globus">200</profile>
>           <profile key="nodeGranularity" namespace="globus">1</profile>
>            <profile key="pe" namespace="globus">16way</profile>
>           <profile key="maxNodes" namespace="globus">1</profile>
>           <profile key="queue" namespace="globus">development</profile>
>           <profile key="jobThrottle" namespace="karajan">1.99</profile>
>           <profile key="initialScore" namespace="karajan">10000</profile>
>           <profile namespace="globus" key="project">TG-CHE110004</profile>
>           <scratch>/scratch/01021/hockyg/glass-lammps-runs</scratch>
>
> <workdirectory>/share/home/01021/hockyg/reichman/glassy_dynamics/code/swift_lammps/run/test/swiftwork</workdirectory>
>           <filesystem provider="local" url="none" />
>         </pool>
>
>
> On Thu, Mar 8, 2012 at 10:21 AM, Michael Wilde <wilde at mcs.anl.gov> wrote:
>
>> David, thanks for addressing this problem.  Does it affect any of the
>> other local providers: pbs, condor, (sge), cobalt?
>>
>> (I need to do some cobalt runs on Eureka for a user today, so I hope that
>> provider is OK).
>>
>> You should describe the issue and fix on swift-devel.
>>
>> We should start a convention where we can document known issues for
>> releases, so that users dont have to discover these bugs on their own.  Can
>> you make an action item to propose and start such a place (probably
>> crosslinked to both Downloads and Documentation). Not urgent for today, but
>> next week would be good.
>>
>> Thanks,
>>
>> - Mike
>>
>>
>>
>> ----- Original Message -----
>> > From: "David Kelly" <davidk at ci.uchicago.edu>
>> > To: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
>> > Cc: "Michael Wilde" <wilde at mcs.anl.gov>
>> > Sent: Thursday, March 8, 2012 8:45:00 AM
>> > Subject: Re: sites.xml for ranger sge coasters
>> > 0.93 is frozen, but I committed the same change to 0.93.1 this
>> > morning.
>> >
>> > ----- Original Message -----
>> > > From: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
>> > > To: "David Kelly" <davidk at ci.uchicago.edu>
>> > > Cc: "Michael Wilde" <wilde at mcs.anl.gov>
>> > > Sent: Wednesday, March 7, 2012 6:39:09 PM
>> > > Subject: Re: sites.xml for ranger sge coasters
>> > > is it committed in 0.93 too?
>> > >
>> > >
>> > > On Wed, Mar 7, 2012 at 6:10 PM, David Kelly < davidk at ci.uchicago.edu
>> > > >
>> > > wrote:
>> > >
>> > >
>> > > I submitted a fix to trunk for the SGE provider. The submit script
>> > > was
>> > > wrong - it started one worker per core, rather than one worker per
>> > > host. (Oddly it's been like that for years without anybody
>> > > noticing).
>> > > I ran a few sleep/hostname tests and it seems to be working. Can you
>> > > please give it a try?
>> > >
>> > > Below is the sites.xml I used for my test:
>> > >
>> > > <config>
>> > > <pool handle="ranger">
>> > > <execution jobmanager="local:sge" provider="coaster" url="none"/>
>> > >
>> > > <filesystem provider="local" url="none" />
>> > > <profile namespace="globus" key="maxWallTime">5</profile>
>> > > <profile namespace="globus" key="maxTime">600</profile>
>> > > <profile key="jobsPerNode" namespace="globus">16</profile>
>> > > <profile key="slots" namespace="globus">1</profile>
>> > > <profile key="nodeGranularity" namespace="globus">3</profile>
>> > > <profile key="pe" namespace="globus">16way</profile>
>> > > <profile key="maxNodes" namespace="globus">3</profile>
>> > > <profile key="queue" namespace="globus">development</profile>
>> > > <profile key="jobThrottle" namespace="karajan">0.4799</profile>
>> > > <profile key="initialScore" namespace="karajan">10000</profile>
>> > > <profile namespace="globus" key="project">TG-DBS080004N</profile>
>> > > <workdirectory>/share/home/01503/davidkel/swiftwork</workdirectory>
>> > > </pool>
>> > > </config>
>> > >
>> > > Thanks,
>> > > David
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > Ketan
>>
>> --
>> Michael Wilde
>> Computation Institute, University of Chicago
>> Mathematics and Computer Science Division
>> Argonne National Laboratory
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>
>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>
>


-- 
Ketan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20120309/fcc110a3/attachment.html>


More information about the Swift-devel mailing list