[Swift-devel] Swift 0.92(.1) on Fusion

David Kelly dk0966 at cs.ship.edu
Wed Apr 13 11:21:49 CDT 2011


The problem is related to the walltime of 00:00:00. The shared queue
that I am trying to use requires a walltime of less than one hour.
Running 'checkjob' reveals that there is a policy violation related to
this.

As a test, I edited the PBS submit file. I set the walltime to a
reasonable value and had it echo something. When I manually submitted
it with qsub, it ran right away.

How can I get a walltime value to be created in the PBS files? I have
maxtime specified in sites.xml and I have a maxwalltime for all my
applications. I tried this with the 0.92 branch as well with trunk
with no luck. Switching to the batch queue might be one workaround.

Regards,
David

On Wed, Apr 13, 2011 at 10:11 AM, Justin M Wozniak <wozniak at mcs.anl.gov> wrote:
>
> Thanks for digging into this- can you try this again from trunk?  A Parvis
> developer and I were able to run successfully there.  (The queues are very
> long.)
>
> One thing I did when working on Fusion was cut out the generated submit
> files and qsub them myself to verify that the #PBS settings did actually
> work for me, you may want to try that too.
>
>        Justin
>
> On Wed, 13 Apr 2011, David Kelly wrote:
>
>> Hello,
>>
>> Recently when I try to run Swift on Fusion, my job never seems to
>> execute. I have emailed Fusion support about this (ticket #70175) but
>> thought it may also be useful to send to the list. I am trying to run
>> the catsn.swift script for testing. I can see it in qstat. The
>> sites.xml is based on the config listed in the Fusion cookbook, with a
>> few small changes. I added an internalHostname entry and set it to the
>> IP address attached to the Infiniband device. I also lowered the
>> maxtime from 1000 to 10. The Fusion cookbook says "Set MAXTIME as in
>> qsub walltime. This is on a per-allocation basis and should be at
>> least 20% larger than your longest task". I am not sure how maxtime
>> relates to walltime exactly, but the walltime value in the PBS file
>> gets set to 00:00:00. I am not sure if this matters or not.
>>
>> I have also attached a compressed log file and the actual swift script
>> I'm trying to run.
>>
>> Thanks,
>> David
>>
>> $ swift -version
>> Swift svn swift-r4076 cog-r3049
>>
>> qstat:
>> 541724.fmgt2.l davidk   shared   Block-0412    --    1    1    --  00:00 Q
>>   --
>>
>> sites.xml:
>> <config>
>> <pool handle="fusion">
>>  <execution jobmanager="local:pbs" provider="coaster" url="none"/>
>>  <filesystem provider="local" url="none" />
>>  <profile namespace="globus"
>> key="internalHostname">192.168.71.81</profile>
>>  <profile namespace="globus" key="maxtime">10</profile>
>>  <profile namespace="globus" key="workersPerNode">1</profile>
>>  <profile namespace="globus" key="slots">1</profile>
>>  <profile namespace="globus" key="nodeGranularity">1</profile>
>>  <profile namespace="globus" key="maxNodes">2</profile>
>>  <profile namespace="globus" key="queue">shared</profile>
>>  <profile namespace="karajan" key="jobThrottle">5.99</profile>
>>  <profile namespace="karajan" key="initialScore">10000</profile>
>>  <workdirectory>/home/davidk/swiftwork</workdirectory>
>> </pool>
>> </config>
>>
>> PBS submission file:
>> #PBS -S /bin/bash
>> #PBS -N Block-0412-211041-000000
>> #PBS -m n
>> #PBS -l nodes=1
>> #PBS -l walltime=00:00:00
>> #PBS -q shared
>> #PBS -o /homes/davidk/.globus/scripts/PBS1298937999826083605.submit.stdout
>> #PBS -e /homes/davidk/.globus/scripts/PBS1298937999826083605.submit.stderr
>> WORKER_LOGGING_LEVEL=NONE
>> #PBS -v WORKER_LOGGING_LEVEL
>> cd / && /usr/bin/perl
>> /homes/davidk/.globus/coasters/cscript1716491648595514240.pl
>> http://192.168.71.81:46584 0412-211041-000000 NOLOGGING
>> /bin/echo $?
>> >/homes/davidk/.globus/scripts/PBS1298937999826083605.submit.exitcode
>
> --
> Justin M Wozniak
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tc.data
Type: application/octet-stream
Size: 562 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20110413/dee63f04/attachment.obj>


More information about the Swift-devel mailing list