[Swift-user] pbs failure on pads
Zhao Zhang
zhaozhang at uchicago.edu
Tue Mar 2 14:39:59 CST 2010
Hi, Mike
First I tried setting "replication.enabled=false", the failure still
pops up.
Then I tried to set "maxwalltime" in tc.data, it doesn't solve the
problem either.
zhao
Michael Wilde wrote:
> Zhao, I was just debugging a similar problem. Mine turned out to be caused by leaving on replication in swift.properties. Try setting replication.enabled=false in swift.properties, and let us know if that solves it.
>
> You should also set the maxwalltime value in tc.data to the expected runtime of the app, and maxtime in the sites.xml entry to some multiple of that for the coaster block.
>
> - Mike
>
> ----- "Zhao Zhang" <zhaozhang at uchicago.edu> wrote:
>
>
>> Hi,
>>
>> I am having the following failure right now on pads using coaster, it
>>
>> failed occasionally but unexpected.
>> I am not sure what the following info means, could some one point out?
>>
>> Thanks
>>
>> [zzhang at login2 final]$ cat pbs.xml
>> <config>
>>
>> <pool handle="pbs">
>> <execution provider="coaster" url="none" jobManager="local:pbs"/>
>> <profile namespace="globus" key="queue">extended</profile>
>>
>> <profile namespace="globus" key="maxtime">3600</profile>
>> <profile namespace="globus" key="maxwalltime">00:40:00</profile>
>>
>> <profile namespace="globus" key="workersPerNode">8</profile>
>> <profile namespace="globus" key="maxnodes">8</profile>
>> <profile namespace="karajan" key="initialScore">10000</profile>
>> <profile namespace="karajan" key="jobThrottle">.63</profile>
>>
>> <gridftp url="local://localhost" />
>> <workdirectory >/home/zzhang/swiftwork</workdirectory>
>> </pool>
>> </config>
>>
>>
>> [zzhang at login2 final]$ swift -tc.file tc -sites.file pbs.xml
>> movie.swift
>> Swift svn swift-r3255 (swift modified locally) cog-r2723
>>
>> RunID: 20100302-1151-1tu5u5ac
>> Progress:
>> Progress:
>> Progress:
>> Progress: uninitialized:1
>> Progress: Initializing:16325 Selecting site:58
>> Progress: Selecting site:16382 Initializing site shared directory:1
>> Progress: Selecting site:16319 Stage in:63 Submitting:1
>> Progress: Selecting site:16319 Stage in:46 Submitting:2
>> Submitted:16
>> Progress: Selecting site:16319 Stage in:13 Submitting:1
>> Submitted:50
>> Progress: Selecting site:16319 Submitted:63 Active:1
>> Progress: Selecting site:16319 Submitted:60 Active:4
>> Progress: Selecting site:16319 Submitted:45 Active:18 Checking
>> status:1 Finished successfully:3
>> Progress: Selecting site:16319 Submitting:1 Submitted:39 Active:21
>>
>> Checking status:1 Stage out:2 Finished successfully:13
>> Progress: Selecting site:16319 Stage in:3 Submitted:33 Active:24
>>
>> Stage out:3 Finished successfully:17
>> Progress: Selecting site:16317 Stage in:3 Submitted:35 Active:18
>>
>> Checking status:1 Stage out:7 Finished successfully:29
>> Progress: Selecting site:16318 Stage in:11 Submitted:29 Active:24
>>
>> Finished successfully:42
>> Progress: Selecting site:16318 Stage in:4 Submitted:34 Active:22
>>
>> Checking status:2 Stage out:2 Finished successfully:47
>> Progress: Selecting site:16316 Stage in:6 Submitted:28 Active:23
>>
>> Checking status:1 Stage out:6 Finished successfully:55
>> Worker task failed: 0302-521133-000002 Block task ended prematurely
>> ----------------------------------------
>> Begin PBS Prologue Tue Mar 2 11:52:38 CST 2010
>> Job ID: 6870.svc.pads.ci.uchicago.edu
>> Username: zzhang
>> Group: ci-users
>> Nodes:
>> c05.pads.ci.uchicago.edu,c15.pads.ci.uchicago.edu,c42.pads.ci.uchicago.edu,c43.pads.ci.uchicago.edu
>> End PBS Prologue Tue Mar 2 11:52:38 CST 2010
>> ----------------------------------------
>> ----------------------------------------
>> Begin PBS Epilogue Tue Mar 2 11:52:41 CST 2010
>> Job ID: 6870.svc.pads.ci.uchicago.edu
>> Username: zzhang
>> Group: ci-users
>> Job Name: null
>> Session: 7051
>> Limits: nodes=4,walltime=00:59:00
>> Resources: cput=00:00:00,mem=700kb,vmem=8400kb,walltime=00:00:02
>> Nodes:
>> c05.pads.ci.uchicago.edu,c15.pads.ci.uchicago.edu,c42.pads.ci.uchicago.edu,c43.pads.ci.uchicago.edu
>> End PBS Epilogue Tue Mar 2 11:52:41 CST 2010
>>
>> Progress: Selecting site:16316 Stage in:6 Submitted:27 Active:23
>>
>> Stage out:7 Finished successfully:64 Failed but can retry:1
>> Failed to transfer wrapper log from
>> movie-20100302-1151-1tu5u5ac/info/3
>> on pbs
>> Execution failed:
>> Exception in transform:
>> Arguments: [training_set/mv_0002679.txt]
>> Host: pbs
>> Directory: movie-20100302-1151-1tu5u5ac/jobs/3/transform-35lvrioj
>> stderr.txt:
>>
>> stdout.txt:
>>
>> ----
>>
>> Caused by:
>> Task failed: 0302-521133-000002 Block task ended prematurely
>>
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>
>
>
More information about the Swift-user
mailing list