[Swift-user] Re: [SWIFT problem]Failed to transfer wrapper log on PBS

Michael Wilde wilde at mcs.anl.gov
Tue Apr 5 10:14:11 CDT 2011


Hi Weiyang,

I'm cc'ing this to swift-user, where you should send all questions, so that other Swift developers and users can offer help as well, and all users can learn from the answers.

Its hard for me to debug this without seeing your .swift script and your log file.

The error message means: Swift tried to run a batch job to execute an app() call, and the attempt did not even return the per-job log file (the "wrapper log") from the execution site (ie what you defined in your pool entry).

First, please comment out the <scratch> tag. Thats an optimization and may be confusing things (or may even be cause of the error, but that is less likely).

The most likely cause of the problem here is that your application "touch" is listed in your tc under the wrong pathname: touch is /bin/touch, not /usr/bin.

The best way to get a sense of how far your script progressed is to do a full "find" under your swiftwork directory from the directory for the failing run id (test-20110405-0940-rqa4nyka), and see what is there.

The shared/ directory should contain whatever input files were processed.  If you did not use the <scratch> tag, then there should also be a "job directory" for each app swift tried to run.  This is described in the user guide.

Since these were not present, I concluded that Swift could not execute your application.

The Swift team is working to improve both these messages and the documentation for debugging such situations to make this much easier to spot.

Also, I dont know how far your workflow ran the previous time, but thus looks like a large run. You should test new workflows (or even any changes you make) on very small runs first, so that there are fewer files and parallel jobs to sort through when debugging new scripts.

- Mike


----- Original Message -----
> Hello,
> 
> 
> My swift codes just encountered new problems: After submitted jobs
> using foreach it's saying
> 
> 
> Failed to transfer wrapper log from test-20110405-0940-rqa4nyka/info/u
> on pbs
> 
> 
> My guess is there're sth wrong with pbs (the execution provider)
> 
> 
> My sites.xml and tc.data is exactly kept the same since last time you
> advised
> 
> 
> sites.xml:
> 
> 
> 
> <config>
> <pool handle="pbs">
> <execution provider="pbs" url="localhost" jobManager="local:pbs"/>
> <profile namespace="globus" key="maxwalltime">1:00:00</profile>
> <profile namespace="globus" key="workersPerNode">8</profile>
> <profile namespace="globus" key="ppn">8</profile>
> <!-- <profile namespace="globus"
> key="internalHostname">172.5.86.6</profile>-->
> <profile namespace="globus" key="nodeGranularity">1</profile>
> <profile namespace="globus" key="maxNodes">1</profile>
> <profile namespace="karajan" key="jobThrottle">1.99</profile>
> <profile namespace="karajan" key="initialScore">10000</profile>
> <profile namespace="globus" key="project">CI-SES000031</profile>
> <!--gridftp url="local://localhost" />-->
> <filesystem provider="local"/>
> <scratch>/home/frankwang/tmp</scratch>
> <workdirectory>/home/frankwang/swiftwork</workdirectory>
> </pool>
> 
> 
> </config>
> 
> pbs echo /bin/echo INSTALLED INTEL32::LINUX null
> pbs sh /bin/bash INSTALLED INTEL32::LINUX null
> pbs touch /usr/bin/touch INSTALLED INTEL32::LINUX
> GLOBUS::maxwalltime="0:1"
> 
> 
> The same problem occured when I was using various version of swift.
> 
> 
> Can you take a time to figure it out?
> 
> 
> Weiyang

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-user mailing list