[Swift-user] Re: [SWIFT problem]Failed to transfer wrapper log on PBS

Ketan Maheshwari ketancmaheshwari at gmail.com
Tue Apr 5 13:02:48 CDT 2011


Hi Weiyang,

Are you, by any chance running from a non-lustre filesystem?

I too faced the same error this morning and after several different trials, I isolated the problem to be causing when running Swift on Beagle from non-lustre fileSystem.

The problem disappears on /lustre filesystem. 

This apparently means file transfer commands are failing in PBS either recently or was always the case as I do not remember ever running from non lustre system but today.


Ketan

On Apr 5, 2011, at 10:14 AM, Michael Wilde wrote:

> Hi Weiyang,
> 
> I'm cc'ing this to swift-user, where you should send all questions, so that other Swift developers and users can offer help as well, and all users can learn from the answers.
> 
> Its hard for me to debug this without seeing your .swift script and your log file.
> 
> The error message means: Swift tried to run a batch job to execute an app() call, and the attempt did not even return the per-job log file (the "wrapper log") from the execution site (ie what you defined in your pool entry).
> 
> First, please comment out the <scratch> tag. Thats an optimization and may be confusing things (or may even be cause of the error, but that is less likely).
> 
> The most likely cause of the problem here is that your application "touch" is listed in your tc under the wrong pathname: touch is /bin/touch, not /usr/bin.
> 
> The best way to get a sense of how far your script progressed is to do a full "find" under your swiftwork directory from the directory for the failing run id (test-20110405-0940-rqa4nyka), and see what is there.
> 
> The shared/ directory should contain whatever input files were processed.  If you did not use the <scratch> tag, then there should also be a "job directory" for each app swift tried to run.  This is described in the user guide.
> 
> Since these were not present, I concluded that Swift could not execute your application.
> 
> The Swift team is working to improve both these messages and the documentation for debugging such situations to make this much easier to spot.
> 
> Also, I dont know how far your workflow ran the previous time, but thus looks like a large run. You should test new workflows (or even any changes you make) on very small runs first, so that there are fewer files and parallel jobs to sort through when debugging new scripts.
> 
> - Mike
> 
> 
> ----- Original Message -----
>> Hello,
>> 
>> 
>> My swift codes just encountered new problems: After submitted jobs
>> using foreach it's saying
>> 
>> 
>> Failed to transfer wrapper log from test-20110405-0940-rqa4nyka/info/u
>> on pbs
>> 
>> 
>> My guess is there're sth wrong with pbs (the execution provider)
>> 
>> 
>> My sites.xml and tc.data is exactly kept the same since last time you
>> advised
>> 
>> 
>> sites.xml:
>> 
>> 
>> 
>> <config>
>> <pool handle="pbs">
>> <execution provider="pbs" url="localhost" jobManager="local:pbs"/>
>> <profile namespace="globus" key="maxwalltime">1:00:00</profile>
>> <profile namespace="globus" key="workersPerNode">8</profile>
>> <profile namespace="globus" key="ppn">8</profile>
>> <!-- <profile namespace="globus"
>> key="internalHostname">172.5.86.6</profile>-->
>> <profile namespace="globus" key="nodeGranularity">1</profile>
>> <profile namespace="globus" key="maxNodes">1</profile>
>> <profile namespace="karajan" key="jobThrottle">1.99</profile>
>> <profile namespace="karajan" key="initialScore">10000</profile>
>> <profile namespace="globus" key="project">CI-SES000031</profile>
>> <!--gridftp url="local://localhost" />-->
>> <filesystem provider="local"/>
>> <scratch>/home/frankwang/tmp</scratch>
>> <workdirectory>/home/frankwang/swiftwork</workdirectory>
>> </pool>
>> 
>> 
>> </config>
>> 
>> pbs echo /bin/echo INSTALLED INTEL32::LINUX null
>> pbs sh /bin/bash INSTALLED INTEL32::LINUX null
>> pbs touch /usr/bin/touch INSTALLED INTEL32::LINUX
>> GLOBUS::maxwalltime="0:1"
>> 
>> 
>> The same problem occured when I was using various version of swift.
>> 
>> 
>> Can you take a time to figure it out?
>> 
>> 
>> Weiyang
> 
> -- 
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user




More information about the Swift-user mailing list