[Swift-devel] [Bug 321] New: Improve "cant find wrapper log" error message and document in new Debugging chapter

bugzilla-daemon at mcs.anl.gov bugzilla-daemon at mcs.anl.gov
Tue Apr 5 10:25:02 CDT 2011


https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=321

           Summary: Improve "cant find wrapper log" error message and
                    document in new Debugging chapter
           Product: Swift
           Version: 0.93
          Platform: All
        OS/Version: All
            Status: ASSIGNED
          Severity: major
          Priority: P1
         Component: error messages
        AssignedTo: skenny at uchicago.edu
        ReportedBy: wilde at mcs.anl.gov


This bug is filed to deal specifically with the very frequent error message
that is perplexing a new Swift user in the message I cc below.  I think we
should do several specific things in response. I mark this high prio because
erros very similar to this one are probably the highest cause of Swift
problems, confusion, and frustration for new users.

1. Reword the message "failed to transfer wrapper log" to say what it means

2. Document the process of where to look for issues causing this.

3. More clearly document the <scratch> tag, how and when to use it, and how it
affects debugging. Stress that its a performance enhancement that inhibits
debugging and should not be used until a workflow is stable.  (Note that in
this case the user put scratch on the home filesystem which defeats the
purpose. Im not sure f this caused a failure; I think not, but would be good to
verify.  This should not be in our basic templates, although this user didnt
use gensites.

4. Work through the common cases of app-not-found, app-not-executable,
app-encoutered-an-error (signal or non-zero exit) and app-didnt-return-expected
files.  Maybe I missed a few cases, but these are the main ones. Each one
should be *instantly* recognizable, and ideally we should report all these in
plain clear English that enables the user to fix them instantly.

5. Review this email thread for anything I missed in either debugging or
documentation. 

The results of fixing this bug should be:

- better error messages for the above cases
- the start of a user guide section on Debugging (use asciidoc; a new
standalobg doc that becomes a user guide chapter is fine for now).

- Mike


Hi Weiyang,

I'm cc'ing this to swift-user, where you should send all questions, so that
other Swift developers and users can offer help as well, and all users can
learn from the answers.

Its hard for me to debug this without seeing your .swift script and your log
file.

The error message means: Swift tried to run a batch job to execute an app()
call, and the attempt did not even return the per-job log file (the "wrapper
log") from the execution site (ie what you defined in your pool entry).

First, please comment out the <scratch> tag. Thats an optimization and may be
confusing things (or may even be cause of the error, but that is less likely).

The most likely cause of the problem here is that your application "touch" is
listed in your tc under the wrong pathname: touch is /bin/touch, not /usr/bin.

The best way to get a sense of how far your script progressed is to do a full
"find" under your swiftwork directory from the directory for the failing run id
(test-20110405-0940-rqa4nyka), and see what is there.

The shared/ directory should contain whatever input files were processed.  If
you did not use the <scratch> tag, then there should also be a "job directory"
for each app swift tried to run.  This is described in the user guide.

Since these were not present, I concluded that Swift could not execute your
application.

The Swift team is working to improve both these messages and the documentation
for debugging such situations to make this much easier to spot.

Also, I dont know how far your workflow ran the previous time, but thus looks
like a large run. You should test new workflows (or even any changes you make)
on very small runs first, so that there are fewer files and parallel jobs to
sort through when debugging new scripts.

- Mike


----- Original Message -----
> Hello,
> 
> 
> My swift codes just encountered new problems: After submitted jobs
> using foreach it's saying
> 
> 
> Failed to transfer wrapper log from test-20110405-0940-rqa4nyka/info/u
> on pbs
> 
> 
> My guess is there're sth wrong with pbs (the execution provider)
> 
> 
> My sites.xml and tc.data is exactly kept the same since last time you
> advised
> 
> 
> sites.xml:
> 
> 
> 
> <config>
> <pool handle="pbs">
> <execution provider="pbs" url="localhost" jobManager="local:pbs"/>
> <profile namespace="globus" key="maxwalltime">1:00:00</profile>
> <profile namespace="globus" key="workersPerNode">8</profile>
> <profile namespace="globus" key="ppn">8</profile>
> <!-- <profile namespace="globus"
> key="internalHostname">172.5.86.6</profile>-->
> <profile namespace="globus" key="nodeGranularity">1</profile>
> <profile namespace="globus" key="maxNodes">1</profile>
> <profile namespace="karajan" key="jobThrottle">1.99</profile>
> <profile namespace="karajan" key="initialScore">10000</profile>
> <profile namespace="globus" key="project">CI-SES000031</profile>
> <!--gridftp url="local://localhost" />-->
> <filesystem provider="local"/>
> <scratch>/home/frankwang/tmp</scratch>
> <workdirectory>/home/frankwang/swiftwork</workdirectory>
> </pool>
> 
> 
> </config>
> 
> pbs echo /bin/echo INSTALLED INTEL32::LINUX null
> pbs sh /bin/bash INSTALLED INTEL32::LINUX null
> pbs touch /usr/bin/touch INSTALLED INTEL32::LINUX
> GLOBUS::maxwalltime="0:1"
> 
> 
> The same problem occured when I was using various version of swift.
> 
> 
> Can you take a time to figure it out?
> 
> 
> Weiyang

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory

-- 
Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the reporter.



More information about the Swift-devel mailing list