[Swift-devel] [Bug 210] job exceeding wallclock limit -- error is not reported by swift

bugzilla-daemon at mcs.anl.gov bugzilla-daemon at mcs.anl.gov
Tue Jul 14 06:29:00 CDT 2009


https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=210





--- Comment #1 from Ben Clifford <benc at hawaga.org.uk>  2009-07-14 06:29:00 ---
This bug is rather ambiguously described.

In non-bugzilla discussion it has been reported as:

> well, for some reason, when a job hits wallclock and is killed by the JM, swift just keeps saying "active"

This is not behaviour that I observe with Swift against NCSA using the below
swiftscript and configuration using Swift swift-r3006 cog-r2430 - in such case,
I see the job fail three times in a row and then the example SwiftScript fails
as should happen.

Please clarify this bug.

s.swift:

$ cat s.swift
type messagefile;

app (messagefile t) greeting() { 
   sleep "999s" stdout=@filename(t);
}

messagefile outfile <"hello.txt">;

outfile = greeting();



tc.data:

$ cat tc.data
cat: tc.data: No such file or directory
benc at communicado:~/tmp-walltime/cog/modules/swift  !1055 
$ cat dist/swift-svn/etc/tc.data 
#This is the transformation catalog.
#
#It comes pre-configured with a number of simple transformations with
#paths that are likely to work on a linux box. However, on some systems,
#the paths to these executables will be different (for example, sometimes
#some of these programs are found in /usr/bin rather than in /bin)
#
#NOTE WELL: fields in this file must be separated by tabs, not spaces; and
#there must be no trailing whitespace at the end of each line.
#
# sitename  transformation  path   INSTALLED  platform  profiles
hg     echo         /bin/echo    INSTALLED    INTEL32::LINUX    null
hg     cat         /bin/cat    INSTALLED    INTEL32::LINUX    null
hg     ls         /bin/ls        INSTALLED    INTEL32::LINUX    null
hg     grep         /bin/grep    INSTALLED    INTEL32::LINUX    null
hg     sort         /bin/sort    INSTALLED    INTEL32::LINUX    null
hg     sleep         /bin/sleep    INSTALLED    INTEL32::LINUX    null


site definition:

<pool handle="hg" >
    <gridftp  url="gsiftp://grid-hg.ncsa.teragrid.org" />
    <jobmanager universe="vanilla"
url="grid-hg.ncsa.teragrid.org/jobmanager-pbs
" major="2" /> 
    <workdirectory >/home/ac/benc</workdirectory>
    <profile namespace="globus" key="queue">debug</profile>
    <profile namespace="globus" key="maxwalltime">1</profile>
</pool>


the output:

Swift svn swift-r3006 cog-r2430

RunID: 20090714-0616-dgktv8b3
Progress:
Progress:  Stage in:1
Progress:  Submitted:1
Progress:  Submitted:1
Progress:  Submitted:1
Progress:  Active:1
Progress:  Active:1
Progress:  Active:1
Progress:  Active:1
Progress:  Checking status:1
Progress:  Stage in:1
Progress:  Submitted:1
Progress:  Submitted:1
Progress:  Active:1
Progress:  Active:1
Progress:  Active:1
Progress:  Checking status:1
Progress:  Submitted:1
Progress:  Submitted:1
Progress:  Submitted:1
Progress:  Active:1
Progress:  Active:1
Progress:  Active:1
Progress:  Checking status:1
Execution failed:
    Exception in sleep:
Arguments: [999s]
Host: hg
Directory: s-20090714-0616-dgktv8b3/jobs/8/sleep-8h82cndj
stderr.txt: 
stdout.txt: 
----

Caused by:
    No status file was found. Check the shared filesystem on hg

-- 
Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.



More information about the Swift-devel mailing list