[Swift-devel] [Bug 210] job exceeding wallclock limit -- error is not reported by swift
bugzilla-daemon at mcs.anl.gov
bugzilla-daemon at mcs.anl.gov
Tue Jul 14 06:29:00 CDT 2009
https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=210
--- Comment #1 from Ben Clifford <benc at hawaga.org.uk> 2009-07-14 06:29:00 ---
This bug is rather ambiguously described.
In non-bugzilla discussion it has been reported as:
> well, for some reason, when a job hits wallclock and is killed by the JM, swift just keeps saying "active"
This is not behaviour that I observe with Swift against NCSA using the below
swiftscript and configuration using Swift swift-r3006 cog-r2430 - in such case,
I see the job fail three times in a row and then the example SwiftScript fails
as should happen.
Please clarify this bug.
s.swift:
$ cat s.swift
type messagefile;
app (messagefile t) greeting() {
sleep "999s" stdout=@filename(t);
}
messagefile outfile <"hello.txt">;
outfile = greeting();
tc.data:
$ cat tc.data
cat: tc.data: No such file or directory
benc at communicado:~/tmp-walltime/cog/modules/swift !1055
$ cat dist/swift-svn/etc/tc.data
#This is the transformation catalog.
#
#It comes pre-configured with a number of simple transformations with
#paths that are likely to work on a linux box. However, on some systems,
#the paths to these executables will be different (for example, sometimes
#some of these programs are found in /usr/bin rather than in /bin)
#
#NOTE WELL: fields in this file must be separated by tabs, not spaces; and
#there must be no trailing whitespace at the end of each line.
#
# sitename transformation path INSTALLED platform profiles
hg echo /bin/echo INSTALLED INTEL32::LINUX null
hg cat /bin/cat INSTALLED INTEL32::LINUX null
hg ls /bin/ls INSTALLED INTEL32::LINUX null
hg grep /bin/grep INSTALLED INTEL32::LINUX null
hg sort /bin/sort INSTALLED INTEL32::LINUX null
hg sleep /bin/sleep INSTALLED INTEL32::LINUX null
site definition:
<pool handle="hg" >
<gridftp url="gsiftp://grid-hg.ncsa.teragrid.org" />
<jobmanager universe="vanilla"
url="grid-hg.ncsa.teragrid.org/jobmanager-pbs
" major="2" />
<workdirectory >/home/ac/benc</workdirectory>
<profile namespace="globus" key="queue">debug</profile>
<profile namespace="globus" key="maxwalltime">1</profile>
</pool>
the output:
Swift svn swift-r3006 cog-r2430
RunID: 20090714-0616-dgktv8b3
Progress:
Progress: Stage in:1
Progress: Submitted:1
Progress: Submitted:1
Progress: Submitted:1
Progress: Active:1
Progress: Active:1
Progress: Active:1
Progress: Active:1
Progress: Checking status:1
Progress: Stage in:1
Progress: Submitted:1
Progress: Submitted:1
Progress: Active:1
Progress: Active:1
Progress: Active:1
Progress: Checking status:1
Progress: Submitted:1
Progress: Submitted:1
Progress: Submitted:1
Progress: Active:1
Progress: Active:1
Progress: Active:1
Progress: Checking status:1
Execution failed:
Exception in sleep:
Arguments: [999s]
Host: hg
Directory: s-20090714-0616-dgktv8b3/jobs/8/sleep-8h82cndj
stderr.txt:
stdout.txt:
----
Caused by:
No status file was found. Check the shared filesystem on hg
--
Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.
More information about the Swift-devel
mailing list