[Swift-devel] Update on Teraport problems with wavlet workflow

Tiberiu Stef-Praun tiberius at ci.uchicago.edu
Wed Feb 28 12:21:37 CST 2007


Nothing gets generated in the individual job's temporary directories.
There is no kickstart record.
It would be really useful finding out the hostname of the node on
which these jobs ran.

Let me retry some more workflow runs.

On 2/28/07, Ben Clifford <benc at hawaga.org.uk> wrote:
>
>
> On Wed, 28 Feb 2007, Ben Clifford wrote:
>
> > do you have kickstart records for the jobs that are failing?
>
> if you do, then:
>
> > > Summary/Speculation: bad teraport node causes job to be declared as
> > > done even though the execution failed
>
> this speculation can be investigated further by:
>
> finding a job that breaks. finding the node name from the kickstart
> record. grepping all the kickstart records to find other kickstart records
> for those jobs. looking to see if they all fail, or if some work and some
> fail. then report back findings here.
>
> --
>


-- 
Tiberiu (Tibi) Stef-Praun, PhD
Research Staff, Computation Institute
5640 S. Ellis Ave, #405
University of Chicago
http://www-unix.mcs.anl.gov/~tiberius/



More information about the Swift-devel mailing list