[Swift-devel] swift pbs/beagle broken

Ketan Maheshwari ketancmaheshwari at gmail.com
Sun Nov 13 09:20:25 CST 2011


I tried with a simple /bin/date command at the end of the submit script
removing the call to worker.pl:

#CoG This script generated by CoG
#CoG   by class: class
org.globus.cog.abstraction.impl.scheduler.pbs.PBSExecutor
#CoG   on date: 2011/11/13 02:16:54

#PBS -S /bin/bash
#PBS -N Block-1113-1602
#PBS -m n
#PBS -A CI-DEB000002
#PBS -l mppwidth=3,mppnppn=1,mppdepth=24
#PBS -l walltime=00:10:00
#PBS -o /home/ketan/.globus/scripts/PBS2583661693904024220.submit.stdout
#PBS -e /home/ketan/.globus/scripts/PBS2583661693904024220.submit.stderr
WORKER_LOGGING_LEVEL=NONE
#PBS -v WORKER_LOGGING_LEVEL
cd / && aprun -n 3 -N 1 -cc none -d 24 -F exclusive /bin/sh -c /bin/date

=======

This fails too. The queue cancels the job as soon as it starts running,
without writing anything to stdout or stderr.


On Sun, Nov 13, 2011 at 12:54 AM, Michael Wilde <wilde at mcs.anl.gov> wrote:

> OK, I dont need these; I can reproduce the problem as well.
>
> For some reason, the coaster worker is exiting immediately.
>
> I see a few possibilities:
>
> - Beagle networking may have changed, making it no longer possible to
> reach the coaster service from the compute nodes using the previous IP
> address ranges.
>
> - the worker.pl script is not being created in $HOME/.globus/coasters
>
> Mike
>
>
> ----- Original Message -----
> > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > To: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
> > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > Sent: Saturday, November 12, 2011 8:39:36 PM
> > Subject: Re: [Swift-devel] swift pbs/beagle broken
> > Ketan, can you post the submit script and site file?
> >
> > On 11/12/11, Ketan Maheshwari <ketancmaheshwari at gmail.com> wrote:
> > > Hi,
> > >
> > > It seems the pbs-coaster provider (local:pbs) is broken for swift. I
> > > tried
> > > swift trunk, 0.93 svn branch, 0.93RC3 and 0.93RC4 but getting the
> > > same
> > > response:
> > >
> > > Swift svn swift-r5205 cog-r3293
> > >
> > > RunID: 20111113-0216-1d35h7eb
> > > Progress: time: Sun, 13 Nov 2011 02:16:54 +0000
> > > site setting workersPerNode has been replaced with jobsPerNode!
> > > Progress: time: Sun, 13 Nov 2011 02:17:05 +0000 Active:1
> > > Failed to transfer wrapper log for job cat-1hg8aoik
> > > Exception in cat:
> > > Arguments: [data.txt]
> > > Host: pbs
> > > Directory: catsn-20111113-0216-1d35h7eb/jobs/1/cat-1hg8aoik
> > > stderr.txt:
> > >
> > > stdout.txt:
> > >
> > > ----
> > >
> > > Caused by: Task failed: 1113-160254-000000 Block task ended
> > > prematurely
> > >
> > >
> > > Final status: time: Sun, 13 Nov 2011 02:17:05 +0000 Failed:1
> > > The following errors have occurred:
> > > 1. Task failed: 1113-160254-000000 Block task ended prematurely
> > >
> > >
> > >
> > > Trying the submit script outside of swift also does not seem to be
> > > working.
> > > The scripts get submitted to the queue and immediately exits without
> > > writing anything to stdout or stderr.
> > >
> > > Were there any recent changes that could have affected this?
> > >
> > > I remember to have tried this successfully in the last week of last
> > > month.
> > >
> > > Regards,
> > > --
> > > Ketan
> > >
> >
> > --
> > Sent from my mobile device
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
>
>


-- 
Ketan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20111113/c686ddbe/attachment.html>


More information about the Swift-devel mailing list