[Swift-devel] swift pbs/beagle broken

Michael Wilde wilde at mcs.anl.gov
Sun Nov 13 09:18:58 CST 2011


It seems that the problem is less likely to be related to network connectivity. I tested access from a compute node to a login host, and that seems to still work as required (ie both netcat and a worker.pl can connect to a login host at the 192.5.86.10X addresses). And manual coasters and worker seem to be able to connect and run jobs.

Im not sure why we are not seeing any output on the .stdout and .stderr files from the jobs that Swift is generating. Simple submit files tests have the same odd behavior. Im assuming for now that this is due to my incorrect usage or wrong assumptions rather than a PBS issue.

I'll next try to re-create the failure that the Swift-generated jobs are seeing.

- Mike


----- Original Message -----
> From: "David Kelly" <davidk at ci.uchicago.edu>
> To: "Michael Wilde" <wilde at mcs.anl.gov>
> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>, "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
> Sent: Saturday, November 12, 2011 11:23:27 PM
> Subject: Re: [Swift-devel] swift pbs/beagle broken
> I'm seeing the same thing.. coasters is immediately failing.
> 
> My first thought is that it might be some type of new network
> restriction, or possibly something with the recent authentication
> changes.
> 
> ----- Original Message -----
> > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > To: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
> > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > Sent: Sunday, November 13, 2011 12:54:56 AM
> > Subject: Re: [Swift-devel] swift pbs/beagle broken
> > OK, I dont need these; I can reproduce the problem as well.
> >
> > For some reason, the coaster worker is exiting immediately.
> >
> > I see a few possibilities:
> >
> > - Beagle networking may have changed, making it no longer possible
> > to
> > reach the coaster service from the compute nodes using the previous
> > IP
> > address ranges.
> >
> > - the worker.pl script is not being created in
> > $HOME/.globus/coasters
> >
> > Mike
> >
> >
> > ----- Original Message -----
> > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > To: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
> > > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > > Sent: Saturday, November 12, 2011 8:39:36 PM
> > > Subject: Re: [Swift-devel] swift pbs/beagle broken
> > > Ketan, can you post the submit script and site file?
> > >
> > > On 11/12/11, Ketan Maheshwari <ketancmaheshwari at gmail.com> wrote:
> > > > Hi,
> > > >
> > > > It seems the pbs-coaster provider (local:pbs) is broken for
> > > > swift.
> > > > I
> > > > tried
> > > > swift trunk, 0.93 svn branch, 0.93RC3 and 0.93RC4 but getting
> > > > the
> > > > same
> > > > response:
> > > >
> > > > Swift svn swift-r5205 cog-r3293
> > > >
> > > > RunID: 20111113-0216-1d35h7eb
> > > > Progress: time: Sun, 13 Nov 2011 02:16:54 +0000
> > > > site setting workersPerNode has been replaced with jobsPerNode!
> > > > Progress: time: Sun, 13 Nov 2011 02:17:05 +0000 Active:1
> > > > Failed to transfer wrapper log for job cat-1hg8aoik
> > > > Exception in cat:
> > > > Arguments: [data.txt]
> > > > Host: pbs
> > > > Directory: catsn-20111113-0216-1d35h7eb/jobs/1/cat-1hg8aoik
> > > > stderr.txt:
> > > >
> > > > stdout.txt:
> > > >
> > > > ----
> > > >
> > > > Caused by: Task failed: 1113-160254-000000 Block task ended
> > > > prematurely
> > > >
> > > >
> > > > Final status: time: Sun, 13 Nov 2011 02:17:05 +0000 Failed:1
> > > > The following errors have occurred:
> > > > 1. Task failed: 1113-160254-000000 Block task ended prematurely
> > > >
> > > >
> > > >
> > > > Trying the submit script outside of swift also does not seem to
> > > > be
> > > > working.
> > > > The scripts get submitted to the queue and immediately exits
> > > > without
> > > > writing anything to stdout or stderr.
> > > >
> > > > Were there any recent changes that could have affected this?
> > > >
> > > > I remember to have tried this successfully in the last week of
> > > > last
> > > > month.
> > > >
> > > > Regards,
> > > > --
> > > > Ketan
> > > >
> > >
> > > --
> > > Sent from my mobile device
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list