[Swift-devel] Re: [VDL2-user] GridFTP timeout exception

Tiberiu Stef-Praun tiberius at ci.uchicago.edu
Mon Feb 26 14:00:08 CST 2007


I fixed that, I am getting back some of the results.
Aparently the wf is stuck at the point where it needs to delete the
remote files
Although that might not be the actual root of all evils, because when
running on a single site (teraport), several iterations of sets of
jobs were sent out before the wf stopped completely.



On 2/26/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> Ip address maybe?
>
> On Mon, 2007-02-26 at 13:53 -0600, Tiberiu Stef-Praun wrote:
> > How do I know that a GridFTP client timeout occurs ?
> > It seems that my SIDGrid workflow has stopped performing.
> > Normally it should process 200 parallel tasks, each of which producess
> > 1GB  in 28 files.
> >
> > I am testing with 0.1 rc 1, but the same has happened to me before
> > (SVn checkout, v0.rc3,etc).
> >
> > The workflow freezes after a while (after processing the first round
> > of jobs submitted? = I received 16G and that's it). The current
> > scenario is me using 3 teragrid sites (UC, Purdue, NCSA), but the same
> > behavior (workflow freeze) happened when I ran the workflow on
> > teraport only. Since It hang, I was always forced to terminate it, so
> > we never had a full SIDGrid run.
> >
> > Any suggestions ?
> >
> > BTW, the NCSA problem is a non-issue, I solved it.  The only other
> > small issue is taking full advantage of all the sites in the
> > sites.xml. And the big issue is what I listed above.
> >
> > Next I will try running the workflow fully at the UC teragrid site.
> >
> > Tibi
> >
> >
> > On 2/26/07, Ben Clifford <benc at hawaga.org.uk> wrote:
> > >
> > > On Mon, 26 Feb 2007, Mihael Hategan wrote:
> > >
> > > > On Fri, 2007-02-23 at 16:58 +0000, Ben Clifford wrote:
> > > > >
> > > > > On Fri, 23 Feb 2007, Mihael Hategan wrote:
> > > > >
> > > > > > Since this is non-functional (failing to shut down a GridFTP client
> > > > > > that's not in use any more), I think the message could be moved to info,
> > > > > > and the stack trace to debug.
> > > > >
> > > > > sounds good.
> > > >
> > > > Seems like it was at info for about a month now. I split it however to
> > > > only log the exception in debug.
> > >
> > > ok. I guess this error message came from something like 0rc3 as chad
> > > reported it originally I think.
> > > --
> > >
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > >
> >
> >
>
>


-- 
Tiberiu (Tibi) Stef-Praun, PhD
Research Staff, Computation Institute
5640 S. Ellis Ave, #405
University of Chicago
http://www-unix.mcs.anl.gov/~tiberius/



More information about the Swift-devel mailing list