[Swift-devel] Re: [VDL2-user] GridFTP timeout exception

Tiberiu Stef-Praun tiberius at ci.uchicago.edu
Mon Feb 26 14:22:48 CST 2007


There is a limit on attachments on the mailing list, so you can see
the log here:
http://teraport.uchicago.edu/~tiberius/sid-wf-pers-channel-type-rcm0oqd5bk4l1.log


On 2/26/07, Tiberiu Stef-Praun <tiberius at ci.uchicago.edu> wrote:
> Here it is, attached.
>
> I run the command this way:
> swift -d -v sid-wf-pers-channel-type.dtm
> on teraport in /home/tiberius/scratch
>
>
> On 2/26/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> > Well, post the log.
> >
> > On Mon, 2007-02-26 at 14:00 -0600, Tiberiu Stef-Praun wrote:
> > > I fixed that, I am getting back some of the results.
> > > Aparently the wf is stuck at the point where it needs to delete the
> > > remote files
> > > Although that might not be the actual root of all evils, because when
> > > running on a single site (teraport), several iterations of sets of
> > > jobs were sent out before the wf stopped completely.
> > >
> > >
> > >
> > > On 2/26/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> > > > Ip address maybe?
> > > >
> > > > On Mon, 2007-02-26 at 13:53 -0600, Tiberiu Stef-Praun wrote:
> > > > > How do I know that a GridFTP client timeout occurs ?
> > > > > It seems that my SIDGrid workflow has stopped performing.
> > > > > Normally it should process 200 parallel tasks, each of which producess
> > > > > 1GB  in 28 files.
> > > > >
> > > > > I am testing with 0.1 rc 1, but the same has happened to me before
> > > > > (SVn checkout, v0.rc3,etc).
> > > > >
> > > > > The workflow freezes after a while (after processing the first round
> > > > > of jobs submitted? = I received 16G and that's it). The current
> > > > > scenario is me using 3 teragrid sites (UC, Purdue, NCSA), but the same
> > > > > behavior (workflow freeze) happened when I ran the workflow on
> > > > > teraport only. Since It hang, I was always forced to terminate it, so
> > > > > we never had a full SIDGrid run.
> > > > >
> > > > > Any suggestions ?
> > > > >
> > > > > BTW, the NCSA problem is a non-issue, I solved it.  The only other
> > > > > small issue is taking full advantage of all the sites in the
> > > > > sites.xml. And the big issue is what I listed above.
> > > > >
> > > > > Next I will try running the workflow fully at the UC teragrid site.
> > > > >
> > > > > Tibi
> > > > >
> > > > >
> > > > > On 2/26/07, Ben Clifford <benc at hawaga.org.uk> wrote:
> > > > > >
> > > > > > On Mon, 26 Feb 2007, Mihael Hategan wrote:
> > > > > >
> > > > > > > On Fri, 2007-02-23 at 16:58 +0000, Ben Clifford wrote:
> > > > > > > >
> > > > > > > > On Fri, 23 Feb 2007, Mihael Hategan wrote:
> > > > > > > >
> > > > > > > > > Since this is non-functional (failing to shut down a GridFTP client
> > > > > > > > > that's not in use any more), I think the message could be moved to info,
> > > > > > > > > and the stack trace to debug.
> > > > > > > >
> > > > > > > > sounds good.
> > > > > > >
> > > > > > > Seems like it was at info for about a month now. I split it however to
> > > > > > > only log the exception in debug.
> > > > > >
> > > > > > ok. I guess this error message came from something like 0rc3 as chad
> > > > > > reported it originally I think.
> > > > > > --
> > > > > >
> > > > > > _______________________________________________
> > > > > > Swift-devel mailing list
> > > > > > Swift-devel at ci.uchicago.edu
> > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>
> --
> Tiberiu (Tibi) Stef-Praun, PhD
> Research Staff, Computation Institute
> 5640 S. Ellis Ave, #405
> University of Chicago
> http://www-unix.mcs.anl.gov/~tiberius/
>
>


-- 
Tiberiu (Tibi) Stef-Praun, PhD
Research Staff, Computation Institute
5640 S. Ellis Ave, #405
University of Chicago
http://www-unix.mcs.anl.gov/~tiberius/



More information about the Swift-devel mailing list