[Swift-devel] coaster status summary
Mihael Hategan
hategan at mcs.anl.gov
Tue Apr 8 10:00:48 CDT 2008
You may want to try lowering the window size. The default is in the
order of 100K (as far as I understand from various sources). That may be
quite a bit if you have many connections. It may also be fairly useless
for local LAN connections used to send short messages (i.e. less than
the MTU/MSS).
On Tue, 2008-04-08 at 08:31 -0500, Ioan Raicu wrote:
> We use the default. For the SiCortex, we had to tweak the TCP
> keepalives to ensure that the TCP connections were not getting
> disconnected by the firewall on the SiCortex, which only allowed 180
> seconds of inactivity before it disconnected connections. This meant
> that any job that took more than 180 seconds, or any Falkon idleness for
> more than 180 seconds resulted in TCP connection terminations. BTW, we
> did not experience this kind of firewall rules when running in other
> environments, so it took us a week to debug and find the root of the
> problem. This also happens because the Falkon service was running
> outside the SiCortex home network, but we had to do this as the SiCortex
> doesn't support Java, and at the time, didn't have access to any system
> within the internal network that supported Java.
>
> Ioan
>
> Mihael Hategan wrote:
> > Do you tweak the TCP window size or do you use the default?
> >
> > On Mon, 2008-04-07 at 13:18 -0500, Ioan Raicu wrote:
> >
> >> I agree that the BG/P is the only system I can think of right now that
> >> won't work with the UDP scheme you currently have, assuming that you
> >> will run the service on a login node that has access to both compute
> >> nodes and external world (i.e. Swift). The compute nodes don't
> >> support Java, so you'd have to have some C/Fortran code, or maybe some
> >> scripting language (which I don't know what kind of support there is).
> >> If you use C or Fortran, MPI becomes a viable alternative. TCP has
> >> always been an alternative. Anyways, if UDP doesn't work on the BG/P,
> >> and the BG/P is the only scale large enough (today) that warrants a
> >> connectionless protocol, then I suggest you switch to TCP (which has
> >> worked for us well on the BG/P, and is general enough to work in most
> >> environments) or even MPI (but you loose the generality of TCP, but
> >> might gain performance).
> >>
> >> Ioan
> >>
> >> Mihael Hategan wrote:
> >>
> >>> On Mon, 2008-04-07 at 12:49 +0000, Ben Clifford wrote:
> >>>
> >>>
> >>>> Wary of excessive optimisation of job completion notification speed in
> >>>> order to get high 'trivial/useless job' numbers, when there also seem to
> >>>> be problems getting shared filesystem access fast enough for non-useless
> >>>> jobs. Getting a ridiculously high trivial job throughput is not (in my
> >>>> eyes) a design goal of this coaster work.
> >>>>
> >>>>
> >>> 200 j/s should be enough for anybody.
> >>>
> >>> Joking aside, the issue was ability to scale to large number of jobs
> >>> rather than speed. But it looks like the issue is only an issue for
> >>> monsters such as the BG/P.
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Swift-devel mailing list
> >>> Swift-devel at ci.uchicago.edu
> >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>
> >>>
> >>>
> >> --
> >> ===================================================
> >> Ioan Raicu
> >> Ph.D. Candidate
> >> ===================================================
> >> Distributed Systems Laboratory
> >> Computer Science Department
> >> University of Chicago
> >> 1100 E. 58th Street, Ryerson Hall
> >> Chicago, IL 60637
> >> ===================================================
> >> Email: iraicu at cs.uchicago.edu
> >> Web: http://www.cs.uchicago.edu/~iraicu
> >> http://dev.globus.org/wiki/Incubator/Falkon
> >> http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
> >> ===================================================
> >> ===================================================
> >>
> >>
> >
> >
> >
>
More information about the Swift-devel
mailing list