[Swift-devel] coaster status summary

Ioan Raicu iraicu at cs.uchicago.edu
Tue Apr 8 08:31:59 CDT 2008


We use the default.  For the SiCortex, we had to tweak the TCP 
keepalives to ensure that the TCP connections were not getting 
disconnected by the firewall on the SiCortex, which only allowed 180 
seconds of inactivity before it disconnected connections.  This meant 
that any job that took more than 180 seconds, or any Falkon idleness for 
more than 180 seconds resulted in TCP connection terminations.  BTW, we 
did not experience this kind of firewall rules when running in other 
environments, so it took us a week to debug and find the root of the 
problem.  This also happens because the Falkon service was running 
outside the SiCortex home network, but we had to do this as the SiCortex 
doesn't support Java, and at the time, didn't have access to any system 
within the internal network that supported Java.

Ioan

Mihael Hategan wrote:
> Do you tweak the TCP window size or do you use the default?
>
> On Mon, 2008-04-07 at 13:18 -0500, Ioan Raicu wrote:
>   
>> I agree that the BG/P is the only system I can think of right now that
>> won't work with the UDP scheme you currently have, assuming that you
>> will run the service on a login node that has access to both compute
>> nodes and external world (i.e. Swift).  The compute nodes don't
>> support Java, so you'd have to have some C/Fortran code, or maybe some
>> scripting language (which I don't know what kind of support there is).
>> If you use C or Fortran, MPI becomes a viable alternative.  TCP has
>> always been an alternative.  Anyways, if UDP doesn't work on the BG/P,
>> and the BG/P is the only scale large enough (today) that warrants a
>> connectionless protocol, then I suggest you switch to TCP (which has
>> worked for us well on the BG/P, and is general enough to work in most
>> environments) or even MPI (but you loose the generality of TCP, but
>> might gain performance).
>>
>> Ioan
>>
>> Mihael Hategan wrote: 
>>     
>>> On Mon, 2008-04-07 at 12:49 +0000, Ben Clifford wrote:
>>>   
>>>       
>>>> Wary of excessive optimisation of job completion notification speed in 
>>>> order to get high 'trivial/useless job' numbers, when there also seem to 
>>>> be problems getting shared filesystem access fast enough for non-useless 
>>>> jobs. Getting a ridiculously high trivial job throughput is not (in my 
>>>> eyes) a design goal of this coaster work.
>>>>     
>>>>         
>>> 200 j/s should be enough for anybody.
>>>
>>> Joking aside, the issue was ability to scale to large number of jobs
>>> rather than speed. But it looks like the issue is only an issue for
>>> monsters such as the BG/P.
>>>
>>>   
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>   
>>>       
>> -- 
>> ===================================================
>> Ioan Raicu
>> Ph.D. Candidate
>> ===================================================
>> Distributed Systems Laboratory
>> Computer Science Department
>> University of Chicago
>> 1100 E. 58th Street, Ryerson Hall
>> Chicago, IL 60637
>> ===================================================
>> Email: iraicu at cs.uchicago.edu
>> Web:   http://www.cs.uchicago.edu/~iraicu
>> http://dev.globus.org/wiki/Incubator/Falkon
>> http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
>> ===================================================
>> ===================================================
>>
>>     
>
>
>   

-- 
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================





More information about the Swift-devel mailing list