[Swift-devel] coaster status summary

Sat Apr 5 08:36:18 CDT 2008


Mihael Hategan wrote:
> On Fri, 2008-04-04 at 19:02 -0500, Ioan Raicu wrote:
>   
>> You say that you use UDP on the workers.  This might be more light
>> weight, but might also pose practical issues.  
>>     
>
> Of course. That is the trade-off.
>
>   
Right, there will be, the key is to be able to switch between TCP and 
UDP easily.
>> Some of those are:
>> - might not work well on any network other than a LAN
>>     
>
> It works exactly as it's supposed to: no guarantee of uniqueness, no
> guarantee of order, no guarantee of integrity, and no guarantee of
> reliability. One has to drop duplicates, do checksums, re-order, have
> time-outs.
>
>   
>> - won't be friendly to firewalls or NATs, no matter if you the service
>> pushes jobs, or workers pull jobs; the logic is that you need 2 way
>> communication, and using UDP (being a connectionless protocol), its
>> like having a server socket and a client socket on both ends of the
>> communication at the same time.
>>     
>
> Precisely so. In Java you can use one UDP socket as both client and
> server. 
But even if the abstraction is OK and allows you to use the same socket 
for both reads and writes, that doesn't mean that the NAT will actually 
set up the coresponding entries for you to have 2-way communication.  
With TCP, given the connection oriented protocol, NATs are fine as long 
as one initiates the connection from the inside the NAT, but with UDP, 
you will only be able to have outgoing messages, but incoming messages 
to the NAT will not have the rules setup.  The only way I could see UDP 
working through the NAT is to have static rules setup ahead of time, 
that map between some PORTs on the NAT and IP:PORT on the compute nodes....
> Perl seems to be nastier as it won't let you send and receive on
> the same socket (at least in the implementation I've seen).
>
>   
>>   This might not matter if the service and the worker are on the same
>> LAN with no NATs or firewalls in the middle, but, it would matter on a
>> machine such as the BG/P, as there is a NAT inbetween the login nodes
>> and the compute nodes.
>>     
>
> That's odd. Do you have anything to back that up?
>
>   
Compute nodes have a private address per P-SET (64 nodes), and there are 
16 P-SETs in the current machine we use, and there will be 640 P-SETs in 
the final machine.  The I/O nodes (1 per P-SET) act as a NAT and have 
network connectivity on both public and private networks, and the login 
nodes only have access to the public network.  This has been our 
experience for the past 2 months in using TCP/IP on the BG/P.  Zhao, if 
you have anything else to add (especially links to docs confirming what 
I just said), please do so.
>>   In essence, for this to work on the BG/P, you'll need to avoid
>> having server side sockets on the compute nodes (workers), and you'll
>> probably only be able to do that via a connection oriented protocol
>> (i.e. TCP).  Is switching to TCP a relatively straight forward option?
>> If not, it might be worth implementing to make the implementation more
>> flexible
>> - loosing messages and recovering from them will likely be harder than
>> anticipated; I have a UDP version of the notification engine that
>> Falkon uses, and after much debugging, I gave up and switched over to
>> TCP.  It worked most of the time, but the occasional lost message (1
>> in 1000s, maybe even more rare) made Falkon unreliable, and hence I
>> stopped using it.
>>     
>
> Of course it's unreliable unless you deal with the reliability issues as
> outlined above.
>   
I did deal with them, duplicates, out of order, retries, timeouts, 
etc... yet, I still couldn't get a 100% reliable implementation, and I 
gave up... in theory, UDP should work given that you deal with all the 
reliability issues you outlined.  I am just pointing out that after lots 
of debugging, I gave in and swapped UDP for TCP to avoid the unexplained 
lost message once in a while.  I am positive it was a bug in my code, so 
perhaps you'll have better luck!
>   
>> Is the 180 tasks/sec the overall throughput measured from Swift's
>> point of view, including overhead of wrapper.sh?  Or is that a
>> micro-benchmark measuring just the coaster performance?  
>>     
>
> It's at the provider level. No wrapper.sh.
>   
OK, great!

Ioan
>   
>> Ioan
>>
>>
>> Mihael Hategan wrote: 
>>     
>>> On Fri, 2008-04-04 at 06:59 -0500, Michael Wilde wrote:
>>>   
>>>       
>>>> Mihael, this is great progress - very exciting.
>>>> Some questions (dont need answers right away):
>>>>
>>>> How would the end user use it? Manually start a service?
>>>> Is the service a separate process, or in the swift jvm?
>>>>     
>>>>         
>>> I though the lines below answered some of these.
>>>
>>> A user would specify the coaster provider in sites.xml. The provider
>>> will then automatically deploy a service on the target machine without
>>> the user having to do so. Given that the service is on a different
>>> machine than the client, they can't be in the same JVM.
>>>
>>>   
>>>       
>>>> How are the number of workers set or adjusted?
>>>>     
>>>>         
>>> Currently workers are requested as much as needed, up to a maximum. This
>>> is preliminary hence "Better allocation strategy for workers".
>>>
>>>   
>>>       
>>>> Does a service manage workers on one cluster or many?
>>>>     
>>>>         
>>> One service per cluster.
>>>
>>>   
>>>       
>>>> At 180 jobs/sec with 10 workers, what were the CPU loads on swift, 
>>>> worker and service?
>>>>     
>>>>         
>>> I faintly recall them being at less than 50% for some reason I don't
>>> understand.
>>>
>>>   
>>>       
>>>> Do you want to try this on the workflows we're running on Falkon on the 
>>>> BGP and SiCortex?
>>>>     
>>>>         
>>> Let me repeat "prototype" and "more testing". In no way do I want to do
>>> preliminary testing with an application that is shaky on an architecture
>>> that is also shaky.
>>>
>>> Mihael
>>>
>>>   
>>>       
>>>> Im eager to try it when you feel its ready for others to test.
>>>>
>>>> Nice work!
>>>>
>>>> - Mike
>>>>
>>>>
>>>>
>>>> On 4/4/08 4:39 AM, Mihael Hategan wrote:
>>>>     
>>>>         
>>>>> I've been asked for a summary of the status of the coaster prototype, so
>>>>> here it is:
>>>>> - It's a prototype so bugs are plenty
>>>>> - It's self deployed (you don't need to start a service on the target
>>>>> cluster)
>>>>> - You can also use it while starting a service on the target cluster
>>>>> - There is a worker written in Perl
>>>>> - It uses encryption between client and coaster service
>>>>> - It uses UDP between the service and the workers (this may prove to be
>>>>> better or worse choice than TCP)
>>>>> - A preliminary test done locally shows an amortized throughput of
>>>>> around 180 jobs/s (/bin/date). This was done with encryption and with 10
>>>>> workers. Pretty picture attached (total time vs. # of jobs)
>>>>>
>>>>> To do:
>>>>> - The scheduling algorithm in the service needs a bit more work
>>>>> - When worker messages are lost, some jobs may get lost (i.e. needs more
>>>>> fault tolerance)
>>>>> - Start testing it on actual clusters
>>>>> - Do some memory consumption benchmarks
>>>>> - Better allocation strategy for workers
>>>>>
>>>>> Mihael
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Swift-devel mailing list
>>>>> Swift-devel at ci.uchicago.edu
>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>       
>>>>>           
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>   
>>>       
>> -- 
>> ===================================================
>> Ioan Raicu
>> Ph.D. Candidate
>> ===================================================
>> Distributed Systems Laboratory
>> Computer Science Department
>> University of Chicago
>> 1100 E. 58th Street, Ryerson Hall
>> Chicago, IL 60637
>> ===================================================
>> Email: iraicu at cs.uchicago.edu
>> Web:   http://www.cs.uchicago.edu/~iraicu
>> http://dev.globus.org/wiki/Incubator/Falkon
>> http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
>> ===================================================
>> ===================================================
>>
>>     
>
>
>   

-- 
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================