[Swift-devel] coaster status summary

Ioan Raicu iraicu at cs.uchicago.edu
Sat Apr 5 08:47:09 CDT 2008



Mihael Hategan wrote:
>>>   This might not matter if the service and the worker are on the same
>>> LAN with no NATs or firewalls in the middle, but, it would matter on a
>>> machine such as the BG/P, as there is a NAT inbetween the login nodes
>>> and the compute nodes.
>>>       
>> That's odd. Do you have anything to back that up?
>>
>>     
>
> Really really odd. I mean MPI has to work between any two worker nodes.
> If they are on separate networks with NAT in-between, this would be
> rather difficult.
>   
MPI doesn't use the Ethernet network.  There are 5 networks to choose 
from (Torus, Tree, Barrier, RAS, 10Gig Ethernet), and I bet the NAT is 
only on one of them.  However, the Ethernet network is important, 
because we want to use TCP/UDP/IP so we can leverage code and systems 
that work in a typical Linux environment that traditionally only has 
Ethernet networks.  So, if you are willing to use MPI to communicate 
between service and workers, then you will likely not have to deal with 
a NAT.  However, then this might limit the generality of the 
implementation, as some Linux clusters might not have the necessary MPI 
packages installed.  The middle ground that we found useful, use TCP, 
and initiate all communication from the workers; this approach has 
worked for us great so far!  We have been able to scale on the BG/P to 
4K workers, and on the SiCortex with 5.8K workers.  I expect our current 
TCP-based implementation to scale to at least 10K workers per service, 
maybe more.  More testing is needed to find the upper bound of how many 
workers we can manage with the current login nodes memory capacity (4GB) 
and the quad-cpu systems we have. 

Ioan
>   

-- 
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================





More information about the Swift-devel mailing list