[Swift-devel] Falkon and Coaster support for MPI

Ioan Raicu iraicu at cs.uchicago.edu
Mon Jun 30 12:40:51 CDT 2008


Right, but in implementing the glide-in scheduler, we will face many of 
the same challenges more mature LRMs have faced when implementing MPI 
support.  After thinking about supporting MPI in Falkon further, it 
might not be as hard as I thought it would be, but it will certainly 
take at least a day of work (initial estimate, could be longer) to get 
the first prototype ready, and then lots of testing to make sure it 
works well.  Perhaps we can wait until Ben does his 1 week evaluation of 
MPI support in Coaster, and make a decision once we have a better 
understanding how much effort it would take in either Coaster or Falkon.

Ioan

Mihael Hategan wrote:
> On Mon, 2008-06-30 at 03:43 -0500, Ian Foster wrote:
>
>   
>> 5) The question has been raised of how to implement (2). One proposal  
>> is to adapt coaster to support MPI jobs. I'm a bit concerned that this  
>> could be expensive: we already have Falkon running well on BG/P, and  
>> given our other commitments to support NSF user communities, putting  
>> scarce resources into replicating that work may not be optimal.
>>
>>     
>
> As far as I understand from what Ioan says, Falkon doesn't support MPI
> jobs. I think the requirement came from Benoit's group realization that
> running applications on BG/P without MPI is rather slow. Not that I
> haven't said that.
>
> In any event, it seems like such support is necessary for achieving
> reasonable performance on BG/P. So it pretty much boils down to whether
> we want to reasonably support BG/P or not.
>
> In terms of the scheduling, what we must keep in mind is that
> coasters/glideins/falkons can be implemented in such a way as to ensure
> performance is never worse than without them. So if a 256 node job is
> submitted, requesting 256 nodes from the queuing system and making sure
> that no other job will get them before the 256 node job, while also
> submitting the job earlier if a sufficient number of nodes becomes
> available is never going to be worse than not having the mechanism. This
> also ensures that starvation does not happen unless the underlying
> system would have starved the job anyway.
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>   

-- 
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================





More information about the Swift-devel mailing list