[Swift-devel] Re: scheduling

Michael Wilde wilde at mcs.anl.gov
Wed Mar 31 12:25:38 CDT 2010


Yeah, I agree with you on keeping scheduling decisions out of mappers.

But mappers could/should, I think, still be involved, if only to the extent that the name space they would work on in a replica-based environment would be the logical namespace of some (abstract) replica catalog (which is itself another mapping).

----- "Mihael Hategan" <hategan at mcs.anl.gov> wrote:

> A few comments:
> 
> I think we need to simplify the model here, because it's already hard
> to
> reason about. I think we also need to keep it cohesive. So whether
> information provided by the mappers is used for scheduling, no
> scheduling decisions (logic) should be in mappers.
> 
> On Wed, 2010-03-31 at 07:28 -0500, wilde at mcs.anl.gov wrote:
> > To think through this question it helps to define the actors that
> are
> > involved:
> > 
> > - input mappers, which have the opportunity to do their mapping from
> a
> > replica catalog
> 
> Again, I do not believe that mappers should play a role in scheduling
> because without other knowledge about sites they would do a poor job
> at
> making decisions about where to run things. And if they made
> scheduling
> decisions, then they would be logically part of the scheduler which
> sounds fishy.

This would not be a scheduling decision. By this part I meant that rather than returning physical filenames as a mapping, the mapper would return a logical file name (using the terminology of say the RLS).
> 
> > 
> > - the site selector in Swift's scheduler, which could factor into
> its
> > criteria where the data needed by an application lives (and perhaps
> > where the output must go, or where the user prefers that it go)
> > 
> > - the Swift execution wrapper _swiftwrap and/or the post-execution
> > logic and cache management logic in Swift which can influence
> whether
> > and how long a file stays in the site shared/ cache
> 
> Assuming that there is a shared cache. The trend we're seeing with
> EAGER
> is that there is no shared directory any more, which was used solely
> as
> a caching mechanism due to the fact that the client did the staging.

That makes sense for input data products. Output data products, though, could reasonably be copied from some local work directory to some locally accessible file cache, as well as to any remote file cache. 

There's also cases where neither the input nor output file *fits* into such a local cache. So both CDM (what you called EAGER) and a future replication model may need to address local and/or distributed persistent storage caches.

- Mike

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list