From benc at hawaga.org.uk  Tue May  1 02:21:34 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 1 May 2007 07:21:34 +0000 (GMT)
Subject: [Swift-devel] LQCD mapping
Message-ID: <Pine.LNX.4.64.0705010708480.3117@dildano.hawaga.org.uk>


yesterday evening I played some with Nika trying to get her LQCD workflow 
running some more.

It involved one code change to swift:

I put a 'create' option on the filesys_mapper so that one can do this:

  file lattice[] <filesys_mapper;prefix="lattice.",create=true>;
  foreach i in range {
    int j=i-1;
    lattice[i] = lqcd_exec(test_in,lattice[j]);
  }

where lattice.* files don't exist, so that lattice[5] will map to 
"lattice.5". With create=false (the default) then the mapper behaves as 
before, which seems to be essentially an input-only mode where it creates 
an array based on existing files.

I think this is the mapping functionality that I want, but its not clear 
to me whether filesys_mapper is the place for it, whether one of the other 
mappers already does this, or if it should go in a different place 
(another mapper or a new mapper). comments?

-- 


From itf at mcs.anl.gov  Tue May  1 02:47:03 2007
From: itf at mcs.anl.gov (itf at mcs.anl.gov)
Date: Tue, 1 May 2007 02:47:03 -0500 (CDT)
Subject: [Swift-devel] LQCD mapping
In-Reply-To: <Pine.LNX.4.64.0705010708480.3117@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705010708480.3117@dildano.hawaga.org.uk>
Message-ID: <3303.84.56.24.39.1178005623.squirrel@www-unix.mcs.anl.gov>

In this case, it seems that you know the number oif files to be created
ahead of time. Should that information be specified in the definition (the
first line)?


> yesterday evening I played some with Nika trying to get her LQCD workflow
> running some more.
>
> It involved one code change to swift:
>
> I put a 'create' option on the filesys_mapper so that one can do this:
>
>   file lattice[] <filesys_mapper;prefix="lattice.",create=true>;
>   foreach i in range {
>     int j=i-1;
>     lattice[i] = lqcd_exec(test_in,lattice[j]);
>   }
>
> where lattice.* files don't exist, so that lattice[5] will map to
> "lattice.5". With create=false (the default) then the mapper behaves as
> before, which seems to be essentially an input-only mode where it creates
> an array based on existing files.
>
> I think this is the mapping functionality that I want, but its not clear
> to me whether filesys_mapper is the place for it, whether one of the other
> mappers already does this, or if it should go in a different place
> (another mapper or a new mapper). comments?
>
> --
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>


From benc at hawaga.org.uk  Tue May  1 04:00:38 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 1 May 2007 09:00:38 +0000 (GMT)
Subject: [Swift-devel] Re: LQCD mapping
In-Reply-To: <Pine.LNX.4.64.0705010708480.3117@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705010708480.3117@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.64.0705010858140.3117@dildano.hawaga.org.uk>


also, a couple of bugs I filed from this:

bug 54: expressions inside array indices do not work

and

bug 55: workflow hangs when accessing uninitialised array member

The problem with bug 54 has been discussed here already and I think is 
relatively straight forward to fix (though that bit of the code is a bit 
tangly); Bug 55 - perhaps some kind of deadlock detection necessary. I 
don't really know, but its a bad user experience at the moment.

Neither of them are on the 0.2 feature list, though they should be fixed 
sooner rather than later.

-- 


From benc at hawaga.org.uk  Tue May  1 04:36:58 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 1 May 2007 09:36:58 +0000 (GMT)
Subject: [Swift-devel] remote file/directory stuff (bug 22) (fwd)
Message-ID: <Pine.LNX.4.64.0705010925500.5347@dildano.hawaga.org.uk>


there was a thread a couple of months (almost to the day) about this 0.2 
feature.

I've done some work towards implementing this already, and I'm going to do 
some more now.

I'm planning on implementing the syntax Yong discusses below. I hit the 
problem that motivated this again yesterday, working with Nika.

I'm not sure what remote mapping should really look like - pretty much 
everyone seems in agreement about the general concept, with various 
differences, so I think it will be useful to actually have an 
implementation of *something* to get a practical feel, rather than our 
previous interminable discussion threads.

---------- Forwarded message ----------
Date: Sat, 3 Mar 2007 01:37:42 +0000 (GMT)
From: Ben Clifford <benc at hawaga.org.uk>
To: swft at ci.uchicago.edu
Subject: [Swft] Re: [Swift-devel] remote file/directory stuff (bug 22)


So the message below can be the beginnings of campaign definition for bug 
22 - 'execute-side protomappers'.

I think in terms of work, me (or Yong if he wants to) needs to implement 
the swiftscript->kml bit to generate the kml syntax that Mihael suggested 
in an earlier message; and then Mihael should go from there to make it 
happen inside the execution engine.

On Fri, 2 Mar 2007, Ben Clifford wrote:

> 
> 
> On Fri, 2 Mar 2007, Yong Zhao wrote:
> 
> > Can you elaborate on this issue a little bit so that we can make a
> > unanimous decision:
> > 
> > 1. what was the problem exactly
> 
> Some programs that we run in swift do not use the traditional VDS-like API 
> of being told on the commandline the names of the files that they must 
> input and output to. Instead, they make up some of the names themselves.
> 
> For example, one of Nika's programs has the syntax:
> 
>  ./program inputfilename
> 
> and places its outputs in inputfile.stuff, inputfile.abc, inputfile.foo
> 
> > 2. what are you proposing
> 
> To extend the syntax of the app {} block to permit specification of the 
> above interface, with a syntax something like:
> 
> (stuffoutfile s, abcoutfile a, foooutfile f) myproc(inputfile i)
>   app { 
>     program @i;
>     s < @strcat(@inputfile,".stuff")
>     a < @strcat(@inputfile,".abc")
>     f < @strcat(@inputfile,".foo")
>   }
>  }
> 
> Meaning that rather than Swift specifying the remote name for s, a and f, 
> instead the app block specifies where those three files are.
> 
> These will be staged back into the submit-side location defined in the 
> existing mappers.
> 
> > 3. to what extent does the proposal solve the problem
> 
> It should solve Nika's immediate problem, I think.
> 
> > 4. what is the implication to the mapping interface
> 
> A longer term perspective is that this is the beginning of longer work to 
> implement fuller execute-side mappers (which have also been called 
> application mappers in some threads).
> 
> So it is mapping, but on the execute side. It fits in in a fairly 
> straightforward way with mapping on the submit side, which is what we have 
> now.
> 
> Submit side mapping maps between submit-side data and SwiftScript 
> variables/structures, so that the user can arrange his submit-side data in 
> a way that he wants (rather than swift compelling it to be in a particular 
> format)
> 
> Execute side mapping maps between SwiftScript variables and execute side 
> data, so that data can be laid out on the execute side in the way that the 
> program wants it (rather than swift compelling it to be in a particular 
> format)
> 
> With the present implementation, this amounts to being able to specify 
> different paths and filenames on the submit and execute side for each data 
> file.
> 
> In the longer term, it might also be useful in defining things like how to 
> map data on a submit-side database to some format on the execute side for 
> processing. If we have only submit side mappers, then we can map data 
> between a submit side database and SwiftScript structures, but not map 
> between those structures and the execute side...
> 
> --
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> 


From nefedova at mcs.anl.gov  Tue May  1 07:06:09 2007
From: nefedova at mcs.anl.gov (Veronika V. Nefedova)
Date: Tue, 01 May 2007 07:06:09 -0500
Subject: [Swift-devel] LQCD mapping
In-Reply-To: <Pine.LNX.4.64.0705010708480.3117@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705010708480.3117@dildano.hawaga.org.uk>
Message-ID: <6.2.1.2.2.20070501070147.020db328@pop.mcs.anl.gov>

I'd like to mention that as a result of our 'play' last evening - we have a 
working LQCD workflow (chained genU execution) so the workflow could be 
given to Xian-He once the changes Ben has made to swift make it into trunk. 
My special thanks to Ben who came up with the mapper construction below 
that made the workflow work!

Nika

At 02:21 AM 5/1/2007, Ben Clifford wrote:

>yesterday evening I played some with Nika trying to get her LQCD workflow
>running some more.
>
>It involved one code change to swift:
>
>I put a 'create' option on the filesys_mapper so that one can do this:
>
>   file lattice[] <filesys_mapper;prefix="lattice.",create=true>;
>   foreach i in range {
>     int j=i-1;
>     lattice[i] = lqcd_exec(test_in,lattice[j]);
>   }
>
>where lattice.* files don't exist, so that lattice[5] will map to
>"lattice.5". With create=false (the default) then the mapper behaves as
>before, which seems to be essentially an input-only mode where it creates
>an array based on existing files.
>
>I think this is the mapping functionality that I want, but its not clear
>to me whether filesys_mapper is the place for it, whether one of the other
>mappers already does this, or if it should go in a different place
>(another mapper or a new mapper). comments?
>
>--
>_______________________________________________
>Swift-devel mailing list
>Swift-devel at ci.uchicago.edu
>http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel


From hategan at mcs.anl.gov  Tue May  1 08:21:33 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 01 May 2007 08:21:33 -0500
Subject: [Swift-devel] LQCD mapping
In-Reply-To: <Pine.LNX.4.64.0705010708480.3117@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705010708480.3117@dildano.hawaga.org.uk>
Message-ID: <1178025693.26042.10.camel@blabla.mcs.anl.gov>

On Tue, 2007-05-01 at 07:21 +0000, Ben Clifford wrote:
> yesterday evening I played some with Nika trying to get her LQCD workflow 
> running some more.
> 
> It involved one code change to swift:
> 
> I put a 'create' option on the filesys_mapper so that one can do this:
> 
>   file lattice[] <filesys_mapper;prefix="lattice.",create=true>;
>   foreach i in range {
>     int j=i-1;
>     lattice[i] = lqcd_exec(test_in,lattice[j]);
>   }
> 
> where lattice.* files don't exist, so that lattice[5] will map to 
> "lattice.5". With create=false (the default) then the mapper behaves as 
> before, which seems to be essentially an input-only mode where it creates 
> an array based on existing files.
> 
> I think this is the mapping functionality that I want, but its not clear 
> to me whether filesys_mapper is the place for it, whether one of the other 
> mappers already does this, or if it should go in a different place 
> (another mapper or a new mapper). comments?

The translator does static analysis to figure what things are "read" and
what things are "write". In this case it looks like it's figuring the
wrong thing, and I think that should be fixed.

Mihael

> 


From yongzh at cs.uchicago.edu  Tue May  1 09:42:04 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Tue, 1 May 2007 09:42:04 -0500 (CDT)
Subject: [Swift-devel] LQCD mapping
In-Reply-To: <Pine.LNX.4.64.0705010708480.3117@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705010708480.3117@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.58.0705010941500.7444@classes.cs.uchicago.edu>

does create mean creating an empty file?

On Tue, 1 May 2007, Ben Clifford wrote:

>
> yesterday evening I played some with Nika trying to get her LQCD workflow
> running some more.
>
> It involved one code change to swift:
>
> I put a 'create' option on the filesys_mapper so that one can do this:
>
>   file lattice[] <filesys_mapper;prefix="lattice.",create=true>;
>   foreach i in range {
>     int j=i-1;
>     lattice[i] = lqcd_exec(test_in,lattice[j]);
>   }
>
> where lattice.* files don't exist, so that lattice[5] will map to
> "lattice.5". With create=false (the default) then the mapper behaves as
> before, which seems to be essentially an input-only mode where it creates
> an array based on existing files.
>
> I think this is the mapping functionality that I want, but its not clear
> to me whether filesys_mapper is the place for it, whether one of the other
> mappers already does this, or if it should go in a different place
> (another mapper or a new mapper). comments?
>
> --
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From benc at hawaga.org.uk  Tue May  1 09:44:02 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 1 May 2007 14:44:02 +0000 (GMT)
Subject: [Swift-devel] LQCD mapping
In-Reply-To: <Pine.LNX.4.58.0705010941500.7444@classes.cs.uchicago.edu>
References: <Pine.LNX.4.64.0705010708480.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705010941500.7444@classes.cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705011442230.3117@dildano.hawaga.org.uk>


On Tue, 1 May 2007, Yong Zhao wrote:

> does create mean creating an empty file?

it doesn't create any file - it creates mappings that this mapper appears 
to not make by default (though mihael suggests that might be a bug)

so that if I refer to an array element  lattice[787979]  it is mapped to a 
(possibly non-existing) file called prefix+"787979"+suffix

if that is then used as the output variable for a procedure, then yes the 
file gets created by the procedure. but not by the mapper.

> 
> On Tue, 1 May 2007, Ben Clifford wrote:
> 
> >
> > yesterday evening I played some with Nika trying to get her LQCD workflow
> > running some more.
> >
> > It involved one code change to swift:
> >
> > I put a 'create' option on the filesys_mapper so that one can do this:
> >
> >   file lattice[] <filesys_mapper;prefix="lattice.",create=true>;
> >   foreach i in range {
> >     int j=i-1;
> >     lattice[i] = lqcd_exec(test_in,lattice[j]);
> >   }
> >
> > where lattice.* files don't exist, so that lattice[5] will map to
> > "lattice.5". With create=false (the default) then the mapper behaves as
> > before, which seems to be essentially an input-only mode where it creates
> > an array based on existing files.
> >
> > I think this is the mapping functionality that I want, but its not clear
> > to me whether filesys_mapper is the place for it, whether one of the other
> > mappers already does this, or if it should go in a different place
> > (another mapper or a new mapper). comments?
> >
> > --
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> 
> 


From yongzh at cs.uchicago.edu  Tue May  1 09:54:38 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Tue, 1 May 2007 09:54:38 -0500 (CDT)
Subject: [Swift-devel] LQCD mapping
In-Reply-To: <Pine.LNX.4.64.0705011442230.3117@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705010708480.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705010941500.7444@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705011442230.3117@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.58.0705010953340.7444@classes.cs.uchicago.edu>

I see, then maybe you should not use filesys_mapper, can you try
simple_mapper instead?

Yong.

On Tue, 1 May 2007, Ben Clifford wrote:

>
>
> On Tue, 1 May 2007, Yong Zhao wrote:
>
> > does create mean creating an empty file?
>
> it doesn't create any file - it creates mappings that this mapper appears
> to not make by default (though mihael suggests that might be a bug)
>
> so that if I refer to an array element  lattice[787979]  it is mapped to a
> (possibly non-existing) file called prefix+"787979"+suffix
>
> if that is then used as the output variable for a procedure, then yes the
> file gets created by the procedure. but not by the mapper.
>
> >
> > On Tue, 1 May 2007, Ben Clifford wrote:
> >
> > >
> > > yesterday evening I played some with Nika trying to get her LQCD workflow
> > > running some more.
> > >
> > > It involved one code change to swift:
> > >
> > > I put a 'create' option on the filesys_mapper so that one can do this:
> > >
> > >   file lattice[] <filesys_mapper;prefix="lattice.",create=true>;
> > >   foreach i in range {
> > >     int j=i-1;
> > >     lattice[i] = lqcd_exec(test_in,lattice[j]);
> > >   }
> > >
> > > where lattice.* files don't exist, so that lattice[5] will map to
> > > "lattice.5". With create=false (the default) then the mapper behaves as
> > > before, which seems to be essentially an input-only mode where it creates
> > > an array based on existing files.
> > >
> > > I think this is the mapping functionality that I want, but its not clear
> > > to me whether filesys_mapper is the place for it, whether one of the other
> > > mappers already does this, or if it should go in a different place
> > > (another mapper or a new mapper). comments?
> > >
> > > --
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > >
> >
> >
>


From hategan at mcs.anl.gov  Tue May  1 09:58:17 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 01 May 2007 09:58:17 -0500
Subject: [Swift-devel] LQCD mapping
In-Reply-To: <Pine.LNX.4.64.0705011442230.3117@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705010708480.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705010941500.7444@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705011442230.3117@dildano.hawaga.org.uk>
Message-ID: <1178031497.29391.7.camel@blabla.mcs.anl.gov>

On Tue, 2007-05-01 at 14:44 +0000, Ben Clifford wrote:
> 
> On Tue, 1 May 2007, Yong Zhao wrote:
> 
> > does create mean creating an empty file?
> 
> it doesn't create any file - it creates mappings that this mapper appears 
> to not make by default (though mihael suggests that might be a bug)
> 
> so that if I refer to an array element  lattice[787979]  it is mapped to a 
> (possibly non-existing) file called prefix+"787979"+suffix
> 
> if that is then used as the output variable for a procedure, then yes the 
> file gets created by the procedure. but not by the mapper.

Mappers are supposed to be lazy. They don't enforce sizes. They can be
used, if requested, to populate a data structure to reflect existing
data. That's what the existing() call does. The translator is supposed
to figure if that call should be made on initialization, and signal that
using the "input" mapping parameter (which is used by the type system to
determine whether a call to existing() should be made). In a sense,
"input" is pretty much what your "create" does, but it's got a large
part of the complexities figured.

Should "input" not be passed, existing() would not be called, and the
mapper should act in a fully lazy way, but that would also mean that no
bits in the array will be marked as available, unless assigned to
separately.

> 
> > 
> > On Tue, 1 May 2007, Ben Clifford wrote:
> > 
> > >
> > > yesterday evening I played some with Nika trying to get her LQCD workflow
> > > running some more.
> > >
> > > It involved one code change to swift:
> > >
> > > I put a 'create' option on the filesys_mapper so that one can do this:
> > >
> > >   file lattice[] <filesys_mapper;prefix="lattice.",create=true>;
> > >   foreach i in range {
> > >     int j=i-1;
> > >     lattice[i] = lqcd_exec(test_in,lattice[j]);
> > >   }
> > >
> > > where lattice.* files don't exist, so that lattice[5] will map to
> > > "lattice.5". With create=false (the default) then the mapper behaves as
> > > before, which seems to be essentially an input-only mode where it creates
> > > an array based on existing files.
> > >
> > > I think this is the mapping functionality that I want, but its not clear
> > > to me whether filesys_mapper is the place for it, whether one of the other
> > > mappers already does this, or if it should go in a different place
> > > (another mapper or a new mapper). comments?
> > >
> > > --
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > >
> > 
> > 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 


From benc at hawaga.org.uk  Tue May  1 10:01:25 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 1 May 2007 15:01:25 +0000 (GMT)
Subject: [Swift-devel] LQCD mapping
In-Reply-To: <Pine.LNX.4.58.0705010953340.7444@classes.cs.uchicago.edu>
References: <Pine.LNX.4.64.0705010708480.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705010941500.7444@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705011442230.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705010953340.7444@classes.cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705011500150.3117@dildano.hawaga.org.uk>


On Tue, 1 May 2007, Yong Zhao wrote:

> I see, then maybe you should not use filesys_mapper, can you try
> simple_mapper instead?

The code in there looks like it does approximately the right stuff 
filename-wise for arrays.

Maybe Nika can try with her workflow - if not, I'll try it later.

-- 


From nefedova at mcs.anl.gov  Tue May  1 10:06:20 2007
From: nefedova at mcs.anl.gov (Veronika V. Nefedova)
Date: Tue, 01 May 2007 10:06:20 -0500
Subject: [Swift-devel] LQCD mapping
In-Reply-To: <Pine.LNX.4.64.0705011500150.3117@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705010708480.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705010941500.7444@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705011442230.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705010953340.7444@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705011500150.3117@dildano.hawaga.org.uk>
Message-ID: <6.2.1.2.2.20070501100604.020553e8@pop.mcs.anl.gov>

At 10:01 AM 5/1/2007, Ben Clifford wrote:


>On Tue, 1 May 2007, Yong Zhao wrote:
>
> > I see, then maybe you should not use filesys_mapper, can you try
> > simple_mapper instead?
>
>The code in there looks like it does approximately the right stuff
>filename-wise for arrays.
>
>Maybe Nika can try with her workflow - if not, I'll try it later.
>

Sure, I can try that

Nika

>--
>
>_______________________________________________
>Swift-devel mailing list
>Swift-devel at ci.uchicago.edu
>http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel


From tiberius at ci.uchicago.edu  Tue May  1 10:09:04 2007
From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun)
Date: Tue, 1 May 2007 10:09:04 -0500
Subject: [Swift-devel] suggestion please on hanging/sleeping/slow wf
Message-ID: <fec1351f0705010809u73c5e786mef466b2d16ccbff7@mail.gmail.com>

I have a workflow that generates 5000 files.
The execution seems to have halted, for no obvious reason:
- there are no more jobs in the queue
- no error are reported in the logfile
- NOTE: some of the input files have not been staged in yet , yet the
workflow is hanging
 -  NOTE: the remote application temp directory is GONE, only the
shared directory is still there
 - apparently all the output files that are in /shared have been sent
back (staged out)

What to do, what to do ?

The workflow is sid-wf.dtm in ~tiberius/scratch on teraport
It uses the config files in ~tiberius/local/swift-conf


-- 
Tiberiu (Tibi) Stef-Praun, PhD
Research Staff, Computation Institute
5640 S. Ellis Ave, #405
University of Chicago
http://www-unix.mcs.anl.gov/~tiberius/


From tiberius at ci.uchicago.edu  Tue May  1 10:15:37 2007
From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun)
Date: Tue, 1 May 2007 10:15:37 -0500
Subject: [Swift-devel] LQCD mapping
In-Reply-To: <6.2.1.2.2.20070501100604.020553e8@pop.mcs.anl.gov>
References: <Pine.LNX.4.64.0705010708480.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705010941500.7444@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705011442230.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705010953340.7444@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705011500150.3117@dildano.hawaga.org.uk>
	<6.2.1.2.2.20070501100604.020553e8@pop.mcs.anl.gov>
Message-ID: <fec1351f0705010815g485f525ej89f807bc6c50509e@mail.gmail.com>

Since the discussion seems to go in the direction I'm interested in, I
remember wishing for a filesys mapper extension where I can specify
some verification code (such as the number of inputs matched, or the
total file sizes) before the initialization is being done.
This implies that the mapper retries initialization, and it only
succeeds when a previous step has produced the right number of
outputs. This enables sequential procedure invocation in the case when
I don't know the names or the number of output files from one stage to
the other.
Currently I fixed this by archiving the output from stage one, and
passing it to stage two (fixes the number of unknown outputs)

Tibi

On 5/1/07, Veronika V. Nefedova <nefedova at mcs.anl.gov> wrote:
> At 10:01 AM 5/1/2007, Ben Clifford wrote:
>
>
> >On Tue, 1 May 2007, Yong Zhao wrote:
> >
> > > I see, then maybe you should not use filesys_mapper, can you try
> > > simple_mapper instead?
> >
> >The code in there looks like it does approximately the right stuff
> >filename-wise for arrays.
> >
> >Maybe Nika can try with her workflow - if not, I'll try it later.
> >
>
> Sure, I can try that
>
> Nika
>
> >--
> >
> >_______________________________________________
> >Swift-devel mailing list
> >Swift-devel at ci.uchicago.edu
> >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


-- 
Tiberiu (Tibi) Stef-Praun, PhD
Research Staff, Computation Institute
5640 S. Ellis Ave, #405
University of Chicago
http://www-unix.mcs.anl.gov/~tiberius/


From benc at hawaga.org.uk  Tue May  1 10:18:10 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 1 May 2007 15:18:10 +0000 (GMT)
Subject: [Swift-devel] LQCD mapping
In-Reply-To: <fec1351f0705010815g485f525ej89f807bc6c50509e@mail.gmail.com>
References: <Pine.LNX.4.64.0705010708480.3117@dildano.hawaga.org.uk> 
	<Pine.LNX.4.58.0705010941500.7444@classes.cs.uchicago.edu> 
	<Pine.LNX.4.64.0705011442230.3117@dildano.hawaga.org.uk> 
	<Pine.LNX.4.58.0705010953340.7444@classes.cs.uchicago.edu> 
	<Pine.LNX.4.64.0705011500150.3117@dildano.hawaga.org.uk> 
	<6.2.1.2.2.20070501100604.020553e8@pop.mcs.anl.gov>
	<fec1351f0705010815g485f525ej89f807bc6c50509e@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0705011516360.3117@dildano.hawaga.org.uk>


or you want to be able to return a variable length array from your app?

(which has been mumbled about but never talked about terribly concretely 
on one of the lists before).

anyway, its my birthday somewhere in the world for the next 44 hours or 
so. woo woo party!

On Tue, 1 May 2007, Tiberiu Stef-Praun wrote:

> Since the discussion seems to go in the direction I'm interested in, I
> remember wishing for a filesys mapper extension where I can specify
> some verification code (such as the number of inputs matched, or the
> total file sizes) before the initialization is being done.
> This implies that the mapper retries initialization, and it only
> succeeds when a previous step has produced the right number of
> outputs. This enables sequential procedure invocation in the case when
> I don't know the names or the number of output files from one stage to
> the other.
> Currently I fixed this by archiving the output from stage one, and
> passing it to stage two (fixes the number of unknown outputs)
> 
> Tibi
> 
> On 5/1/07, Veronika V. Nefedova <nefedova at mcs.anl.gov> wrote:
> > At 10:01 AM 5/1/2007, Ben Clifford wrote:
> > 
> > 
> > >On Tue, 1 May 2007, Yong Zhao wrote:
> > >
> > > > I see, then maybe you should not use filesys_mapper, can you try
> > > > simple_mapper instead?
> > >
> > >The code in there looks like it does approximately the right stuff
> > >filename-wise for arrays.
> > >
> > >Maybe Nika can try with her workflow - if not, I'll try it later.
> > >
> > 
> > Sure, I can try that
> > 
> > Nika
> > 
> > >--
> > >
> > >_______________________________________________
> > >Swift-devel mailing list
> > >Swift-devel at ci.uchicago.edu
> > >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > 
> > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > 
> 
> 
> 


From nefedova at mcs.anl.gov  Tue May  1 12:22:25 2007
From: nefedova at mcs.anl.gov (Veronika  V. Nefedova)
Date: Tue, 01 May 2007 12:22:25 -0500
Subject: [Swift-devel] terminable down
Message-ID: <6.0.0.22.2.20070501121638.05c7bec0@mail.mcs.anl.gov>

Hi,

terminable is down for at least an hour now, maybe longer. My email to 
ci.support was unanswered... I am wondering if anybody on this list is at 
UC now and know why terminable is down (scheduled maintenance or 
something)? Could anybody access terminable -- maybe I just can't do it 
from ANL ?

Thanks!

Nika


From iraicu at cs.uchicago.edu  Tue May  1 12:27:43 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Tue, 01 May 2007 12:27:43 -0500
Subject: [Swift-devel] terminable down
In-Reply-To: <6.0.0.22.2.20070501121638.05c7bec0@mail.mcs.anl.gov>
References: <6.0.0.22.2.20070501121638.05c7bec0@mail.mcs.anl.gov>
Message-ID: <4637788F.90707@cs.uchicago.edu>

Nika,
I can't access it either from the UChicago campus...  although I don't 
know why its down.
Ioan

Veronika V. Nefedova wrote:
> Hi,
>
> terminable is down for at least an hour now, maybe longer. My email to 
> ci.support was unanswered... I am wondering if anybody on this list is 
> at UC now and know why terminable is down (scheduled maintenance or 
> something)? Could anybody access terminable -- maybe I just can't do 
> it from ANL ?
>
> Thanks!
>
> Nika
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From nefedova at mcs.anl.gov  Tue May  1 12:29:38 2007
From: nefedova at mcs.anl.gov (Veronika  V. Nefedova)
Date: Tue, 01 May 2007 12:29:38 -0500
Subject: [Swift-devel] terminable down
In-Reply-To: <4637788F.90707@cs.uchicago.edu>
References: <6.0.0.22.2.20070501121638.05c7bec0@mail.mcs.anl.gov>
	<4637788F.90707@cs.uchicago.edu>
Message-ID: <6.0.0.22.2.20070501122822.03836ec0@mail.mcs.anl.gov>

Hmmm. Do you know of any other UC machine that I could try to login into 
(that shares the same home dirw/terminable) ? I tried evitable and its also 
down...

Thanks for the info!

Nika

At 12:27 PM 5/1/2007, Ioan Raicu wrote:
>Nika,
>I can't access it either from the UChicago campus...  although I don't 
>know why its down.
>Ioan
>
>Veronika V. Nefedova wrote:
>>Hi,
>>
>>terminable is down for at least an hour now, maybe longer. My email to 
>>ci.support was unanswered... I am wondering if anybody on this list is at 
>>UC now and know why terminable is down (scheduled maintenance or 
>>something)? Could anybody access terminable -- maybe I just can't do it 
>>from ANL ?
>>
>>Thanks!
>>
>>Nika
>>
>>_______________________________________________
>>Swift-devel mailing list
>>Swift-devel at ci.uchicago.edu
>>http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>--
>============================================
>Ioan Raicu
>Ph.D. Student
>============================================
>Distributed Systems Laboratory
>Computer Science Department
>University of Chicago
>1100 E. 58th Street, Ryerson Hall
>Chicago, IL 60637
>============================================
>Email: iraicu at cs.uchicago.edu
>Web:   http://www.cs.uchicago.edu/~iraicu
>       http://dsl.cs.uchicago.edu/
>============================================
>============================================


From iraicu at cs.uchicago.edu  Tue May  1 12:39:34 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Tue, 01 May 2007 12:39:34 -0500
Subject: [Swift-devel] terminable down
In-Reply-To: <6.0.0.22.2.20070501122822.03836ec0@mail.mcs.anl.gov>
References: <6.0.0.22.2.20070501121638.05c7bec0@mail.mcs.anl.gov>
	<4637788F.90707@cs.uchicago.edu>
	<6.0.0.22.2.20070501122822.03836ec0@mail.mcs.anl.gov>
Message-ID: <46377B56.8080807@cs.uchicago.edu>

No, I don't, as I don't normally use the CI machines much.
Ioan

Veronika V. Nefedova wrote:
> Hmmm. Do you know of any other UC machine that I could try to login 
> into (that shares the same home dirw/terminable) ? I tried evitable 
> and its also down...
>
> Thanks for the info!
>
> Nika
>
> At 12:27 PM 5/1/2007, Ioan Raicu wrote:
>> Nika,
>> I can't access it either from the UChicago campus...  although I 
>> don't know why its down.
>> Ioan
>>
>> Veronika V. Nefedova wrote:
>>> Hi,
>>>
>>> terminable is down for at least an hour now, maybe longer. My email 
>>> to ci.support was unanswered... I am wondering if anybody on this 
>>> list is at UC now and know why terminable is down (scheduled 
>>> maintenance or something)? Could anybody access terminable -- maybe 
>>> I just can't do it from ANL ?
>>>
>>> Thanks!
>>>
>>> Nika
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>> -- 
>> ============================================
>> Ioan Raicu
>> Ph.D. Student
>> ============================================
>> Distributed Systems Laboratory
>> Computer Science Department
>> University of Chicago
>> 1100 E. 58th Street, Ryerson Hall
>> Chicago, IL 60637
>> ============================================
>> Email: iraicu at cs.uchicago.edu
>> Web:   http://www.cs.uchicago.edu/~iraicu
>>       http://dsl.cs.uchicago.edu/
>> ============================================
>> ============================================
>
>
>

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From nefedova at mcs.anl.gov  Tue May  1 12:43:38 2007
From: nefedova at mcs.anl.gov (Veronika  V. Nefedova)
Date: Tue, 01 May 2007 12:43:38 -0500
Subject: [Swift-devel] terminable down
In-Reply-To: <46377B56.8080807@cs.uchicago.edu>
References: <6.0.0.22.2.20070501121638.05c7bec0@mail.mcs.anl.gov>
	<4637788F.90707@cs.uchicago.edu>
	<6.0.0.22.2.20070501122822.03836ec0@mail.mcs.anl.gov>
	<46377B56.8080807@cs.uchicago.edu>
Message-ID: <6.0.0.22.2.20070501124319.0381c020@mail.mcs.anl.gov>

It looks like I should avoid those as well (;

At 12:39 PM 5/1/2007, Ioan Raicu wrote:
>No, I don't, as I don't normally use the CI machines much.
>Ioan
>
>Veronika V. Nefedova wrote:
>>Hmmm. Do you know of any other UC machine that I could try to login into 
>>(that shares the same home dirw/terminable) ? I tried evitable and its 
>>also down...
>>
>>Thanks for the info!
>>
>>Nika
>>
>>At 12:27 PM 5/1/2007, Ioan Raicu wrote:
>>>Nika,
>>>I can't access it either from the UChicago campus...  although I don't 
>>>know why its down.
>>>Ioan
>>>
>>>Veronika V. Nefedova wrote:
>>>>Hi,
>>>>
>>>>terminable is down for at least an hour now, maybe longer. My email to 
>>>>ci.support was unanswered... I am wondering if anybody on this list is 
>>>>at UC now and know why terminable is down (scheduled maintenance or 
>>>>something)? Could anybody access terminable -- maybe I just can't do it 
>>>>from ANL ?
>>>>
>>>>Thanks!
>>>>
>>>>Nika
>>>>
>>>>_______________________________________________
>>>>Swift-devel mailing list
>>>>Swift-devel at ci.uchicago.edu
>>>>http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>--
>>>============================================
>>>Ioan Raicu
>>>Ph.D. Student
>>>============================================
>>>Distributed Systems Laboratory
>>>Computer Science Department
>>>University of Chicago
>>>1100 E. 58th Street, Ryerson Hall
>>>Chicago, IL 60637
>>>============================================
>>>Email: iraicu at cs.uchicago.edu
>>>Web:   http://www.cs.uchicago.edu/~iraicu
>>>       http://dsl.cs.uchicago.edu/
>>>============================================
>>>============================================
>>
>>
>
>--
>============================================
>Ioan Raicu
>Ph.D. Student
>============================================
>Distributed Systems Laboratory
>Computer Science Department
>University of Chicago
>1100 E. 58th Street, Ryerson Hall
>Chicago, IL 60637
>============================================
>Email: iraicu at cs.uchicago.edu
>Web:   http://www.cs.uchicago.edu/~iraicu
>       http://dsl.cs.uchicago.edu/
>============================================
>============================================


From hategan at mcs.anl.gov  Tue May  1 12:46:08 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 01 May 2007 12:46:08 -0500
Subject: [Swift-devel] terminable down
In-Reply-To: <6.0.0.22.2.20070501124319.0381c020@mail.mcs.anl.gov>
References: <6.0.0.22.2.20070501121638.05c7bec0@mail.mcs.anl.gov>
	<4637788F.90707@cs.uchicago.edu>
	<6.0.0.22.2.20070501122822.03836ec0@mail.mcs.anl.gov>
	<46377B56.8080807@cs.uchicago.edu>
	<6.0.0.22.2.20070501124319.0381c020@mail.mcs.anl.gov>
Message-ID: <1178041568.3508.0.camel@blabla.mcs.anl.gov>

Somebody stepped on the power outlet that hosted the small switch that
terminable used. It should be back now.

On Tue, 2007-05-01 at 12:43 -0500, Veronika V. Nefedova wrote:
> It looks like I should avoid those as well (;
> 
> At 12:39 PM 5/1/2007, Ioan Raicu wrote:
> >No, I don't, as I don't normally use the CI machines much.
> >Ioan
> >
> >Veronika V. Nefedova wrote:
> >>Hmmm. Do you know of any other UC machine that I could try to login into 
> >>(that shares the same home dirw/terminable) ? I tried evitable and its 
> >>also down...
> >>
> >>Thanks for the info!
> >>
> >>Nika
> >>
> >>At 12:27 PM 5/1/2007, Ioan Raicu wrote:
> >>>Nika,
> >>>I can't access it either from the UChicago campus...  although I don't 
> >>>know why its down.
> >>>Ioan
> >>>
> >>>Veronika V. Nefedova wrote:
> >>>>Hi,
> >>>>
> >>>>terminable is down for at least an hour now, maybe longer. My email to 
> >>>>ci.support was unanswered... I am wondering if anybody on this list is 
> >>>>at UC now and know why terminable is down (scheduled maintenance or 
> >>>>something)? Could anybody access terminable -- maybe I just can't do it 
> >>>>from ANL ?
> >>>>
> >>>>Thanks!
> >>>>
> >>>>Nika
> >>>>
> >>>>_______________________________________________
> >>>>Swift-devel mailing list
> >>>>Swift-devel at ci.uchicago.edu
> >>>>http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>
> >>>--
> >>>============================================
> >>>Ioan Raicu
> >>>Ph.D. Student
> >>>============================================
> >>>Distributed Systems Laboratory
> >>>Computer Science Department
> >>>University of Chicago
> >>>1100 E. 58th Street, Ryerson Hall
> >>>Chicago, IL 60637
> >>>============================================
> >>>Email: iraicu at cs.uchicago.edu
> >>>Web:   http://www.cs.uchicago.edu/~iraicu
> >>>       http://dsl.cs.uchicago.edu/
> >>>============================================
> >>>============================================
> >>
> >>
> >
> >--
> >============================================
> >Ioan Raicu
> >Ph.D. Student
> >============================================
> >Distributed Systems Laboratory
> >Computer Science Department
> >University of Chicago
> >1100 E. 58th Street, Ryerson Hall
> >Chicago, IL 60637
> >============================================
> >Email: iraicu at cs.uchicago.edu
> >Web:   http://www.cs.uchicago.edu/~iraicu
> >       http://dsl.cs.uchicago.edu/
> >============================================
> >============================================
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 


From nefedova at mcs.anl.gov  Tue May  1 13:25:16 2007
From: nefedova at mcs.anl.gov (Veronika  V. Nefedova)
Date: Tue, 01 May 2007 13:25:16 -0500
Subject: [Swift-devel] LQCD mapping
In-Reply-To: <6.2.1.2.2.20070501100604.020553e8@pop.mcs.anl.gov>
References: <Pine.LNX.4.64.0705010708480.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705010941500.7444@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705011442230.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705010953340.7444@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705011500150.3117@dildano.hawaga.org.uk>
	<6.2.1.2.2.20070501100604.020553e8@pop.mcs.anl.gov>
Message-ID: <6.0.0.22.2.20070501132329.05842c20@mail.mcs.anl.gov>

Ok, I tried using a different mapper. It seems that replacing this line:

file lattice[] <filesys_mapper;prefix="lattice.",create=true>;

with this one:

file lattice[] <simple_mapper;prefix="lattice">;


Works just fine. My workflow has finished without any errors.

Nika

At 10:06 AM 5/1/2007, Veronika V. Nefedova wrote:
>At 10:01 AM 5/1/2007, Ben Clifford wrote:
>
>
>>On Tue, 1 May 2007, Yong Zhao wrote:
>>
>> > I see, then maybe you should not use filesys_mapper, can you try
>> > simple_mapper instead?
>>
>>The code in there looks like it does approximately the right stuff
>>filename-wise for arrays.
>>
>>Maybe Nika can try with her workflow - if not, I'll try it later.
>
>Sure, I can try that
>
>Nika
>
>>--
>>
>>_______________________________________________
>>Swift-devel mailing list
>>Swift-devel at ci.uchicago.edu
>>http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>
>_______________________________________________
>Swift-devel mailing list
>Swift-devel at ci.uchicago.edu
>http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel


From nefedova at mcs.anl.gov  Tue May  1 14:53:40 2007
From: nefedova at mcs.anl.gov (Veronika  V. Nefedova)
Date: Tue, 01 May 2007 14:53:40 -0500
Subject: [Swift-devel] arguments to swift
Message-ID: <6.0.0.22.2.20070501144050.04c69c90@mail.mcs.anl.gov>

Hi,

I couldn't find info on how to pass the arguments to the swift script. For 
example, I need to pass an integer, say NUM=5 on the command line when I am 
invoking swift. And inside swift script I'd like to address that variable 
(is it @arg(NUM) ?)... I tried several variations but none seems to work. 
Could somebody please point me to the documentation or give me an example 
on how to do that?

This is one of the few syntax that I tried (and it didn't work):

inside swift script (how to address it):

type file{}
int N = @arg(NUM);
int range[] = [1:N];

foreach i in range {
BLA
}

and on the command line - how to specify an argument:
swift bla.swift NUM=2

Thanks!

Nika


From hategan at mcs.anl.gov  Tue May  1 14:55:30 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 01 May 2007 14:55:30 -0500
Subject: [Swift-devel] arguments to swift
In-Reply-To: <6.0.0.22.2.20070501144050.04c69c90@mail.mcs.anl.gov>
References: <6.0.0.22.2.20070501144050.04c69c90@mail.mcs.anl.gov>
Message-ID: <1178049330.8303.2.camel@blabla.mcs.anl.gov>

This is undocumented and unsupported, but:
@arg("NUM")

You pass it after the .dtm|.swift name:

swift x.swift -NUM=5


On Tue, 2007-05-01 at 14:53 -0500, Veronika V. Nefedova wrote:
> Hi,
> 
> I couldn't find info on how to pass the arguments to the swift script. For 
> example, I need to pass an integer, say NUM=5 on the command line when I am 
> invoking swift. And inside swift script I'd like to address that variable 
> (is it @arg(NUM) ?)... I tried several variations but none seems to work. 
> Could somebody please point me to the documentation or give me an example 
> on how to do that?
> 
> This is one of the few syntax that I tried (and it didn't work):
> 
> inside swift script (how to address it):
> 
> type file{}
> int N = @arg(NUM);
> int range[] = [1:N];
> 
> foreach i in range {
> BLA
> }
> 
> and on the command line - how to specify an argument:
> swift bla.swift NUM=2
> 
> Thanks!
> 
> Nika
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 


From nefedova at mcs.anl.gov  Tue May  1 15:07:33 2007
From: nefedova at mcs.anl.gov (Veronika  V. Nefedova)
Date: Tue, 01 May 2007 15:07:33 -0500
Subject: [Swift-devel] arguments to swift
In-Reply-To: <1178049330.8303.2.camel@blabla.mcs.anl.gov>
References: <6.0.0.22.2.20070501144050.04c69c90@mail.mcs.anl.gov>
	<1178049330.8303.2.camel@blabla.mcs.anl.gov>
Message-ID: <6.0.0.22.2.20070501150722.04c8b710@mail.mcs.anl.gov>

Great, thanks! it worked.

Nika

At 02:55 PM 5/1/2007, Mihael Hategan wrote:
>This is undocumented and unsupported, but:
>@arg("NUM")
>
>You pass it after the .dtm|.swift name:
>
>swift x.swift -NUM=5
>
>
>On Tue, 2007-05-01 at 14:53 -0500, Veronika V. Nefedova wrote:
> > Hi,
> >
> > I couldn't find info on how to pass the arguments to the swift script. For
> > example, I need to pass an integer, say NUM=5 on the command line when 
> I am
> > invoking swift. And inside swift script I'd like to address that variable
> > (is it @arg(NUM) ?)... I tried several variations but none seems to work.
> > Could somebody please point me to the documentation or give me an example
> > on how to do that?
> >
> > This is one of the few syntax that I tried (and it didn't work):
> >
> > inside swift script (how to address it):
> >
> > type file{}
> > int N = @arg(NUM);
> > int range[] = [1:N];
> >
> > foreach i in range {
> > BLA
> > }
> >
> > and on the command line - how to specify an argument:
> > swift bla.swift NUM=2
> >
> > Thanks!
> >
> > Nika
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >


From nefedova at mcs.anl.gov  Tue May  1 16:31:28 2007
From: nefedova at mcs.anl.gov (Veronika  V. Nefedova)
Date: Tue, 01 May 2007 16:31:28 -0500
Subject: [Swift-devel] Fwd: Re: chained genU workflow
Message-ID: <6.0.0.22.2.20070501162729.04c5e910@mail.mcs.anl.gov>

Hi, everybody:

I got this email from Xian-He (after i sent him the lqcd workflow) and I do 
not think I understand what exactly is he talking about.
Mihael and/or Yong -- you've worked with this group before I joined - maybe 
you know what exactly are their problems ? Please give me any background 
information so I could help them to proceed.

Thanks!

Nika


>Date: Tue, 01 May 2007 16:24:45 -0500
>From: Xian-He Sun <sun at iit.edu>
>Subject: Re: chained genU workflow
>To: "Veronika  V. Nefedova" <nefedova at mcs.anl.gov>
>Cc: Don Holmgren <djholm at fnal.gov>, simone at fnal.gov,
>    Nirmal Seenu <nirmal at fnal.gov>, Mike Wilde <wilde at mcs.anl.gov>,
>    Ian Foster <foster at mcs.anl.gov>
>
>
>Thank you, Nika. It is a good achievement. Currently, we are still
>facing
>two technical issues,
>
>1. The lqcd computing environment is not an true Grid environment. We
>still
>need to modify your code to make it work. We have had some success of
>the hello
>example and will work on this one too.
>
>2. We have made Swift talking to PBS directly but some efficiency
>issues remain
>at this time. Some modification at the Swift side is needed. Nirmal is
>working with Mihael Hategan on this regard.
>
>Thank you,
>
>Xian-He


From benc at hawaga.org.uk  Tue May  1 16:56:51 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 1 May 2007 21:56:51 +0000 (GMT)
Subject: [Swift-devel] arguments to swift
In-Reply-To: <1178049330.8303.2.camel@blabla.mcs.anl.gov>
References: <6.0.0.22.2.20070501144050.04c69c90@mail.mcs.anl.gov>
	<1178049330.8303.2.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705012154140.20212@dildano.hawaga.org.uk>


On Tue, 1 May 2007, Mihael Hategan wrote:

> This is undocumented and unsupported, but:
> @arg("NUM")
> 
> You pass it after the .dtm|.swift name:
> 
> swift x.swift -NUM=5

That should probably be a supported feature. I'll note it in the user 
guide.

-- 


From hategan at mcs.anl.gov  Tue May  1 16:56:42 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 01 May 2007 16:56:42 -0500
Subject: [Swift-devel] Fwd: Re: chained genU workflow
In-Reply-To: <6.0.0.22.2.20070501162729.04c5e910@mail.mcs.anl.gov>
References: <6.0.0.22.2.20070501162729.04c5e910@mail.mcs.anl.gov>
Message-ID: <1178056602.13231.5.camel@blabla.mcs.anl.gov>

This is the list of things I got from them OOB:
- MPI jobs with the PBS provider. They need to be able to run with more
than one version of MPI.
- Easier configuration of tc.data/sites.xml. Basically they need the
ability to use a global sites.xml while changing only things like the
project profile entry.
- The cleanup didn't work as it was. It would submit a job on the
default execution provider (whatever that was) which needed a project
profile entry, but the swift library didn't provide one. This was solved
by hacking the vdl lib and adding /bin/rm in tc.data.
- They would like the cleanup to be done without pbs in the future
(possibly fork or directly with the fileop provider). There's some
thinking that needs to go here.

That's it I think.

On Tue, 2007-05-01 at 16:31 -0500, Veronika V. Nefedova wrote:
> Hi, everybody:
> 
> I got this email from Xian-He (after i sent him the lqcd workflow) and I do 
> not think I understand what exactly is he talking about.
> Mihael and/or Yong -- you've worked with this group before I joined - maybe 
> you know what exactly are their problems ? Please give me any background 
> information so I could help them to proceed.
> 
> Thanks!
> 
> Nika
> 
> 
> >Date: Tue, 01 May 2007 16:24:45 -0500
> >From: Xian-He Sun <sun at iit.edu>
> >Subject: Re: chained genU workflow
> >To: "Veronika  V. Nefedova" <nefedova at mcs.anl.gov>
> >Cc: Don Holmgren <djholm at fnal.gov>, simone at fnal.gov,
> >    Nirmal Seenu <nirmal at fnal.gov>, Mike Wilde <wilde at mcs.anl.gov>,
> >    Ian Foster <foster at mcs.anl.gov>
> >
> >
> >Thank you, Nika. It is a good achievement. Currently, we are still
> >facing
> >two technical issues,
> >
> >1. The lqcd computing environment is not an true Grid environment. We
> >still
> >need to modify your code to make it work. We have had some success of
> >the hello
> >example and will work on this one too.
> >
> >2. We have made Swift talking to PBS directly but some efficiency
> >issues remain
> >at this time. Some modification at the Swift side is needed. Nirmal is
> >working with Mihael Hategan on this regard.
> >
> >Thank you,
> >
> >Xian-He
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 


From hategan at mcs.anl.gov  Tue May  1 16:57:27 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 01 May 2007 16:57:27 -0500
Subject: [Swift-devel] arguments to swift
In-Reply-To: <Pine.LNX.4.64.0705012154140.20212@dildano.hawaga.org.uk>
References: <6.0.0.22.2.20070501144050.04c69c90@mail.mcs.anl.gov>
	<1178049330.8303.2.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0705012154140.20212@dildano.hawaga.org.uk>
Message-ID: <1178056647.13231.7.camel@blabla.mcs.anl.gov>

On Tue, 2007-05-01 at 21:56 +0000, Ben Clifford wrote:
> 
> On Tue, 1 May 2007, Mihael Hategan wrote:
> 
> > This is undocumented and unsupported, but:
> > @arg("NUM")
> > 
> > You pass it after the .dtm|.swift name:
> > 
> > swift x.swift -NUM=5
> 
> That should probably be a supported feature. I'll note it in the user 
> guide.

makes sense.


From benc at hawaga.org.uk  Tue May  1 17:06:57 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 1 May 2007 22:06:57 +0000 (GMT)
Subject: [Swift-devel] Fwd: Re: chained genU workflow
In-Reply-To: <1178056602.13231.5.camel@blabla.mcs.anl.gov>
References: <6.0.0.22.2.20070501162729.04c5e910@mail.mcs.anl.gov>
	<1178056602.13231.5.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705012206500.20212@dildano.hawaga.org.uk>


whats a project profile entry?

On Tue, 1 May 2007, Mihael Hategan wrote:

> This is the list of things I got from them OOB:
> - MPI jobs with the PBS provider. They need to be able to run with more
> than one version of MPI.
> - Easier configuration of tc.data/sites.xml. Basically they need the
> ability to use a global sites.xml while changing only things like the
> project profile entry.
> - The cleanup didn't work as it was. It would submit a job on the
> default execution provider (whatever that was) which needed a project
> profile entry, but the swift library didn't provide one. This was solved
> by hacking the vdl lib and adding /bin/rm in tc.data.
> - They would like the cleanup to be done without pbs in the future
> (possibly fork or directly with the fileop provider). There's some
> thinking that needs to go here.
> 
> That's it I think.
> 
> On Tue, 2007-05-01 at 16:31 -0500, Veronika V. Nefedova wrote:
> > Hi, everybody:
> > 
> > I got this email from Xian-He (after i sent him the lqcd workflow) and I do 
> > not think I understand what exactly is he talking about.
> > Mihael and/or Yong -- you've worked with this group before I joined - maybe 
> > you know what exactly are their problems ? Please give me any background 
> > information so I could help them to proceed.
> > 
> > Thanks!
> > 
> > Nika
> > 
> > 
> > >Date: Tue, 01 May 2007 16:24:45 -0500
> > >From: Xian-He Sun <sun at iit.edu>
> > >Subject: Re: chained genU workflow
> > >To: "Veronika  V. Nefedova" <nefedova at mcs.anl.gov>
> > >Cc: Don Holmgren <djholm at fnal.gov>, simone at fnal.gov,
> > >    Nirmal Seenu <nirmal at fnal.gov>, Mike Wilde <wilde at mcs.anl.gov>,
> > >    Ian Foster <foster at mcs.anl.gov>
> > >
> > >
> > >Thank you, Nika. It is a good achievement. Currently, we are still
> > >facing
> > >two technical issues,
> > >
> > >1. The lqcd computing environment is not an true Grid environment. We
> > >still
> > >need to modify your code to make it work. We have had some success of
> > >the hello
> > >example and will work on this one too.
> > >
> > >2. We have made Swift talking to PBS directly but some efficiency
> > >issues remain
> > >at this time. Some modification at the Swift side is needed. Nirmal is
> > >working with Mihael Hategan on this regard.
> > >
> > >Thank you,
> > >
> > >Xian-He
> > 
> > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> 


From hategan at mcs.anl.gov  Tue May  1 17:11:04 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 01 May 2007 17:11:04 -0500
Subject: [Swift-devel] Fwd: Re: chained genU workflow
In-Reply-To: <Pine.LNX.4.64.0705012206500.20212@dildano.hawaga.org.uk>
References: <6.0.0.22.2.20070501162729.04c5e910@mail.mcs.anl.gov>
	<1178056602.13231.5.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0705012206500.20212@dildano.hawaga.org.uk>
Message-ID: <1178057464.13872.4.camel@blabla.mcs.anl.gov>

On Tue, 2007-05-01 at 22:06 +0000, Ben Clifford wrote:
> whats a project profile entry?

That thing that gets translated into a 'project' RSL attribute and
eventually to the PBS equivalent.
In swift I suppose it's called a profile entry, since it's specified
with the <profile> element, and has both a key and a value which kinda
makes it an 'entry'. And the key in this case is "project".

> 
> On Tue, 1 May 2007, Mihael Hategan wrote:
> 
> > This is the list of things I got from them OOB:
> > - MPI jobs with the PBS provider. They need to be able to run with more
> > than one version of MPI.
> > - Easier configuration of tc.data/sites.xml. Basically they need the
> > ability to use a global sites.xml while changing only things like the
> > project profile entry.
> > - The cleanup didn't work as it was. It would submit a job on the
> > default execution provider (whatever that was) which needed a project
> > profile entry, but the swift library didn't provide one. This was solved
> > by hacking the vdl lib and adding /bin/rm in tc.data.
> > - They would like the cleanup to be done without pbs in the future
> > (possibly fork or directly with the fileop provider). There's some
> > thinking that needs to go here.
> > 
> > That's it I think.
> > 
> > On Tue, 2007-05-01 at 16:31 -0500, Veronika V. Nefedova wrote:
> > > Hi, everybody:
> > > 
> > > I got this email from Xian-He (after i sent him the lqcd workflow) and I do 
> > > not think I understand what exactly is he talking about.
> > > Mihael and/or Yong -- you've worked with this group before I joined - maybe 
> > > you know what exactly are their problems ? Please give me any background 
> > > information so I could help them to proceed.
> > > 
> > > Thanks!
> > > 
> > > Nika
> > > 
> > > 
> > > >Date: Tue, 01 May 2007 16:24:45 -0500
> > > >From: Xian-He Sun <sun at iit.edu>
> > > >Subject: Re: chained genU workflow
> > > >To: "Veronika  V. Nefedova" <nefedova at mcs.anl.gov>
> > > >Cc: Don Holmgren <djholm at fnal.gov>, simone at fnal.gov,
> > > >    Nirmal Seenu <nirmal at fnal.gov>, Mike Wilde <wilde at mcs.anl.gov>,
> > > >    Ian Foster <foster at mcs.anl.gov>
> > > >
> > > >
> > > >Thank you, Nika. It is a good achievement. Currently, we are still
> > > >facing
> > > >two technical issues,
> > > >
> > > >1. The lqcd computing environment is not an true Grid environment. We
> > > >still
> > > >need to modify your code to make it work. We have had some success of
> > > >the hello
> > > >example and will work on this one too.
> > > >
> > > >2. We have made Swift talking to PBS directly but some efficiency
> > > >issues remain
> > > >at this time. Some modification at the Swift side is needed. Nirmal is
> > > >working with Mihael Hategan on this regard.
> > > >
> > > >Thank you,
> > > >
> > > >Xian-He
> > > 
> > > 
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > 
> > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > 
> > 
> 


From benc at hawaga.org.uk  Wed May  2 01:27:27 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 2 May 2007 06:27:27 +0000 (GMT)
Subject: [Swift-devel] LQCD mapping
In-Reply-To: <6.0.0.22.2.20070501132329.05842c20@mail.mcs.anl.gov>
References: <Pine.LNX.4.64.0705010708480.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705010941500.7444@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705011442230.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705010953340.7444@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705011500150.3117@dildano.hawaga.org.uk>
	<6.2.1.2.2.20070501100604.020553e8@pop.mcs.anl.gov>
	<6.0.0.22.2.20070501132329.05842c20@mail.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705020626560.3117@dildano.hawaga.org.uk>


On Tue, 1 May 2007, Veronika  V. Nefedova wrote:

> Ok, I tried using a different mapper. It seems that replacing this line:
> 
> file lattice[] <filesys_mapper;prefix="lattice.",create=true>;
> 
> with this one:
> 
> file lattice[] <simple_mapper;prefix="lattice">;
> 
> Works just fine. My workflow has finished without any errors.

ok cool.

I'll add more documentation to the userguide about the behaviour of the 
simple mapper.

-- 


From benc at hawaga.org.uk  Wed May  2 03:52:47 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 2 May 2007 08:52:47 +0000 (GMT)
Subject: [Swift-devel] Fwd: Re: chained genU workflow
In-Reply-To: <1178056602.13231.5.camel@blabla.mcs.anl.gov>
References: <6.0.0.22.2.20070501162729.04c5e910@mail.mcs.anl.gov>
	<1178056602.13231.5.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705020744590.20212@dildano.hawaga.org.uk>


On Tue, 1 May 2007, Mihael Hategan wrote:

These should probably go into the bugzilla as they look like app 
requirements that need tracking.


> - MPI jobs with the PBS provider. They need to be able to run with more
> than one version of MPI.

> - Easier configuration of tc.data/sites.xml. Basically they need the
> ability to use a global sites.xml while changing only things like the
> project profile entry.

It maybe makes sense in general that a commandline specified value 
overrides a tc.data-specified value which overrides a sites.xml-specified 
value.

Though in this 'project' situation, that might look wrong in the 
multi-site case (though, given that they're using PBS on single site, that 
isn't so much a problem at the moment)

> - The cleanup didn't work as it was. It would submit a job on the
> default execution provider (whatever that was) which needed a project
> profile entry, but the swift library didn't provide one. This was solved
> by hacking the vdl lib and adding /bin/rm in tc.data.

mmm hacks.

anything useful from a production codebase perspective?

> - They would like the cleanup to be done without pbs in the future
> (possibly fork or directly with the fileop provider). There's some
> thinking that needs to go here.

VDS1's sites descriptions allowed different job submission mechanisms to 
be specified for different purposes - the 'vanilla' universe and the 
'transfer' universe with the intention that the vanilla universe is for 
running the meat of the workflow and would point at a batch system of some 
kind, whilst the transfer universe is intended for lighter weight jobs and 
would point at GRAM2's jobmanager-fork.

That's perhaps a starting point.

-- 


From benc at hawaga.org.uk  Wed May  2 04:08:34 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 2 May 2007 09:08:34 +0000 (GMT)
Subject: [Swift-devel] multiple arguments
Message-ID: <Pine.LNX.4.64.0705020906040.3117@dildano.hawaga.org.uk>


I'm trying to run softmean, one of the tools in the fmri workflow that I 
used before for tutorial purposes.

At present I run it like this:

softmean @atlas.img overwrite scalingsuffix @sliced[0].img @sliced[1].img 
@sliced[2].img @sliced[3].img;

so that each of four image filenames are passed in as separate parameters.

This isn't so nice when 'sliced' has != 4 elements.

I considered trying this:

softmean @atlas.img overwrite scalingsuffix @sliced[*].img;

But the @sliced[*].img appears to to turn into a single string argument 
listing all of the filenames, which softmean finds displeasing.

-- 


From benc at hawaga.org.uk  Wed May  2 05:12:17 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 2 May 2007 10:12:17 +0000 (GMT)
Subject: [Swift-devel] version numbering and directory naming.
Message-ID: <Pine.LNX.4.64.0705021008520.3117@dildano.hawaga.org.uk>


I just changed the project version number to 0.1-dev (previously it was 
1.0) as part of some version number stuff I did in r665.

A practical side effect of this is that when you build for source, this 
will change the directory in which the distribution will appear. It will 
now appear in dist/vdsk-0.1-dev instead of dist/vdsk-1.0.

Perhaps the nightly builds need tweaking to accomodate this too, but I 
can't remember where they happen...

-- 


From yongzh at cs.uchicago.edu  Wed May  2 09:27:19 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Wed, 2 May 2007 09:27:19 -0500 (CDT)
Subject: [Swift-devel] multiple arguments
In-Reply-To: <Pine.LNX.4.64.0705020906040.3117@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705020906040.3117@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.58.0705020926570.25235@classes.cs.uchicago.edu>

Ben,

use @filenames(sliced[*].img).

On Wed, 2 May 2007, Ben Clifford wrote:

>
> I'm trying to run softmean, one of the tools in the fmri workflow that I
> used before for tutorial purposes.
>
> At present I run it like this:
>
> softmean @atlas.img overwrite scalingsuffix @sliced[0].img @sliced[1].img
> @sliced[2].img @sliced[3].img;
>
> so that each of four image filenames are passed in as separate parameters.
>
> This isn't so nice when 'sliced' has != 4 elements.
>
> I considered trying this:
>
> softmean @atlas.img overwrite scalingsuffix @sliced[*].img;
>
> But the @sliced[*].img appears to to turn into a single string argument
> listing all of the filenames, which softmean finds displeasing.
>
> --
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From benc at hawaga.org.uk  Wed May  2 10:02:54 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 2 May 2007 15:02:54 +0000 (GMT)
Subject: [Swift-devel] multiple arguments
In-Reply-To: <Pine.LNX.4.58.0705020926570.25235@classes.cs.uchicago.edu>
References: <Pine.LNX.4.64.0705020906040.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705020926570.25235@classes.cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705021500100.22628@dildano.hawaga.org.uk>


On Wed, 2 May 2007, Yong Zhao wrote:

> use @filenames(sliced[*].img).

I get this:

Execution failed:
        org.griphyn.vdl.mapping.InvalidPathException: Invalid path (*.img) 
for type volume


I tried something a little simpler:


type file;

(file out) echo(file n[])
{
  app {
    echo @filenames(n) stdout=out;
  }
}


file f[] <fixed_array_mapper;files="a b c">;

file out;

out=echo(f);


but that hangs...

oof.

-- 


From yongzh at cs.uchicago.edu  Wed May  2 10:28:43 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Wed, 2 May 2007 10:28:43 -0500 (CDT)
Subject: [Swift-devel] multiple arguments
In-Reply-To: <Pine.LNX.4.64.0705021500100.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705020906040.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705020926570.25235@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705021500100.22628@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.58.0705021024440.25235@classes.cs.uchicago.edu>

That's strange. I used @filenames a lot a while ago and never had any
problems. Check the kml translation, maybe you added the getfieldvalue
stuff to getFilenames, which should not happen. i.e.

It needs to be
	<vdl:getFilenames var="{sliced}">
		<argument name="path"> ....</...>
	</...>

not
	<vdl:getFilenames><vdl:getFieldvalue ....>


Yong.

On Wed, 2 May 2007, Ben Clifford wrote:

>
>
> On Wed, 2 May 2007, Yong Zhao wrote:
>
> > use @filenames(sliced[*].img).
>
> I get this:
>
> Execution failed:
>         org.griphyn.vdl.mapping.InvalidPathException: Invalid path (*.img)
> for type volume
>
>
> I tried something a little simpler:
>
>
> type file;
>
> (file out) echo(file n[])
> {
>   app {
>     echo @filenames(n) stdout=out;
>   }
> }
>
>
> file f[] <fixed_array_mapper;files="a b c">;
>
> file out;
>
> out=echo(f);
>
>
> but that hangs...
>
> oof.
>
> --
>


From benc at hawaga.org.uk  Wed May  2 10:37:30 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 2 May 2007 15:37:30 +0000 (GMT)
Subject: [Swift-devel] multiple arguments
In-Reply-To: <Pine.LNX.4.58.0705021024440.25235@classes.cs.uchicago.edu>
References: <Pine.LNX.4.64.0705020906040.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705020926570.25235@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705021500100.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705021024440.25235@classes.cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705021529110.22628@dildano.hawaga.org.uk>


On Wed, 2 May 2007, Yong Zhao wrote:

> That's strange. I used @filenames a lot a while ago and never had any
> problems. Check the kml translation, maybe you added the getfieldvalue
> stuff to getFilenames, which should not happen. i.e.

yeah, I was just checking that out as a probable cause.

As of about r650, getField is used on all function invocations when a 
variable/path name is supplied as a parameter, no matter which function 
name is used.

The semantics of these 'multiple valued' language constructs ([*] and how 
that passes through @filenames, for example) seems (still) quite poorly 
defined...

> 
> It needs to be
> 	<vdl:getFilenames var="{sliced}">
> 		<argument name="path"> ....</...>
> 	</...>
> 
> not
> 	<vdl:getFilenames><vdl:getFieldvalue ....>
> 
> 
> Yong.
> 
> On Wed, 2 May 2007, Ben Clifford wrote:
> 
> >
> >
> > On Wed, 2 May 2007, Yong Zhao wrote:
> >
> > > use @filenames(sliced[*].img).
> >
> > I get this:
> >
> > Execution failed:
> >         org.griphyn.vdl.mapping.InvalidPathException: Invalid path (*.img)
> > for type volume
> >
> >
> > I tried something a little simpler:
> >
> >
> > type file;
> >
> > (file out) echo(file n[])
> > {
> >   app {
> >     echo @filenames(n) stdout=out;
> >   }
> > }
> >
> >
> > file f[] <fixed_array_mapper;files="a b c">;
> >
> > file out;
> >
> > out=echo(f);
> >
> >
> > but that hangs...
> >
> > oof.
> >
> > --
> >
> 
> 


From benc at hawaga.org.uk  Wed May  2 11:03:10 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 2 May 2007 16:03:10 +0000 (GMT)
Subject: [Swift-devel] suggestion please on hanging/sleeping/slow wf
In-Reply-To: <fec1351f0705010809u73c5e786mef466b2d16ccbff7@mail.gmail.com>
References: <fec1351f0705010809u73c5e786mef466b2d16ccbff7@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0705021554410.20212@dildano.hawaga.org.uk>


On Tue, 1 May 2007, Tiberiu Stef-Praun wrote:

> I have a workflow that generates 5000 files.
> The execution seems to have halted, for no obvious reason:

In the past few days, I've hit hangs a bunch of times in various places - 
more than I've ever seen before, but I am doing more complicated things 
recently compared to before (which was running a few relatively trivial 
jobs in a bunch of relatively trivial workflows).

Its an awkward user experience. In some cases, the code should perhaps 
detect such hangs; and in other cases, perhaps different logging info in 
the -debug output would be useful...

> - there are no more jobs in the queue
> - no error are reported in the logfile
> - NOTE: some of the input files have not been staged in yet , yet the
> workflow is hanging
> -  NOTE: the remote application temp directory is GONE, only the
> shared directory is still there
> - apparently all the output files that are in /shared have been sent
> back (staged out)
> 
> What to do, what to do ?
> 
> The workflow is sid-wf.dtm in ~tiberius/scratch on teraport
> It uses the config files in ~tiberius/local/swift-conf
> 
> 
> 


From hategan at mcs.anl.gov  Wed May  2 11:16:52 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 02 May 2007 11:16:52 -0500
Subject: [Swift-devel] suggestion please on hanging/sleeping/slow wf
In-Reply-To: <Pine.LNX.4.64.0705021554410.20212@dildano.hawaga.org.uk>
References: <fec1351f0705010809u73c5e786mef466b2d16ccbff7@mail.gmail.com>
	<Pine.LNX.4.64.0705021554410.20212@dildano.hawaga.org.uk>
Message-ID: <1178122612.31984.0.camel@blabla.mcs.anl.gov>

On Wed, 2007-05-02 at 16:03 +0000, Ben Clifford wrote:
> 
> On Tue, 1 May 2007, Tiberiu Stef-Praun wrote:
> 
> > I have a workflow that generates 5000 files.
> > The execution seems to have halted, for no obvious reason:
> 
> In the past few days, I've hit hangs a bunch of times in various places - 
> more than I've ever seen before, but I am doing more complicated things 
> recently compared to before (which was running a few relatively trivial 
> jobs in a bunch of relatively trivial workflows).
> 
> Its an awkward user experience. In some cases, the code should perhaps 
> detect such hangs; and in other cases, perhaps different logging info in 
> the -debug output would be useful...

Yep. The question is how.

> 
> > - there are no more jobs in the queue
> > - no error are reported in the logfile
> > - NOTE: some of the input files have not been staged in yet , yet the
> > workflow is hanging
> > -  NOTE: the remote application temp directory is GONE, only the
> > shared directory is still there
> > - apparently all the output files that are in /shared have been sent
> > back (staged out)
> > 
> > What to do, what to do ?
> > 
> > The workflow is sid-wf.dtm in ~tiberius/scratch on teraport
> > It uses the config files in ~tiberius/local/swift-conf
> > 
> > 
> > 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 


From benc at hawaga.org.uk  Wed May  2 18:28:20 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 2 May 2007 23:28:20 +0000 (GMT)
Subject: [Swift-devel] suggestion please on hanging/sleeping/slow wf
In-Reply-To: <fec1351f0705010809u73c5e786mef466b2d16ccbff7@mail.gmail.com>
References: <fec1351f0705010809u73c5e786mef466b2d16ccbff7@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0705022321190.20212@dildano.hawaga.org.uk>

so one of the things I suggested tibi do is change logging to trace level 
for everything, which has resulted in a 400mb log file for his workflow.

Of course, I don't know what this should really look like if it was 
healthy, but I notice a few hundred exceptions of the form:

Caused by: org.globus.ftp.exception.ServerException: Server refused 
performing the request. Custom message:  (error code 1) [Nested exception 
message:  Custom message: Unexpected reply: 425 
globus_ftp_control_local_pasv(): Handle not in the proper state 
CONNECT_WRITE.: Success.] [Nested exception is 
org.globus.ftp.exception.UnexpectedReplyCodeException:  Custom message: 
Unexpected reply: 425 globus_ftp_control_local_pasv(): Handle not in the 
proper state CONNECT_WRITE.: Success.]


towards the end of the log file, which may be wrong.

For anyone interested in the full 400mb, the log file is on teraport at: 
/home/tiberius/scratch/sid-wf-1yrnoadiq0940.log

-- 


From benc at hawaga.org.uk  Thu May  3 07:24:29 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 3 May 2007 12:24:29 +0000 (GMT)
Subject: [Swift-devel] multiple arguments
In-Reply-To: <Pine.LNX.4.58.0705021024440.25235@classes.cs.uchicago.edu>
References: <Pine.LNX.4.64.0705020906040.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705020926570.25235@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705021500100.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705021024440.25235@classes.cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705031224080.22628@dildano.hawaga.org.uk>


On Wed, 2 May 2007, Yong Zhao wrote:

> That's strange. I used @filenames a lot a while ago and never had any
> problems. Check the kml translation, maybe you added the getfieldvalue
> stuff to getFilenames, which should not happen. i.e.
> 
> It needs to be
> 	<vdl:getFilenames var="{sliced}">
> 		<argument name="path"> ....</...>
> 	</...>
> 
> not
> 	<vdl:getFilenames><vdl:getFieldvalue ....>
> 

I noted this problem as bug 59 so it doesn't get forgotten.

-- 


From nefedova at mcs.anl.gov  Thu May  3 09:27:35 2007
From: nefedova at mcs.anl.gov (Veronika V. Nefedova)
Date: Thu, 03 May 2007 09:27:35 -0500
Subject: [Swift-devel] Fwd: Re: Fwd: chained genU workflow
Message-ID: <6.2.1.2.2.20070503092456.0214dc30@pop.mcs.anl.gov>

Does anybody have any idea why swift is failing ? It seems pretty 
straightforward but I do not see whats wrong here...
He is using 070429 nightly build (I am using the same build and it works 
for me).

Nika

>Date: Wed, 02 May 2007 20:44:32 -0500
>From: Luciano Piccoli <piccoli at fnal.gov>
>Subject: Re: Fwd: chained genU workflow
>To: nefedova at mcs.anl.gov
>
>Hi Nika,
>
>I tried to run the genU workflow, but I believe I have some configuration 
>problem. I did download and install the same vdsk version that you used.
>
>My sites.xml has only the localhost provider:
>
>     <?xml version="1.0" encoding="UTF-8"?>
>     <config 
> xmlns=<http://www.griphyn.org/chimera/GVDS-PoolConfig>"http://www.griphyn.org/chimera/GVDS-PoolConfig" 
>
>     xsi:schemaLocation="http://www.griphyn.org/chimera/GVDS 
> http://www.griphyn.org/chimera/gvds-poolcfg-1.5.xsd"
>     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.5">
>
>     <!-- This localhost entry should work on most linux-like systems. It may
>     be necessary to change the two occurences of /var/tmp to a different
>     working directory. -->
>       <pool handle="localhost" sysinfo="INTEL64::LINUX">
>         <lrc url="local://localhost"/>
>         <gridftp  url="local://localhost" storage="{user.home}" major="1" 
> minor="0" patch="0"/>
>         <jobmanager universe="vanilla" url="local://localhost" major="1" 
> minor="0" patch="0" />
>         <workdirectory >{user.home}</workdirectory>
>       </pool>
>     </config>
>
>I reproduced the problem using the q1.swift example. My tc.data looks like 
>this:
>
>     localhost       echo            /home/piccoli/bin/myecho 
> INSTALLED       INTEL64::LINUX  null
>     localhost       echoecho        /home/piccoli/bin/myecho 
> INSTALLED       INTEL64::LINUX  null
>
>In the q1.swift workflow, when I replace the echo command with the 
>echoecho command I get the following error message:
>
>     bash-3.00$ swift q1.swift
>     Swift V 0.0405
>     RunID: 59888ief8zsp1
>     echoecho started
>     echoecho failed
>     The following errors have occurred:
>     1. The requested application (echoecho) cannot be found installed on 
> any of the sites.
>       You should check your tc.data and sites.xml files, and make sure 
> that the name (echoecho) is not misspelled.
>
>This is the swift script:
>
>     type messagefile {}
>
>     (messagefile t) greeting() {
>         app {
>             echoecho "Hello, world!" stdout=@filename(t);
>         }
>     }
>
>     messagefile outfile <"hello.txt">;
>
>     outfile = greeting();
>
>Do you have any idea why this happens? The same error message shows up 
>when I run the genU script. Swift complains that mode_test_in cannot be 
>found, even though tc.data is correct...
>
>Thanks!
>Luciano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070503/661f54be/attachment.html>

From benc at hawaga.org.uk  Thu May  3 09:43:55 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 3 May 2007 14:43:55 +0000 (GMT)
Subject: [Swift-devel] Fwd: Re: Fwd: chained genU workflow
In-Reply-To: <6.2.1.2.2.20070503092456.0214dc30@pop.mcs.anl.gov>
References: <6.2.1.2.2.20070503092456.0214dc30@pop.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705031442210.20212@dildano.hawaga.org.uk>


On Thu, 3 May 2007, Veronika V. Nefedova wrote:

> Does anybody have any idea why swift is failing ? It seems pretty
> straightforward but I do not see whats wrong here...
> He is using 070429 nightly build (I am using the same build and it works for
> me).

try explicitly indicating which tc.data to use like this:

swift -tc.file /path/to/tc.data myprogram.swift

you can encourage people to engage directly on swift-user too!

-- 


From hategan at mcs.anl.gov  Thu May  3 10:35:13 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 03 May 2007 10:35:13 -0500
Subject: [Swift-devel] Fwd: Re: Fwd: chained genU workflow
In-Reply-To: <Pine.LNX.4.64.0705031442210.20212@dildano.hawaga.org.uk>
References: <6.2.1.2.2.20070503092456.0214dc30@pop.mcs.anl.gov>
	<Pine.LNX.4.64.0705031442210.20212@dildano.hawaga.org.uk>
Message-ID: <1178206513.19768.0.camel@blabla.mcs.anl.gov>

Or use -v to see what sites file is being used.

On Thu, 2007-05-03 at 14:43 +0000, Ben Clifford wrote:
> On Thu, 3 May 2007, Veronika V. Nefedova wrote:
> 
> > Does anybody have any idea why swift is failing ? It seems pretty
> > straightforward but I do not see whats wrong here...
> > He is using 070429 nightly build (I am using the same build and it works for
> > me).
> 
> try explicitly indicating which tc.data to use like this:
> 
> swift -tc.file /path/to/tc.data myprogram.swift
> 
> you can encourage people to engage directly on swift-user too!
> 


From benc at hawaga.org.uk  Fri May  4 05:06:06 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Fri, 4 May 2007 10:06:06 +0000 (GMT)
Subject: [Swift-devel] limiting simultaneous jobs using the local provider.
Message-ID: <Pine.LNX.4.64.0705041000111.22628@dildano.hawaga.org.uk>


Is there a way to limit the number of jobs that will be executing 
simultaneously with the local provider? (or perhaps with swift as a 
whole?)

There are a few throttle parameters in the configuration file but I find 
them slightly confusing, and setting them all to 1 appears to not have the 
effect I want - I think I understand them to limit the number of jobs that 
will be in a particular internal gram 'in process of being submitted' 
state, rather than the total number of actually executing jobs.

My immediate motivation for this is because the fmri workflow (which runs 
up to the incredible number of four cpu intensive executables 
simultaneously) pretty much kills my laptop for other purposes whilst its 
running. I'd much rather be able to limit it to one (or perhaps two).

-- 


From benc at hawaga.org.uk  Fri May  4 06:15:07 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Fri, 4 May 2007 11:15:07 +0000 (GMT)
Subject: [Swift-devel] multiple arguments
In-Reply-To: <Pine.LNX.4.64.0705021529110.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705020906040.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705020926570.25235@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705021500100.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705021024440.25235@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705021529110.22628@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.64.0705041033200.22628@dildano.hawaga.org.uk>


On Wed, 2 May 2007, Ben Clifford wrote:

> The semantics of these 'multiple valued' language constructs ([*] and how 
> that passes through @filenames, for example) seems (still) quite poorly 
> defined...

So I thought about this for a while.

I believe that the problem is there is a tension between C/Java like 
structure/array access constructs (which is the syntax, but not 
necessarily the semantics, that SwiftScript uses) and XPath-like XML 
selection constructs (which is the semantics that let you write [*]).

That tension for the most part has not been a problem in the way that 
we've written code so far, except in the presence of [*].

In the C/Java like model, an expression like

  v

or 

  v.image

identifies exactly one entity - in the first case to a variable v, in the 
second case (assuming that v is a structure) to the unique element of the 
structure in variable v that is called image.

In the XML/XPath model, we can make similar looking expressions, such as:

  v/image.

However, XPath expressions do not identify exactly one entity. XPath 
expressions select nodes in an xml document; the identified entities are 
XML nodes. But they are not constrained to selecting exactly one node. 
They can select none, or they can select one, or they can select many.

Consider the XPath query:  v/image   when applied to the XML document:

 <v>
  <image>theimage</image>
  <header>theheader</header>
 </v>

The above query will select the node <image>theimage</image>

However, consider the same query with the document:

  <v>
   <image>theimage</image>
   <header>theheader</header>
   <image>more</image>
   <foo>bar</foo>
  </v>

The query will select two nodes. One of the nodes selected will be 
<image>theimage</image> and the other will be <image>more</image>.

We have not uniquely identified an entity. We have selected several.

Similar things happen if we use what Swift refers to as arrays. Consider 
an XML document like this:

 <lifeforms>
   <foo>tree</foo>
   <foo>plant</foo>
   <foo>fish</foo>
   <foo>dog</foo>
 </lifeforms>

We can say lifeforms[1] and have the element <foo>plant</foo> uniquely 
identified.

or we can say lifeforms[*] and have all four foo elements selected.

But what is the 'value' and 'type' of lifeforms[*], for the purposes of 
feeding into other swift expressions?

When we say lifeforms[1] we can say the 'value' is the uniquely selected 
node <foo>plant</foo>, and that lifeforms[1] evaluates to 
<foo>plant</foo>.

But there is no definition of 'value' at the moment in SwiftScript for 
expressions like this that select multiple expressions. And without a 
definition of what the value of such an expression is, then we can't use 
such an expression as a value to pass into some other bigger expression, 
for example @filenames(lifeforms[*]).

One solution is to define a data type that can hold (as a single value) 
the complete set of results (for example an unbounded sequence of XML 
<any>, or in something more like SwiftScript syntax  any[]  ).

This would allows expressions such as lifeforms[*] to return a single 
value (an instance of the above type, containing all of the selected 
nodes) and would give a stronger formalisation of what expressions like 
@filenames(lifeforms[*]) actually mean.

There may be other ways, which I'd be interested to hear about.

-- 


From nefedova at mcs.anl.gov  Fri May  4 07:58:51 2007
From: nefedova at mcs.anl.gov (Veronika V. Nefedova)
Date: Fri, 04 May 2007 07:58:51 -0500
Subject: [Swift-devel] limiting simultaneous jobs using the local provider.
In-Reply-To: <Pine.LNX.4.64.0705041000111.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705041000111.22628@dildano.hawaga.org.uk>
Message-ID: <6.2.1.2.2.20070504075722.0216f740@pop.mcs.anl.gov>

You can limit it with:
                 <property name="jobThrottle" value="1"/>
<property name="maxSimultaneousJobs" value="1"/>

in scheduler.xml

Nika

At 05:06 AM 5/4/2007, Ben Clifford wrote:

>Is there a way to limit the number of jobs that will be executing
>simultaneously with the local provider? (or perhaps with swift as a
>whole?)
>
>There are a few throttle parameters in the configuration file but I find
>them slightly confusing, and setting them all to 1 appears to not have the
>effect I want - I think I understand them to limit the number of jobs that
>will be in a particular internal gram 'in process of being submitted'
>state, rather than the total number of actually executing jobs.
>
>My immediate motivation for this is because the fmri workflow (which runs
>up to the incredible number of four cpu intensive executables
>simultaneously) pretty much kills my laptop for other purposes whilst its
>running. I'd much rather be able to limit it to one (or perhaps two).
>
>--
>_______________________________________________
>Swift-devel mailing list
>Swift-devel at ci.uchicago.edu
>http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel


From benc at hawaga.org.uk  Fri May  4 08:38:10 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Fri, 4 May 2007 13:38:10 +0000 (GMT)
Subject: [Swift-devel] remote file/directory stuff (bug 22)
In-Reply-To: <1172604377.25936.2.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0702261818090.6810@dildano.hawaga.org.uk> 
	<1172521676.27811.9.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0702271913170.6810@dildano.hawaga.org.uk>
	<1172604377.25936.2.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705041335030.20212@dildano.hawaga.org.uk>


On Tue, 27 Feb 2007, Mihael Hategan wrote:

> If you can make this translate into something like vdl:(in|
> out)appmapping(var, path, dest), preferably after the stagein/stageout
> directives, I can probably make it work.

I have a patch that (at least for input files) makes this:

type file;

(file o) cat(file f) {
  app {
    cat "in.txt" stdout=@o;
    f < "in.txt";
  }
}

file a <"hello.txt">;
file b <"output.txt">;

b=cat(a);


turn into this kml (fragment):

    <vdl:execute>
      <vdl:tr>cat</vdl:tr>
      <vdl:stagein var="{f}"/>
      <vdl:stageout var="{o}"/>
      <vdl:arguments>
        <string>in.txt</string>
      </vdl:arguments>
      <vdl:stdout>
        <vdl:filename>
            <vdl:getfield path=""><variable>o</variable></vdl:getfield>
        </vdl:filename>
      </vdl:stdout>
      <vdl:inappmapping>
        <variable>f</variable>
        <string></string>
        <string>"in.txt"</string>
      </vdl:inappmapping>
    </vdl:execute>

However, I have no implementation of the <vdl:inappmapping> element. I 
guess its time for me to poke round at the guts of vdl.k a bit more.


-- 


From hategan at mcs.anl.gov  Fri May  4 09:04:44 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 04 May 2007 09:04:44 -0500
Subject: [Swift-devel] limiting simultaneous jobs using the local provider.
In-Reply-To: <6.2.1.2.2.20070504075722.0216f740@pop.mcs.anl.gov>
References: <Pine.LNX.4.64.0705041000111.22628@dildano.hawaga.org.uk>
	<6.2.1.2.2.20070504075722.0216f740@pop.mcs.anl.gov>
Message-ID: <1178287485.14998.1.camel@blabla.mcs.anl.gov>

On Fri, 2007-05-04 at 07:58 -0500, Veronika V. Nefedova wrote:
> You can limit it with:
>                  <property name="jobThrottle" value="1"/>

Right. That one above limits the jobs for a site based on its score.
It's supposed to provide load balancing with multiple sites, so it's
likely not what you want.

> <property name="maxSimultaneousJobs" value="1"/>

That on the other hand enforces a hard limit on the number of total
concurrent jobs.

> 
> in scheduler.xml
> 
> Nika
> 
> At 05:06 AM 5/4/2007, Ben Clifford wrote:
> 
> >Is there a way to limit the number of jobs that will be executing
> >simultaneously with the local provider? (or perhaps with swift as a
> >whole?)
> >
> >There are a few throttle parameters in the configuration file but I find
> >them slightly confusing, and setting them all to 1 appears to not have the
> >effect I want - I think I understand them to limit the number of jobs that
> >will be in a particular internal gram 'in process of being submitted'
> >state, rather than the total number of actually executing jobs.
> >
> >My immediate motivation for this is because the fmri workflow (which runs
> >up to the incredible number of four cpu intensive executables
> >simultaneously) pretty much kills my laptop for other purposes whilst its
> >running. I'd much rather be able to limit it to one (or perhaps two).
> >
> >--
> >_______________________________________________
> >Swift-devel mailing list
> >Swift-devel at ci.uchicago.edu
> >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 


From hategan at mcs.anl.gov  Fri May  4 09:10:57 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 04 May 2007 09:10:57 -0500
Subject: [Swift-devel] multiple arguments
In-Reply-To: <Pine.LNX.4.64.0705041033200.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705020906040.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705020926570.25235@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705021500100.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705021024440.25235@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705021529110.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0705041033200.22628@dildano.hawaga.org.uk>
Message-ID: <1178287857.14998.7.camel@blabla.mcs.anl.gov>

I'd say it's somewhat simpler.
Since data in Swift is recursive, non-leaf paths make sense by
themselves. What may not make sense is translating them to arguments to
an application. For example, passing a complex type as an argument to an
application is not well defined.

This can be restricted to, say, passing single values or arrays. In the
case of arrays, a space separated list is the implicit conversion
scheme, and functions could be provided to pass them as comma or
something-else-separated-lists. Passing the whole fringe as a list is a
possibility, too

When it comes to files, the scheme was a little simpler. @filename would
pass the file names of the fringe of a particular data tree. And
@filenames would do the same, but each leaf is a single argument.

On Fri, 2007-05-04 at 11:15 +0000, Ben Clifford wrote:
> 
> On Wed, 2 May 2007, Ben Clifford wrote:
> 
> > The semantics of these 'multiple valued' language constructs ([*] and how 
> > that passes through @filenames, for example) seems (still) quite poorly 
> > defined...
> 
> So I thought about this for a while.
> 
> I believe that the problem is there is a tension between C/Java like 
> structure/array access constructs (which is the syntax, but not 
> necessarily the semantics, that SwiftScript uses) and XPath-like XML 
> selection constructs (which is the semantics that let you write [*]).
> 
> That tension for the most part has not been a problem in the way that 
> we've written code so far, except in the presence of [*].
> 
> In the C/Java like model, an expression like
> 
>   v
> 
> or 
> 
>   v.image
> 
> identifies exactly one entity - in the first case to a variable v, in the 
> second case (assuming that v is a structure) to the unique element of the 
> structure in variable v that is called image.
> 
> In the XML/XPath model, we can make similar looking expressions, such as:
> 
>   v/image.
> 
> However, XPath expressions do not identify exactly one entity. XPath 
> expressions select nodes in an xml document; the identified entities are 
> XML nodes. But they are not constrained to selecting exactly one node. 
> They can select none, or they can select one, or they can select many.
> 
> Consider the XPath query:  v/image   when applied to the XML document:
> 
>  <v>
>   <image>theimage</image>
>   <header>theheader</header>
>  </v>
> 
> The above query will select the node <image>theimage</image>
> 
> However, consider the same query with the document:
> 
>   <v>
>    <image>theimage</image>
>    <header>theheader</header>
>    <image>more</image>
>    <foo>bar</foo>
>   </v>
> 
> The query will select two nodes. One of the nodes selected will be 
> <image>theimage</image> and the other will be <image>more</image>.
> 
> We have not uniquely identified an entity. We have selected several.
> 
> Similar things happen if we use what Swift refers to as arrays. Consider 
> an XML document like this:
> 
>  <lifeforms>
>    <foo>tree</foo>
>    <foo>plant</foo>
>    <foo>fish</foo>
>    <foo>dog</foo>
>  </lifeforms>
> 
> We can say lifeforms[1] and have the element <foo>plant</foo> uniquely 
> identified.
> 
> or we can say lifeforms[*] and have all four foo elements selected.
> 
> But what is the 'value' and 'type' of lifeforms[*], for the purposes of 
> feeding into other swift expressions?
> 
> When we say lifeforms[1] we can say the 'value' is the uniquely selected 
> node <foo>plant</foo>, and that lifeforms[1] evaluates to 
> <foo>plant</foo>.
> 
> But there is no definition of 'value' at the moment in SwiftScript for 
> expressions like this that select multiple expressions. And without a 
> definition of what the value of such an expression is, then we can't use 
> such an expression as a value to pass into some other bigger expression, 
> for example @filenames(lifeforms[*]).
> 
> One solution is to define a data type that can hold (as a single value) 
> the complete set of results (for example an unbounded sequence of XML 
> <any>, or in something more like SwiftScript syntax  any[]  ).
> 
> This would allows expressions such as lifeforms[*] to return a single 
> value (an instance of the above type, containing all of the selected 
> nodes) and would give a stronger formalisation of what expressions like 
> @filenames(lifeforms[*]) actually mean.
> 
> There may be other ways, which I'd be interested to hear about.
> 


From hategan at mcs.anl.gov  Fri May  4 09:17:19 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 04 May 2007 09:17:19 -0500
Subject: [Swift-devel] remote file/directory stuff (bug 22)
In-Reply-To: <Pine.LNX.4.64.0705041335030.20212@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0702261818090.6810@dildano.hawaga.org.uk>
	<1172521676.27811.9.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0702271913170.6810@dildano.hawaga.org.uk>
	<1172604377.25936.2.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0705041335030.20212@dildano.hawaga.org.uk>
Message-ID: <1178288239.14998.13.camel@blabla.mcs.anl.gov>

On Fri, 2007-05-04 at 13:38 +0000, Ben Clifford wrote:
> 
> On Tue, 27 Feb 2007, Mihael Hategan wrote:
> 
> > If you can make this translate into something like vdl:(in|
> > out)appmapping(var, path, dest), preferably after the stagein/stageout
> > directives, I can probably make it work.
> 
> I have a patch that (at least for input files) makes this:
> 
> type file;
> 
> (file o) cat(file f) {
>   app {
>     cat "in.txt" stdout=@o;
>     f < "in.txt";
>   }
> }

Shouldn't that be f > "in.txt" and perhaps before "cat"? In a strict
language style, that would vaguely suggest "dump f into 'in.txt'" before
running cat...

> 
> file a <"hello.txt">;
> file b <"output.txt">;
> 
> b=cat(a);
> 
> 
> turn into this kml (fragment):
> 
>     <vdl:execute>
>       <vdl:tr>cat</vdl:tr>
>       <vdl:stagein var="{f}"/>
>       <vdl:stageout var="{o}"/>
>       <vdl:arguments>
>         <string>in.txt</string>
>       </vdl:arguments>
>       <vdl:stdout>
>         <vdl:filename>
>             <vdl:getfield path=""><variable>o</variable></vdl:getfield>
>         </vdl:filename>
>       </vdl:stdout>
>       <vdl:inappmapping>
>         <variable>f</variable>
>         <string></string>
>         <string>"in.txt"</string>
>       </vdl:inappmapping>
>     </vdl:execute>
> 
> However, I have no implementation of the <vdl:inappmapping> element. I 
> guess its time for me to poke round at the guts of vdl.k a bit more.

1. Using attributes instead of sub-elements may be a little faster.
2. I think the best deal would be to perhaps extend stagein with having
pairs of [localName, remoteName] and deal with that appropriately.

> 
> 


From benc at hawaga.org.uk  Fri May  4 09:26:32 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Fri, 4 May 2007 14:26:32 +0000 (GMT)
Subject: [Swift-devel] remote file/directory stuff (bug 22)
In-Reply-To: <1178288239.14998.13.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0702261818090.6810@dildano.hawaga.org.uk> 
	<1172521676.27811.9.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0702271913170.6810@dildano.hawaga.org.uk> 
	<1172604377.25936.2.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0705041335030.20212@dildano.hawaga.org.uk>
	<1178288239.14998.13.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705041424400.22628@dildano.hawaga.org.uk>


On Fri, 4 May 2007, Mihael Hategan wrote:

> Shouldn't that be f > "in.txt" and perhaps before "cat"? In a strict
> language style, that would vaguely suggest "dump f into 'in.txt'" before
> running cat...

I pondered a while over which way to put the arrow when I was writing the 
parser, and also where in the text it should go. Then decided that it was 
a waste of time to think about it too much when I could be playing with 
the code and picked a configuration at random...

I'm note even sure the a>b syntax is the right way anyway, but I'm more 
interested in getting some implementation done for now than pondering 
syntax.

-- 


From benc at hawaga.org.uk  Fri May  4 09:33:04 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Fri, 4 May 2007 14:33:04 +0000 (GMT)
Subject: [Swift-devel] multiple arguments
In-Reply-To: <1178287857.14998.7.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0705020906040.3117@dildano.hawaga.org.uk> 
	<Pine.LNX.4.58.0705020926570.25235@classes.cs.uchicago.edu> 
	<Pine.LNX.4.64.0705021500100.22628@dildano.hawaga.org.uk> 
	<Pine.LNX.4.58.0705021024440.25235@classes.cs.uchicago.edu> 
	<Pine.LNX.4.64.0705021529110.22628@dildano.hawaga.org.uk> 
	<Pine.LNX.4.64.0705041033200.22628@dildano.hawaga.org.uk>
	<1178287857.14998.7.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705041429580.22628@dildano.hawaga.org.uk>


On Fri, 4 May 2007, Mihael Hategan wrote:

> When it comes to files, the scheme was a little simpler. @filename would
> pass the file names of the fringe of a particular data tree. And
> @filenames would do the same, but each leaf is a single argument.

when used in an app block something like:

 app { 
   myapp "-in" @filenames(myarray[*]) "-type" "fast";
 }

then @filenames needs to be able to return something that gets passed to 
myapp as multiple parameters, rather than a single parameter with spaces 
in it.

I think (?) that this is hard to do if @filenames returns a single value, 
from a SwiftScript-theory perspective (though I think in the karajan 
implementation, @filenames can return as many values as it wants?)

-- 


From benc at hawaga.org.uk  Fri May  4 09:48:29 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Fri, 4 May 2007 14:48:29 +0000 (GMT)
Subject: [Swift-devel] multiple arguments
In-Reply-To: <Pine.LNX.4.64.0705041429580.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705020906040.3117@dildano.hawaga.org.uk> 
	<Pine.LNX.4.58.0705020926570.25235@classes.cs.uchicago.edu> 
	<Pine.LNX.4.64.0705021500100.22628@dildano.hawaga.org.uk> 
	<Pine.LNX.4.58.0705021024440.25235@classes.cs.uchicago.edu> 
	<Pine.LNX.4.64.0705021529110.22628@dildano.hawaga.org.uk> 
	<Pine.LNX.4.64.0705041033200.22628@dildano.hawaga.org.uk>
	<1178287857.14998.7.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0705041429580.22628@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.64.0705041447280.22628@dildano.hawaga.org.uk>


On Fri, 4 May 2007, Ben Clifford wrote:

> > When it comes to files, the scheme was a little simpler. @filename would
> > pass the file names of the fringe of a particular data tree. And
> > @filenames would do the same, but each leaf is a single argument.
> 
> when used in an app block something like:
> 
>  app { 
>    myapp "-in" @filenames(myarray[*]) "-type" "fast";
>  }
> 
> then @filenames needs to be able to return something that gets passed to 
> myapp as multiple parameters, rather than a single parameter with spaces 
> in it.
> 
> I think (?) that this is hard to do if @filenames returns a single value, 
> from a SwiftScript-theory perspective (though I think in the karajan 
> implementation, @filenames can return as many values as it wants?)

so perhaps what we should say is

@filenames(myarray)

returns an array of strings (so  @filenames(myarray)  has type string[])

and then say that the behaviour for string arrays being used in an 
application line is to make each element into its own argument.

-- 


From benc at hawaga.org.uk  Fri May  4 09:56:47 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Fri, 4 May 2007 14:56:47 +0000 (GMT)
Subject: [Swift-devel] limiting simultaneous jobs using the local provider.
In-Reply-To: <1178287485.14998.1.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0705041000111.22628@dildano.hawaga.org.uk> 
	<6.2.1.2.2.20070504075722.0216f740@pop.mcs.anl.gov>
	<1178287485.14998.1.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705041455050.22628@dildano.hawaga.org.uk>


On Fri, 4 May 2007, Mihael Hategan wrote:

> > <property name="maxSimultaneousJobs" value="1"/>
> 
> That on the other hand enforces a hard limit on the number of total
> concurrent jobs.

setting this and leaving jobThrottle as it was appears to have caused the 
desired effect.

-- 


From yongzh at cs.uchicago.edu  Fri May  4 10:22:19 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Fri, 4 May 2007 10:22:19 -0500 (CDT)
Subject: [Swift-devel] remote file/directory stuff (bug 22)
In-Reply-To: <Pine.LNX.4.64.0705041335030.20212@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0702261818090.6810@dildano.hawaga.org.uk> 
	<1172521676.27811.9.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0702271913170.6810@dildano.hawaga.org.uk>
	<1172604377.25936.2.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0705041335030.20212@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.58.0705041021240.30122@classes.cs.uchicago.edu>

what does f < "in.txt" mean here? wouldn't it be placed before the call?

Yong.

On Fri, 4 May 2007, Ben Clifford wrote:

>
>
> On Tue, 27 Feb 2007, Mihael Hategan wrote:
>
> > If you can make this translate into something like vdl:(in|
> > out)appmapping(var, path, dest), preferably after the stagein/stageout
> > directives, I can probably make it work.
>
> I have a patch that (at least for input files) makes this:
>
> type file;
>
> (file o) cat(file f) {
>   app {
>     cat "in.txt" stdout=@o;
>     f < "in.txt";
>   }
> }
>
> file a <"hello.txt">;
> file b <"output.txt">;
>
> b=cat(a);
>
>
> turn into this kml (fragment):
>
>     <vdl:execute>
>       <vdl:tr>cat</vdl:tr>
>       <vdl:stagein var="{f}"/>
>       <vdl:stageout var="{o}"/>
>       <vdl:arguments>
>         <string>in.txt</string>
>       </vdl:arguments>
>       <vdl:stdout>
>         <vdl:filename>
>             <vdl:getfield path=""><variable>o</variable></vdl:getfield>
>         </vdl:filename>
>       </vdl:stdout>
>       <vdl:inappmapping>
>         <variable>f</variable>
>         <string></string>
>         <string>"in.txt"</string>
>       </vdl:inappmapping>
>     </vdl:execute>
>
> However, I have no implementation of the <vdl:inappmapping> element. I
> guess its time for me to poke round at the guts of vdl.k a bit more.
>
>
> --
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From benc at hawaga.org.uk  Fri May  4 10:25:19 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Fri, 4 May 2007 15:25:19 +0000 (GMT)
Subject: [Swift-devel] remote file/directory stuff (bug 22)
In-Reply-To: <Pine.LNX.4.58.0705041021240.30122@classes.cs.uchicago.edu>
References: <Pine.LNX.4.64.0702261818090.6810@dildano.hawaga.org.uk> 
	<1172521676.27811.9.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0702271913170.6810@dildano.hawaga.org.uk>
	<1172604377.25936.2.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0705041335030.20212@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705041021240.30122@classes.cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705041522390.22628@dildano.hawaga.org.uk>


On Fri, 4 May 2007, Yong Zhao wrote:

> what does f < "in.txt" mean here? wouldn't it be placed before the call?

It means the input file f goes into a file called "in.txt" in the remote 
run directory, rather than into a file with the same name as whatever it 
happens to have on the submit side.

I can tweak the syntax easily enough by moving round production rules and 
templates with cut-n-paste - the semantics are more something I'm 
concerned about, in terms of actually being useful.

-- 


From hategan at mcs.anl.gov  Fri May  4 10:24:46 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 04 May 2007 10:24:46 -0500
Subject: [Swift-devel] multiple arguments
In-Reply-To: <Pine.LNX.4.64.0705041429580.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705020906040.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705020926570.25235@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705021500100.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705021024440.25235@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705021529110.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0705041033200.22628@dildano.hawaga.org.uk>
	<1178287857.14998.7.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0705041429580.22628@dildano.hawaga.org.uk>
Message-ID: <1178292286.17541.8.camel@blabla.mcs.anl.gov>

On Fri, 2007-05-04 at 14:33 +0000, Ben Clifford wrote:
> 
> On Fri, 4 May 2007, Mihael Hategan wrote:
> 
> > When it comes to files, the scheme was a little simpler. @filename would
> > pass the file names of the fringe of a particular data tree. And
> > @filenames would do the same, but each leaf is a single argument.
> 
> when used in an app block something like:
> 
>  app { 
>    myapp "-in" @filenames(myarray[*]) "-type" "fast";
>  }
> 
> then @filenames needs to be able to return something that gets passed to 
> myapp as multiple parameters, rather than a single parameter with spaces 
> in it.
> 
> I think (?) that this is hard to do if @filenames returns a single value, 
> from a SwiftScript-theory perspective (though I think in the karajan 
> implementation, @filenames can return as many values as it wants?

Which is exactly what's happening.
I'm not sure if we need to go into that much detail on that one.
@filenames returns something that app{} knows how to interpret as
meaning multiple arguments rather than one.

> )
> 


From hategan at mcs.anl.gov  Fri May  4 10:25:54 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 04 May 2007 10:25:54 -0500
Subject: [Swift-devel] multiple arguments
In-Reply-To: <Pine.LNX.4.64.0705041447280.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705020906040.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705020926570.25235@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705021500100.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705021024440.25235@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705021529110.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0705041033200.22628@dildano.hawaga.org.uk>
	<1178287857.14998.7.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0705041429580.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0705041447280.22628@dildano.hawaga.org.uk>
Message-ID: <1178292354.17541.10.camel@blabla.mcs.anl.gov>

On Fri, 2007-05-04 at 14:48 +0000, Ben Clifford wrote:
> 
> On Fri, 4 May 2007, Ben Clifford wrote:
> 
> > > When it comes to files, the scheme was a little simpler. @filename would
> > > pass the file names of the fringe of a particular data tree. And
> > > @filenames would do the same, but each leaf is a single argument.
> > 
> > when used in an app block something like:
> > 
> >  app { 
> >    myapp "-in" @filenames(myarray[*]) "-type" "fast";
> >  }
> > 
> > then @filenames needs to be able to return something that gets passed to 
> > myapp as multiple parameters, rather than a single parameter with spaces 
> > in it.
> > 
> > I think (?) that this is hard to do if @filenames returns a single value, 
> > from a SwiftScript-theory perspective (though I think in the karajan 
> > implementation, @filenames can return as many values as it wants?)
> 
> so perhaps what we should say is
> 
> @filenames(myarray)
> 
> returns an array of strings (so  @filenames(myarray)  has type string[])
> 
> and then say that the behaviour for string arrays being used in an 
> application line is to make each element into its own argument.

Exactly.

> 


From benc at hawaga.org.uk  Fri May  4 11:46:32 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Fri, 4 May 2007 16:46:32 +0000 (GMT)
Subject: [Swift-devel] swift-on-windows
Message-ID: <Pine.LNX.4.64.0705041645580.22628@dildano.hawaga.org.uk>


out of interest, has anyone ever run swift on a Windows OS?

-- 


From hategan at mcs.anl.gov  Fri May  4 20:51:16 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 04 May 2007 20:51:16 -0500
Subject: [Swift-devel] swift-on-windows
In-Reply-To: <Pine.LNX.4.64.0705041645580.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705041645580.22628@dildano.hawaga.org.uk>
Message-ID: <1178329876.19509.4.camel@blabla.mcs.anl.gov>

I think our own Yong has.
The trick, if you run locally, is the wrapper which is a bash script. If
not, I can see no obvious problems.

On Fri, 2007-05-04 at 16:46 +0000, Ben Clifford wrote:
> out of interest, has anyone ever run swift on a Windows OS?
> 


From yongzh at cs.uchicago.edu  Fri May  4 21:14:26 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Fri, 4 May 2007 21:14:26 -0500 (CDT)
Subject: [Swift-devel] swift-on-windows
In-Reply-To: <1178329876.19509.4.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0705041645580.22628@dildano.hawaga.org.uk>
	<1178329876.19509.4.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.58.0705042113040.24157@classes.cs.uchicago.edu>

Yeah, I did run swift on my windows laptop a while ago before we
introduced the shell wrapper. We can have a windows wrapper in place of
that to run on windows.

Yong.

On Fri, 4 May 2007, Mihael Hategan wrote:

> I think our own Yong has.
> The trick, if you run locally, is the wrapper which is a bash script. If
> not, I can see no obvious problems.
>
> On Fri, 2007-05-04 at 16:46 +0000, Ben Clifford wrote:
> > out of interest, has anyone ever run swift on a Windows OS?
> >
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From hategan at mcs.anl.gov  Fri May  4 21:12:49 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 04 May 2007 21:12:49 -0500
Subject: [Swift-devel] swift-on-windows
In-Reply-To: <Pine.LNX.4.58.0705042113040.24157@classes.cs.uchicago.edu>
References: <Pine.LNX.4.64.0705041645580.22628@dildano.hawaga.org.uk>
	<1178329876.19509.4.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.58.0705042113040.24157@classes.cs.uchicago.edu>
Message-ID: <1178331169.20702.0.camel@blabla.mcs.anl.gov>

Again, if the jobs themselves are NOT run locally, then the wrapper
problem does not apply.

On Fri, 2007-05-04 at 21:14 -0500, Yong Zhao wrote:
> Yeah, I did run swift on my windows laptop a while ago before we
> introduced the shell wrapper. We can have a windows wrapper in place of
> that to run on windows.
> 
> Yong.
> 
> On Fri, 4 May 2007, Mihael Hategan wrote:
> 
> > I think our own Yong has.
> > The trick, if you run locally, is the wrapper which is a bash script. If
> > not, I can see no obvious problems.
> >
> > On Fri, 2007-05-04 at 16:46 +0000, Ben Clifford wrote:
> > > out of interest, has anyone ever run swift on a Windows OS?
> > >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> 


From iraicu at cs.uchicago.edu  Fri May  4 21:29:12 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Fri, 04 May 2007 21:29:12 -0500
Subject: [Swift-devel] swift-on-windows
In-Reply-To: <1178331169.20702.0.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0705041645580.22628@dildano.hawaga.org.uk>	<1178329876.19509.4.camel@blabla.mcs.anl.gov>	<Pine.LNX.4.58.0705042113040.24157@classes.cs.uchicago.edu>
	<1178331169.20702.0.camel@blabla.mcs.anl.gov>
Message-ID: <463BEBF8.10108@cs.uchicago.edu>

What about cygwin? Linux scripts work unchanged in cygwin... for 
example, I can run my GT4 clients from windows under cygwin with no 
modifications to any of my scripts or code (non-swift related).
Ioan

Mihael Hategan wrote:
> Again, if the jobs themselves are NOT run locally, then the wrapper
> problem does not apply.
>
> On Fri, 2007-05-04 at 21:14 -0500, Yong Zhao wrote:
>   
>> Yeah, I did run swift on my windows laptop a while ago before we
>> introduced the shell wrapper. We can have a windows wrapper in place of
>> that to run on windows.
>>
>> Yong.
>>
>> On Fri, 4 May 2007, Mihael Hategan wrote:
>>
>>     
>>> I think our own Yong has.
>>> The trick, if you run locally, is the wrapper which is a bash script. If
>>> not, I can see no obvious problems.
>>>
>>> On Fri, 2007-05-04 at 16:46 +0000, Ben Clifford wrote:
>>>       
>>>> out of interest, has anyone ever run swift on a Windows OS?
>>>>
>>>>         
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>       
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070504/1f3c00bb/attachment.html>

From hategan at mcs.anl.gov  Fri May  4 22:12:25 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 04 May 2007 22:12:25 -0500
Subject: [Swift-devel] swift-on-windows
In-Reply-To: <463BEBF8.10108@cs.uchicago.edu>
References: <Pine.LNX.4.64.0705041645580.22628@dildano.hawaga.org.uk>
	<1178329876.19509.4.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.58.0705042113040.24157@classes.cs.uchicago.edu>
	<1178331169.20702.0.camel@blabla.mcs.anl.gov>
	<463BEBF8.10108@cs.uchicago.edu>
Message-ID: <1178334745.22667.2.camel@blabla.mcs.anl.gov>

On Fri, 2007-05-04 at 21:29 -0500, Ioan Raicu wrote:
> What about cygwin? Linux scripts work unchanged in cygwin... for
> example, I can run my GT4 clients from windows under cygwin with no
> modifications to any of my scripts or code (non-swift related).

Right. That should work.

There's one other thing. I'm now a little more convinced that perl may
be a better option for a wrapper. It's a little more strict than Bash,
and Jens seems to think it's not as wasteful of resources (although that
should not be that much of an issue if it's running on a worker node).

Mihael

> Ioan
> 
> Mihael Hategan wrote: 
> > Again, if the jobs themselves are NOT run locally, then the wrapper
> > problem does not apply.
> > 
> > On Fri, 2007-05-04 at 21:14 -0500, Yong Zhao wrote:
> >   
> > > Yeah, I did run swift on my windows laptop a while ago before we
> > > introduced the shell wrapper. We can have a windows wrapper in place of
> > > that to run on windows.
> > > 
> > > Yong.
> > > 
> > > On Fri, 4 May 2007, Mihael Hategan wrote:
> > > 
> > >     
> > > > I think our own Yong has.
> > > > The trick, if you run locally, is the wrapper which is a bash script. If
> > > > not, I can see no obvious problems.
> > > > 
> > > > On Fri, 2007-05-04 at 16:46 +0000, Ben Clifford wrote:
> > > >       
> > > > > out of interest, has anyone ever run swift on a Windows OS?
> > > > > 
> > > > >         
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > 
> > > >       
> > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > 
> >   
> 
> -- 
> ============================================
> Ioan Raicu
> Ph.D. Student
> ============================================
> Distributed Systems Laboratory
> Computer Science Department
> University of Chicago
> 1100 E. 58th Street, Ryerson Hall
> Chicago, IL 60637
> ============================================
> Email: iraicu at cs.uchicago.edu
> Web:   http://www.cs.uchicago.edu/~iraicu
>        http://dsl.cs.uchicago.edu/
> ============================================
> ============================================


From nefedova at mcs.anl.gov  Wed May  9 11:19:20 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Wed, 9 May 2007 11:19:20 -0500
Subject: [Swift-devel] MolDyn at Purdue
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>
Message-ID: <AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>

Hi,

I am wondering if somebody knows where in swift we could specify the  
project name (allocation) for PBS ? Without that we can't submit to  
PBS at Purdue... In VDL you'd specify that in 'properties' file.

Thanks!

Nika

Begin forwarded message:

> From: "Yuqing Deng" <yuqing.deng at gmail.com>
> Date: May 9, 2007 10:21:50 AM CDT
> To: "Veronika Nefedova" <nefedova at mcs.anl.gov>
> Subject: Re: you allocation at Purdue
>
>
> Is there a way to specify command line argument to the scheduler  
> from swift
> config files?  I need to use -A account with qsub.  Purdue site  
> does not support
> default account with pbs.
>
> Yuqing
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070509/10821a68/attachment.html>

From benc at hawaga.org.uk  Wed May  9 11:23:26 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 9 May 2007 16:23:26 +0000 (GMT)
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>
	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>


are they submitting to GRAM (and thence to PBS) or to PBS via some cog pbs 
provider?

On Wed, 9 May 2007, Veronika Nefedova wrote:

> Hi,
> 
> I am wondering if somebody knows where in swift we could specify the project
> name (allocation) for PBS ? Without that we can't submit to PBS at Purdue...
> In VDL you'd specify that in 'properties' file.
> 
> Thanks!
> 
> Nika
> 
> Begin forwarded message:
> 
> > From: "Yuqing Deng" <yuqing.deng at gmail.com>
> > Date: May 9, 2007 10:21:50 AM CDT
> > To: "Veronika Nefedova" <nefedova at mcs.anl.gov>
> > Subject: Re: you allocation at Purdue
> > 
> > 
> > Is there a way to specify command line argument to the scheduler from swift
> > config files?  I need to use -A account with qsub.  Purdue site does not
> > support
> > default account with pbs.
> > 
> > Yuqing
> > 
> 


From itf at mcs.anl.gov  Wed May  9 11:34:02 2007
From: itf at mcs.anl.gov (=?UTF-8?B?SWFuIEZvc3Rlcg==?=)
Date: Wed, 9 May 2007 16:34:02 +0000
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com><AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>
	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>
Message-ID: <233492970-1178728474-cardhu_blackberry.rim.net-3812750-@bwe026-cell00.bisx.prod.on.blackberry>

I'm surprised that the purdue gram iinterface is different to that at ncsa


Sent via BlackBerry from T-Mobile  

-----Original Message-----
From: Ben Clifford <benc at hawaga.org.uk>
Date: Wed, 9 May 2007 16:23:26 
To:Veronika Nefedova <nefedova at mcs.anl.gov>
Cc:swift-devel at ci.uchicago.edu
Subject: Re: [Swift-devel] MolDyn at Purdue


are they submitting to GRAM (and thence to PBS) or to PBS via some cog pbs 
provider?

On Wed, 9 May 2007, Veronika Nefedova wrote:

> Hi,
> 
> I am wondering if somebody knows where in swift we could specify the project
> name (allocation) for PBS ? Without that we can't submit to PBS at Purdue...
> In VDL you'd specify that in 'properties' file.
> 
> Thanks!
> 
> Nika
> 
> Begin forwarded message:
> 
> > From: "Yuqing Deng" <yuqing.deng at gmail.com>
> > Date: May 9, 2007 10:21:50 AM CDT
> > To: "Veronika Nefedova" <nefedova at mcs.anl.gov>
> > Subject: Re: you allocation at Purdue
> > 
> > 
> > Is there a way to specify command line argument to the scheduler from swift
> > config files?  I need to use -A account with qsub.  Purdue site does not
> > support
> > default account with pbs.
> > 
> > Yuqing
> > 
> 
_______________________________________________
Swift-devel mailing list
Swift-devel at ci.uchicago.edu
http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel


From nefedova at mcs.anl.gov  Wed May  9 11:38:23 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Wed, 9 May 2007 11:38:23 -0500
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>
	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>
	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>
Message-ID: <1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>

we are submitting with swift, to GRAM.

purdue ~$ globus-job-run tg-gatekeeper.purdue.teragrid.org/jobmanager- 
pbs /bin/hostname
Please specify a TG project number.
GRAM Job failed because the job failed when the job manager attempted  
to run it (error code 17)

While you can specify that for globusrun on a command line - it has  
to be a way to specify it somewhere inside swift?

Thanks!

Nika

On May 9, 2007, at 11:23 AM, Ben Clifford wrote:

>
> are they submitting to GRAM (and thence to PBS) or to PBS via some  
> cog pbs
> provider?
>
> On Wed, 9 May 2007, Veronika Nefedova wrote:
>
>> Hi,
>>
>> I am wondering if somebody knows where in swift we could specify  
>> the project
>> name (allocation) for PBS ? Without that we can't submit to PBS at  
>> Purdue...
>> In VDL you'd specify that in 'properties' file.
>>
>> Thanks!
>>
>> Nika
>>
>> Begin forwarded message:
>>
>>> From: "Yuqing Deng" <yuqing.deng at gmail.com>
>>> Date: May 9, 2007 10:21:50 AM CDT
>>> To: "Veronika Nefedova" <nefedova at mcs.anl.gov>
>>> Subject: Re: you allocation at Purdue
>>>
>>>
>>> Is there a way to specify command line argument to the scheduler  
>>> from swift
>>> config files?  I need to use -A account with qsub.  Purdue site  
>>> does not
>>> support
>>> default account with pbs.
>>>
>>> Yuqing
>>>
>>
>


From benc at hawaga.org.uk  Wed May  9 11:37:16 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 9 May 2007 16:37:16 +0000 (GMT)
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <233492970-1178728474-cardhu_blackberry.rim.net-3812750-@bwe026-cell00.bisx.prod.on.blackberry>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com><AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>
	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>
	<233492970-1178728474-cardhu_blackberry.rim.net-3812750-@bwe026-cell00.bisx.prod.on.blackberry>
Message-ID: <Pine.LNX.4.64.0705091636460.22628@dildano.hawaga.org.uk>


On Wed, 9 May 2007, Ian Foster wrote:

> I'm surprised that the purdue gram iinterface is different to that at ncsa

some differences are documented here.

http://www.teragrid.org/docs/jobs/index.php#NCSA

what tibi and nika are encountering appears to be documented.

-- 


From benc at hawaga.org.uk  Wed May  9 11:39:37 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 9 May 2007 16:39:37 +0000 (GMT)
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>
	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>
	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>
	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>


On Wed, 9 May 2007, Veronika Nefedova wrote:

> While you can specify that for globusrun on a command line - it has to 
> be a way to specify it somewhere inside swift?

mihael talked about being able to specify it as a profile entry perhaps, 
in a thread the other day on this list.

that might work - check out the VDS docs for how to specify globus RSL 
extension attributes in the site or transformation catalogs (or if you 
can't find, I can have a look).

-- 


From benc at hawaga.org.uk  Wed May  9 12:16:42 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 9 May 2007 17:16:42 +0000 (GMT)
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>
	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>
	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>
	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>
	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>


Do this:

edit your site catalog to add an entry <profile namespace="globus" 
key="project">TG-STA040020N</profile> for the purdue site, add an entry 
<profile namespace="globus" 
key="project">TG-WHATEVERYOURGRANTNUMBERIS</profile>

-- 


From tiberius at ci.uchicago.edu  Wed May  9 12:21:41 2007
From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun)
Date: Wed, 9 May 2007 12:21:41 -0500
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>
	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>
	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>
	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>
	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>
Message-ID: <fec1351f0705091021r22302442ma9227430a4e6e67@mail.gmail.com>

alternatively do this:
in your tc.data, append GLOBUS::project=TGxxxxxxx to your application definition


On 5/9/07, Ben Clifford <benc at hawaga.org.uk> wrote:
>
> Do this:
>
> edit your site catalog to add an entry <profile namespace="globus"
> key="project">TG-STA040020N</profile> for the purdue site, add an entry
> <profile namespace="globus"
> key="project">TG-WHATEVERYOURGRANTNUMBERIS</profile>
>
> --
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


-- 
Tiberiu (Tibi) Stef-Praun, PhD
Research Staff, Computation Institute
5640 S. Ellis Ave, #405
University of Chicago
http://www-unix.mcs.anl.gov/~tiberius/


From nefedova at mcs.anl.gov  Wed May  9 12:30:42 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Wed, 9 May 2007 12:30:42 -0500
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <fec1351f0705091021r22302442ma9227430a4e6e67@mail.gmail.com>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>
	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>
	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>
	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>
	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>
	<fec1351f0705091021r22302442ma9227430a4e6e67@mail.gmail.com>
Message-ID: <4F78FC2F-11EC-49FE-B9A1-FAD0ABCDFDE6@mcs.anl.gov>

So does any of it work? Have you tested it successfully?

(-;

Nika

On May 9, 2007, at 12:21 PM, Tiberiu Stef-Praun wrote:

> alternatively do this:
> in your tc.data, append GLOBUS::project=TGxxxxxxx to your  
> application definition
>
>
>
> On 5/9/07, Ben Clifford <benc at hawaga.org.uk> wrote:
>>
>> Do this:
>>
>> edit your site catalog to add an entry <profile namespace="globus"
>> key="project">TG-STA040020N</profile> for the purdue site, add an  
>> entry
>> <profile namespace="globus"
>> key="project">TG-WHATEVERYOURGRANTNUMBERIS</profile>
>>
>> --
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>
>
> -- 
> Tiberiu (Tibi) Stef-Praun, PhD
> Research Staff, Computation Institute
> 5640 S. Ellis Ave, #405
> University of Chicago
> http://www-unix.mcs.anl.gov/~tiberius/
>


From benc at hawaga.org.uk  Wed May  9 12:30:20 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 9 May 2007 17:30:20 +0000 (GMT)
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <4F78FC2F-11EC-49FE-B9A1-FAD0ABCDFDE6@mcs.anl.gov>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>
	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>
	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>
	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>
	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>
	<fec1351f0705091021r22302442ma9227430a4e6e67@mail.gmail.com>
	<4F78FC2F-11EC-49FE-B9A1-FAD0ABCDFDE6@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705091729350.22628@dildano.hawaga.org.uk>


I haven't.

On Wed, 9 May 2007, Veronika Nefedova wrote:

> So does any of it work? Have you tested it successfully?
> 
> (-;
> 
> Nika
> 
> On May 9, 2007, at 12:21 PM, Tiberiu Stef-Praun wrote:
> 
> > alternatively do this:
> > in your tc.data, append GLOBUS::project=TGxxxxxxx to your application
> > definition
> > 
> > 
> > 
> > On 5/9/07, Ben Clifford <benc at hawaga.org.uk> wrote:
> > > 
> > > Do this:
> > > 
> > > edit your site catalog to add an entry <profile namespace="globus"
> > > key="project">TG-STA040020N</profile> for the purdue site, add an entry
> > > <profile namespace="globus"
> > > key="project">TG-WHATEVERYOURGRANTNUMBERIS</profile>
> > > 
> > > --
> > > 
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > 
> > 
> > 
> > -- 
> > Tiberiu (Tibi) Stef-Praun, PhD
> > Research Staff, Computation Institute
> > 5640 S. Ellis Ave, #405
> > University of Chicago
> > http://www-unix.mcs.anl.gov/~tiberius/
> > 
> 


From nefedova at mcs.anl.gov  Wed May  9 15:16:46 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Wed, 9 May 2007 15:16:46 -0500
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>
	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>
	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>
	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>
	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>
Message-ID: <BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>

Hi, Ioan:

How do I add my project info into Falcon? (I can't submit anything to  
PBS queue unless I specify the project)

Nika

On May 9, 2007, at 12:16 PM, Ben Clifford wrote:

>
> Do this:
>
> edit your site catalog to add an entry <profile namespace="globus"
> key="project">TG-STA040020N</profile> for the purdue site, add an  
> entry
> <profile namespace="globus"
> key="project">TG-WHATEVERYOURGRANTNUMBERIS</profile>
>
> -- 
>


From iraicu at cs.uchicago.edu  Wed May  9 15:19:56 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 09 May 2007 15:19:56 -0500
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>
	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>
	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>
	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>
	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>
	<BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>
Message-ID: <46422CEC.5050807@cs.uchicago.edu>

Hmmm... I don't know.  If you know what modifications we need to make to 
the GRAM4 RSL to include the specific project, we can simply modify the 
RSL by changing the create function to add the extra info into the 
RSL... its a minor change, but we'd have to recompile the DRP stuff 
again.  If we can't add it to the RSL, then I don't know any other place 
to put this.  Anyone have any ideas?  I am using the GRAM4 Java API 
directly in the DRP code.

Ioan

PS: Here is a sample RSL...

iraicu at tg-viz-login1:~/java/Falkon_v0.8/worker> cat 
RSL.0.0.ia32-compute.1.120.14898358.xml
<job>
    
<executable>/home/iraicu/java/Falkon_v0.8/worker/run.worker.sh</executable>
    <directory>/home/iraicu/java/Falkon_v0.8/worker</directory>
    <argument>6900000</argument>
    <argument>1500000</argument>
    <maxWallTime>120</maxWallTime>
    <extensions>
         <resourceAllocationGroup>
            <hostType>ia32-compute</hostType>
            <hostCount>1</hostCount>
            <cpuCount>1</cpuCount>
            <processCount>1</processCount>
         </resourceAllocationGroup>
     </extensions>
</job>

My guess is that we could add something like
<project> ... </project>
but I am not sure...

If no one knows how to do this off the top of their heads, I'll look it up!

Veronika Nefedova wrote:
> Hi, Ioan:
>
> How do I add my project info into Falcon? (I can't submit anything to 
> PBS queue unless I specify the project)
>
> Nika
>
> On May 9, 2007, at 12:16 PM, Ben Clifford wrote:
>
>>
>> Do this:
>>
>> edit your site catalog to add an entry <profile namespace="globus"
>> key="project">TG-STA040020N</profile> for the purdue site, add an entry
>> <profile namespace="globus"
>> key="project">TG-WHATEVERYOURGRANTNUMBERIS</profile>
>>
>> -- 
>>
>
>

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From nefedova at mcs.anl.gov  Wed May  9 15:30:22 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Wed, 9 May 2007 15:30:22 -0500
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <46422CEC.5050807@cs.uchicago.edu>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>
	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>
	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>
	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>
	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>
	<BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>
	<46422CEC.5050807@cs.uchicago.edu>
Message-ID: <BF10AA75-06DB-46D5-B163-64AE15B16D71@mcs.anl.gov>

Ioan,

its at the very bottom of this thread (thats what Ben is suggesting  
for Swift) We need to include just one line, similar to that one. But  
I do not know where (-;

Nika

On May 9, 2007, at 3:19 PM, Ioan Raicu wrote:

> Hmmm... I don't know.  If you know what modifications we need to  
> make to the GRAM4 RSL to include the specific project, we can  
> simply modify the RSL by changing the create function to add the  
> extra info into the RSL... its a minor change, but we'd have to  
> recompile the DRP stuff again.  If we can't add it to the RSL, then  
> I don't know any other place to put this.  Anyone have any ideas?   
> I am using the GRAM4 Java API directly in the DRP code.
>
> Ioan
>
> PS: Here is a sample RSL...
>
> iraicu at tg-viz-login1:~/java/Falkon_v0.8/worker> cat RSL.0.0.ia32- 
> compute.1.120.14898358.xml
> <job>
>    <executable>/home/iraicu/java/Falkon_v0.8/worker/run.worker.sh</ 
> executable>
>    <directory>/home/iraicu/java/Falkon_v0.8/worker</directory>
>    <argument>6900000</argument>
>    <argument>1500000</argument>
>    <maxWallTime>120</maxWallTime>
>    <extensions>
>         <resourceAllocationGroup>
>            <hostType>ia32-compute</hostType>
>            <hostCount>1</hostCount>
>            <cpuCount>1</cpuCount>
>            <processCount>1</processCount>
>         </resourceAllocationGroup>
>     </extensions>
> </job>
>
> My guess is that we could add something like
> <project> ... </project>
> but I am not sure...
>
> If no one knows how to do this off the top of their heads, I'll  
> look it up!
>
> Veronika Nefedova wrote:
>> Hi, Ioan:
>>
>> How do I add my project info into Falcon? (I can't submit anything  
>> to PBS queue unless I specify the project)
>>
>> Nika
>>
>> On May 9, 2007, at 12:16 PM, Ben Clifford wrote:
>>
>>>
>>> Do this:
>>>
>>> edit your site catalog to add an entry <profile namespace="globus"
>>> key="project">TG-STA040020N</profile> for the purdue site, add an  
>>> entry
>>> <profile namespace="globus"
>>> key="project">TG-WHATEVERYOURGRANTNUMBERIS</profile>
>>>
>>> -- 
>>>
>>
>>
>
> -- 
> ============================================
> Ioan Raicu
> Ph.D. Student
> ============================================
> Distributed Systems Laboratory
> Computer Science Department
> University of Chicago
> 1100 E. 58th Street, Ryerson Hall
> Chicago, IL 60637
> ============================================
> Email: iraicu at cs.uchicago.edu
> Web:   http://www.cs.uchicago.edu/~iraicu
>       http://dsl.cs.uchicago.edu/
> ============================================
> ============================================
>


From benc at hawaga.org.uk  Thu May 10 01:47:55 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 10 May 2007 06:47:55 +0000 (GMT)
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <46422CEC.5050807@cs.uchicago.edu>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>
	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>
	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>
	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>
	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>
	<BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>
	<46422CEC.5050807@cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705100645150.20212@dildano.hawaga.org.uk>


As you suggested, the GRAM4 RSL extension to use is 
<project>whatever</project> according to 
http://teragrid.org/userinfo/jobs/

Probably needs to go under extensions, (in xpath, extensions/project)

On Wed, 9 May 2007, Ioan Raicu wrote:

> Hmmm... I don't know.  If you know what modifications we need to make to the
> GRAM4 RSL to include the specific project, we can simply modify the RSL by
> changing the create function to add the extra info into the RSL... its a minor
> change, but we'd have to recompile the DRP stuff again.  If we can't add it to
> the RSL, then I don't know any other place to put this.  Anyone have any
> ideas?  I am using the GRAM4 Java API directly in the DRP code.
> 
> Ioan
> 
> PS: Here is a sample RSL...
> 
> iraicu at tg-viz-login1:~/java/Falkon_v0.8/worker> cat
> RSL.0.0.ia32-compute.1.120.14898358.xml
> <job>
>    <executable>/home/iraicu/java/Falkon_v0.8/worker/run.worker.sh</executable>
>    <directory>/home/iraicu/java/Falkon_v0.8/worker</directory>
>    <argument>6900000</argument>
>    <argument>1500000</argument>
>    <maxWallTime>120</maxWallTime>
>    <extensions>
>         <resourceAllocationGroup>
>            <hostType>ia32-compute</hostType>
>            <hostCount>1</hostCount>
>            <cpuCount>1</cpuCount>
>            <processCount>1</processCount>
>         </resourceAllocationGroup>
>     </extensions>
> </job>
> 
> My guess is that we could add something like
> <project> ... </project>
> but I am not sure...
> 
> If no one knows how to do this off the top of their heads, I'll look it up!
> 
> Veronika Nefedova wrote:
> > Hi, Ioan:
> > 
> > How do I add my project info into Falcon? (I can't submit anything to PBS
> > queue unless I specify the project)
> > 
> > Nika
> > 
> > On May 9, 2007, at 12:16 PM, Ben Clifford wrote:
> > 
> > > 
> > > Do this:
> > > 
> > > edit your site catalog to add an entry <profile namespace="globus"
> > > key="project">TG-STA040020N</profile> for the purdue site, add an entry
> > > <profile namespace="globus"
> > > key="project">TG-WHATEVERYOURGRANTNUMBERIS</profile>
> > > 
> > > -- 
> > > 
> > 
> > 
> 
> 


From benc at hawaga.org.uk  Thu May 10 05:21:44 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 10 May 2007 10:21:44 +0000 (GMT)
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <46422CEC.5050807@cs.uchicago.edu>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>
	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>
	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>
	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>
	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>
	<BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>
	<46422CEC.5050807@cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705101018570.20212@dildano.hawaga.org.uk>


not sure whether you're looking to make this code more closely integrated 
with swift and/or a product rather than a research project, but you might 
make the below submission use profile information from the site catalog 
(and transformation catalog?) - it doesn't look like you're doing anything 
fancy in the submission.

On Wed, 9 May 2007, Ioan Raicu wrote:

> Hmmm... I don't know.  If you know what modifications we need to make to the
> GRAM4 RSL to include the specific project, we can simply modify the RSL by
> changing the create function to add the extra info into the RSL... its a minor
> change, but we'd have to recompile the DRP stuff again.  If we can't add it to
> the RSL, then I don't know any other place to put this.  Anyone have any
> ideas?  I am using the GRAM4 Java API directly in the DRP code.
> 
> Ioan
> 
> PS: Here is a sample RSL...
> 
> iraicu at tg-viz-login1:~/java/Falkon_v0.8/worker> cat
> RSL.0.0.ia32-compute.1.120.14898358.xml
> <job>
>    <executable>/home/iraicu/java/Falkon_v0.8/worker/run.worker.sh</executable>
>    <directory>/home/iraicu/java/Falkon_v0.8/worker</directory>
>    <argument>6900000</argument>
>    <argument>1500000</argument>
>    <maxWallTime>120</maxWallTime>
>    <extensions>
>         <resourceAllocationGroup>
>            <hostType>ia32-compute</hostType>
>            <hostCount>1</hostCount>
>            <cpuCount>1</cpuCount>
>            <processCount>1</processCount>
>         </resourceAllocationGroup>
>     </extensions>
> </job>
> 
> My guess is that we could add something like
> <project> ... </project>
> but I am not sure...
> 
> If no one knows how to do this off the top of their heads, I'll look it up!
> 
> Veronika Nefedova wrote:
> > Hi, Ioan:
> > 
> > How do I add my project info into Falcon? (I can't submit anything to PBS
> > queue unless I specify the project)
> > 
> > Nika
> > 
> > On May 9, 2007, at 12:16 PM, Ben Clifford wrote:
> > 
> > > 
> > > Do this:
> > > 
> > > edit your site catalog to add an entry <profile namespace="globus"
> > > key="project">TG-STA040020N</profile> for the purdue site, add an entry
> > > <profile namespace="globus"
> > > key="project">TG-WHATEVERYOURGRANTNUMBERIS</profile>
> > > 
> > > -- 
> > > 
> > 
> > 
> 
> 


From iraicu at cs.uchicago.edu  Thu May 10 12:39:28 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Thu, 10 May 2007 12:39:28 -0500
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <Pine.LNX.4.64.0705101018570.20212@dildano.hawaga.org.uk>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>
	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>
	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>
	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>
	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>
	<BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>
	<46422CEC.5050807@cs.uchicago.edu>
	<Pine.LNX.4.64.0705101018570.20212@dildano.hawaga.org.uk>
Message-ID: <464358D0.1060402@cs.uchicago.edu>

So I made the modifications to the code that generates the GRAM RSL to 
take a command line arguement -project ..., which then is simply passed 
to the RSL file as <project> ... </project>.  At the moment, this is 
something that needs to be set in Falkon at startup, and all resources 
provisioned by Falkon will use the same project.   For now I think we 
have a solution that works at the various TG sites, and it is not 
tightly integrated with Swift. 

The issue is much more complex if you want Swift to carry the project 
information on a per job basis, and charge it to a (potentially) 
different project each job. 

Ioan

Ben Clifford wrote:
> not sure whether you're looking to make this code more closely integrated 
> with swift and/or a product rather than a research project, but you might 
> make the below submission use profile information from the site catalog 
> (and transformation catalog?) - it doesn't look like you're doing anything 
> fancy in the submission.
>
> On Wed, 9 May 2007, Ioan Raicu wrote:
>
>   
>> Hmmm... I don't know.  If you know what modifications we need to make to the
>> GRAM4 RSL to include the specific project, we can simply modify the RSL by
>> changing the create function to add the extra info into the RSL... its a minor
>> change, but we'd have to recompile the DRP stuff again.  If we can't add it to
>> the RSL, then I don't know any other place to put this.  Anyone have any
>> ideas?  I am using the GRAM4 Java API directly in the DRP code.
>>
>> Ioan
>>
>> PS: Here is a sample RSL...
>>
>> iraicu at tg-viz-login1:~/java/Falkon_v0.8/worker> cat
>> RSL.0.0.ia32-compute.1.120.14898358.xml
>> <job>
>>    <executable>/home/iraicu/java/Falkon_v0.8/worker/run.worker.sh</executable>
>>    <directory>/home/iraicu/java/Falkon_v0.8/worker</directory>
>>    <argument>6900000</argument>
>>    <argument>1500000</argument>
>>    <maxWallTime>120</maxWallTime>
>>    <extensions>
>>         <resourceAllocationGroup>
>>            <hostType>ia32-compute</hostType>
>>            <hostCount>1</hostCount>
>>            <cpuCount>1</cpuCount>
>>            <processCount>1</processCount>
>>         </resourceAllocationGroup>
>>     </extensions>
>> </job>
>>
>> My guess is that we could add something like
>> <project> ... </project>
>> but I am not sure...
>>
>> If no one knows how to do this off the top of their heads, I'll look it up!
>>
>> Veronika Nefedova wrote:
>>     
>>> Hi, Ioan:
>>>
>>> How do I add my project info into Falcon? (I can't submit anything to PBS
>>> queue unless I specify the project)
>>>
>>> Nika
>>>
>>> On May 9, 2007, at 12:16 PM, Ben Clifford wrote:
>>>
>>>       
>>>> Do this:
>>>>
>>>> edit your site catalog to add an entry <profile namespace="globus"
>>>> key="project">TG-STA040020N</profile> for the purdue site, add an entry
>>>> <profile namespace="globus"
>>>> key="project">TG-WHATEVERYOURGRANTNUMBERIS</profile>
>>>>
>>>> -- 
>>>>
>>>>         
>>>       
>>     
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070510/7abaa011/attachment.html>

From foster at mcs.anl.gov  Thu May 10 13:34:18 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Thu, 10 May 2007 13:34:18 -0500
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <464358D0.1060402@cs.uchicago.edu>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>	<BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>	<46422CEC.5050807@cs.uchicago.edu>	<Pine.LNX.4.64.0705101018570.20212@dildano.hawaga.org.uk>
	<464358D0.1060402@cs.uchicago.edu>
Message-ID: <464365AA.50407@mcs.anl.gov>

we don't want to do that ("carry the project information on a per job 
basis, and charge it to a (potentially) different project each job")

Ioan Raicu wrote:
> So I made the modifications to the code that generates the GRAM RSL to 
> take a command line arguement -project ..., which then is simply 
> passed to the RSL file as <project> ... </project>.  At the moment, 
> this is something that needs to be set in Falkon at startup, and all 
> resources provisioned by Falkon will use the same project.   For now I 
> think we have a solution that works at the various TG sites, and it is 
> not tightly integrated with Swift. 
>
> The issue is much more complex if you want Swift to carry the project 
> information on a per job basis, and charge it to a (potentially) 
> different project each job. 
>
> Ioan
>
> Ben Clifford wrote:
>> not sure whether you're looking to make this code more closely integrated 
>> with swift and/or a product rather than a research project, but you might 
>> make the below submission use profile information from the site catalog 
>> (and transformation catalog?) - it doesn't look like you're doing anything 
>> fancy in the submission.
>>
>> On Wed, 9 May 2007, Ioan Raicu wrote:
>>
>>   
>>> Hmmm... I don't know.  If you know what modifications we need to make to the
>>> GRAM4 RSL to include the specific project, we can simply modify the RSL by
>>> changing the create function to add the extra info into the RSL... its a minor
>>> change, but we'd have to recompile the DRP stuff again.  If we can't add it to
>>> the RSL, then I don't know any other place to put this.  Anyone have any
>>> ideas?  I am using the GRAM4 Java API directly in the DRP code.
>>>
>>> Ioan
>>>
>>> PS: Here is a sample RSL...
>>>
>>> iraicu at tg-viz-login1:~/java/Falkon_v0.8/worker> cat
>>> RSL.0.0.ia32-compute.1.120.14898358.xml
>>> <job>
>>>    <executable>/home/iraicu/java/Falkon_v0.8/worker/run.worker.sh</executable>
>>>    <directory>/home/iraicu/java/Falkon_v0.8/worker</directory>
>>>    <argument>6900000</argument>
>>>    <argument>1500000</argument>
>>>    <maxWallTime>120</maxWallTime>
>>>    <extensions>
>>>         <resourceAllocationGroup>
>>>            <hostType>ia32-compute</hostType>
>>>            <hostCount>1</hostCount>
>>>            <cpuCount>1</cpuCount>
>>>            <processCount>1</processCount>
>>>         </resourceAllocationGroup>
>>>     </extensions>
>>> </job>
>>>
>>> My guess is that we could add something like
>>> <project> ... </project>
>>> but I am not sure...
>>>
>>> If no one knows how to do this off the top of their heads, I'll look it up!
>>>
>>> Veronika Nefedova wrote:
>>>     
>>>> Hi, Ioan:
>>>>
>>>> How do I add my project info into Falcon? (I can't submit anything to PBS
>>>> queue unless I specify the project)
>>>>
>>>> Nika
>>>>
>>>> On May 9, 2007, at 12:16 PM, Ben Clifford wrote:
>>>>
>>>>       
>>>>> Do this:
>>>>>
>>>>> edit your site catalog to add an entry <profile namespace="globus"
>>>>> key="project">TG-STA040020N</profile> for the purdue site, add an entry
>>>>> <profile namespace="globus"
>>>>> key="project">TG-WHATEVERYOURGRANTNUMBERIS</profile>
>>>>>
>>>>> -- 
>>>>>
>>>>>         
>>>>       
>>>     
>>
>>   
>
> -- 
> ============================================
> Ioan Raicu
> Ph.D. Student
> ============================================
> Distributed Systems Laboratory
> Computer Science Department
> University of Chicago
> 1100 E. 58th Street, Ryerson Hall
> Chicago, IL 60637
> ============================================
> Email: iraicu at cs.uchicago.edu
> Web:   http://www.cs.uchicago.edu/~iraicu
>        http://dsl.cs.uchicago.edu/
> ============================================
> ============================================
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>   

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070510/753470a9/attachment.html>

From iraicu at cs.uchicago.edu  Thu May 10 13:39:57 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Thu, 10 May 2007 13:39:57 -0500
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <464365AA.50407@mcs.anl.gov>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>	<BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>	<46422CEC.5050807@cs.uchicago.edu>	<Pine.LNX.4.64.0705101018570.20212@dildano.hawaga.org.uk>
	<464358D0.1060402@cs.uchicago.edu> <464365AA.50407@mcs.anl.gov>
Message-ID: <464366FD.8060008@cs.uchicago.edu>

Great, than we are set, the project is configurable at the Falkon startup!
Ioan

Ian Foster wrote:
> we don't want to do that ("carry the project information on a per job 
> basis, and charge it to a (potentially) different project each job")
>
> Ioan Raicu wrote:
>> So I made the modifications to the code that generates the GRAM RSL 
>> to take a command line arguement -project ..., which then is simply 
>> passed to the RSL file as <project> ... </project>.  At the moment, 
>> this is something that needs to be set in Falkon at startup, and all 
>> resources provisioned by Falkon will use the same project.   For now 
>> I think we have a solution that works at the various TG sites, and it 
>> is not tightly integrated with Swift. 
>>
>> The issue is much more complex if you want Swift to carry the project 
>> information on a per job basis, and charge it to a (potentially) 
>> different project each job. 
>>
>> Ioan
>>
>> Ben Clifford wrote:
>>> not sure whether you're looking to make this code more closely integrated 
>>> with swift and/or a product rather than a research project, but you might 
>>> make the below submission use profile information from the site catalog 
>>> (and transformation catalog?) - it doesn't look like you're doing anything 
>>> fancy in the submission.
>>>
>>> On Wed, 9 May 2007, Ioan Raicu wrote:
>>>
>>>   
>>>> Hmmm... I don't know.  If you know what modifications we need to make to the
>>>> GRAM4 RSL to include the specific project, we can simply modify the RSL by
>>>> changing the create function to add the extra info into the RSL... its a minor
>>>> change, but we'd have to recompile the DRP stuff again.  If we can't add it to
>>>> the RSL, then I don't know any other place to put this.  Anyone have any
>>>> ideas?  I am using the GRAM4 Java API directly in the DRP code.
>>>>
>>>> Ioan
>>>>
>>>> PS: Here is a sample RSL...
>>>>
>>>> iraicu at tg-viz-login1:~/java/Falkon_v0.8/worker> cat
>>>> RSL.0.0.ia32-compute.1.120.14898358.xml
>>>> <job>
>>>>    <executable>/home/iraicu/java/Falkon_v0.8/worker/run.worker.sh</executable>
>>>>    <directory>/home/iraicu/java/Falkon_v0.8/worker</directory>
>>>>    <argument>6900000</argument>
>>>>    <argument>1500000</argument>
>>>>    <maxWallTime>120</maxWallTime>
>>>>    <extensions>
>>>>         <resourceAllocationGroup>
>>>>            <hostType>ia32-compute</hostType>
>>>>            <hostCount>1</hostCount>
>>>>            <cpuCount>1</cpuCount>
>>>>            <processCount>1</processCount>
>>>>         </resourceAllocationGroup>
>>>>     </extensions>
>>>> </job>
>>>>
>>>> My guess is that we could add something like
>>>> <project> ... </project>
>>>> but I am not sure...
>>>>
>>>> If no one knows how to do this off the top of their heads, I'll look it up!
>>>>
>>>> Veronika Nefedova wrote:
>>>>     
>>>>> Hi, Ioan:
>>>>>
>>>>> How do I add my project info into Falcon? (I can't submit anything to PBS
>>>>> queue unless I specify the project)
>>>>>
>>>>> Nika
>>>>>
>>>>> On May 9, 2007, at 12:16 PM, Ben Clifford wrote:
>>>>>
>>>>>       
>>>>>> Do this:
>>>>>>
>>>>>> edit your site catalog to add an entry <profile namespace="globus"
>>>>>> key="project">TG-STA040020N</profile> for the purdue site, add an entry
>>>>>> <profile namespace="globus"
>>>>>> key="project">TG-WHATEVERYOURGRANTNUMBERIS</profile>
>>>>>>
>>>>>> -- 
>>>>>>
>>>>>>         
>>>>>       
>>>>     
>>>
>>>   
>>
>> -- 
>> ============================================
>> Ioan Raicu
>> Ph.D. Student
>> ============================================
>> Distributed Systems Laboratory
>> Computer Science Department
>> University of Chicago
>> 1100 E. 58th Street, Ryerson Hall
>> Chicago, IL 60637
>> ============================================
>> Email: iraicu at cs.uchicago.edu
>> Web:   http://www.cs.uchicago.edu/~iraicu
>>        http://dsl.cs.uchicago.edu/
>> ============================================
>> ============================================
>>   
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>   
>
> -- 
>
>    Ian Foster, Director, Computation Institute
> Argonne National Laboratory & University of Chicago
> Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
> Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
> Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
>       Globus Alliance: www.globus.org.
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070510/a80d676b/attachment.html>

From foster at mcs.anl.gov  Fri May 11 08:31:11 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Fri, 11 May 2007 08:31:11 -0500
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <464366FD.8060008@cs.uchicago.edu>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>	<BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>	<46422CEC.5050807@cs.uchicago.edu>	<Pine.LNX.4.64.0705101018570.20212@dildano.hawaga.org.uk>
	<464358D0.1060402@cs.uchicago.edu> <464365AA.50407@mcs.anl.gov>
	<464366FD.8060008@cs.uchicago.edu>
Message-ID: <4644701F.4040007@mcs.anl.gov>

I note that we have stopped running at NCSA and switched to trying to 
run at Purdue. A good thing to try, certainly.

However, could we not have had a big job in the queue at NCSA all this 
time, also, using Falkon, which would have run by now?

Ian.

Ioan Raicu wrote:
> Great, than we are set, the project is configurable at the Falkon startup!
> Ioan
>
> Ian Foster wrote:
>> we don't want to do that ("carry the project information on a per job 
>> basis, and charge it to a (potentially) different project each job")
>>
>> Ioan Raicu wrote:
>>> So I made the modifications to the code that generates the GRAM RSL 
>>> to take a command line arguement -project ..., which then is simply 
>>> passed to the RSL file as <project> ... </project>.  At the moment, 
>>> this is something that needs to be set in Falkon at startup, and all 
>>> resources provisioned by Falkon will use the same project.   For now 
>>> I think we have a solution that works at the various TG sites, and 
>>> it is not tightly integrated with Swift. 
>>>
>>> The issue is much more complex if you want Swift to carry the 
>>> project information on a per job basis, and charge it to a 
>>> (potentially) different project each job. 
>>>
>>> Ioan
>>>
>>> Ben Clifford wrote:
>>>> not sure whether you're looking to make this code more closely integrated 
>>>> with swift and/or a product rather than a research project, but you might 
>>>> make the below submission use profile information from the site catalog 
>>>> (and transformation catalog?) - it doesn't look like you're doing anything 
>>>> fancy in the submission.
>>>>
>>>> On Wed, 9 May 2007, Ioan Raicu wrote:
>>>>
>>>>   
>>>>> Hmmm... I don't know.  If you know what modifications we need to make to the
>>>>> GRAM4 RSL to include the specific project, we can simply modify the RSL by
>>>>> changing the create function to add the extra info into the RSL... its a minor
>>>>> change, but we'd have to recompile the DRP stuff again.  If we can't add it to
>>>>> the RSL, then I don't know any other place to put this.  Anyone have any
>>>>> ideas?  I am using the GRAM4 Java API directly in the DRP code.
>>>>>
>>>>> Ioan
>>>>>
>>>>> PS: Here is a sample RSL...
>>>>>
>>>>> iraicu at tg-viz-login1:~/java/Falkon_v0.8/worker> cat
>>>>> RSL.0.0.ia32-compute.1.120.14898358.xml
>>>>> <job>
>>>>>    <executable>/home/iraicu/java/Falkon_v0.8/worker/run.worker.sh</executable>
>>>>>    <directory>/home/iraicu/java/Falkon_v0.8/worker</directory>
>>>>>    <argument>6900000</argument>
>>>>>    <argument>1500000</argument>
>>>>>    <maxWallTime>120</maxWallTime>
>>>>>    <extensions>
>>>>>         <resourceAllocationGroup>
>>>>>            <hostType>ia32-compute</hostType>
>>>>>            <hostCount>1</hostCount>
>>>>>            <cpuCount>1</cpuCount>
>>>>>            <processCount>1</processCount>
>>>>>         </resourceAllocationGroup>
>>>>>     </extensions>
>>>>> </job>
>>>>>
>>>>> My guess is that we could add something like
>>>>> <project> ... </project>
>>>>> but I am not sure...
>>>>>
>>>>> If no one knows how to do this off the top of their heads, I'll look it up!
>>>>>
>>>>> Veronika Nefedova wrote:
>>>>>     
>>>>>> Hi, Ioan:
>>>>>>
>>>>>> How do I add my project info into Falcon? (I can't submit anything to PBS
>>>>>> queue unless I specify the project)
>>>>>>
>>>>>> Nika
>>>>>>
>>>>>> On May 9, 2007, at 12:16 PM, Ben Clifford wrote:
>>>>>>
>>>>>>       
>>>>>>> Do this:
>>>>>>>
>>>>>>> edit your site catalog to add an entry <profile namespace="globus"
>>>>>>> key="project">TG-STA040020N</profile> for the purdue site, add an entry
>>>>>>> <profile namespace="globus"
>>>>>>> key="project">TG-WHATEVERYOURGRANTNUMBERIS</profile>
>>>>>>>
>>>>>>> -- 
>>>>>>>
>>>>>>>         
>>>>>>       
>>>>>     
>>>>
>>>>   
>>>
>>> -- 
>>> ============================================
>>> Ioan Raicu
>>> Ph.D. Student
>>> ============================================
>>> Distributed Systems Laboratory
>>> Computer Science Department
>>> University of Chicago
>>> 1100 E. 58th Street, Ryerson Hall
>>> Chicago, IL 60637
>>> ============================================
>>> Email: iraicu at cs.uchicago.edu
>>> Web:   http://www.cs.uchicago.edu/~iraicu
>>>        http://dsl.cs.uchicago.edu/
>>> ============================================
>>> ============================================
>>>   
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>   
>>
>> -- 
>>
>>    Ian Foster, Director, Computation Institute
>> Argonne National Laboratory & University of Chicago
>> Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
>> Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
>> Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
>>       Globus Alliance: www.globus.org.
>>   
>
> -- 
> ============================================
> Ioan Raicu
> Ph.D. Student
> ============================================
> Distributed Systems Laboratory
> Computer Science Department
> University of Chicago
> 1100 E. 58th Street, Ryerson Hall
> Chicago, IL 60637
> ============================================
> Email: iraicu at cs.uchicago.edu
> Web:   http://www.cs.uchicago.edu/~iraicu
>        http://dsl.cs.uchicago.edu/
> ============================================
> ============================================
>   

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070511/60739d42/attachment.html>

From nefedova at mcs.anl.gov  Fri May 11 08:58:12 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 11 May 2007 08:58:12 -0500
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <4644701F.4040007@mcs.anl.gov>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>	<BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>	<46422CEC.5050807@cs.uchicago.edu>	<Pine.LNX.4.64.0705101018570.20212@dildano.hawaga.org.uk>
	<464358D0.1060402@cs.uchicago.edu> <464365AA.50407@mcs.anl.gov>
	<464366FD.8060008@cs.uchicago.edu> <4644701F.4040007@mcs.anl.gov>
Message-ID: <B9AC4D80-5BAB-4894-B5C0-6318BA6F9FAA@mcs.anl.gov>

I think we had a problem submitting a big reservation to NCSA - even  
a smaller ones were in the queue for more then a week at that time.  
When we did a time estimate on a queue time it said something like  
'unable to predict' or 'unable to accept'...
Ioan - do you remember what was the exact problem?

Nika


On May 11, 2007, at 8:31 AM, Ian Foster wrote:

> I note that we have stopped running at NCSA and switched to trying  
> to run at Purdue. A good thing to try, certainly.
>
> However, could we not have had a big job in the queue at NCSA all  
> this time, also, using Falkon, which would have run by now?
>
> Ian.
>
> Ioan Raicu wrote:
>> Great, than we are set, the project is configurable at the Falkon  
>> startup!
>> Ioan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070511/9aeb1095/attachment.html>

From itf at mcs.anl.gov  Fri May 11 09:01:27 2007
From: itf at mcs.anl.gov (=?UTF-8?B?SWFuIEZvc3Rlcg==?=)
Date: Fri, 11 May 2007 14:01:27 +0000
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <B9AC4D80-5BAB-4894-B5C0-6318BA6F9FAA@mcs.anl.gov>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>	<BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>	<46422CEC.5050807@cs.uchicago.edu>	<Pine.LNX.4.64.0705101018570.20212@dildano.hawaga.org.uk>
	<464358D0.1060402@cs.uchicago.edu> <464365AA.50407@mcs.anl.gov>
	<464366FD.8060008@cs.uchicago.edu> <4644701F.4040007@mcs.anl.gov>
	<B9AC4D80-5BAB-4894-B5C0-6318BA6F9FAA@mcs.anl.gov>
Message-ID: <1819896301-1178892121-cardhu_blackberry.rim.net-1642341055-@bwe017-cell00.bisx.prod.on.blackberry>

It seems unlikely to me that you can't even submit it?

Sent via BlackBerry from T-Mobile  

-----Original Message-----
From: Veronika Nefedova <nefedova at mcs.anl.gov>
Date: Fri, 11 May 2007 08:58:12 
To:Ian Foster <foster at mcs.anl.gov>
Cc:iraicu at cs.uchicago.edu, swift-devel at ci.uchicago.edu
Subject: Re: [Swift-devel] MolDyn at Purdue

I think we had a problem submitting a big reservation to NCSA - even a smaller ones were in the queue for more then a week at that time. When we did a time estimate on a queue time it said something like 'unable to predict' or 'unable to accept'...?
Ioan - do you remember what was the exact problem?


Nika


On May 11, 2007, at 8:31 AM, Ian Foster wrote:
I note that we have stopped running at NCSA and switched to trying to run at Purdue. A good thing to try, certainly.

However, could we not have had a big job in the queue at NCSA all this time, also, using Falkon, which would have run by now?

Ian.

Ioan Raicu wrote:Great, than we are set, the project is configurable at the Falkon startup!
Ioan


From nefedova at mcs.anl.gov  Fri May 11 09:18:35 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 11 May 2007 09:18:35 -0500
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <1819896301-1178892121-cardhu_blackberry.rim.net-1642341055-@bwe017-cell00.bisx.prod.on.blackberry>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>	<BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>	<46422CEC.5050807@cs.uchicago.edu>	<Pine.LNX.4.64.0705101018570.20212@dildano.hawaga.org.uk>
	<464358D0.1060402@cs.uchicago.edu> <464365AA.50407@mcs.anl.gov>
	<464366FD.8060008@cs.uchicago.edu> <4644701F.4040007@mcs.anl.gov>
	<B9AC4D80-5BAB-4894-B5C0-6318BA6F9FAA@mcs.anl.gov>
	<1819896301-1178892121-cardhu_blackberry.rim.net-1642341055-@bwe017-cell00.bisx.prod.on.blackberry>
Message-ID: <D67D65D9-80AC-4254-A8F8-A43DA0D6FF44@mcs.anl.gov>

Nope, its quite possible. Last week I couldn't submit a single job  
for almost a day -- their queue was completely full! The message was  
something like 'not accepting new jobs in a queue' - or something  
like that. The cluster is ridiculously busy. I could try to submit  
today a reservation for , say, 20 molecules...

Nika

On May 11, 2007, at 9:01 AM, Ian Foster wrote:

> It seems unlikely to me that you can't even submit it?
>
> Sent via BlackBerry from T-Mobile
>
> -----Original Message-----
> From: Veronika Nefedova <nefedova at mcs.anl.gov>
> Date: Fri, 11 May 2007 08:58:12
> To:Ian Foster <foster at mcs.anl.gov>
> Cc:iraicu at cs.uchicago.edu, swift-devel at ci.uchicago.edu
> Subject: Re: [Swift-devel] MolDyn at Purdue
>
> I think we had a problem submitting a big reservation to NCSA -  
> even a smaller ones were in the queue for more then a week at that  
> time. When we did a time estimate on a queue time it said something  
> like 'unable to predict' or 'unable to accept'...
> Ioan - do you remember what was the exact problem?
>
>
> Nika
>
>
>
>
> On May 11, 2007, at 8:31 AM, Ian Foster wrote:
> I note that we have stopped running at NCSA and switched to trying  
> to run at Purdue. A good thing to try, certainly.
>
> However, could we not have had a big job in the queue at NCSA all  
> this time, also, using Falkon, which would have run by now?
>
> Ian.
>
> Ioan Raicu wrote:Great, than we are set, the project is  
> configurable at the Falkon startup!
> Ioan
>


From iraicu at cs.uchicago.edu  Fri May 11 10:18:20 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Fri, 11 May 2007 10:18:20 -0500
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <1819896301-1178892121-cardhu_blackberry.rim.net-1642341055-@bwe017-cell00.bisx.prod.on.blackberry>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>	<BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>	<46422CEC.5050807@cs.uchicago.edu>	<Pine.LNX.4.64.0705101018570.20212@dildano.hawaga.org.uk>
	<464358D0.1060402@cs.uchicago.edu> <464365AA.50407@mcs.anl.gov>
	<464366FD.8060008@cs.uchicago.edu> <4644701F.4040007@mcs.anl.gov>
	<B9AC4D80-5BAB-4894-B5C0-6318BA6F9FAA@mcs.anl.gov>
	<1819896301-1178892121-cardhu_blackberry.rim.net-1642341055-@bwe017-cell00.bisx.prod.on.blackberry>
Message-ID: <4644893C.6050300@cs.uchicago.edu>

I remember using he batch queue prediction system to try to estimate how 
long the queues would be, and we were getting relatively long queues (on 
the order of days) for just a few dozen processors for a a 24 hour 
period, and if we asked for anything significant (100+ processors), the 
prediction system was saying that it cannot give us a prediction... my 
guess is that the queue wait would have been longer than the maximum the 
prediction models were designed for.  The site was really busy, and 
there were hundreds of large jobs involving 100~1000 processors each run 
for days at a time.  We were essentially discouraged by all this, and 
decided that its not worth trying to do any large runs at NCSA (at that 
time), and that Nika would try to install the application at Purdue, and 
do try some larger scale runs there, as the Purdue site seemed to be 
relatively idle.  So, we never tried to submit a large allocation at 
NCSA... but maybewe should have, maybe we would have gotten it by now.

Ioan

Ian Foster wrote:
> It seems unlikely to me that you can't even submit it?
>
> Sent via BlackBerry from T-Mobile  
>
> -----Original Message-----
> From: Veronika Nefedova <nefedova at mcs.anl.gov>
> Date: Fri, 11 May 2007 08:58:12 
> To:Ian Foster <foster at mcs.anl.gov>
> Cc:iraicu at cs.uchicago.edu, swift-devel at ci.uchicago.edu
> Subject: Re: [Swift-devel] MolDyn at Purdue
>
> I think we had a problem submitting a big reservation to NCSA - even a smaller ones were in the queue for more then a week at that time. When we did a time estimate on a queue time it said something like 'unable to predict' or 'unable to accept'... 
> Ioan - do you remember what was the exact problem?
>
>
> Nika
>
>
>
>
> On May 11, 2007, at 8:31 AM, Ian Foster wrote:
> I note that we have stopped running at NCSA and switched to trying to run at Purdue. A good thing to try, certainly.
>
> However, could we not have had a big job in the queue at NCSA all this time, also, using Falkon, which would have run by now?
>
> Ian.
>
> Ioan Raicu wrote:Great, than we are set, the project is configurable at the Falkon startup!
> Ioan
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From iraicu at cs.uchicago.edu  Fri May 11 10:24:16 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Fri, 11 May 2007 10:24:16 -0500
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <D67D65D9-80AC-4254-A8F8-A43DA0D6FF44@mcs.anl.gov>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>	<BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>	<46422CEC.5050807@cs.uchicago.edu>	<Pine.LNX.4.64.0705101018570.20212@dildano.hawaga.org.uk>
	<464358D0.1060402@cs.uchicago.edu> <464365AA.50407@mcs.anl.gov>
	<464366FD.8060008@cs.uchicago.edu> <4644701F.4040007@mcs.anl.gov>
	<B9AC4D80-5BAB-4894-B5C0-6318BA6F9FAA@mcs.anl.gov>
	<1819896301-1178892121-cardhu_blackberry.rim.net-1642341055-@bwe017-cell00.bisx.prod.on.blackberry>
	<D67D65D9-80AC-4254-A8F8-A43DA0D6FF44@mcs.anl.gov>
Message-ID: <46448AA0.5030705@cs.uchicago.edu>

Right, so if we want to get roughly the same execution time of 77 
minutes, we would need 34*20 = 680 machines for 2 hours, right?  If we 
halve the machine numbers, we can double the time reservation, right?

Let me know if you need help with the Falkon settings!

Ioan


Veronika Nefedova wrote:
> Nope, its quite possible. Last week I couldn't submit a single job for 
> almost a day -- their queue was completely full! The message was 
> something like 'not accepting new jobs in a queue' - or something like 
> that. The cluster is ridiculously busy. I could try to submit today a 
> reservation for , say, 20 molecules...
>
> Nika
>
> On May 11, 2007, at 9:01 AM, Ian Foster wrote:
>
>> It seems unlikely to me that you can't even submit it?
>>
>> Sent via BlackBerry from T-Mobile
>>
>> -----Original Message-----
>> From: Veronika Nefedova <nefedova at mcs.anl.gov>
>> Date: Fri, 11 May 2007 08:58:12
>> To:Ian Foster <foster at mcs.anl.gov>
>> Cc:iraicu at cs.uchicago.edu, swift-devel at ci.uchicago.edu
>> Subject: Re: [Swift-devel] MolDyn at Purdue
>>
>> I think we had a problem submitting a big reservation to NCSA - even 
>> a smaller ones were in the queue for more then a week at that time. 
>> When we did a time estimate on a queue time it said something like 
>> 'unable to predict' or 'unable to accept'...
>> Ioan - do you remember what was the exact problem?
>>
>>
>> Nika
>>
>>
>>
>>
>> On May 11, 2007, at 8:31 AM, Ian Foster wrote:
>> I note that we have stopped running at NCSA and switched to trying to 
>> run at Purdue. A good thing to try, certainly.
>>
>> However, could we not have had a big job in the queue at NCSA all 
>> this time, also, using Falkon, which would have run by now?
>>
>> Ian.
>>
>> Ioan Raicu wrote:Great, than we are set, the project is configurable 
>> at the Falkon startup!
>> Ioan
>>
>
>

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From nefedova at mcs.anl.gov  Fri May 11 10:46:56 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 11 May 2007 10:46:56 -0500
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <46448AA0.5030705@cs.uchicago.edu>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>	<BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>	<46422CEC.5050807@cs.uchicago.edu>	<Pine.LNX.4.64.0705101018570.20212@dildano.hawaga.org.uk>
	<464358D0.1060402@cs.uchicago.edu> <464365AA.50407@mcs.anl.gov>
	<464366FD.8060008@cs.uchicago.edu> <4644701F.4040007@mcs.anl.gov>
	<B9AC4D80-5BAB-4894-B5C0-6318BA6F9FAA@mcs.anl.gov>
	<1819896301-1178892121-cardhu_blackberry.rim.net-1642341055-@bwe017-cell00.bisx.prod.on.blackberry>
	<D67D65D9-80AC-4254-A8F8-A43DA0D6FF44@mcs.anl.gov>
	<46448AA0.5030705@cs.uchicago.edu>
Message-ID: <C35E7692-2EE9-4225-8134-869CBCFA2C89@mcs.anl.gov>

Interesting...
Apparently, I did submit the reservation for a big run back on Monday  
(I thought it didn't go through at that time). And it is still in the  
queue..

tg-login1 nefedova/Falkon_v0.8> showq | grep nefedova
995068             nefedova       Idle   286     2:00:00  Mon May  7  
10:02:28
1000628            nefedova       Idle   340     4:00:00  Fri May 11  
10:41:14
tg-login1 nefedova/Falkon_v0.8>


Nika

On May 11, 2007, at 10:24 AM, Ioan Raicu wrote:

> Right, so if we want to get roughly the same execution time of 77  
> minutes, we would need 34*20 = 680 machines for 2 hours, right?  If  
> we halve the machine numbers, we can double the time reservation,  
> right?
>
> Let me know if you need help with the Falkon settings!
>
> Ioan
>
>
> Veronika Nefedova wrote:
>> Nope, its quite possible. Last week I couldn't submit a single job  
>> for almost a day -- their queue was completely full! The message  
>> was something like 'not accepting new jobs in a queue' - or  
>> something like that. The cluster is ridiculously busy. I could try  
>> to submit today a reservation for , say, 20 molecules...
>>
>> Nika
>>
>> On May 11, 2007, at 9:01 AM, Ian Foster wrote:
>>
>>> It seems unlikely to me that you can't even submit it?
>>>
>>> Sent via BlackBerry from T-Mobile
>>>
>>> -----Original Message-----
>>> From: Veronika Nefedova <nefedova at mcs.anl.gov>
>>> Date: Fri, 11 May 2007 08:58:12
>>> To:Ian Foster <foster at mcs.anl.gov>
>>> Cc:iraicu at cs.uchicago.edu, swift-devel at ci.uchicago.edu
>>> Subject: Re: [Swift-devel] MolDyn at Purdue
>>>
>>> I think we had a problem submitting a big reservation to NCSA -  
>>> even a smaller ones were in the queue for more then a week at  
>>> that time. When we did a time estimate on a queue time it said  
>>> something like 'unable to predict' or 'unable to accept'...
>>> Ioan - do you remember what was the exact problem?
>>>
>>>
>>> Nika
>>>
>>>
>>>
>>>
>>> On May 11, 2007, at 8:31 AM, Ian Foster wrote:
>>> I note that we have stopped running at NCSA and switched to  
>>> trying to run at Purdue. A good thing to try, certainly.
>>>
>>> However, could we not have had a big job in the queue at NCSA all  
>>> this time, also, using Falkon, which would have run by now?
>>>
>>> Ian.
>>>
>>> Ioan Raicu wrote:Great, than we are set, the project is  
>>> configurable at the Falkon startup!
>>> Ioan
>>>
>>
>>
>
> -- 
> ============================================
> Ioan Raicu
> Ph.D. Student
> ============================================
> Distributed Systems Laboratory
> Computer Science Department
> University of Chicago
> 1100 E. 58th Street, Ryerson Hall
> Chicago, IL 60637
> ============================================
> Email: iraicu at cs.uchicago.edu
> Web:   http://www.cs.uchicago.edu/~iraicu
>       http://dsl.cs.uchicago.edu/
> ============================================
> ============================================
>


From foster at mcs.anl.gov  Fri May 11 14:04:52 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Fri, 11 May 2007 14:04:52 -0500
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <C35E7692-2EE9-4225-8134-869CBCFA2C89@mcs.anl.gov>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>	<BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>	<46422CEC.5050807@cs.uchicago.edu>	<Pine.LNX.4.64.0705101018570.20212@dildano.hawaga.org.uk>
	<464358D0.1060402@cs.uchicago.edu> <464365AA.50407@mcs.anl.gov>
	<464366FD.8060008@cs.uchicago.edu> <4644701F.4040007@mcs.anl.gov>
	<B9AC4D80-5BAB-4894-B5C0-6318BA6F9FAA@mcs.anl.gov>
	<1819896301-1178892121-cardhu_blackberry.rim.net-1642341055-@bwe017-cell00.bisx.prod.on.blackberry>
	<D67D65D9-80AC-4254-A8F8-A43DA0D6FF44@mcs.anl.gov>
	<46448AA0.5030705@cs.uchicago.edu>
	<C35E7692-2EE9-4225-8134-869CBCFA2C89@mcs.anl.gov>
Message-ID: <4644BE54.4040202@mcs.anl.gov>

that's scary!

Just out of interest, how big was it? (cpus, time?)

Veronika Nefedova wrote:
> Interesting...
> Apparently, I did submit the reservation for a big run back on Monday 
> (I thought it didn't go through at that time). And it is still in the 
> queue..
>
> tg-login1 nefedova/Falkon_v0.8> showq | grep nefedova
> 995068             nefedova       Idle   286     2:00:00  Mon May  7 
> 10:02:28
> 1000628            nefedova       Idle   340     4:00:00  Fri May 11 
> 10:41:14
> tg-login1 nefedova/Falkon_v0.8>
>
>
> Nika
>
> On May 11, 2007, at 10:24 AM, Ioan Raicu wrote:
>
>> Right, so if we want to get roughly the same execution time of 77 
>> minutes, we would need 34*20 = 680 machines for 2 hours, right?  If 
>> we halve the machine numbers, we can double the time reservation, right?
>>
>> Let me know if you need help with the Falkon settings!
>>
>> Ioan
>>
>>
>> Veronika Nefedova wrote:
>>> Nope, its quite possible. Last week I couldn't submit a single job 
>>> for almost a day -- their queue was completely full! The message was 
>>> something like 'not accepting new jobs in a queue' - or something 
>>> like that. The cluster is ridiculously busy. I could try to submit 
>>> today a reservation for , say, 20 molecules...
>>>
>>> Nika
>>>
>>> On May 11, 2007, at 9:01 AM, Ian Foster wrote:
>>>
>>>> It seems unlikely to me that you can't even submit it?
>>>>
>>>> Sent via BlackBerry from T-Mobile
>>>>
>>>> -----Original Message-----
>>>> From: Veronika Nefedova <nefedova at mcs.anl.gov>
>>>> Date: Fri, 11 May 2007 08:58:12
>>>> To:Ian Foster <foster at mcs.anl.gov>
>>>> Cc:iraicu at cs.uchicago.edu, swift-devel at ci.uchicago.edu
>>>> Subject: Re: [Swift-devel] MolDyn at Purdue
>>>>
>>>> I think we had a problem submitting a big reservation to NCSA - 
>>>> even a smaller ones were in the queue for more then a week at that 
>>>> time. When we did a time estimate on a queue time it said something 
>>>> like 'unable to predict' or 'unable to accept'...
>>>> Ioan - do you remember what was the exact problem?
>>>>
>>>>
>>>> Nika
>>>>
>>>>
>>>>
>>>>
>>>> On May 11, 2007, at 8:31 AM, Ian Foster wrote:
>>>> I note that we have stopped running at NCSA and switched to trying 
>>>> to run at Purdue. A good thing to try, certainly.
>>>>
>>>> However, could we not have had a big job in the queue at NCSA all 
>>>> this time, also, using Falkon, which would have run by now?
>>>>
>>>> Ian.
>>>>
>>>> Ioan Raicu wrote:Great, than we are set, the project is 
>>>> configurable at the Falkon startup!
>>>> Ioan
>>>>
>>>
>>>
>>
>> -- 
>> ============================================
>> Ioan Raicu
>> Ph.D. Student
>> ============================================
>> Distributed Systems Laboratory
>> Computer Science Department
>> University of Chicago
>> 1100 E. 58th Street, Ryerson Hall
>> Chicago, IL 60637
>> ============================================
>> Email: iraicu at cs.uchicago.edu
>> Web:   http://www.cs.uchicago.edu/~iraicu
>>       http://dsl.cs.uchicago.edu/
>> ============================================
>> ============================================
>>
>

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.


From nefedova at mcs.anl.gov  Fri May 11 14:11:41 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 11 May 2007 14:11:41 -0500
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <4644BE54.4040202@mcs.anl.gov>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>	<BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>	<46422CEC.5050807@cs.uchicago.edu>	<Pine.LNX.4.64.0705101018570.20212@dildano.hawaga.org.uk>
	<464358D0.1060402@cs.uchicago.edu> <464365AA.50407@mcs.anl.gov>
	<464366FD.8060008@cs.uchicago.edu> <4644701F.4040007@mcs.anl.gov>
	<B9AC4D80-5BAB-4894-B5C0-6318BA6F9FAA@mcs.anl.gov>
	<1819896301-1178892121-cardhu_blackberry.rim.net-1642341055-@bwe017-cell00.bisx.prod.on.blackberry>
	<D67D65D9-80AC-4254-A8F8-A43DA0D6FF44@mcs.anl.gov>
	<46448AA0.5030705@cs.uchicago.edu>
	<C35E7692-2EE9-4225-8134-869CBCFA2C89@mcs.anl.gov>
	<4644BE54.4040202@mcs.anl.gov>
Message-ID: <8C852C48-7251-4E13-A9A0-E2AFF4B4F8F0@mcs.anl.gov>

The requested CPU was 286 and time 2 hours. Still in the queue!

On May 11, 2007, at 2:04 PM, Ian Foster wrote:

> that's scary!
>
> Just out of interest, how big was it? (cpus, time?)
>
> Veronika Nefedova wrote:
>> Interesting...
>> Apparently, I did submit the reservation for a big run back on  
>> Monday (I thought it didn't go through at that time). And it is  
>> still in the queue..
>>
>> tg-login1 nefedova/Falkon_v0.8> showq | grep nefedova
>> 995068             nefedova       Idle   286     2:00:00  Mon May   
>> 7 10:02:28
>> 1000628            nefedova       Idle   340     4:00:00  Fri May  
>> 11 10:41:14
>> tg-login1 nefedova/Falkon_v0.8>
>>
>>
>> Nika
>>
>> On May 11, 2007, at 10:24 AM, Ioan Raicu wrote:
>>
>>> Right, so if we want to get roughly the same execution time of 77  
>>> minutes, we would need 34*20 = 680 machines for 2 hours, right?   
>>> If we halve the machine numbers, we can double the time  
>>> reservation, right?
>>>
>>> Let me know if you need help with the Falkon settings!
>>>
>>> Ioan
>>>
>>>
>>> Veronika Nefedova wrote:
>>>> Nope, its quite possible. Last week I couldn't submit a single  
>>>> job for almost a day -- their queue was completely full! The  
>>>> message was something like 'not accepting new jobs in a queue' -  
>>>> or something like that. The cluster is ridiculously busy. I  
>>>> could try to submit today a reservation for , say, 20 molecules...
>>>>
>>>> Nika
>>>>
>>>> On May 11, 2007, at 9:01 AM, Ian Foster wrote:
>>>>
>>>>> It seems unlikely to me that you can't even submit it?
>>>>>
>>>>> Sent via BlackBerry from T-Mobile
>>>>>
>>>>> -----Original Message-----
>>>>> From: Veronika Nefedova <nefedova at mcs.anl.gov>
>>>>> Date: Fri, 11 May 2007 08:58:12
>>>>> To:Ian Foster <foster at mcs.anl.gov>
>>>>> Cc:iraicu at cs.uchicago.edu, swift-devel at ci.uchicago.edu
>>>>> Subject: Re: [Swift-devel] MolDyn at Purdue
>>>>>
>>>>> I think we had a problem submitting a big reservation to NCSA -  
>>>>> even a smaller ones were in the queue for more then a week at  
>>>>> that time. When we did a time estimate on a queue time it said  
>>>>> something like 'unable to predict' or 'unable to accept'...
>>>>> Ioan - do you remember what was the exact problem?
>>>>>
>>>>>
>>>>> Nika
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On May 11, 2007, at 8:31 AM, Ian Foster wrote:
>>>>> I note that we have stopped running at NCSA and switched to  
>>>>> trying to run at Purdue. A good thing to try, certainly.
>>>>>
>>>>> However, could we not have had a big job in the queue at NCSA  
>>>>> all this time, also, using Falkon, which would have run by now?
>>>>>
>>>>> Ian.
>>>>>
>>>>> Ioan Raicu wrote:Great, than we are set, the project is  
>>>>> configurable at the Falkon startup!
>>>>> Ioan
>>>>>
>>>>
>>>>
>>>
>>> -- 
>>> ============================================
>>> Ioan Raicu
>>> Ph.D. Student
>>> ============================================
>>> Distributed Systems Laboratory
>>> Computer Science Department
>>> University of Chicago
>>> 1100 E. 58th Street, Ryerson Hall
>>> Chicago, IL 60637
>>> ============================================
>>> Email: iraicu at cs.uchicago.edu
>>> Web:   http://www.cs.uchicago.edu/~iraicu
>>>       http://dsl.cs.uchicago.edu/
>>> ============================================
>>> ============================================
>>>
>>
>
> -- 
>
>   Ian Foster, Director, Computation Institute
> Argonne National Laboratory & University of Chicago
> Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
> Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
> Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
>      Globus Alliance: www.globus.org.
>


From foster at mcs.anl.gov  Fri May 11 14:11:51 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Fri, 11 May 2007 14:11:51 -0500
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <C35E7692-2EE9-4225-8134-869CBCFA2C89@mcs.anl.gov>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>	<BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>	<46422CEC.5050807@cs.uchicago.edu>	<Pine.LNX.4.64.0705101018570.20212@dildano.hawaga.org.uk>
	<464358D0.1060402@cs.uchicago.edu> <464365AA.50407@mcs.anl.gov>
	<464366FD.8060008@cs.uchicago.edu> <4644701F.4040007@mcs.anl.gov>
	<B9AC4D80-5BAB-4894-B5C0-6318BA6F9FAA@mcs.anl.gov>
	<1819896301-1178892121-cardhu_blackberry.rim.net-1642341055-@bwe017-cell00.bisx.prod.on.blackberry>
	<D67D65D9-80AC-4254-A8F8-A43DA0D6FF44@mcs.anl.gov>
	<46448AA0.5030705@cs.uchicago.edu>
	<C35E7692-2EE9-4225-8134-869CBCFA2C89@mcs.anl.gov>
Message-ID: <4644BFF7.5020106@mcs.anl.gov>

One more question: should we be trying the TG-Argonne cluster? 
Apparently it is fairly idle?

Veronika Nefedova wrote:
> Interesting...
> Apparently, I did submit the reservation for a big run back on Monday 
> (I thought it didn't go through at that time). And it is still in the 
> queue..
>
> tg-login1 nefedova/Falkon_v0.8> showq | grep nefedova
> 995068             nefedova       Idle   286     2:00:00  Mon May  7 
> 10:02:28
> 1000628            nefedova       Idle   340     4:00:00  Fri May 11 
> 10:41:14
> tg-login1 nefedova/Falkon_v0.8>
>
>
> Nika
>
> On May 11, 2007, at 10:24 AM, Ioan Raicu wrote:
>
>> Right, so if we want to get roughly the same execution time of 77 
>> minutes, we would need 34*20 = 680 machines for 2 hours, right?  If 
>> we halve the machine numbers, we can double the time reservation, right?
>>
>> Let me know if you need help with the Falkon settings!
>>
>> Ioan
>>
>>
>> Veronika Nefedova wrote:
>>> Nope, its quite possible. Last week I couldn't submit a single job 
>>> for almost a day -- their queue was completely full! The message was 
>>> something like 'not accepting new jobs in a queue' - or something 
>>> like that. The cluster is ridiculously busy. I could try to submit 
>>> today a reservation for , say, 20 molecules...
>>>
>>> Nika
>>>
>>> On May 11, 2007, at 9:01 AM, Ian Foster wrote:
>>>
>>>> It seems unlikely to me that you can't even submit it?
>>>>
>>>> Sent via BlackBerry from T-Mobile
>>>>
>>>> -----Original Message-----
>>>> From: Veronika Nefedova <nefedova at mcs.anl.gov>
>>>> Date: Fri, 11 May 2007 08:58:12
>>>> To:Ian Foster <foster at mcs.anl.gov>
>>>> Cc:iraicu at cs.uchicago.edu, swift-devel at ci.uchicago.edu
>>>> Subject: Re: [Swift-devel] MolDyn at Purdue
>>>>
>>>> I think we had a problem submitting a big reservation to NCSA - 
>>>> even a smaller ones were in the queue for more then a week at that 
>>>> time. When we did a time estimate on a queue time it said something 
>>>> like 'unable to predict' or 'unable to accept'...
>>>> Ioan - do you remember what was the exact problem?
>>>>
>>>>
>>>> Nika
>>>>
>>>>
>>>>
>>>>
>>>> On May 11, 2007, at 8:31 AM, Ian Foster wrote:
>>>> I note that we have stopped running at NCSA and switched to trying 
>>>> to run at Purdue. A good thing to try, certainly.
>>>>
>>>> However, could we not have had a big job in the queue at NCSA all 
>>>> this time, also, using Falkon, which would have run by now?
>>>>
>>>> Ian.
>>>>
>>>> Ioan Raicu wrote:Great, than we are set, the project is 
>>>> configurable at the Falkon startup!
>>>> Ioan
>>>>
>>>
>>>
>>
>> -- 
>> ============================================
>> Ioan Raicu
>> Ph.D. Student
>> ============================================
>> Distributed Systems Laboratory
>> Computer Science Department
>> University of Chicago
>> 1100 E. 58th Street, Ryerson Hall
>> Chicago, IL 60637
>> ============================================
>> Email: iraicu at cs.uchicago.edu
>> Web:   http://www.cs.uchicago.edu/~iraicu
>>       http://dsl.cs.uchicago.edu/
>> ============================================
>> ============================================
>>
>

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.


From nefedova at mcs.anl.gov  Fri May 11 14:18:10 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 11 May 2007 14:18:10 -0500
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <4644BFF7.5020106@mcs.anl.gov>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>	<BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>	<46422CEC.5050807@cs.uchicago.edu>	<Pine.LNX.4.64.0705101018570.20212@dildano.hawaga.org.uk>
	<464358D0.1060402@cs.uchicago.edu> <464365AA.50407@mcs.anl.gov>
	<464366FD.8060008@cs.uchicago.edu> <4644701F.4040007@mcs.anl.gov>
	<B9AC4D80-5BAB-4894-B5C0-6318BA6F9FAA@mcs.anl.gov>
	<1819896301-1178892121-cardhu_blackberry.rim.net-1642341055-@bwe017-cell00.bisx.prod.on.blackberry>
	<D67D65D9-80AC-4254-A8F8-A43DA0D6FF44@mcs.anl.gov>
	<46448AA0.5030705@cs.uchicago.edu>
	<C35E7692-2EE9-4225-8134-869CBCFA2C89@mcs.anl.gov>
	<4644BFF7.5020106@mcs.anl.gov>
Message-ID: <2E404D2F-960D-45AC-BED0-31BBA4657C8C@mcs.anl.gov>

I think Benoit's group doesn't have any allocation at TG-ANL (they  
have a good allocation at Purdue). It takes quite an effort to  
compile their tools, so I am not sure if Yuqing will be interested in  
trying TG-ANL...
I could try to move apps to TG/ANL on Monday and it see if it runs  
there. Hopefully the Purdue guys will be able to resolve their GT4  
GRAM issues by then...

Nika

On May 11, 2007, at 2:11 PM, Ian Foster wrote:

> One more question: should we be trying the TG-Argonne cluster?  
> Apparently it is fairly idle?
>
> Veronika Nefedova wrote:
>> Interesting...
>> Apparently, I did submit the reservation for a big run back on  
>> Monday (I thought it didn't go through at that time). And it is  
>> still in the queue..
>>
>> tg-login1 nefedova/Falkon_v0.8> showq | grep nefedova
>> 995068             nefedova       Idle   286     2:00:00  Mon May   
>> 7 10:02:28
>> 1000628            nefedova       Idle   340     4:00:00  Fri May  
>> 11 10:41:14
>> tg-login1 nefedova/Falkon_v0.8>
>>
>>
>> Nika
>>
>> On May 11, 2007, at 10:24 AM, Ioan Raicu wrote:
>>
>>> Right, so if we want to get roughly the same execution time of 77  
>>> minutes, we would need 34*20 = 680 machines for 2 hours, right?   
>>> If we halve the machine numbers, we can double the time  
>>> reservation, right?
>>>
>>> Let me know if you need help with the Falkon settings!
>>>
>>> Ioan
>>>
>>>
>>> Veronika Nefedova wrote:
>>>> Nope, its quite possible. Last week I couldn't submit a single  
>>>> job for almost a day -- their queue was completely full! The  
>>>> message was something like 'not accepting new jobs in a queue' -  
>>>> or something like that. The cluster is ridiculously busy. I  
>>>> could try to submit today a reservation for , say, 20 molecules...
>>>>
>>>> Nika
>>>>
>>>> On May 11, 2007, at 9:01 AM, Ian Foster wrote:
>>>>
>>>>> It seems unlikely to me that you can't even submit it?
>>>>>
>>>>> Sent via BlackBerry from T-Mobile
>>>>>
>>>>> -----Original Message-----
>>>>> From: Veronika Nefedova <nefedova at mcs.anl.gov>
>>>>> Date: Fri, 11 May 2007 08:58:12
>>>>> To:Ian Foster <foster at mcs.anl.gov>
>>>>> Cc:iraicu at cs.uchicago.edu, swift-devel at ci.uchicago.edu
>>>>> Subject: Re: [Swift-devel] MolDyn at Purdue
>>>>>
>>>>> I think we had a problem submitting a big reservation to NCSA -  
>>>>> even a smaller ones were in the queue for more then a week at  
>>>>> that time. When we did a time estimate on a queue time it said  
>>>>> something like 'unable to predict' or 'unable to accept'...
>>>>> Ioan - do you remember what was the exact problem?
>>>>>
>>>>>
>>>>> Nika
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On May 11, 2007, at 8:31 AM, Ian Foster wrote:
>>>>> I note that we have stopped running at NCSA and switched to  
>>>>> trying to run at Purdue. A good thing to try, certainly.
>>>>>
>>>>> However, could we not have had a big job in the queue at NCSA  
>>>>> all this time, also, using Falkon, which would have run by now?
>>>>>
>>>>> Ian.
>>>>>
>>>>> Ioan Raicu wrote:Great, than we are set, the project is  
>>>>> configurable at the Falkon startup!
>>>>> Ioan
>>>>>
>>>>
>>>>
>>>
>>> -- 
>>> ============================================
>>> Ioan Raicu
>>> Ph.D. Student
>>> ============================================
>>> Distributed Systems Laboratory
>>> Computer Science Department
>>> University of Chicago
>>> 1100 E. 58th Street, Ryerson Hall
>>> Chicago, IL 60637
>>> ============================================
>>> Email: iraicu at cs.uchicago.edu
>>> Web:   http://www.cs.uchicago.edu/~iraicu
>>>       http://dsl.cs.uchicago.edu/
>>> ============================================
>>> ============================================
>>>
>>
>
> -- 
>
>   Ian Foster, Director, Computation Institute
> Argonne National Laboratory & University of Chicago
> Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
> Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
> Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
>      Globus Alliance: www.globus.org.
>


From itf at mcs.anl.gov  Fri May 11 14:55:40 2007
From: itf at mcs.anl.gov (=?UTF-8?B?SWFuIEZvc3Rlcg==?=)
Date: Fri, 11 May 2007 19:55:40 +0000
Subject: [Swift-devel] MolDyn at Purdue
In-Reply-To: <2E404D2F-960D-45AC-BED0-31BBA4657C8C@mcs.anl.gov>
References: <d4ca48c0705090821m5ea97d18u62d4dfc21ef6f9cd@mail.gmail.com>	<AE46CE67-9532-4569-89C1-0B6B3AB0B28B@mcs.anl.gov>	<Pine.LNX.4.64.0705091623000.20212@dildano.hawaga.org.uk>	<1E5A7292-AC04-49BE-B017-6A4D8C28F32F@mcs.anl.gov>	<Pine.LNX.4.64.0705091637250.22628@dildano.hawaga.org.uk>	<Pine.LNX.4.64.0705091716060.22628@dildano.hawaga.org.uk>	<BCBF81B6-B9E7-47D1-B2D9-B3F515E06600@mcs.anl.gov>	<46422CEC.5050807@cs.uchicago.edu>	<Pine.LNX.4.64.0705101018570.20212@dildano.hawaga.org.uk>
	<464358D0.1060402@cs.uchicago.edu> <464365AA.50407@mcs.anl.gov>
	<464366FD.8060008@cs.uchicago.edu> <4644701F.4040007@mcs.anl.gov>
	<B9AC4D80-5BAB-4894-B5C0-6318BA6F9FAA@mcs.anl.gov>
	<1819896301-1178892121-cardhu_blackberry.rim.net-1642341055-@bwe017-cell00.bisx.prod.on.blackberry>
	<D67D65D9-80AC-4254-A8F8-A43DA0D6FF44@mcs.anl.gov>
	<46448AA0.5030705@cs.uchicago.edu>
	<C35E7692-2EE9-4225-8134-869CBCFA2C89@mcs.anl.gov>
	<4644BFF7.5020106@mcs.anl.gov>
	<2E404D2F-960D-45AC-BED0-31BBA4657C8C@mcs.anl.gov>
Message-ID: <1987384684-1178913374-cardhu_blackberry.rim.net-877925243-@bwe005-cell00.bisx.prod.on.blackberry>

Ok. I think I should stop asking questions (-:

Sent via BlackBerry from T-Mobile  

-----Original Message-----
From: Veronika Nefedova <nefedova at mcs.anl.gov>
Date: Fri, 11 May 2007 14:18:10 
To:Ian Foster <foster at mcs.anl.gov>
Cc:iraicu at cs.uchicago.edu, itf at mcs.anl.gov, swift-devel at ci.uchicago.edu
Subject: Re: [Swift-devel] MolDyn at Purdue

I think Benoit's group doesn't have any allocation at TG-ANL (they  
have a good allocation at Purdue). It takes quite an effort to  
compile their tools, so I am not sure if Yuqing will be interested in  
trying TG-ANL...
I could try to move apps to TG/ANL on Monday and it see if it runs  
there. Hopefully the Purdue guys will be able to resolve their GT4  
GRAM issues by then...

Nika

On May 11, 2007, at 2:11 PM, Ian Foster wrote:

> One more question: should we be trying the TG-Argonne cluster?  
> Apparently it is fairly idle?
>
> Veronika Nefedova wrote:
>> Interesting...
>> Apparently, I did submit the reservation for a big run back on  
>> Monday (I thought it didn't go through at that time). And it is  
>> still in the queue..
>>
>> tg-login1 nefedova/Falkon_v0.8> showq | grep nefedova
>> 995068             nefedova       Idle   286     2:00:00  Mon May   
>> 7 10:02:28
>> 1000628            nefedova       Idle   340     4:00:00  Fri May  
>> 11 10:41:14
>> tg-login1 nefedova/Falkon_v0.8>
>>
>>
>> Nika
>>
>> On May 11, 2007, at 10:24 AM, Ioan Raicu wrote:
>>
>>> Right, so if we want to get roughly the same execution time of 77  
>>> minutes, we would need 34*20 = 680 machines for 2 hours, right?   
>>> If we halve the machine numbers, we can double the time  
>>> reservation, right?
>>>
>>> Let me know if you need help with the Falkon settings!
>>>
>>> Ioan
>>>
>>>
>>> Veronika Nefedova wrote:
>>>> Nope, its quite possible. Last week I couldn't submit a single  
>>>> job for almost a day -- their queue was completely full! The  
>>>> message was something like 'not accepting new jobs in a queue' -  
>>>> or something like that. The cluster is ridiculously busy. I  
>>>> could try to submit today a reservation for , say, 20 molecules...
>>>>
>>>> Nika
>>>>
>>>> On May 11, 2007, at 9:01 AM, Ian Foster wrote:
>>>>
>>>>> It seems unlikely to me that you can't even submit it?
>>>>>
>>>>> Sent via BlackBerry from T-Mobile
>>>>>
>>>>> -----Original Message-----
>>>>> From: Veronika Nefedova <nefedova at mcs.anl.gov>
>>>>> Date: Fri, 11 May 2007 08:58:12
>>>>> To:Ian Foster <foster at mcs.anl.gov>
>>>>> Cc:iraicu at cs.uchicago.edu, swift-devel at ci.uchicago.edu
>>>>> Subject: Re: [Swift-devel] MolDyn at Purdue
>>>>>
>>>>> I think we had a problem submitting a big reservation to NCSA -  
>>>>> even a smaller ones were in the queue for more then a week at  
>>>>> that time. When we did a time estimate on a queue time it said  
>>>>> something like 'unable to predict' or 'unable to accept'...
>>>>> Ioan - do you remember what was the exact problem?
>>>>>
>>>>>
>>>>> Nika
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On May 11, 2007, at 8:31 AM, Ian Foster wrote:
>>>>> I note that we have stopped running at NCSA and switched to  
>>>>> trying to run at Purdue. A good thing to try, certainly.
>>>>>
>>>>> However, could we not have had a big job in the queue at NCSA  
>>>>> all this time, also, using Falkon, which would have run by now?
>>>>>
>>>>> Ian.
>>>>>
>>>>> Ioan Raicu wrote:Great, than we are set, the project is  
>>>>> configurable at the Falkon startup!
>>>>> Ioan
>>>>>
>>>>
>>>>
>>>
>>> -- 
>>> ============================================
>>> Ioan Raicu
>>> Ph.D. Student
>>> ============================================
>>> Distributed Systems Laboratory
>>> Computer Science Department
>>> University of Chicago
>>> 1100 E. 58th Street, Ryerson Hall
>>> Chicago, IL 60637
>>> ============================================
>>> Email: iraicu at cs.uchicago.edu
>>> Web:   http://www.cs.uchicago.edu/~iraicu
>>>       http://dsl.cs.uchicago.edu/
>>> ============================================
>>> ============================================
>>>
>>
>
> -- 
>
>   Ian Foster, Director, Computation Institute
> Argonne National Laboratory & University of Chicago
> Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
> Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
> Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
>      Globus Alliance: www.globus.org.
>


From benc at hawaga.org.uk  Tue May 15 11:20:03 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 15 May 2007 16:20:03 +0000 (GMT)
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <4649D280.5080906@mcs.anl.gov>
References: <4649D280.5080906@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>


Ian asked about this elsewhere, but its perhaps interesting for 
swift-devel people to look at the questions too.

On Tue, 15 May 2007, Ian Foster wrote:

> Dear All:
                                                                                
> I asked Kate if she and Tim could look into creating VM images that 
> would allow us to run Swift applications on Amazon EC2. I think Kate is 
> meeting with Ioan about this on Thursday (?).
                                                                                
> One issue that I thought would be good to discuss is what we'd want in 
> that VM image. Perhaps this is obvious to the rest of you, but it isn't 
> to me. A few thoughts:

> * I'm assuming that we want to run "workers" on EC2 nodes, and have the "task
> dispatch" logic run on some external frontend system outside EC2.

> * I would think that we want to use Falkon to do the task dispatch. If so, we
> need a Falkon executor on each VM, configured to check in with the Falkon
> dispatcher. (Alternatively, we could use, say, SGE: in that case, we would
> want an SGE agent.)

> *  We need a way of getting data to and from the worker nodes. Do we want to
> run a file system across the EC2 nodes and the external frontend node? That
> seems rather inefficient. Other options?

> * Should we preload the application code on each EC2 node?

Here's a couple of approaches:

 1) swift regards all the EC2 nodes that we are paying for as a single 
    site.

Something like falkon handles all the task dispatch and worker node 
management. I don't know what that looks like at the moment in Falkon, but 
the interface for Swift to send jobs into Falkon sounds pretty 
straightforward and shouldn't need changing.

All the nodes in a site are required by our site model to have a shared 
filesystem - we've talked about removing it, but I think that is still the 
case and if so, isn't going to change soon. timf probably knows more than 
the people on this list about making shared filesystems.

In this case, falkon would be doing the site selection.

 2) swift regards each EC2 node as a separate site.

So Swift would be doing site selection between each site (i.e. between 
each EC2 node), and then submitting to that site.

I don't know if the interface between Swift and eg. Falkon allows swift to 
tell Falkon which remote node to run on.

However, Swift would then be able to use something like gridftp to stage 
to each EC2 node (assuming that EC2 nodes can act as ftp servers - I don't 
know what their network connectivity is like) - a shared filesystem 
between all nodes in a site is pretty simple when there is only a single 
node in the site.


Amazon also has a storage cloud, alongside its compute cloud. I know very 
little about that and have never thought about how it would fit into the 
above (if at all). Maybe someone else knows more.

-- 


From tfreeman at mcs.anl.gov  Tue May 15 15:45:00 2007
From: tfreeman at mcs.anl.gov (Tim Freeman)
Date: Tue, 15 May 2007 15:45:00 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>
References: <4649D280.5080906@mcs.anl.gov>
	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>
Message-ID: <20070515154500.ad1600bf.tfreeman@mcs.anl.gov>

On Tue, 15 May 2007 16:20:03 +0000 (GMT)
Ben Clifford <benc at hawaga.org.uk> wrote:

> 
> Ian asked about this elsewhere, but its perhaps interesting for 
> swift-devel people to look at the questions too.
> 
> On Tue, 15 May 2007, Ian Foster wrote:
> 
> > Dear All:
>                                                                                 
> > I asked Kate if she and Tim could look into creating VM images that 
> > would allow us to run Swift applications on Amazon EC2. I think Kate is 
> > meeting with Ioan about this on Thursday (?).
>                                                                                 
> > One issue that I thought would be good to discuss is what we'd want in 
> > that VM image. Perhaps this is obvious to the rest of you, but it isn't 
> > to me. A few thoughts:
> 
> > * I'm assuming that we want to run "workers" on EC2 nodes, and have the
> > "task dispatch" logic run on some external frontend system outside EC2.
> 
> > * I would think that we want to use Falkon to do the task dispatch. If so,
> > we need a Falkon executor on each VM, configured to check in with the Falkon
> > dispatcher. (Alternatively, we could use, say, SGE: in that case, we would
> > want an SGE agent.)
> 
> > *  We need a way of getting data to and from the worker nodes. Do we want to
> > run a file system across the EC2 nodes and the external frontend node? That
> > seems rather inefficient. Other options?
> 
> > * Should we preload the application code on each EC2 node?
> 
> Here's a couple of approaches:
> 
>  1) swift regards all the EC2 nodes that we are paying for as a single 
>     site.
> 
> Something like falkon handles all the task dispatch and worker node 
> management. I don't know what that looks like at the moment in Falkon, but 
> the interface for Swift to send jobs into Falkon sounds pretty 
> straightforward and shouldn't need changing.

So if I understand, here there would be no gateway+LRM but each EC2 node +
Falkon would need a port open to receive tasks?  Or does each node pull down
instructions OK from behind a firewall?

Is there a latency problem with running each node as an indepdent task
receiver with the dispatcher off-site from EC2?  I would think it would be
better to put the queues to fill with tasks on EC2 so it can more quickly get
the task going when a node is done with a previous task (I may be missing some
nuances here with respect to Falkon, don't know much about this yet!). 

If a gateway node is desired, this option sounds a lot like the GRAM+LRM
situation we use on VMs with the workspace service and will soon use on EC2 via
the workspace EC2 gateway we're implementing.  Start up one gateway node and
then add compute nodes which dynamically join the pool, they are pointed to the
GRAM node.

> All the nodes in a site are required by our site model to have a shared 
> filesystem - we've talked about removing it, but I think that is still the 
> case and if so, isn't going to change soon. 

Setting up a shared filesystem in this environment is akin to setting up the
compute nodes to join an LRM pool.  The VMs can communicate over the private
network at EC2, you can instruct EC2 to let all the nodes be open to each other
(while simultaneously keeping a separate policy of blocking ports from being
open from the internet and other people's EC2 nodes).  The non-file-serving
nodes would simply need to know the private address of the filesystem server
(unless you are using a fancier network file system than NFS-style ones). 

For background: every VM on EC2 currently gets a public address -- NAT'd to a
private address which is actually what the VM's one NIC is configured with.
There is a facility to open/forward specific network ports on the public
address to each VM either via a group policy or on a VM by VM basis.

[...] 
> Amazon also has a storage cloud, alongside its compute cloud. I know very 
> little about that and have never thought about how it would fit into the 
> above (if at all). Maybe someone else knows more.

A VM template on EC2 is called an AMI which stands for Amazon Machine Image.
This is just a packaging thing but what it mostly means is that the VM is
stored on S3 and also registered into the EC2 system.

When starting an instance of an AMI, the file is copied from S3 to the
hypervisor node (what we call propagation in the workspace service).  After it
is used, this file is deleted (an option in the workspace service but there is
also an option to save it back with any changes).  

So the VMs are stored in S3 but anything that happens on them after being
started is lost unless you manually do something about it.

As for free scratch space, you get a good amount per node, 140G.  But the node
could go down at any moment just like a physical resource.

To harness S3 for safely persisting any data (or if you need more space) you
would need to actually run S3 clients on the VMs when they are run on EC2.  You
could alternatively mirror data between nodes assuming that all would not go
down at once. 

The good thing is that you do not pay transfer costs between S3 and EC2 if you
chose to use S3 for big storage, you would only pay the "housing fees" so to
speak. 

Tim


From iraicu at cs.uchicago.edu  Tue May 15 16:16:14 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Tue, 15 May 2007 16:16:14 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>
References: <4649D280.5080906@mcs.anl.gov>
	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>
Message-ID: <464A231E.9040708@cs.uchicago.edu>

Hi,
See below:

Ben Clifford wrote:
> Ian asked about this elsewhere, but its perhaps interesting for 
> swift-devel people to look at the questions too.
>
> On Tue, 15 May 2007, Ian Foster wrote:
>
>   
>> Dear All:
>>     
>                                                                                 
>   
>> I asked Kate if she and Tim could look into creating VM images that 
>> would allow us to run Swift applications on Amazon EC2. I think Kate is 
>> meeting with Ioan about this on Thursday (?).
>>     
>                                                                                 
>   
>> One issue that I thought would be good to discuss is what we'd want in 
>> that VM image. Perhaps this is obvious to the rest of you, but it isn't 
>> to me. A few thoughts:
>>     
>
>   
>> * I'm assuming that we want to run "workers" on EC2 nodes, and have the "task
>> dispatch" logic run on some external frontend system outside EC2.
>>     
>
>   
>> * I would think that we want to use Falkon to do the task dispatch. If so, we
>> need a Falkon executor on each VM, configured to check in with the Falkon
>> dispatcher. (Alternatively, we could use, say, SGE: in that case, we would
>> want an SGE agent.)
>>     
>
>   
>> *  We need a way of getting data to and from the worker nodes. Do we want to
>> run a file system across the EC2 nodes and the external frontend node? That
>> seems rather inefficient. Other options?
>>     
>
>   
>> * Should we preload the application code on each EC2 node?
>>     
>
> Here's a couple of approaches:
>
>  1) swift regards all the EC2 nodes that we are paying for as a single 
>     site.
>
> Something like falkon handles all the task dispatch and worker node 
> management. I don't know what that looks like at the moment in Falkon, but 
> the interface for Swift to send jobs into Falkon sounds pretty 
> straightforward and shouldn't need changing.
>
> All the nodes in a site are required by our site model to have a shared 
> filesystem - we've talked about removing it, but I think that is still the 
> case and if so, isn't going to change soon. timf probably knows more than 
> the people on this list about making shared filesystems.
>   
If we can get the data caching working in Falkon, we might be able to 
run Swift over Falkon without a shared file system.  This is still work 
in progress, but we might be closer to achieving this that not.  BTW, 
the data caching would mean that Swift does not stage in any data 
anymore, but wold essentially stand up a GridFTP server from where 
Falkon workers would get the needed data just when they need it.  We are 
still ironing out all this stuff, but it could potentially do away with 
the shared file sytem assumption.
> In this case, falkon would be doing the site selection.
>
>  2) swift regards each EC2 node as a separate site.
>
> So Swift would be doing site selection between each site (i.e. between 
> each EC2 node), and then submitting to that site.
>
> I don't know if the interface between Swift and eg. Falkon allows swift to 
> tell Falkon which remote node to run on.
>   
No, it does not... but the data caching work has added a data-aware 
scheduler that allows jobs to be run on nodes that have the data, and if 
they don't have the data, allow the respective node to get the data.
> However, Swift would then be able to use something like gridftp to stage 
> to each EC2 node (assuming that EC2 nodes can act as ftp servers - I don't 
> know what their network connectivity is like) - a shared filesystem 
> between all nodes in a site is pretty simple when there is only a single 
> node in the site.
>
>
> Amazon also has a storage cloud, alongside its compute cloud. I know very 
> little about that and have never thought about how it would fit into the 
> above (if at all). Maybe someone else knows more.
>   
I think the idea would be to use the Amazon 3S storage service as a 
common medium from where to get data and where to put it back.

Ioan

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From iraicu at cs.uchicago.edu  Tue May 15 16:22:55 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Tue, 15 May 2007 16:22:55 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <20070515154500.ad1600bf.tfreeman@mcs.anl.gov>
References: <4649D280.5080906@mcs.anl.gov>	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>
	<20070515154500.ad1600bf.tfreeman@mcs.anl.gov>
Message-ID: <464A24AF.7080801@cs.uchicago.edu>

Hi,
See below:

Tim Freeman wrote:
> On Tue, 15 May 2007 16:20:03 +0000 (GMT)
> Ben Clifford <benc at hawaga.org.uk> wrote:
>
>   
>> Ian asked about this elsewhere, but its perhaps interesting for 
>> swift-devel people to look at the questions too.
>>
>> On Tue, 15 May 2007, Ian Foster wrote:
>>
>>     
>>> Dear All:
>>>       
>>                                                                                 
>>     
>>> I asked Kate if she and Tim could look into creating VM images that 
>>> would allow us to run Swift applications on Amazon EC2. I think Kate is 
>>> meeting with Ioan about this on Thursday (?).
>>>       
>>                                                                                 
>>     
>>> One issue that I thought would be good to discuss is what we'd want in 
>>> that VM image. Perhaps this is obvious to the rest of you, but it isn't 
>>> to me. A few thoughts:
>>>       
>>> * I'm assuming that we want to run "workers" on EC2 nodes, and have the
>>> "task dispatch" logic run on some external frontend system outside EC2.
>>>       
>>> * I would think that we want to use Falkon to do the task dispatch. If so,
>>> we need a Falkon executor on each VM, configured to check in with the Falkon
>>> dispatcher. (Alternatively, we could use, say, SGE: in that case, we would
>>> want an SGE agent.)
>>>       
>>> *  We need a way of getting data to and from the worker nodes. Do we want to
>>> run a file system across the EC2 nodes and the external frontend node? That
>>> seems rather inefficient. Other options?
>>>       
>>> * Should we preload the application code on each EC2 node?
>>>       
>> Here's a couple of approaches:
>>
>>  1) swift regards all the EC2 nodes that we are paying for as a single 
>>     site.
>>
>> Something like falkon handles all the task dispatch and worker node 
>> management. I don't know what that looks like at the moment in Falkon, but 
>> the interface for Swift to send jobs into Falkon sounds pretty 
>> straightforward and shouldn't need changing.
>>     
>
> So if I understand, here there would be no gateway+LRM but each EC2 node +
> Falkon would need a port open to receive tasks?  Or does each node pull down
> instructions OK from behind a firewall?
>   
Falkon supports both polling and notifications.  To use notifications, 
there needs to be an open port on the worker :(
> Is there a latency problem with running each node as an indepdent task
> receiver with the dispatcher off-site from EC2?  I would think it would be
> better to put the queues to fill with tasks on EC2 so it can more quickly get
> the task going when a node is done with a previous task (I may be missing some
> nuances here with respect to Falkon, don't know much about this yet!). 
>   
We have run the Falkon dispatcher at UChicago and workers at ANL without 
any issues, so it can easily tolerate a few ms of latency.  We haven't 
tried it across 10s of ms of latency links, but my instinct says that if 
you have enough workers, you might be able to hide the latency.  We'd 
have to experiment with it to see what happens.  We could potentially do 
some experiments between SDSC and ANL over a 50+ ms link, and see what 
difference in throughputs we get.

Ioan
> If a gateway node is desired, this option sounds a lot like the GRAM+LRM
> situation we use on VMs with the workspace service and will soon use on EC2 via
> the workspace EC2 gateway we're implementing.  Start up one gateway node and
> then add compute nodes which dynamically join the pool, they are pointed to the
> GRAM node.
>
>   
>> All the nodes in a site are required by our site model to have a shared 
>> filesystem - we've talked about removing it, but I think that is still the 
>> case and if so, isn't going to change soon. 
>>     
>
> Setting up a shared filesystem in this environment is akin to setting up the
> compute nodes to join an LRM pool.  The VMs can communicate over the private
> network at EC2, you can instruct EC2 to let all the nodes be open to each other
> (while simultaneously keeping a separate policy of blocking ports from being
> open from the internet and other people's EC2 nodes).  The non-file-serving
> nodes would simply need to know the private address of the filesystem server
> (unless you are using a fancier network file system than NFS-style ones). 
>
> For background: every VM on EC2 currently gets a public address -- NAT'd to a
> private address which is actually what the VM's one NIC is configured with.
> There is a facility to open/forward specific network ports on the public
> address to each VM either via a group policy or on a VM by VM basis.
>
> [...] 
>   
>> Amazon also has a storage cloud, alongside its compute cloud. I know very 
>> little about that and have never thought about how it would fit into the 
>> above (if at all). Maybe someone else knows more.
>>     
>
> A VM template on EC2 is called an AMI which stands for Amazon Machine Image.
> This is just a packaging thing but what it mostly means is that the VM is
> stored on S3 and also registered into the EC2 system.
>
> When starting an instance of an AMI, the file is copied from S3 to the
> hypervisor node (what we call propagation in the workspace service).  After it
> is used, this file is deleted (an option in the workspace service but there is
> also an option to save it back with any changes).  
>
> So the VMs are stored in S3 but anything that happens on them after being
> started is lost unless you manually do something about it.
>
> As for free scratch space, you get a good amount per node, 140G.  But the node
> could go down at any moment just like a physical resource.
>
> To harness S3 for safely persisting any data (or if you need more space) you
> would need to actually run S3 clients on the VMs when they are run on EC2.  You
> could alternatively mirror data between nodes assuming that all would not go
> down at once. 
>
> The good thing is that you do not pay transfer costs between S3 and EC2 if you
> chose to use S3 for big storage, you would only pay the "housing fees" so to
> speak. 
>
> Tim
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From benc at hawaga.org.uk  Tue May 15 18:24:03 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 15 May 2007 23:24:03 +0000 (GMT)
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464A231E.9040708@cs.uchicago.edu>
References: <4649D280.5080906@mcs.anl.gov>
	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>
	<464A231E.9040708@cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705152317520.22628@dildano.hawaga.org.uk>


On Tue, 15 May 2007, Ioan Raicu wrote:

> If we can get the data caching working in Falkon, we might be able to 
> run Swift over Falkon without a shared file system.  This is still work 
> in progress, but we might be closer to achieving this that not.  BTW, 
> the data caching would mean that Swift does not stage in any data 
> anymore, but wold essentially stand up a GridFTP server from where 
> Falkon workers would get the needed data just when they need it.  We are 
> still ironing out all this stuff, but it could potentially do away with 
> the shared file sytem assumption.

In the longer term, Swift possibly won't have its input data on the 
submitting system - for example, if data is mapped from remote gridftp 
servers, then it should be transferred directly from those ftp servers to 
the execute side (perhaps to a shared filesystem, perhaps direct to a 
worker node), and output data should be transferred back fairly directly, 
rather than going via the submit system.

If Falkon is doing its own 'interesting' data movement stuff, then it 
would probably be a good idea for it to mesh in with what Swift (eg. swift 
provides a list of stage-these-in and stage-these-out URLs or something 
like that and has various ways of performing that, such as submitting a 
transfer job, or passing that information onto falkon)

-- 


From iraicu at cs.uchicago.edu  Tue May 15 18:40:15 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Tue, 15 May 2007 18:40:15 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <Pine.LNX.4.64.0705152317520.22628@dildano.hawaga.org.uk>
References: <4649D280.5080906@mcs.anl.gov>
	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>
	<464A231E.9040708@cs.uchicago.edu>
	<Pine.LNX.4.64.0705152317520.22628@dildano.hawaga.org.uk>
Message-ID: <464A44DF.5030600@cs.uchicago.edu>


Ben Clifford wrote:
> On Tue, 15 May 2007, Ioan Raicu wrote:
>
>   
>> If we can get the data caching working in Falkon, we might be able to 
>> run Swift over Falkon without a shared file system.  This is still work 
>> in progress, but we might be closer to achieving this that not.  BTW, 
>> the data caching would mean that Swift does not stage in any data 
>> anymore, but wold essentially stand up a GridFTP server from where 
>> Falkon workers would get the needed data just when they need it.  We are 
>> still ironing out all this stuff, but it could potentially do away with 
>> the shared file sytem assumption.
>>     
>
> In the longer term, Swift possibly won't have its input data on the 
> submitting system - for example, if data is mapped from remote gridftp 
> servers, then it should be transferred directly from those ftp servers to 
> the execute side (perhaps to a shared filesystem, perhaps direct to a 
> worker node), and output data should be transferred back fairly directly, 
> rather than going via the submit system.
>   
Right, from Falon's point of view, this would not be any different than 
having the GridFTP server at the submit host. 
> If Falkon is doing its own 'interesting' data movement stuff, then it 
> would probably be a good idea for it to mesh in with what Swift (eg. swift 
> provides a list of stage-these-in and stage-these-out URLs or something 
> like that and has various ways of performing that, such as submitting a 
> transfer job, or passing that information onto falkon)
>   
The idea is to do just this!  Get Swift to pass in its normal URLs of 
input and output data, and then have Falkon do its own data management 
using those URLs!  The idea is to not change anything fundamental in 
Swift, but ensure that enough information is passed to Falkon so it can 
operate properly, and do its own data management! 

Ioan

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070515/2af84c8b/attachment.html>

From keahey at mcs.anl.gov  Tue May 15 23:28:07 2007
From: keahey at mcs.anl.gov (Kate Keahey)
Date: Tue, 15 May 2007 23:28:07 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464A24AF.7080801@cs.uchicago.edu>
References: <4649D280.5080906@mcs.anl.gov>	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>
	<20070515154500.ad1600bf.tfreeman@mcs.anl.gov>
	<464A24AF.7080801@cs.uchicago.edu>
Message-ID: <464A8857.90800@mcs.anl.gov>

First -- this is a very useful discussion, would it be possible to see 
all of it. We need to understand the requirements and trade-offs in some 
detail to figure out the best way to make this work. I see a few 
different interaction threads somewhat mixed up here though so just to 
make sure we are all on the same wavelength, here is some context.

Ian and I have been talking on and off about providing a workspace 
service implementation with EC2 backend. The benefit for that would be 
that users could deploy the same VMs using the same interface to either 
TeraPort or EC2 or yet another resource provider. The workspace service 
would also provide some features on top of EC2 (translating between PKI 
credentials and Amazon's paying accounts, contextualization as needed to 
make deployment dynamic). One application of interest for this was 
Swift. Last time we chatted about this though was in the context of 
using EC2 to provide a production platform for STAR runs (since 
virtualizing enough TeraPort to provide a production platform is taking 
a long time). This is what Tim and I are trying to make happen now.

Since there was also interest in running Swift in VMs, Mike, Tibi and I 
met around February/March and agreed that a reasonable way to proceed 
will be for us to stand up a base virtual cluster somewhere locally 
(e.g., a static deployment on TeraPort) so that they can finish the 
configuration according to their needs, look at performance, figure out 
the best way to interact with it, and make sure that there are no 
VM-induced gotchas. All of this will be much easier to assess locally 
and on a static deployment. Then we'd make sure the cluster is 
dynamically deployable using the workspace service (on EC2 or whatever 
other provider). During the meeting (and over following emails) we 
agreed that the required "base cluster" would be configured with 
GRAM/Torque on the headnode plus a number of worker nodes, plus root 
privileges. We configured this cluster and it is ready to deploy. Are 
you saying now that in fact something different is needed?

As Ian says, Borja and I were planning to meet with Ioan on Thursday to 
discuss interaction between Falkon and the workspace service (not 
necessarily/exclusively in the EC2 context). I don't completely 
understand the relationship between swift and falkon -- are there 
specific applications or scenarios that you are trying to target in this 
exercise?

Ioan Raicu wrote:
> Hi,
> See below:
> 
> Tim Freeman wrote:
>> On Tue, 15 May 2007 16:20:03 +0000 (GMT)
>> Ben Clifford <benc at hawaga.org.uk> wrote:
>>
>>  
>>> Ian asked about this elsewhere, but its perhaps interesting for 
>>> swift-devel people to look at the questions too.
>>>
>>> On Tue, 15 May 2007, Ian Foster wrote:
>>>
>>>    
>>>> Dear All:
>>>>       
>>>                                                                                 
>>>    
>>>> I asked Kate if she and Tim could look into creating VM images that 
>>>> would allow us to run Swift applications on Amazon EC2. I think Kate 
>>>> is meeting with Ioan about this on Thursday (?).
>>>>       
>>>                                                                                 
>>>    
>>>> One issue that I thought would be good to discuss is what we'd want 
>>>> in that VM image. Perhaps this is obvious to the rest of you, but it 
>>>> isn't to me. A few thoughts:
>>>>       * I'm assuming that we want to run "workers" on EC2 nodes, and 
>>>> have the
>>>> "task dispatch" logic run on some external frontend system outside EC2.
>>>>       * I would think that we want to use Falkon to do the task 
>>>> dispatch. If so,
>>>> we need a Falkon executor on each VM, configured to check in with 
>>>> the Falkon
>>>> dispatcher. (Alternatively, we could use, say, SGE: in that case, we 
>>>> would
>>>> want an SGE agent.)
>>>>       *  We need a way of getting data to and from the worker nodes. 
>>>> Do we want to
>>>> run a file system across the EC2 nodes and the external frontend 
>>>> node? That
>>>> seems rather inefficient. Other options?
>>>>       * Should we preload the application code on each EC2 node?
>>>>       
>>> Here's a couple of approaches:
>>>
>>>  1) swift regards all the EC2 nodes that we are paying for as a 
>>> single     site.
>>>
>>> Something like falkon handles all the task dispatch and worker node 
>>> management. I don't know what that looks like at the moment in 
>>> Falkon, but the interface for Swift to send jobs into Falkon sounds 
>>> pretty straightforward and shouldn't need changing.
>>>     
>>
>> So if I understand, here there would be no gateway+LRM but each EC2 
>> node +
>> Falkon would need a port open to receive tasks?  Or does each node 
>> pull down
>> instructions OK from behind a firewall?
>>   
> Falkon supports both polling and notifications.  To use notifications, 
> there needs to be an open port on the worker :(
>> Is there a latency problem with running each node as an indepdent task
>> receiver with the dispatcher off-site from EC2?  I would think it 
>> would be
>> better to put the queues to fill with tasks on EC2 so it can more 
>> quickly get
>> the task going when a node is done with a previous task (I may be 
>> missing some
>> nuances here with respect to Falkon, don't know much about this yet!).   
> We have run the Falkon dispatcher at UChicago and workers at ANL without 
> any issues, so it can easily tolerate a few ms of latency.  We haven't 
> tried it across 10s of ms of latency links, but my instinct says that if 
> you have enough workers, you might be able to hide the latency.  We'd 
> have to experiment with it to see what happens.  We could potentially do 
> some experiments between SDSC and ANL over a 50+ ms link, and see what 
> difference in throughputs we get.
> 
> Ioan
>> If a gateway node is desired, this option sounds a lot like the GRAM+LRM
>> situation we use on VMs with the workspace service and will soon use 
>> on EC2 via
>> the workspace EC2 gateway we're implementing.  Start up one gateway 
>> node and
>> then add compute nodes which dynamically join the pool, they are 
>> pointed to the
>> GRAM node.
>>
>>  
>>> All the nodes in a site are required by our site model to have a 
>>> shared filesystem - we've talked about removing it, but I think that 
>>> is still the case and if so, isn't going to change soon.     
>>
>> Setting up a shared filesystem in this environment is akin to setting 
>> up the
>> compute nodes to join an LRM pool.  The VMs can communicate over the 
>> private
>> network at EC2, you can instruct EC2 to let all the nodes be open to 
>> each other
>> (while simultaneously keeping a separate policy of blocking ports from 
>> being
>> open from the internet and other people's EC2 nodes).  The 
>> non-file-serving
>> nodes would simply need to know the private address of the filesystem 
>> server
>> (unless you are using a fancier network file system than NFS-style ones).
>> For background: every VM on EC2 currently gets a public address -- 
>> NAT'd to a
>> private address which is actually what the VM's one NIC is configured 
>> with.
>> There is a facility to open/forward specific network ports on the public
>> address to each VM either via a group policy or on a VM by VM basis.
>>
>> [...]  
>>> Amazon also has a storage cloud, alongside its compute cloud. I know 
>>> very little about that and have never thought about how it would fit 
>>> into the above (if at all). Maybe someone else knows more.
>>>     
>>
>> A VM template on EC2 is called an AMI which stands for Amazon Machine 
>> Image.
>> This is just a packaging thing but what it mostly means is that the VM is
>> stored on S3 and also registered into the EC2 system.
>>
>> When starting an instance of an AMI, the file is copied from S3 to the
>> hypervisor node (what we call propagation in the workspace service).  
>> After it
>> is used, this file is deleted (an option in the workspace service but 
>> there is
>> also an option to save it back with any changes). 
>> So the VMs are stored in S3 but anything that happens on them after being
>> started is lost unless you manually do something about it.
>>
>> As for free scratch space, you get a good amount per node, 140G.  But 
>> the node
>> could go down at any moment just like a physical resource.
>>
>> To harness S3 for safely persisting any data (or if you need more 
>> space) you
>> would need to actually run S3 clients on the VMs when they are run on 
>> EC2.  You
>> could alternatively mirror data between nodes assuming that all would 
>> not go
>> down at once.
>> The good thing is that you do not pay transfer costs between S3 and 
>> EC2 if you
>> chose to use S3 for big storage, you would only pay the "housing fees" 
>> so to
>> speak.
>> Tim
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>>   
> 

-- 

Kate Keahey,
Mathematics & CS Division, Argonne National Laboratory
Computation Institute, University of Chicago


From itf at mcs.anl.gov  Wed May 16 02:44:59 2007
From: itf at mcs.anl.gov (=?UTF-8?B?SWFuIEZvc3Rlcg==?=)
Date: Wed, 16 May 2007 07:44:59 +0000
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464A8857.90800@mcs.anl.gov>
References: <4649D280.5080906@mcs.anl.gov>	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk><20070515154500.ad1600bf.tfreeman@mcs.anl.gov><464A24AF.7080801@cs.uchicago.edu>
	<464A8857.90800@mcs.anl.gov>
Message-ID: <356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>

Kate:

If we configure the virtual cluster with a full LRM, as you propose (and it seems have already done--great work!), then we can use this to start Falkon executors--as we do today on regular clusters. So it seems to me that we have all we need. How about you and Ioan spend your time on Thursday running something on EC2, to make sure it sorks?

Regarding choice of LRM: have you looked at SGE? That is what quite a few others seem to be using.

Ian


Sent via BlackBerry from T-Mobile  

-----Original Message-----
From: Kate Keahey <keahey at mcs.anl.gov>
Date: Tue, 15 May 2007 23:28:07 
To:iraicu at cs.uchicago.edu
Cc:swift-devel at ci.uchicago.edu
Subject: Re: [Swift-devel] swift-on-ec2

First -- this is a very useful discussion, would it be possible to see 
all of it. We need to understand the requirements and trade-offs in some 
detail to figure out the best way to make this work. I see a few 
different interaction threads somewhat mixed up here though so just to 
make sure we are all on the same wavelength, here is some context.

Ian and I have been talking on and off about providing a workspace 
service implementation with EC2 backend. The benefit for that would be 
that users could deploy the same VMs using the same interface to either 
TeraPort or EC2 or yet another resource provider. The workspace service 
would also provide some features on top of EC2 (translating between PKI 
credentials and Amazon's paying accounts, contextualization as needed to 
make deployment dynamic). One application of interest for this was 
Swift. Last time we chatted about this though was in the context of 
using EC2 to provide a production platform for STAR runs (since 
virtualizing enough TeraPort to provide a production platform is taking 
a long time). This is what Tim and I are trying to make happen now.

Since there was also interest in running Swift in VMs, Mike, Tibi and I 
met around February/March and agreed that a reasonable way to proceed 
will be for us to stand up a base virtual cluster somewhere locally 
(e.g., a static deployment on TeraPort) so that they can finish the 
configuration according to their needs, look at performance, figure out 
the best way to interact with it, and make sure that there are no 
VM-induced gotchas. All of this will be much easier to assess locally 
and on a static deployment. Then we'd make sure the cluster is 
dynamically deployable using the workspace service (on EC2 or whatever 
other provider). During the meeting (and over following emails) we 
agreed that the required "base cluster" would be configured with 
GRAM/Torque on the headnode plus a number of worker nodes, plus root 
privileges. We configured this cluster and it is ready to deploy. Are 
you saying now that in fact something different is needed?

As Ian says, Borja and I were planning to meet with Ioan on Thursday to 
discuss interaction between Falkon and the workspace service (not 
necessarily/exclusively in the EC2 context). I don't completely 
understand the relationship between swift and falkon -- are there 
specific applications or scenarios that you are trying to target in this 
exercise?

Ioan Raicu wrote:
> Hi,
> See below:
> 
> Tim Freeman wrote:
>> On Tue, 15 May 2007 16:20:03 +0000 (GMT)
>> Ben Clifford <benc at hawaga.org.uk> wrote:
>>
>>  
>>> Ian asked about this elsewhere, but its perhaps interesting for 
>>> swift-devel people to look at the questions too.
>>>
>>> On Tue, 15 May 2007, Ian Foster wrote:
>>>
>>>    
>>>> Dear All:
>>>>       
>>>                                                                                 
>>>    
>>>> I asked Kate if she and Tim could look into creating VM images that 
>>>> would allow us to run Swift applications on Amazon EC2. I think Kate 
>>>> is meeting with Ioan about this on Thursday (?).
>>>>       
>>>                                                                                 
>>>    
>>>> One issue that I thought would be good to discuss is what we'd want 
>>>> in that VM image. Perhaps this is obvious to the rest of you, but it 
>>>> isn't to me. A few thoughts:
>>>>       * I'm assuming that we want to run "workers" on EC2 nodes, and 
>>>> have the
>>>> "task dispatch" logic run on some external frontend system outside EC2.
>>>>       * I would think that we want to use Falkon to do the task 
>>>> dispatch. If so,
>>>> we need a Falkon executor on each VM, configured to check in with 
>>>> the Falkon
>>>> dispatcher. (Alternatively, we could use, say, SGE: in that case, we 
>>>> would
>>>> want an SGE agent.)
>>>>       *  We need a way of getting data to and from the worker nodes. 
>>>> Do we want to
>>>> run a file system across the EC2 nodes and the external frontend 
>>>> node? That
>>>> seems rather inefficient. Other options?
>>>>       * Should we preload the application code on each EC2 node?
>>>>       
>>> Here's a couple of approaches:
>>>
>>>  1) swift regards all the EC2 nodes that we are paying for as a 
>>> single     site.
>>>
>>> Something like falkon handles all the task dispatch and worker node 
>>> management. I don't know what that looks like at the moment in 
>>> Falkon, but the interface for Swift to send jobs into Falkon sounds 
>>> pretty straightforward and shouldn't need changing.
>>>     
>>
>> So if I understand, here there would be no gateway+LRM but each EC2 
>> node +
>> Falkon would need a port open to receive tasks?  Or does each node 
>> pull down
>> instructions OK from behind a firewall?
>>   
> Falkon supports both polling and notifications.  To use notifications, 
> there needs to be an open port on the worker :(
>> Is there a latency problem with running each node as an indepdent task
>> receiver with the dispatcher off-site from EC2?  I would think it 
>> would be
>> better to put the queues to fill with tasks on EC2 so it can more 
>> quickly get
>> the task going when a node is done with a previous task (I may be 
>> missing some
>> nuances here with respect to Falkon, don't know much about this yet!).   
> We have run the Falkon dispatcher at UChicago and workers at ANL without 
> any issues, so it can easily tolerate a few ms of latency.  We haven't 
> tried it across 10s of ms of latency links, but my instinct says that if 
> you have enough workers, you might be able to hide the latency.  We'd 
> have to experiment with it to see what happens.  We could potentially do 
> some experiments between SDSC and ANL over a 50+ ms link, and see what 
> difference in throughputs we get.
> 
> Ioan
>> If a gateway node is desired, this option sounds a lot like the GRAM+LRM
>> situation we use on VMs with the workspace service and will soon use 
>> on EC2 via
>> the workspace EC2 gateway we're implementing.  Start up one gateway 
>> node and
>> then add compute nodes which dynamically join the pool, they are 
>> pointed to the
>> GRAM node.
>>
>>  
>>> All the nodes in a site are required by our site model to have a 
>>> shared filesystem - we've talked about removing it, but I think that 
>>> is still the case and if so, isn't going to change soon.     
>>
>> Setting up a shared filesystem in this environment is akin to setting 
>> up the
>> compute nodes to join an LRM pool.  The VMs can communicate over the 
>> private
>> network at EC2, you can instruct EC2 to let all the nodes be open to 
>> each other
>> (while simultaneously keeping a separate policy of blocking ports from 
>> being
>> open from the internet and other people's EC2 nodes).  The 
>> non-file-serving
>> nodes would simply need to know the private address of the filesystem 
>> server
>> (unless you are using a fancier network file system than NFS-style ones).
>> For background: every VM on EC2 currently gets a public address -- 
>> NAT'd to a
>> private address which is actually what the VM's one NIC is configured 
>> with.
>> There is a facility to open/forward specific network ports on the public
>> address to each VM either via a group policy or on a VM by VM basis.
>>
>> [...]  
>>> Amazon also has a storage cloud, alongside its compute cloud. I know 
>>> very little about that and have never thought about how it would fit 
>>> into the above (if at all). Maybe someone else knows more.
>>>     
>>
>> A VM template on EC2 is called an AMI which stands for Amazon Machine 
>> Image.
>> This is just a packaging thing but what it mostly means is that the VM is
>> stored on S3 and also registered into the EC2 system.
>>
>> When starting an instance of an AMI, the file is copied from S3 to the
>> hypervisor node (what we call propagation in the workspace service).  
>> After it
>> is used, this file is deleted (an option in the workspace service but 
>> there is
>> also an option to save it back with any changes). 
>> So the VMs are stored in S3 but anything that happens on them after being
>> started is lost unless you manually do something about it.
>>
>> As for free scratch space, you get a good amount per node, 140G.  But 
>> the node
>> could go down at any moment just like a physical resource.
>>
>> To harness S3 for safely persisting any data (or if you need more 
>> space) you
>> would need to actually run S3 clients on the VMs when they are run on 
>> EC2.  You
>> could alternatively mirror data between nodes assuming that all would 
>> not go
>> down at once.
>> The good thing is that you do not pay transfer costs between S3 and 
>> EC2 if you
>> chose to use S3 for big storage, you would only pay the "housing fees" 
>> so to
>> speak.
>> Tim
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>>   
> 

-- 

Kate Keahey,
Mathematics & CS Division, Argonne National Laboratory
Computation Institute, University of Chicago

_______________________________________________
Swift-devel mailing list
Swift-devel at ci.uchicago.edu
http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel


From benc at hawaga.org.uk  Wed May 16 03:52:02 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 16 May 2007 08:52:02 +0000 (GMT)
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464A8857.90800@mcs.anl.gov>
References: <4649D280.5080906@mcs.anl.gov>
	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>
	<20070515154500.ad1600bf.tfreeman@mcs.anl.gov>
	<464A24AF.7080801@cs.uchicago.edu> <464A8857.90800@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705160829390.22628@dildano.hawaga.org.uk>


On Tue, 15 May 2007, Kate Keahey wrote:

> As Ian says, Borja and I were planning to meet with Ioan on Thursday to 
> discuss interaction between Falkon and the workspace service (not 
> necessarily/exclusively in the EC2 context). I don't completely 
> understand the relationship between swift and falkon -- are there 
> specific applications or scenarios that you are trying to target in this 
> exercise?

By virtue of the fact that they come from pretty much the same group of 
people, they're somewhat fuzzily related - but pretty much swift is 
generating (over the duration of its execution, rather than in one batch) 
a bunch of jobs that need executing (as well, as various things like file 
transfers). As it generates them, it sends them off to be executed. The 
official ways that are 'supported' by Swift are by executing them on the 
local machine and by sending them off through GRAM; however, people can 
plug in whatever they want to do submissions.

I know less about Falkon because it isn't Swift, but the Falkon side of 
things is pretty much about running a bunch of jobs - it plugs into the 
abovementioned place in Swift so that Swift gives Falkon jobs to run, and 
Falkon runs them (with a goal of Falkon being, presumably, to run it much 
more efficiently than if they were submitted straight through GRAM - it 
seems to do pretty well).

There's two things going on with swift - one is about making it 
straightforward to use at the low end of things, so that people can start 
using it easily - for the most part, that isn't interesting in itself; the 
other is about getting it to perform well at the high end of things, which 
is where the fun research is. Using Falkon and using EC2 are both on that 
side of things.

-- 


From benc at hawaga.org.uk  Wed May 16 04:04:11 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 16 May 2007 09:04:11 +0000 (GMT)
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>
References: <4649D280.5080906@mcs.anl.gov>
	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk><20070515154500.ad1600bf.tfreeman@mcs.anl.gov><464A24AF.7080801@cs.uchicago.edu>
	<464A8857.90800@mcs.anl.gov>
	<356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>
Message-ID: <Pine.LNX.4.64.0705160902110.20212@dildano.hawaga.org.uk>


On Wed, 16 May 2007, Ian Foster wrote:

> If we configure the virtual cluster with a full LRM, as you propose (and 
> it seems have already done--great work!), then we can use this to start 
> Falkon executors--as we do today on regular clusters. So it seems to me 
> that we have all we need. How about you and Ioan spend your time on 
> Thursday running something on EC2, to make sure it sorks?

> Regarding choice of LRM: have you looked at SGE? That is what quite a 
> few others seem to be using.

That's probably a bunch of most unnecessary extra weight (== trouble) if 
the images are specifically intended for use as swift+falkon. But useful 
to have round if people want to do other things too.

-- 


From hategan at mcs.anl.gov  Wed May 16 04:07:01 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 16 May 2007 12:07:01 +0300
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <Pine.LNX.4.64.0705152317520.22628@dildano.hawaga.org.uk>
References: <4649D280.5080906@mcs.anl.gov>
	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>
	<464A231E.9040708@cs.uchicago.edu>
	<Pine.LNX.4.64.0705152317520.22628@dildano.hawaga.org.uk>
Message-ID: <1179306421.2402.12.camel@blabla.mcs.anl.gov>

I think we're moving towards a scenario in which Falkon does
increasingly more things that it wasn't supposed to do. That includes
scheduling and data management (which, is a tricky business if we look
at the necessity for throttling, error handling and other management
issues).
I'm not sure if this is a good idea from an engineering standpoint.

Mihael

On Tue, 2007-05-15 at 23:24 +0000, Ben Clifford wrote:
> On Tue, 15 May 2007, Ioan Raicu wrote:
> 
> > If we can get the data caching working in Falkon, we might be able to 
> > run Swift over Falkon without a shared file system.  This is still work 
> > in progress, but we might be closer to achieving this that not.  BTW, 
> > the data caching would mean that Swift does not stage in any data 
> > anymore, but wold essentially stand up a GridFTP server from where 
> > Falkon workers would get the needed data just when they need it.  We are 
> > still ironing out all this stuff, but it could potentially do away with 
> > the shared file sytem assumption.
> 
> In the longer term, Swift possibly won't have its input data on the 
> submitting system - for example, if data is mapped from remote gridftp 
> servers, then it should be transferred directly from those ftp servers to 
> the execute side (perhaps to a shared filesystem, perhaps direct to a 
> worker node), and output data should be transferred back fairly directly, 
> rather than going via the submit system.
> 
> If Falkon is doing its own 'interesting' data movement stuff, then it 
> would probably be a good idea for it to mesh in with what Swift (eg. swift 
> provides a list of stage-these-in and stage-these-out URLs or something 
> like that and has various ways of performing that, such as submitting a 
> transfer job, or passing that information onto falkon)
> 


From keahey at mcs.anl.gov  Wed May 16 09:24:02 2007
From: keahey at mcs.anl.gov (Kate Keahey)
Date: Wed, 16 May 2007 09:24:02 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>
References: <4649D280.5080906@mcs.anl.gov>	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk><20070515154500.ad1600bf.tfreeman@mcs.anl.gov><464A24AF.7080801@cs.uchicago.edu>
	<464A8857.90800@mcs.anl.gov>
	<356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>
Message-ID: <464B1402.9040405@mcs.anl.gov>


Ian Foster wrote:
> Kate:
> 
> If we configure the virtual cluster with a full LRM, as you propose (and it seems have already done--great work!), then we can use this to start Falkon executors--as we do today on regular clusters. So it seems to me that we have all we need. How about you and Ioan spend your time on Thursday running something on EC2, to make sure it sorks?

As I suggest below, I think it would be easiest if we could deploy and 
debug a small static cluster locally first, and we can probably give it 
a shot tomorrow. We still don't have access to the Xen nodes on TeraPort 
(although hopefully that might change by tomorrow) but I asked Rick to 
rebuild a couple of nodes at ANL and he did, I think for a test that 
should give us enough resources to play with.

At the same time -- if there are multiple ways of doing this, and 
perhaps better ways than simply using a virtual cluster, we should 
discuss them now. It is not completely clear to me what the relationship 
between Falkon and Swift is, and what the specific objectives are (other 
than that dynamically provisioning resources is required). It looks at 
this point like the objectives probably overlap with what Ioan, Borja 
and I wanted to talk about (which I thought was a separate project, but 
am thrilled to find out is related) so how about we come up with a 
design tomorrow and post the notes on this list (is this a good venue 
for that?) and then others can shoot them down.

> Regarding choice of LRM: have you looked at SGE? That is what quite a few others seem to be using.

Yes, we have. We also collaborate with others who do, as well as with 
Sun... As you may remember, Borja did the scheduling work for his thesis 
in the context of SGE. Last time we talked though, Torque was the 
scheduler of choice for the virtual cluster LRM so we used that.

The usage of SGE you are referring to above -- is this in the context of 
virtualization projects, or as LRM for various Falkon-related applications?

> 
> Ian
> 
> 
> 
> Sent via BlackBerry from T-Mobile  
> 
> -----Original Message-----
> From: Kate Keahey <keahey at mcs.anl.gov>
> Date: Tue, 15 May 2007 23:28:07 
> To:iraicu at cs.uchicago.edu
> Cc:swift-devel at ci.uchicago.edu
> Subject: Re: [Swift-devel] swift-on-ec2
> 
> First -- this is a very useful discussion, would it be possible to see 
> all of it. We need to understand the requirements and trade-offs in some 
> detail to figure out the best way to make this work. I see a few 
> different interaction threads somewhat mixed up here though so just to 
> make sure we are all on the same wavelength, here is some context.
> 
> Ian and I have been talking on and off about providing a workspace 
> service implementation with EC2 backend. The benefit for that would be 
> that users could deploy the same VMs using the same interface to either 
> TeraPort or EC2 or yet another resource provider. The workspace service 
> would also provide some features on top of EC2 (translating between PKI 
> credentials and Amazon's paying accounts, contextualization as needed to 
> make deployment dynamic). One application of interest for this was 
> Swift. Last time we chatted about this though was in the context of 
> using EC2 to provide a production platform for STAR runs (since 
> virtualizing enough TeraPort to provide a production platform is taking 
> a long time). This is what Tim and I are trying to make happen now.
> 
> Since there was also interest in running Swift in VMs, Mike, Tibi and I 
> met around February/March and agreed that a reasonable way to proceed 
> will be for us to stand up a base virtual cluster somewhere locally 
> (e.g., a static deployment on TeraPort) so that they can finish the 
> configuration according to their needs, look at performance, figure out 
> the best way to interact with it, and make sure that there are no 
> VM-induced gotchas. All of this will be much easier to assess locally 
> and on a static deployment. Then we'd make sure the cluster is 
> dynamically deployable using the workspace service (on EC2 or whatever 
> other provider). During the meeting (and over following emails) we 
> agreed that the required "base cluster" would be configured with 
> GRAM/Torque on the headnode plus a number of worker nodes, plus root 
> privileges. We configured this cluster and it is ready to deploy. Are 
> you saying now that in fact something different is needed?
> 
> As Ian says, Borja and I were planning to meet with Ioan on Thursday to 
> discuss interaction between Falkon and the workspace service (not 
> necessarily/exclusively in the EC2 context). I don't completely 
> understand the relationship between swift and falkon -- are there 
> specific applications or scenarios that you are trying to target in this 
> exercise?
> 
> Ioan Raicu wrote:
>> Hi,
>> See below:
>>
>> Tim Freeman wrote:
>>> On Tue, 15 May 2007 16:20:03 +0000 (GMT)
>>> Ben Clifford <benc at hawaga.org.uk> wrote:
>>>
>>>  
>>>> Ian asked about this elsewhere, but its perhaps interesting for 
>>>> swift-devel people to look at the questions too.
>>>>
>>>> On Tue, 15 May 2007, Ian Foster wrote:
>>>>
>>>>    
>>>>> Dear All:
>>>>>       
>>>>                                                                                 
>>>>    
>>>>> I asked Kate if she and Tim could look into creating VM images that 
>>>>> would allow us to run Swift applications on Amazon EC2. I think Kate 
>>>>> is meeting with Ioan about this on Thursday (?).
>>>>>       
>>>>                                                                                 
>>>>    
>>>>> One issue that I thought would be good to discuss is what we'd want 
>>>>> in that VM image. Perhaps this is obvious to the rest of you, but it 
>>>>> isn't to me. A few thoughts:
>>>>>       * I'm assuming that we want to run "workers" on EC2 nodes, and 
>>>>> have the
>>>>> "task dispatch" logic run on some external frontend system outside EC2.
>>>>>       * I would think that we want to use Falkon to do the task 
>>>>> dispatch. If so,
>>>>> we need a Falkon executor on each VM, configured to check in with 
>>>>> the Falkon
>>>>> dispatcher. (Alternatively, we could use, say, SGE: in that case, we 
>>>>> would
>>>>> want an SGE agent.)
>>>>>       *  We need a way of getting data to and from the worker nodes. 
>>>>> Do we want to
>>>>> run a file system across the EC2 nodes and the external frontend 
>>>>> node? That
>>>>> seems rather inefficient. Other options?
>>>>>       * Should we preload the application code on each EC2 node?
>>>>>       
>>>> Here's a couple of approaches:
>>>>
>>>>  1) swift regards all the EC2 nodes that we are paying for as a 
>>>> single     site.
>>>>
>>>> Something like falkon handles all the task dispatch and worker node 
>>>> management. I don't know what that looks like at the moment in 
>>>> Falkon, but the interface for Swift to send jobs into Falkon sounds 
>>>> pretty straightforward and shouldn't need changing.
>>>>     
>>> So if I understand, here there would be no gateway+LRM but each EC2 
>>> node +
>>> Falkon would need a port open to receive tasks?  Or does each node 
>>> pull down
>>> instructions OK from behind a firewall?
>>>   
>> Falkon supports both polling and notifications.  To use notifications, 
>> there needs to be an open port on the worker :(
>>> Is there a latency problem with running each node as an indepdent task
>>> receiver with the dispatcher off-site from EC2?  I would think it 
>>> would be
>>> better to put the queues to fill with tasks on EC2 so it can more 
>>> quickly get
>>> the task going when a node is done with a previous task (I may be 
>>> missing some
>>> nuances here with respect to Falkon, don't know much about this yet!).   
>> We have run the Falkon dispatcher at UChicago and workers at ANL without 
>> any issues, so it can easily tolerate a few ms of latency.  We haven't 
>> tried it across 10s of ms of latency links, but my instinct says that if 
>> you have enough workers, you might be able to hide the latency.  We'd 
>> have to experiment with it to see what happens.  We could potentially do 
>> some experiments between SDSC and ANL over a 50+ ms link, and see what 
>> difference in throughputs we get.
>>
>> Ioan
>>> If a gateway node is desired, this option sounds a lot like the GRAM+LRM
>>> situation we use on VMs with the workspace service and will soon use 
>>> on EC2 via
>>> the workspace EC2 gateway we're implementing.  Start up one gateway 
>>> node and
>>> then add compute nodes which dynamically join the pool, they are 
>>> pointed to the
>>> GRAM node.
>>>
>>>  
>>>> All the nodes in a site are required by our site model to have a 
>>>> shared filesystem - we've talked about removing it, but I think that 
>>>> is still the case and if so, isn't going to change soon.     
>>> Setting up a shared filesystem in this environment is akin to setting 
>>> up the
>>> compute nodes to join an LRM pool.  The VMs can communicate over the 
>>> private
>>> network at EC2, you can instruct EC2 to let all the nodes be open to 
>>> each other
>>> (while simultaneously keeping a separate policy of blocking ports from 
>>> being
>>> open from the internet and other people's EC2 nodes).  The 
>>> non-file-serving
>>> nodes would simply need to know the private address of the filesystem 
>>> server
>>> (unless you are using a fancier network file system than NFS-style ones).
>>> For background: every VM on EC2 currently gets a public address -- 
>>> NAT'd to a
>>> private address which is actually what the VM's one NIC is configured 
>>> with.
>>> There is a facility to open/forward specific network ports on the public
>>> address to each VM either via a group policy or on a VM by VM basis.
>>>
>>> [...]  
>>>> Amazon also has a storage cloud, alongside its compute cloud. I know 
>>>> very little about that and have never thought about how it would fit 
>>>> into the above (if at all). Maybe someone else knows more.
>>>>     
>>> A VM template on EC2 is called an AMI which stands for Amazon Machine 
>>> Image.
>>> This is just a packaging thing but what it mostly means is that the VM is
>>> stored on S3 and also registered into the EC2 system.
>>>
>>> When starting an instance of an AMI, the file is copied from S3 to the
>>> hypervisor node (what we call propagation in the workspace service).  
>>> After it
>>> is used, this file is deleted (an option in the workspace service but 
>>> there is
>>> also an option to save it back with any changes). 
>>> So the VMs are stored in S3 but anything that happens on them after being
>>> started is lost unless you manually do something about it.
>>>
>>> As for free scratch space, you get a good amount per node, 140G.  But 
>>> the node
>>> could go down at any moment just like a physical resource.
>>>
>>> To harness S3 for safely persisting any data (or if you need more 
>>> space) you
>>> would need to actually run S3 clients on the VMs when they are run on 
>>> EC2.  You
>>> could alternatively mirror data between nodes assuming that all would 
>>> not go
>>> down at once.
>>> The good thing is that you do not pay transfer costs between S3 and 
>>> EC2 if you
>>> chose to use S3 for big storage, you would only pay the "housing fees" 
>>> so to
>>> speak.
>>> Tim
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>   
> 

-- 

Kate Keahey,
Mathematics & CS Division, Argonne National Laboratory
Computation Institute, University of Chicago


From keahey at mcs.anl.gov  Wed May 16 09:37:52 2007
From: keahey at mcs.anl.gov (Kate Keahey)
Date: Wed, 16 May 2007 09:37:52 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <Pine.LNX.4.64.0705160829390.22628@dildano.hawaga.org.uk>
References: <4649D280.5080906@mcs.anl.gov>
	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>
	<20070515154500.ad1600bf.tfreeman@mcs.anl.gov>
	<464A24AF.7080801@cs.uchicago.edu> <464A8857.90800@mcs.anl.gov>
	<Pine.LNX.4.64.0705160829390.22628@dildano.hawaga.org.uk>
Message-ID: <464B1740.3060808@mcs.anl.gov>

Thanks Ben, this helps a lot! So it seems to me like we are talking 
about combining dynamic provisioning with lightweight job management 
which should be pluggable into swift.

Ben Clifford wrote:
> On Tue, 15 May 2007, Kate Keahey wrote:
> 
>> As Ian says, Borja and I were planning to meet with Ioan on Thursday to 
>> discuss interaction between Falkon and the workspace service (not 
>> necessarily/exclusively in the EC2 context). I don't completely 
>> understand the relationship between swift and falkon -- are there 
>> specific applications or scenarios that you are trying to target in this 
>> exercise?
> 
> By virtue of the fact that they come from pretty much the same group of 
> people, they're somewhat fuzzily related - but pretty much swift is 
> generating (over the duration of its execution, rather than in one batch) 
> a bunch of jobs that need executing (as well, as various things like file 
> transfers). As it generates them, it sends them off to be executed. The 
> official ways that are 'supported' by Swift are by executing them on the 
> local machine and by sending them off through GRAM; however, people can 
> plug in whatever they want to do submissions.
> 
> I know less about Falkon because it isn't Swift, but the Falkon side of 
> things is pretty much about running a bunch of jobs - it plugs into the 
> abovementioned place in Swift so that Swift gives Falkon jobs to run, and 
> Falkon runs them (with a goal of Falkon being, presumably, to run it much 
> more efficiently than if they were submitted straight through GRAM - it 
> seems to do pretty well).
> 
> There's two things going on with swift - one is about making it 
> straightforward to use at the low end of things, so that people can start 
> using it easily - for the most part, that isn't interesting in itself; the 
> other is about getting it to perform well at the high end of things, which 
> is where the fun research is. Using Falkon and using EC2 are both on that 
> side of things.
> 

-- 

Kate Keahey,
Mathematics & CS Division, Argonne National Laboratory
Computation Institute, University of Chicago


From benc at hawaga.org.uk  Wed May 16 09:55:45 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 16 May 2007 14:55:45 +0000 (GMT)
Subject: [Swift-devel] 0.2 release
Message-ID: <Pine.LNX.4.64.0705161449560.22628@dildano.hawaga.org.uk>


Its been a while since there was a non-SVN based release. Not all the 
features that are on the milestone list for 0.2 have been done; however, 
what is in SVN now has a bunch of extra stuff that wasn't in 0.1 that 
would be good to make available to casual downloaders.

So I'm planning on putting whatever is at the head of SVN some time middle 
of next week out as 0.2 (in the same fairly lightweight process as 
happened for 0.1) and move the remaining 0.2 milestones to be 0.3 
milestones.

It would be good if you're using SVN to start updating at least daily 
until that time.

Separately, I'll send a note about remaining milestones for people to 
discuss how much they still want them in relation to other features.

-- 


From benc at hawaga.org.uk  Wed May 16 10:26:10 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 16 May 2007 15:26:10 +0000 (GMT)
Subject: [Swift-devel] mappers on files that are inputs and outputs
Message-ID: <Pine.LNX.4.64.0705161506410.22628@dildano.hawaga.org.uk>


Here's a code fragment:

  type volume {
      imagefile img;
      headerfile hdr;
  };

  volume atlas <simple_mapper;prefix="atlas">;
  atlas = softmean(slices);

  string directions[] = [ "x", "y", "z"];

  foreach direction in directions {
      giffile outputgif 
          <single_file_mapper;file=@strcat("atlas-",direction,".gif")>;
      string option = @strcat("-",direction);
      outputgif = slice_to_gif(atlas, option, ".5");
  }

When this is run as part of a workflow, there are no atlas.* files and the 
atlas = softmean(slices) line causes atlas.hdr and atlas.img files to be 
created and placed in my working directory, and also used in the 
subsequent slice_to_gif calls.

If I prune the program in a text editor so that the altas = ... line is 
not called, and leave the atlas.hdr and atlas.img files in place in my 
current directory (so that the files are now input files, rather than 
intermediate files), I get this error:

  $ swift -debug -tc.file tc.data play.swift 
  WARN   - Failed to configure log file name

  Swift v0.1-dev

  RunID: mx49u8a36d1m0
  Execution failed:
          java.lang.RuntimeException: Data set initialization failed for 
  true. Missing required field: img mapped to atlas


I think its probably a desirable feature that the same mapping that maps 
ok for intermediate files to map for input files too.

-- 


From hategan at mcs.anl.gov  Wed May 16 10:29:02 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 16 May 2007 18:29:02 +0300
Subject: [Swift-devel] mappers on files that are inputs and outputs
In-Reply-To: <Pine.LNX.4.64.0705161506410.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705161506410.22628@dildano.hawaga.org.uk>
Message-ID: <1179329342.4368.0.camel@blabla.mcs.anl.gov>

You should probably also add the input=true mapping parameter?

Mihael

On Wed, 2007-05-16 at 15:26 +0000, Ben Clifford wrote:
> Here's a code fragment:
> 
>   type volume {
>       imagefile img;
>       headerfile hdr;
>   };
> 
>   volume atlas <simple_mapper;prefix="atlas">;
>   atlas = softmean(slices);
> 
>   string directions[] = [ "x", "y", "z"];
> 
>   foreach direction in directions {
>       giffile outputgif 
>           <single_file_mapper;file=@strcat("atlas-",direction,".gif")>;
>       string option = @strcat("-",direction);
>       outputgif = slice_to_gif(atlas, option, ".5");
>   }
> 
> When this is run as part of a workflow, there are no atlas.* files and the 
> atlas = softmean(slices) line causes atlas.hdr and atlas.img files to be 
> created and placed in my working directory, and also used in the 
> subsequent slice_to_gif calls.
> 
> If I prune the program in a text editor so that the altas = ... line is 
> not called, and leave the atlas.hdr and atlas.img files in place in my 
> current directory (so that the files are now input files, rather than 
> intermediate files), I get this error:
> 
>   $ swift -debug -tc.file tc.data play.swift 
>   WARN   - Failed to configure log file name
> 
>   Swift v0.1-dev
> 
>   RunID: mx49u8a36d1m0
>   Execution failed:
>           java.lang.RuntimeException: Data set initialization failed for 
>   true. Missing required field: img mapped to atlas
> 
> 
> I think its probably a desirable feature that the same mapping that maps 
> ok for intermediate files to map for input files too.
> 


From benc at hawaga.org.uk  Wed May 16 10:37:36 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 16 May 2007 15:37:36 +0000 (GMT)
Subject: [Swift-devel] mappers on files that are inputs and outputs
In-Reply-To: <1179329342.4368.0.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0705161506410.22628@dildano.hawaga.org.uk>
	<1179329342.4368.0.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705161536450.22628@dildano.hawaga.org.uk>


On Wed, 16 May 2007, Mihael Hategan wrote:

> You should probably also add the input=true mapping parameter?

shouldn't really need that in the language though.

-- 


From hategan at mcs.anl.gov  Wed May 16 10:43:15 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 16 May 2007 18:43:15 +0300
Subject: [Swift-devel] mappers on files that are inputs and outputs
In-Reply-To: <Pine.LNX.4.64.0705161536450.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705161506410.22628@dildano.hawaga.org.uk>
	<1179329342.4368.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0705161536450.22628@dildano.hawaga.org.uk>
Message-ID: <1179330195.4473.0.camel@blabla.mcs.anl.gov>

The translator does that bit. You hacked the translated file, but
incompletely.

Mihael

On Wed, 2007-05-16 at 15:37 +0000, Ben Clifford wrote:
> On Wed, 16 May 2007, Mihael Hategan wrote:
> 
> > You should probably also add the input=true mapping parameter?
> 
> shouldn't really need that in the language though.
> 


From iraicu at cs.uchicago.edu  Wed May 16 11:55:18 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 16 May 2007 11:55:18 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <Pine.LNX.4.64.0705160829390.22628@dildano.hawaga.org.uk>
References: <4649D280.5080906@mcs.anl.gov>
	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>
	<20070515154500.ad1600bf.tfreeman@mcs.anl.gov>
	<464A24AF.7080801@cs.uchicago.edu> <464A8857.90800@mcs.anl.gov>
	<Pine.LNX.4.64.0705160829390.22628@dildano.hawaga.org.uk>
Message-ID: <464B3776.2010700@cs.uchicago.edu>

Hi,
I am just catching up with emails from last night...

Ben Clifford wrote:
> On Tue, 15 May 2007, Kate Keahey wrote:
>
>   
>> As Ian says, Borja and I were planning to meet with Ioan on Thursday to 
>> discuss interaction between Falkon and the workspace service (not 
>> necessarily/exclusively in the EC2 context). I don't completely 
>> understand the relationship between swift and falkon -- are there 
>> specific applications or scenarios that you are trying to target in this 
>> exercise?
>>     
>
> By virtue of the fact that they come from pretty much the same group of 
> people, they're somewhat fuzzily related - but pretty much swift is 
> generating (over the duration of its execution, rather than in one batch) 
> a bunch of jobs that need executing (as well, as various things like file 
> transfers). As it generates them, it sends them off to be executed. The 
> official ways that are 'supported' by Swift are by executing them on the 
> local machine and by sending them off through GRAM; however, people can 
> plug in whatever they want to do submissions.
>
> I know less about Falkon because it isn't Swift, but the Falkon side of 
> things is pretty much about running a bunch of jobs - it plugs into the 
> abovementioned place in Swift so that Swift gives Falkon jobs to run, and 
> Falkon runs them (with a goal of Falkon being, presumably, to run it much 
> more efficiently than if they were submitted straight through GRAM - it 
> seems to do pretty well).
>   
We intentionally made Falkon's interface and semantics as similar as 
possible to that of GRAM, so applications that normally used GRAM could 
easily change to Falkon.
> There's two things going on with swift - one is about making it 
> straightforward to use at the low end of things, so that people can start 
> using it easily - for the most part, that isn't interesting in itself; the 
> other is about getting it to perform well at the high end of things, which 
> is where the fun research is. Using Falkon and using EC2 are both on that 
> side of things.
>   
Right! 

Falkon is certainly about getting more performance from the same hardware. 

EC2 on the other hand is more about a new paradigm of how resources are 
acquired.  In the batch-scheduled world, the demand for resources is 
usually higher than the supply.  In EC2, its likely that the supply for 
resources is higher than the demand.  With that said, it means that with 
EC2, it is likely that you could always get more resources now if you 
were willing to pay for them... this could have implications on the 
resource allocation and management policies that govern when it makes 
sense to get more resources and when not to.  Using EC2 might be about 
performance, but the really interesting part that I see emerging is a 
new model that deviates from the traditional batch-scheduled systems the 
Grid community has grown accustomed to.

Ioan

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070516/85a2d070/attachment.html>

From tfreeman at mcs.anl.gov  Wed May 16 12:03:04 2007
From: tfreeman at mcs.anl.gov (Tim Freeman)
Date: Wed, 16 May 2007 12:03:04 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464B3776.2010700@cs.uchicago.edu>
References: <4649D280.5080906@mcs.anl.gov>
	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>
	<20070515154500.ad1600bf.tfreeman@mcs.anl.gov>
	<464A24AF.7080801@cs.uchicago.edu> <464A8857.90800@mcs.anl.gov>
	<Pine.LNX.4.64.0705160829390.22628@dildano.hawaga.org.uk>
	<464B3776.2010700@cs.uchicago.edu>
Message-ID: <20070516120304.19151d46.tfreeman@mcs.anl.gov>

On Wed, 16 May 2007 11:55:18 -0500
Ioan Raicu <iraicu at cs.uchicago.edu> wrote:

> Hi,
> I am just catching up with emails from last night...
> 
> Ben Clifford wrote:
> > On Tue, 15 May 2007, Kate Keahey wrote:
> >
> >   
> >> As Ian says, Borja and I were planning to meet with Ioan on Thursday to 
> >> discuss interaction between Falkon and the workspace service (not 
> >> necessarily/exclusively in the EC2 context). I don't completely 
> >> understand the relationship between swift and falkon -- are there 
> >> specific applications or scenarios that you are trying to target in this 
> >> exercise?
> >>     
> >
> > By virtue of the fact that they come from pretty much the same group of 
> > people, they're somewhat fuzzily related - but pretty much swift is 
> > generating (over the duration of its execution, rather than in one batch) 
> > a bunch of jobs that need executing (as well, as various things like file 
> > transfers). As it generates them, it sends them off to be executed. The 
> > official ways that are 'supported' by Swift are by executing them on the 
> > local machine and by sending them off through GRAM; however, people can 
> > plug in whatever they want to do submissions.
> >
> > I know less about Falkon because it isn't Swift, but the Falkon side of 
> > things is pretty much about running a bunch of jobs - it plugs into the 
> > abovementioned place in Swift so that Swift gives Falkon jobs to run, and 
> > Falkon runs them (with a goal of Falkon being, presumably, to run it much 
> > more efficiently than if they were submitted straight through GRAM - it 
> > seems to do pretty well).
> >   
> We intentionally made Falkon's interface and semantics as similar as 
> possible to that of GRAM, so applications that normally used GRAM could 
> easily change to Falkon.
> > There's two things going on with swift - one is about making it 
> > straightforward to use at the low end of things, so that people can start 
> > using it easily - for the most part, that isn't interesting in itself; the 
> > other is about getting it to perform well at the high end of things, which 
> > is where the fun research is. Using Falkon and using EC2 are both on that 
> > side of things.
> >   
> Right! 
> 
> Falkon is certainly about getting more performance from the same hardware. 
> 
> EC2 on the other hand is more about a new paradigm of how resources are 
> acquired.  In the batch-scheduled world, the demand for resources is 
> usually higher than the supply.  In EC2, its likely that the supply for 
> resources is higher than the demand.  With that said, it means that with 
> EC2, it is likely that you could always get more resources now if you 
> were willing to pay for them

That's not entirely true at this particular point in time:

http://www.pcworld.com/article/id,130832-c,webservices/article.html

"We hate being capacity-constrained," Bezos said. "It's not the right way to
run a business. We are trying to get ourselves in a position with EC2 where we
will be demand-constrained instead of capacity-constrained."


> ... this could have implications on the 
> resource allocation and management policies that govern when it makes 
> sense to get more resources and when not to.

Right now for example, we're programming a little feature into the workspace-EC2
gateway that limits the amount of money an entity can spend :-) 

Tim 


>   Using EC2 might be about 
> performance, but the really interesting part that I see emerging is a 
> new model that deviates from the traditional batch-scheduled systems the 
> Grid community has grown accustomed to.
> 
> Ioan


From iraicu at cs.uchicago.edu  Wed May 16 12:04:22 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 16 May 2007 12:04:22 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <Pine.LNX.4.64.0705160902110.20212@dildano.hawaga.org.uk>
References: <4649D280.5080906@mcs.anl.gov>
	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk><20070515154500.ad1600bf.tfreeman@mcs.anl.gov><464A24AF.7080801@cs.uchicago.edu>
	<464A8857.90800@mcs.anl.gov>
	<356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>
	<Pine.LNX.4.64.0705160902110.20212@dildano.hawaga.org.uk>
Message-ID: <464B3996.9030305@cs.uchicago.edu>

If a LRM can be configured by the virtual workspace service as part of 
the VMs, then its even easier for Falkon to work!  I don't see an LRM as 
being central and necessary though, as when the VM starts up, we could 
easily bootstrap the Falkon executors to start up and live forever (or 
at least while the VM is running).  We could host the disptacher 
off-site, or even on another EC2 VM... it all depends on how much the 
latency seems to affect Falkon performance. 

I assume that EC2 VMs have a public IP space that is not behind some 
site-wide firewall, right?  If not, then this could be a problem.

Ioan

Ben Clifford wrote:
> On Wed, 16 May 2007, Ian Foster wrote:
>
>   
>> If we configure the virtual cluster with a full LRM, as you propose (and 
>> it seems have already done--great work!), then we can use this to start 
>> Falkon executors--as we do today on regular clusters. So it seems to me 
>> that we have all we need. How about you and Ioan spend your time on 
>> Thursday running something on EC2, to make sure it sorks?
>>     
>
>   
>> Regarding choice of LRM: have you looked at SGE? That is what quite a 
>> few others seem to be using.
>>     
>
> That's probably a bunch of most unnecessary extra weight (== trouble) if 
> the images are specifically intended for use as swift+falkon. But useful 
> to have round if people want to do other things too.
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070516/54ae78bc/attachment.html>

From iraicu at cs.uchicago.edu  Wed May 16 12:09:54 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 16 May 2007 12:09:54 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <1179306421.2402.12.camel@blabla.mcs.anl.gov>
References: <4649D280.5080906@mcs.anl.gov>	
	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>	
	<464A231E.9040708@cs.uchicago.edu>	
	<Pine.LNX.4.64.0705152317520.22628@dildano.hawaga.org.uk>
	<1179306421.2402.12.camel@blabla.mcs.anl.gov>
Message-ID: <464B3AE2.6020300@cs.uchicago.edu>

One of the 2 main motivations for Falkon was the data management.  We 
saw early on that we need to couple the compute and data resource 
management, and that is what we are doing as we push forward with 
Falkon.  Falkon should be something that could be usable by other 
applications, that don't have all the smarts of Swift, that simply want 
to run jobs efficiently and have the data management abstracted away. 

The main idea is that Swift's data management will likely still be 
needed (at a site level), but Falkon can push that further to the 
physical node level.  Swift and Falkon will likely evolve independently, 
but if we work together, we can ensure that they can inter-operate, as 
thy do today!

Ioan

Mihael Hategan wrote:
> I think we're moving towards a scenario in which Falkon does
> increasingly more things that it wasn't supposed to do. That includes
> scheduling and data management (which, is a tricky business if we look
> at the necessity for throttling, error handling and other management
> issues).
> I'm not sure if this is a good idea from an engineering standpoint.
>
> Mihael
>
> On Tue, 2007-05-15 at 23:24 +0000, Ben Clifford wrote:
>   
>> On Tue, 15 May 2007, Ioan Raicu wrote:
>>
>>     
>>> If we can get the data caching working in Falkon, we might be able to 
>>> run Swift over Falkon without a shared file system.  This is still work 
>>> in progress, but we might be closer to achieving this that not.  BTW, 
>>> the data caching would mean that Swift does not stage in any data 
>>> anymore, but wold essentially stand up a GridFTP server from where 
>>> Falkon workers would get the needed data just when they need it.  We are 
>>> still ironing out all this stuff, but it could potentially do away with 
>>> the shared file sytem assumption.
>>>       
>> In the longer term, Swift possibly won't have its input data on the 
>> submitting system - for example, if data is mapped from remote gridftp 
>> servers, then it should be transferred directly from those ftp servers to 
>> the execute side (perhaps to a shared filesystem, perhaps direct to a 
>> worker node), and output data should be transferred back fairly directly, 
>> rather than going via the submit system.
>>
>> If Falkon is doing its own 'interesting' data movement stuff, then it 
>> would probably be a good idea for it to mesh in with what Swift (eg. swift 
>> provides a list of stage-these-in and stage-these-out URLs or something 
>> like that and has various ways of performing that, such as submitting a 
>> transfer job, or passing that information onto falkon)
>>
>>     
>
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070516/1dd7b778/attachment.html>

From iraicu at cs.uchicago.edu  Wed May 16 12:15:07 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 16 May 2007 12:15:07 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464B1402.9040405@mcs.anl.gov>
References: <4649D280.5080906@mcs.anl.gov>	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk><20070515154500.ad1600bf.tfreeman@mcs.anl.gov><464A24AF.7080801@cs.uchicago.edu>
	<464A8857.90800@mcs.anl.gov>
	<356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>
	<464B1402.9040405@mcs.anl.gov>
Message-ID: <464B3C1B.2040506@cs.uchicago.edu>


Kate Keahey wrote:
>
>
> Ian Foster wrote:
>> Kate:
>>
>> If we configure the virtual cluster with a full LRM, as you propose 
>> (and it seems have already done--great work!), then we can use this 
>> to start Falkon executors--as we do today on regular clusters. So it 
>> seems to me that we have all we need. How about you and Ioan spend 
>> your time on Thursday running something on EC2, to make sure it sorks?
>
> As I suggest below, I think it would be easiest if we could deploy and 
> debug a small static cluster locally first, and we can probably give 
> it a shot tomorrow. We still don't have access to the Xen nodes on 
> TeraPort (although hopefully that might change by tomorrow) but I 
> asked Rick to rebuild a couple of nodes at ANL and he did, I think for 
> a test that should give us enough resources to play with.
If someone (Kate, Borja, Ian, anyone) has an account on EC2 and S3 so we 
can try a demo run tomorrow, I think it would be very beneficial!  Do we 
have images created that would run on EC2?  Can we easily modify them so 
we can include the necessary software, or at least once we start them 
up, we can upload the necessary software needed b Falkon (JVM, Falkon 
executor, some GT4 libraries).
>
> At the same time -- if there are multiple ways of doing this, and 
> perhaps better ways than simply using a virtual cluster, we should 
> discuss them now. It is not completely clear to me what the 
> relationship between Falkon and Swift is, and what the specific 
> objectives are (other than that dynamically provisioning resources is 
> required). It looks at this point like the objectives probably overlap 
> with what Ioan, Borja and I wanted to talk about (which I thought was 
> a separate project, but am thrilled to find out is related) so how 
> about we come up with a design tomorrow and post the notes on this 
> list (is this a good venue for that?) and then others can shoot them 
> down.
>
>> Regarding choice of LRM: have you looked at SGE? That is what quite a 
>> few others seem to be using.
>
> Yes, we have. We also collaborate with others who do, as well as with 
> Sun... As you may remember, Borja did the scheduling work for his 
> thesis in the context of SGE. Last time we talked though, Torque was 
> the scheduler of choice for the virtual cluster LRM so we used that.
>
> The usage of SGE you are referring to above -- is this in the context 
> of virtualization projects, or as LRM for various Falkon-related 
> applications?
Falkon relies on LRMs to get resource allocations, and bootstrap.  We 
have not interfaced with any specific LRMs, but use GRAM to abstract 
this away. 
>
>>
>> Ian
>>
>>
>>
>> Sent via BlackBerry from T-Mobile 
>> -----Original Message-----
>> From: Kate Keahey <keahey at mcs.anl.gov>
>> Date: Tue, 15 May 2007 23:28:07 To:iraicu at cs.uchicago.edu
>> Cc:swift-devel at ci.uchicago.edu
>> Subject: Re: [Swift-devel] swift-on-ec2
>>
>> First -- this is a very useful discussion, would it be possible to 
>> see all of it. We need to understand the requirements and trade-offs 
>> in some detail to figure out the best way to make this work. I see a 
>> few different interaction threads somewhat mixed up here though so 
>> just to make sure we are all on the same wavelength, here is some 
>> context.
>>
>> Ian and I have been talking on and off about providing a workspace 
>> service implementation with EC2 backend. The benefit for that would 
>> be that users could deploy the same VMs using the same interface to 
>> either TeraPort or EC2 or yet another resource provider. The 
>> workspace service would also provide some features on top of EC2 
>> (translating between PKI credentials and Amazon's paying accounts, 
>> contextualization as needed to make deployment dynamic). One 
>> application of interest for this was Swift. Last time we chatted 
>> about this though was in the context of using EC2 to provide a 
>> production platform for STAR runs (since virtualizing enough TeraPort 
>> to provide a production platform is taking a long time). This is what 
>> Tim and I are trying to make happen now.
>>
>> Since there was also interest in running Swift in VMs, Mike, Tibi and 
>> I met around February/March and agreed that a reasonable way to 
>> proceed will be for us to stand up a base virtual cluster somewhere 
>> locally (e.g., a static deployment on TeraPort) so that they can 
>> finish the configuration according to their needs, look at 
>> performance, figure out the best way to interact with it, and make 
>> sure that there are no VM-induced gotchas. All of this will be much 
>> easier to assess locally and on a static deployment. Then we'd make 
>> sure the cluster is dynamically deployable using the workspace 
>> service (on EC2 or whatever other provider). During the meeting (and 
>> over following emails) we agreed that the required "base cluster" 
>> would be configured with GRAM/Torque on the headnode plus a number of 
>> worker nodes, plus root privileges. We configured this cluster and it 
>> is ready to deploy. Are you saying now that in fact something 
>> different is needed?
>>
>> As Ian says, Borja and I were planning to meet with Ioan on Thursday 
>> to discuss interaction between Falkon and the workspace service (not 
>> necessarily/exclusively in the EC2 context). I don't completely 
>> understand the relationship between swift and falkon -- are there 
>> specific applications or scenarios that you are trying to target in 
>> this exercise?
>>
>> Ioan Raicu wrote:
>>> Hi,
>>> See below:
>>>
>>> Tim Freeman wrote:
>>>> On Tue, 15 May 2007 16:20:03 +0000 (GMT)
>>>> Ben Clifford <benc at hawaga.org.uk> wrote:
>>>>
>>>>  
>>>>> Ian asked about this elsewhere, but its perhaps interesting for 
>>>>> swift-devel people to look at the questions too.
>>>>>
>>>>> On Tue, 15 May 2007, Ian Foster wrote:
>>>>>
>>>>>   
>>>>>> Dear All:
>>>>>>       
>>>>>                                                                                 
>>>>>   
>>>>>> I asked Kate if she and Tim could look into creating VM images 
>>>>>> that would allow us to run Swift applications on Amazon EC2. I 
>>>>>> think Kate is meeting with Ioan about this on Thursday (?).
>>>>>>       
>>>>>                                                                                 
>>>>>   
>>>>>> One issue that I thought would be good to discuss is what we'd 
>>>>>> want in that VM image. Perhaps this is obvious to the rest of 
>>>>>> you, but it isn't to me. A few thoughts:
>>>>>>       * I'm assuming that we want to run "workers" on EC2 nodes, 
>>>>>> and have the
>>>>>> "task dispatch" logic run on some external frontend system 
>>>>>> outside EC2.
>>>>>>       * I would think that we want to use Falkon to do the task 
>>>>>> dispatch. If so,
>>>>>> we need a Falkon executor on each VM, configured to check in with 
>>>>>> the Falkon
>>>>>> dispatcher. (Alternatively, we could use, say, SGE: in that case, 
>>>>>> we would
>>>>>> want an SGE agent.)
>>>>>>       *  We need a way of getting data to and from the worker 
>>>>>> nodes. Do we want to
>>>>>> run a file system across the EC2 nodes and the external frontend 
>>>>>> node? That
>>>>>> seems rather inefficient. Other options?
>>>>>>       * Should we preload the application code on each EC2 node?
>>>>>>       
>>>>> Here's a couple of approaches:
>>>>>
>>>>>  1) swift regards all the EC2 nodes that we are paying for as a 
>>>>> single     site.
>>>>>
>>>>> Something like falkon handles all the task dispatch and worker 
>>>>> node management. I don't know what that looks like at the moment 
>>>>> in Falkon, but the interface for Swift to send jobs into Falkon 
>>>>> sounds pretty straightforward and shouldn't need changing.
>>>>>     
>>>> So if I understand, here there would be no gateway+LRM but each EC2 
>>>> node +
>>>> Falkon would need a port open to receive tasks?  Or does each node 
>>>> pull down
>>>> instructions OK from behind a firewall?
>>>>   
>>> Falkon supports both polling and notifications.  To use 
>>> notifications, there needs to be an open port on the worker :(
>>>> Is there a latency problem with running each node as an indepdent task
>>>> receiver with the dispatcher off-site from EC2?  I would think it 
>>>> would be
>>>> better to put the queues to fill with tasks on EC2 so it can more 
>>>> quickly get
>>>> the task going when a node is done with a previous task (I may be 
>>>> missing some
>>>> nuances here with respect to Falkon, don't know much about this 
>>>> yet!).   
>>> We have run the Falkon dispatcher at UChicago and workers at ANL 
>>> without any issues, so it can easily tolerate a few ms of latency.  
>>> We haven't tried it across 10s of ms of latency links, but my 
>>> instinct says that if you have enough workers, you might be able to 
>>> hide the latency.  We'd have to experiment with it to see what 
>>> happens.  We could potentially do some experiments between SDSC and 
>>> ANL over a 50+ ms link, and see what difference in throughputs we get.
>>>
>>> Ioan
>>>> If a gateway node is desired, this option sounds a lot like the 
>>>> GRAM+LRM
>>>> situation we use on VMs with the workspace service and will soon 
>>>> use on EC2 via
>>>> the workspace EC2 gateway we're implementing.  Start up one gateway 
>>>> node and
>>>> then add compute nodes which dynamically join the pool, they are 
>>>> pointed to the
>>>> GRAM node.
>>>>
>>>>  
>>>>> All the nodes in a site are required by our site model to have a 
>>>>> shared filesystem - we've talked about removing it, but I think 
>>>>> that is still the case and if so, isn't going to change soon.     
>>>> Setting up a shared filesystem in this environment is akin to 
>>>> setting up the
>>>> compute nodes to join an LRM pool.  The VMs can communicate over 
>>>> the private
>>>> network at EC2, you can instruct EC2 to let all the nodes be open 
>>>> to each other
>>>> (while simultaneously keeping a separate policy of blocking ports 
>>>> from being
>>>> open from the internet and other people's EC2 nodes).  The 
>>>> non-file-serving
>>>> nodes would simply need to know the private address of the 
>>>> filesystem server
>>>> (unless you are using a fancier network file system than NFS-style 
>>>> ones).
>>>> For background: every VM on EC2 currently gets a public address -- 
>>>> NAT'd to a
>>>> private address which is actually what the VM's one NIC is 
>>>> configured with.
>>>> There is a facility to open/forward specific network ports on the 
>>>> public
>>>> address to each VM either via a group policy or on a VM by VM basis.
>>>>
>>>> [...] 
>>>>> Amazon also has a storage cloud, alongside its compute cloud. I 
>>>>> know very little about that and have never thought about how it 
>>>>> would fit into the above (if at all). Maybe someone else knows more.
>>>>>     
>>>> A VM template on EC2 is called an AMI which stands for Amazon 
>>>> Machine Image.
>>>> This is just a packaging thing but what it mostly means is that the 
>>>> VM is
>>>> stored on S3 and also registered into the EC2 system.
>>>>
>>>> When starting an instance of an AMI, the file is copied from S3 to the
>>>> hypervisor node (what we call propagation in the workspace 
>>>> service).  After it
>>>> is used, this file is deleted (an option in the workspace service 
>>>> but there is
>>>> also an option to save it back with any changes). So the VMs are 
>>>> stored in S3 but anything that happens on them after being
>>>> started is lost unless you manually do something about it.
>>>>
>>>> As for free scratch space, you get a good amount per node, 140G.  
>>>> But the node
>>>> could go down at any moment just like a physical resource.
>>>>
>>>> To harness S3 for safely persisting any data (or if you need more 
>>>> space) you
>>>> would need to actually run S3 clients on the VMs when they are run 
>>>> on EC2.  You
>>>> could alternatively mirror data between nodes assuming that all 
>>>> would not go
>>>> down at once.
>>>> The good thing is that you do not pay transfer costs between S3 and 
>>>> EC2 if you
>>>> chose to use S3 for big storage, you would only pay the "housing 
>>>> fees" so to
>>>> speak.
>>>> Tim
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>
>>>>   
>>
>

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From keahey at mcs.anl.gov  Wed May 16 12:19:23 2007
From: keahey at mcs.anl.gov (Kate Keahey)
Date: Wed, 16 May 2007 12:19:23 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464B3776.2010700@cs.uchicago.edu>
References: <4649D280.5080906@mcs.anl.gov>
	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>
	<20070515154500.ad1600bf.tfreeman@mcs.anl.gov>
	<464A24AF.7080801@cs.uchicago.edu> <464A8857.90800@mcs.anl.gov>
	<Pine.LNX.4.64.0705160829390.22628@dildano.hawaga.org.uk>
	<464B3776.2010700@cs.uchicago.edu>
Message-ID: <464B3D1B.7080908@mcs.anl.gov>

I agree that we are talking of a new model that allows a better 
separation between provisioning resources and task management -- the 
interesting aspect of this is that what we are talking about is 
combining coarse-grained provisioning combined with very light-weight 
task management.

In terms of "always" being able to get resources if you pay for them -- 
there really are no miracles though. EC2 will run out of resources 
eventually just like any other provider does, payment is just a 
different way of managing policies. There is interesting work out of the 
HP quartermaster project though that predicts resource demand and the 
tycoon work of course shows how if people need resources they could 
always just bid higher.

And then -- although we deviate from the traditional batch-scheduling, I 
don't think it will go away anytime soon ;-). The interesting challenge 
(what Borja is working on) is how to combine those two models for Grid 
communities.

Ioan Raicu wrote:

> 
> Falkon is certainly about getting more performance from the same hardware. 
> 
> EC2 on the other hand is more about a new paradigm of how resources are 
> acquired.  In the batch-scheduled world, the demand for resources is 
> usually higher than the supply.  In EC2, its likely that the supply for 
> resources is higher than the demand.  With that said, it means that with 
> EC2, it is likely that you could always get more resources now if you 
> were willing to pay for them... this could have implications on the 
> resource allocation and management policies that govern when it makes 
> sense to get more resources and when not to.  Using EC2 might be about 
> performance, but the really interesting part that I see emerging is a 
> new model that deviates from the traditional batch-scheduled systems the 
> Grid community has grown accustomed to.
> 
> Ioan
> 
> -- 
> ============================================
> Ioan Raicu
> Ph.D. Student
> ============================================
> Distributed Systems Laboratory
> Computer Science Department
> University of Chicago
> 1100 E. 58th Street, Ryerson Hall
> Chicago, IL 60637
> ============================================
> Email: iraicu at cs.uchicago.edu
> Web:   http://www.cs.uchicago.edu/~iraicu
>        http://dsl.cs.uchicago.edu/
> ============================================
> ============================================
> 

-- 

Kate Keahey,
Mathematics & CS Division, Argonne National Laboratory
Computation Institute, University of Chicago


From keahey at mcs.anl.gov  Wed May 16 12:20:43 2007
From: keahey at mcs.anl.gov (Kate Keahey)
Date: Wed, 16 May 2007 12:20:43 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <20070516120304.19151d46.tfreeman@mcs.anl.gov>
References: <4649D280.5080906@mcs.anl.gov>	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>	<20070515154500.ad1600bf.tfreeman@mcs.anl.gov>	<464A24AF.7080801@cs.uchicago.edu>	<464A8857.90800@mcs.anl.gov>	<Pine.LNX.4.64.0705160829390.22628@dildano.hawaga.org.uk>	<464B3776.2010700@cs.uchicago.edu>
	<20070516120304.19151d46.tfreeman@mcs.anl.gov>
Message-ID: <464B3D6B.5090801@mcs.anl.gov>

Ah, yes, the next thing they will allow people to bid... ;-).

Tim Freeman wrote:

> 
> That's not entirely true at this particular point in time:
> 
> http://www.pcworld.com/article/id,130832-c,webservices/article.html
> 
> "We hate being capacity-constrained," Bezos said. "It's not the right way to
> run a business. We are trying to get ourselves in a position with EC2 where we
> will be demand-constrained instead of capacity-constrained."
> 
> 
>> ... this could have implications on the 
>> resource allocation and management policies that govern when it makes 
>> sense to get more resources and when not to.
> 
> Right now for example, we're programming a little feature into the workspace-EC2
> gateway that limits the amount of money an entity can spend :-) 
> 
> Tim 

-- 

Kate Keahey,
Mathematics & CS Division, Argonne National Laboratory
Computation Institute, University of Chicago


From iraicu at cs.uchicago.edu  Wed May 16 12:29:43 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 16 May 2007 12:29:43 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464B1740.3060808@mcs.anl.gov>
References: <4649D280.5080906@mcs.anl.gov>
	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>
	<20070515154500.ad1600bf.tfreeman@mcs.anl.gov>
	<464A24AF.7080801@cs.uchicago.edu> <464A8857.90800@mcs.anl.gov>
	<Pine.LNX.4.64.0705160829390.22628@dildano.hawaga.org.uk>
	<464B1740.3060808@mcs.anl.gov>
Message-ID: <464B3F87.9090708@cs.uchicago.edu>

Well, the dynamic provisioning assumes that Falkon is acquiring 
resources when it needs them.  This implies that it knows how to talk to 
the EC2 service, and it knows how to bootstrap a VM that has the 
necessary Falkon software stack.

I was actually hoping (at least in the short term) that static resource 
provisioning could be handled by the workspace service, talking to the 
EC2 service and bootstraping the VM (with the necesarry Falkon stack), 
and then once the Falkon executors register with the Falkon dispatcher, 
then Falkon handles the lightweight job management (in place of a 
traditional LRM). 

The provisioning to EC2 could be pushed onto Falkon in the future, but 
it is not currently on my immediate list of things to-do list.

Ioan

Kate Keahey wrote:
> Thanks Ben, this helps a lot! So it seems to me like we are talking 
> about combining dynamic provisioning with lightweight job management 
> which should be pluggable into swift.
>
> Ben Clifford wrote:
>> On Tue, 15 May 2007, Kate Keahey wrote:
>>
>>> As Ian says, Borja and I were planning to meet with Ioan on Thursday 
>>> to discuss interaction between Falkon and the workspace service (not 
>>> necessarily/exclusively in the EC2 context). I don't completely 
>>> understand the relationship between swift and falkon -- are there 
>>> specific applications or scenarios that you are trying to target in 
>>> this exercise?
>>
>> By virtue of the fact that they come from pretty much the same group 
>> of people, they're somewhat fuzzily related - but pretty much swift 
>> is generating (over the duration of its execution, rather than in one 
>> batch) a bunch of jobs that need executing (as well, as various 
>> things like file transfers). As it generates them, it sends them off 
>> to be executed. The official ways that are 'supported' by Swift are 
>> by executing them on the local machine and by sending them off 
>> through GRAM; however, people can plug in whatever they want to do 
>> submissions.
>>
>> I know less about Falkon because it isn't Swift, but the Falkon side 
>> of things is pretty much about running a bunch of jobs - it plugs 
>> into the abovementioned place in Swift so that Swift gives Falkon 
>> jobs to run, and Falkon runs them (with a goal of Falkon being, 
>> presumably, to run it much more efficiently than if they were 
>> submitted straight through GRAM - it seems to do pretty well).
>>
>> There's two things going on with swift - one is about making it 
>> straightforward to use at the low end of things, so that people can 
>> start using it easily - for the most part, that isn't interesting in 
>> itself; the other is about getting it to perform well at the high end 
>> of things, which is where the fun research is. Using Falkon and using 
>> EC2 are both on that side of things.
>>
>

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From yongzh at cs.uchicago.edu  Wed May 16 12:31:05 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Wed, 16 May 2007 12:31:05 -0500 (CDT)
Subject: [Swift-devel] mappers on files that are inputs and outputs
In-Reply-To: <Pine.LNX.4.64.0705161506410.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705161506410.22628@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.58.0705161230160.3799@classes.cs.uchicago.edu>

This sounds strange, it should be able to map the img and hdr files
correctly to fields atlas.img and atlas.hdr.

Can you enable detailed logging?

Yong.

On Wed, 16 May 2007, Ben Clifford wrote:

>
> Here's a code fragment:
>
>   type volume {
>       imagefile img;
>       headerfile hdr;
>   };
>
>   volume atlas <simple_mapper;prefix="atlas">;
>   atlas = softmean(slices);
>
>   string directions[] = [ "x", "y", "z"];
>
>   foreach direction in directions {
>       giffile outputgif
>           <single_file_mapper;file=@strcat("atlas-",direction,".gif")>;
>       string option = @strcat("-",direction);
>       outputgif = slice_to_gif(atlas, option, ".5");
>   }
>
> When this is run as part of a workflow, there are no atlas.* files and the
> atlas = softmean(slices) line causes atlas.hdr and atlas.img files to be
> created and placed in my working directory, and also used in the
> subsequent slice_to_gif calls.
>
> If I prune the program in a text editor so that the altas = ... line is
> not called, and leave the atlas.hdr and atlas.img files in place in my
> current directory (so that the files are now input files, rather than
> intermediate files), I get this error:
>
>   $ swift -debug -tc.file tc.data play.swift
>   WARN   - Failed to configure log file name
>
>   Swift v0.1-dev
>
>   RunID: mx49u8a36d1m0
>   Execution failed:
>           java.lang.RuntimeException: Data set initialization failed for
>   true. Missing required field: img mapped to atlas
>
>
> I think its probably a desirable feature that the same mapping that maps
> ok for intermediate files to map for input files too.
>
> --
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From itf at mcs.anl.gov  Wed May 16 12:20:57 2007
From: itf at mcs.anl.gov (=?UTF-8?B?SWFuIEZvc3Rlcg==?=)
Date: Wed, 16 May 2007 17:20:57 +0000
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464B1402.9040405@mcs.anl.gov>
References: <4649D280.5080906@mcs.anl.gov>	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk><20070515154500.ad1600bf.tfreeman@mcs.anl.gov><464A24AF.7080801@cs.uchicago.edu>
	<464A8857.90800@mcs.anl.gov>
	<356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>
	<464B1402.9040405@mcs.anl.gov>
Message-ID: <1583680979-1179336692-cardhu_blackberry.rim.net-2135204572-@bwe032-cell00.bisx.prod.on.blackberry>

Kate:

I personally will be delighted if you could run the virtual cluster on ec2 tomorrow. I know that there are lots of ways that you could refine its config, local expts that could be performed, etc., but perhaps we could try bypassing those things, which seem somewhat like distractions to me?

Ian


Sent via BlackBerry from T-Mobile  

-----Original Message-----
From: Kate Keahey <keahey at mcs.anl.gov>
Date: Wed, 16 May 2007 09:24:02 
To:itf at mcs.anl.gov
Cc:swift-devel-bounces at ci.uchicago.edu, Ioan Raicu <iraicu at cs.uchicago.edu>,  swift-devel at ci.uchicago.edu, Borja Sotomayor <borja at borjanet.com>
Subject: Re: [Swift-devel] swift-on-ec2


Ian Foster wrote:
> Kate:
> 
> If we configure the virtual cluster with a full LRM, as you propose (and it seems have already done--great work!), then we can use this to start Falkon executors--as we do today on regular clusters. So it seems to me that we have all we need. How about you and Ioan spend your time on Thursday running something on EC2, to make sure it sorks?

As I suggest below, I think it would be easiest if we could deploy and 
debug a small static cluster locally first, and we can probably give it 
a shot tomorrow. We still don't have access to the Xen nodes on TeraPort 
(although hopefully that might change by tomorrow) but I asked Rick to 
rebuild a couple of nodes at ANL and he did, I think for a test that 
should give us enough resources to play with.

At the same time -- if there are multiple ways of doing this, and 
perhaps better ways than simply using a virtual cluster, we should 
discuss them now. It is not completely clear to me what the relationship 
between Falkon and Swift is, and what the specific objectives are (other 
than that dynamically provisioning resources is required). It looks at 
this point like the objectives probably overlap with what Ioan, Borja 
and I wanted to talk about (which I thought was a separate project, but 
am thrilled to find out is related) so how about we come up with a 
design tomorrow and post the notes on this list (is this a good venue 
for that?) and then others can shoot them down.

> Regarding choice of LRM: have you looked at SGE? That is what quite a few others seem to be using.

Yes, we have. We also collaborate with others who do, as well as with 
Sun... As you may remember, Borja did the scheduling work for his thesis 
in the context of SGE. Last time we talked though, Torque was the 
scheduler of choice for the virtual cluster LRM so we used that.

The usage of SGE you are referring to above -- is this in the context of 
virtualization projects, or as LRM for various Falkon-related applications?

> 
> Ian
> 
> 
> 
> Sent via BlackBerry from T-Mobile  
> 
> -----Original Message-----
> From: Kate Keahey <keahey at mcs.anl.gov>
> Date: Tue, 15 May 2007 23:28:07 
> To:iraicu at cs.uchicago.edu
> Cc:swift-devel at ci.uchicago.edu
> Subject: Re: [Swift-devel] swift-on-ec2
> 
> First -- this is a very useful discussion, would it be possible to see 
> all of it. We need to understand the requirements and trade-offs in some 
> detail to figure out the best way to make this work. I see a few 
> different interaction threads somewhat mixed up here though so just to 
> make sure we are all on the same wavelength, here is some context.
> 
> Ian and I have been talking on and off about providing a workspace 
> service implementation with EC2 backend. The benefit for that would be 
> that users could deploy the same VMs using the same interface to either 
> TeraPort or EC2 or yet another resource provider. The workspace service 
> would also provide some features on top of EC2 (translating between PKI 
> credentials and Amazon's paying accounts, contextualization as needed to 
> make deployment dynamic). One application of interest for this was 
> Swift. Last time we chatted about this though was in the context of 
> using EC2 to provide a production platform for STAR runs (since 
> virtualizing enough TeraPort to provide a production platform is taking 
> a long time). This is what Tim and I are trying to make happen now.
> 
> Since there was also interest in running Swift in VMs, Mike, Tibi and I 
> met around February/March and agreed that a reasonable way to proceed 
> will be for us to stand up a base virtual cluster somewhere locally 
> (e.g., a static deployment on TeraPort) so that they can finish the 
> configuration according to their needs, look at performance, figure out 
> the best way to interact with it, and make sure that there are no 
> VM-induced gotchas. All of this will be much easier to assess locally 
> and on a static deployment. Then we'd make sure the cluster is 
> dynamically deployable using the workspace service (on EC2 or whatever 
> other provider). During the meeting (and over following emails) we 
> agreed that the required "base cluster" would be configured with 
> GRAM/Torque on the headnode plus a number of worker nodes, plus root 
> privileges. We configured this cluster and it is ready to deploy. Are 
> you saying now that in fact something different is needed?
> 
> As Ian says, Borja and I were planning to meet with Ioan on Thursday to 
> discuss interaction between Falkon and the workspace service (not 
> necessarily/exclusively in the EC2 context). I don't completely 
> understand the relationship between swift and falkon -- are there 
> specific applications or scenarios that you are trying to target in this 
> exercise?
> 
> Ioan Raicu wrote:
>> Hi,
>> See below:
>>
>> Tim Freeman wrote:
>>> On Tue, 15 May 2007 16:20:03 +0000 (GMT)
>>> Ben Clifford <benc at hawaga.org.uk> wrote:
>>>
>>>  
>>>> Ian asked about this elsewhere, but its perhaps interesting for 
>>>> swift-devel people to look at the questions too.
>>>>
>>>> On Tue, 15 May 2007, Ian Foster wrote:
>>>>
>>>>    
>>>>> Dear All:
>>>>>       
>>>>                                                                                 
>>>>    
>>>>> I asked Kate if she and Tim could look into creating VM images that 
>>>>> would allow us to run Swift applications on Amazon EC2. I think Kate 
>>>>> is meeting with Ioan about this on Thursday (?).
>>>>>       
>>>>                                                                                 
>>>>    
>>>>> One issue that I thought would be good to discuss is what we'd want 
>>>>> in that VM image. Perhaps this is obvious to the rest of you, but it 
>>>>> isn't to me. A few thoughts:
>>>>>       * I'm assuming that we want to run "workers" on EC2 nodes, and 
>>>>> have the
>>>>> "task dispatch" logic run on some external frontend system outside EC2.
>>>>>       * I would think that we want to use Falkon to do the task 
>>>>> dispatch. If so,
>>>>> we need a Falkon executor on each VM, configured to check in with 
>>>>> the Falkon
>>>>> dispatcher. (Alternatively, we could use, say, SGE: in that case, we 
>>>>> would
>>>>> want an SGE agent.)
>>>>>       *  We need a way of getting data to and from the worker nodes. 
>>>>> Do we want to
>>>>> run a file system across the EC2 nodes and the external frontend 
>>>>> node? That
>>>>> seems rather inefficient. Other options?
>>>>>       * Should we preload the application code on each EC2 node?
>>>>>       
>>>> Here's a couple of approaches:
>>>>
>>>>  1) swift regards all the EC2 nodes that we are paying for as a 
>>>> single     site.
>>>>
>>>> Something like falkon handles all the task dispatch and worker node 
>>>> management. I don't know what that looks like at the moment in 
>>>> Falkon, but the interface for Swift to send jobs into Falkon sounds 
>>>> pretty straightforward and shouldn't need changing.
>>>>     
>>> So if I understand, here there would be no gateway+LRM but each EC2 
>>> node +
>>> Falkon would need a port open to receive tasks?  Or does each node 
>>> pull down
>>> instructions OK from behind a firewall?
>>>   
>> Falkon supports both polling and notifications.  To use notifications, 
>> there needs to be an open port on the worker :(
>>> Is there a latency problem with running each node as an indepdent task
>>> receiver with the dispatcher off-site from EC2?  I would think it 
>>> would be
>>> better to put the queues to fill with tasks on EC2 so it can more 
>>> quickly get
>>> the task going when a node is done with a previous task (I may be 
>>> missing some
>>> nuances here with respect to Falkon, don't know much about this yet!).   
>> We have run the Falkon dispatcher at UChicago and workers at ANL without 
>> any issues, so it can easily tolerate a few ms of latency.  We haven't 
>> tried it across 10s of ms of latency links, but my instinct says that if 
>> you have enough workers, you might be able to hide the latency.  We'd 
>> have to experiment with it to see what happens.  We could potentially do 
>> some experiments between SDSC and ANL over a 50+ ms link, and see what 
>> difference in throughputs we get.
>>
>> Ioan
>>> If a gateway node is desired, this option sounds a lot like the GRAM+LRM
>>> situation we use on VMs with the workspace service and will soon use 
>>> on EC2 via
>>> the workspace EC2 gateway we're implementing.  Start up one gateway 
>>> node and
>>> then add compute nodes which dynamically join the pool, they are 
>>> pointed to the
>>> GRAM node.
>>>
>>>  
>>>> All the nodes in a site are required by our site model to have a 
>>>> shared filesystem - we've talked about removing it, but I think that 
>>>> is still the case and if so, isn't going to change soon.     
>>> Setting up a shared filesystem in this environment is akin to setting 
>>> up the
>>> compute nodes to join an LRM pool.  The VMs can communicate over the 
>>> private
>>> network at EC2, you can instruct EC2 to let all the nodes be open to 
>>> each other
>>> (while simultaneously keeping a separate policy of blocking ports from 
>>> being
>>> open from the internet and other people's EC2 nodes).  The 
>>> non-file-serving
>>> nodes would simply need to know the private address of the filesystem 
>>> server
>>> (unless you are using a fancier network file system than NFS-style ones).
>>> For background: every VM on EC2 currently gets a public address -- 
>>> NAT'd to a
>>> private address which is actually what the VM's one NIC is configured 
>>> with.
>>> There is a facility to open/forward specific network ports on the public
>>> address to each VM either via a group policy or on a VM by VM basis.
>>>
>>> [...]  
>>>> Amazon also has a storage cloud, alongside its compute cloud. I know 
>>>> very little about that and have never thought about how it would fit 
>>>> into the above (if at all). Maybe someone else knows more.
>>>>     
>>> A VM template on EC2 is called an AMI which stands for Amazon Machine 
>>> Image.
>>> This is just a packaging thing but what it mostly means is that the VM is
>>> stored on S3 and also registered into the EC2 system.
>>>
>>> When starting an instance of an AMI, the file is copied from S3 to the
>>> hypervisor node (what we call propagation in the workspace service).  
>>> After it
>>> is used, this file is deleted (an option in the workspace service but 
>>> there is
>>> also an option to save it back with any changes). 
>>> So the VMs are stored in S3 but anything that happens on them after being
>>> started is lost unless you manually do something about it.
>>>
>>> As for free scratch space, you get a good amount per node, 140G.  But 
>>> the node
>>> could go down at any moment just like a physical resource.
>>>
>>> To harness S3 for safely persisting any data (or if you need more 
>>> space) you
>>> would need to actually run S3 clients on the VMs when they are run on 
>>> EC2.  You
>>> could alternatively mirror data between nodes assuming that all would 
>>> not go
>>> down at once.
>>> The good thing is that you do not pay transfer costs between S3 and 
>>> EC2 if you
>>> chose to use S3 for big storage, you would only pay the "housing fees" 
>>> so to
>>> speak.
>>> Tim
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>   
> 

-- 

Kate Keahey,
Mathematics & CS Division, Argonne National Laboratory
Computation Institute, University of Chicago


From yongzh at cs.uchicago.edu  Wed May 16 12:34:53 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Wed, 16 May 2007 12:34:53 -0500 (CDT)
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464B3F87.9090708@cs.uchicago.edu>
References: <4649D280.5080906@mcs.anl.gov>
	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>
	<20070515154500.ad1600bf.tfreeman@mcs.anl.gov>
	<464A24AF.7080801@cs.uchicago.edu> <464A8857.90800@mcs.anl.gov>
	<Pine.LNX.4.64.0705160829390.22628@dildano.hawaga.org.uk>
	<464B1740.3060808@mcs.anl.gov> <464B3F87.9090708@cs.uchicago.edu>
Message-ID: <Pine.LNX.4.58.0705161233520.3799@classes.cs.uchicago.edu>

I'd think the the workspace manager should be able to do that, and not
statically, but allocate new virtual nodes as requested.

Yong.

On Wed, 16 May 2007, Ioan Raicu wrote:

> Well, the dynamic provisioning assumes that Falkon is acquiring
> resources when it needs them.  This implies that it knows how to talk to
> the EC2 service, and it knows how to bootstrap a VM that has the
> necessary Falkon software stack.
>
> I was actually hoping (at least in the short term) that static resource
> provisioning could be handled by the workspace service, talking to the
> EC2 service and bootstraping the VM (with the necesarry Falkon stack),
> and then once the Falkon executors register with the Falkon dispatcher,
> then Falkon handles the lightweight job management (in place of a
> traditional LRM).
>
> The provisioning to EC2 could be pushed onto Falkon in the future, but
> it is not currently on my immediate list of things to-do list.
>
> Ioan
>
> Kate Keahey wrote:
> > Thanks Ben, this helps a lot! So it seems to me like we are talking
> > about combining dynamic provisioning with lightweight job management
> > which should be pluggable into swift.
> >
> > Ben Clifford wrote:
> >> On Tue, 15 May 2007, Kate Keahey wrote:
> >>
> >>> As Ian says, Borja and I were planning to meet with Ioan on Thursday
> >>> to discuss interaction between Falkon and the workspace service (not
> >>> necessarily/exclusively in the EC2 context). I don't completely
> >>> understand the relationship between swift and falkon -- are there
> >>> specific applications or scenarios that you are trying to target in
> >>> this exercise?
> >>
> >> By virtue of the fact that they come from pretty much the same group
> >> of people, they're somewhat fuzzily related - but pretty much swift
> >> is generating (over the duration of its execution, rather than in one
> >> batch) a bunch of jobs that need executing (as well, as various
> >> things like file transfers). As it generates them, it sends them off
> >> to be executed. The official ways that are 'supported' by Swift are
> >> by executing them on the local machine and by sending them off
> >> through GRAM; however, people can plug in whatever they want to do
> >> submissions.
> >>
> >> I know less about Falkon because it isn't Swift, but the Falkon side
> >> of things is pretty much about running a bunch of jobs - it plugs
> >> into the abovementioned place in Swift so that Swift gives Falkon
> >> jobs to run, and Falkon runs them (with a goal of Falkon being,
> >> presumably, to run it much more efficiently than if they were
> >> submitted straight through GRAM - it seems to do pretty well).
> >>
> >> There's two things going on with swift - one is about making it
> >> straightforward to use at the low end of things, so that people can
> >> start using it easily - for the most part, that isn't interesting in
> >> itself; the other is about getting it to perform well at the high end
> >> of things, which is where the fun research is. Using Falkon and using
> >> EC2 are both on that side of things.
> >>
> >
>
> --
> ============================================
> Ioan Raicu
> Ph.D. Student
> ============================================
> Distributed Systems Laboratory
> Computer Science Department
> University of Chicago
> 1100 E. 58th Street, Ryerson Hall
> Chicago, IL 60637
> ============================================
> Email: iraicu at cs.uchicago.edu
> Web:   http://www.cs.uchicago.edu/~iraicu
>        http://dsl.cs.uchicago.edu/
> ============================================
> ============================================
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From iraicu at cs.uchicago.edu  Wed May 16 12:35:32 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 16 May 2007 12:35:32 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <20070516120304.19151d46.tfreeman@mcs.anl.gov>
References: <4649D280.5080906@mcs.anl.gov>	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>	<20070515154500.ad1600bf.tfreeman@mcs.anl.gov>	<464A24AF.7080801@cs.uchicago.edu>	<464A8857.90800@mcs.anl.gov>	<Pine.LNX.4.64.0705160829390.22628@dildano.hawaga.org.uk>	<464B3776.2010700@cs.uchicago.edu>
	<20070516120304.19151d46.tfreeman@mcs.anl.gov>
Message-ID: <464B40E4.3020706@cs.uchicago.edu>


Tim Freeman wrote:
> On Wed, 16 May 2007 11:55:18 -0500
> Ioan Raicu <iraicu at cs.uchicago.edu> wrote:
>
>   
>> Hi,
>> I am just catching up with emails from last night...
>>
>> Ben Clifford wrote:
>>     
>>> On Tue, 15 May 2007, Kate Keahey wrote:
>>>
>>>   
>>>       
>>>> As Ian says, Borja and I were planning to meet with Ioan on Thursday to 
>>>> discuss interaction between Falkon and the workspace service (not 
>>>> necessarily/exclusively in the EC2 context). I don't completely 
>>>> understand the relationship between swift and falkon -- are there 
>>>> specific applications or scenarios that you are trying to target in this 
>>>> exercise?
>>>>     
>>>>         
>>> By virtue of the fact that they come from pretty much the same group of 
>>> people, they're somewhat fuzzily related - but pretty much swift is 
>>> generating (over the duration of its execution, rather than in one batch) 
>>> a bunch of jobs that need executing (as well, as various things like file 
>>> transfers). As it generates them, it sends them off to be executed. The 
>>> official ways that are 'supported' by Swift are by executing them on the 
>>> local machine and by sending them off through GRAM; however, people can 
>>> plug in whatever they want to do submissions.
>>>
>>> I know less about Falkon because it isn't Swift, but the Falkon side of 
>>> things is pretty much about running a bunch of jobs - it plugs into the 
>>> abovementioned place in Swift so that Swift gives Falkon jobs to run, and 
>>> Falkon runs them (with a goal of Falkon being, presumably, to run it much 
>>> more efficiently than if they were submitted straight through GRAM - it 
>>> seems to do pretty well).
>>>   
>>>       
>> We intentionally made Falkon's interface and semantics as similar as 
>> possible to that of GRAM, so applications that normally used GRAM could 
>> easily change to Falkon.
>>     
>>> There's two things going on with swift - one is about making it 
>>> straightforward to use at the low end of things, so that people can start 
>>> using it easily - for the most part, that isn't interesting in itself; the 
>>> other is about getting it to perform well at the high end of things, which 
>>> is where the fun research is. Using Falkon and using EC2 are both on that 
>>> side of things.
>>>   
>>>       
>> Right! 
>>
>> Falkon is certainly about getting more performance from the same hardware. 
>>
>> EC2 on the other hand is more about a new paradigm of how resources are 
>> acquired.  In the batch-scheduled world, the demand for resources is 
>> usually higher than the supply.  In EC2, its likely that the supply for 
>> resources is higher than the demand.  With that said, it means that with 
>> EC2, it is likely that you could always get more resources now if you 
>> were willing to pay for them
>>     
>
> That's not entirely true at this particular point in time:
>
> http://www.pcworld.com/article/id,130832-c,webservices/article.html
>
> "We hate being capacity-constrained," Bezos said. "It's not the right way to
> run a business. We are trying to get ourselves in a position with EC2 where we
> will be demand-constrained instead of capacity-constrained."
>
>   
But this doesn't make much sense.  I think these guys get $700 or so a 
year for each VM they run, that means that they are charging more money 
over the lifetime of the machine than it costs to purchase and maintain 
the machine (assuming they are cheap computers).  With this said, it 
seems that they should be adding more resources as the demand grows, so 
they always have resources available if someone asks for them... at 
least that is what I am expecting from such as service.  If this is not 
the case now, I hope it will be in the future!

Ioan
>   
>> ... this could have implications on the 
>> resource allocation and management policies that govern when it makes 
>> sense to get more resources and when not to.
>>     
>
> Right now for example, we're programming a little feature into the workspace-EC2
> gateway that limits the amount of money an entity can spend :-) 
>
> Tim 
>
>
>   
>>   Using EC2 might be about 
>> performance, but the really interesting part that I see emerging is a 
>> new model that deviates from the traditional batch-scheduled systems the 
>> Grid community has grown accustomed to.
>>
>> Ioan
>>     
>
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070516/64720490/attachment.html>

From iraicu at cs.uchicago.edu  Wed May 16 12:37:50 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 16 May 2007 12:37:50 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464B3D6B.5090801@mcs.anl.gov>
References: <4649D280.5080906@mcs.anl.gov>	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>	<20070515154500.ad1600bf.tfreeman@mcs.anl.gov>	<464A24AF.7080801@cs.uchicago.edu>	<464A8857.90800@mcs.anl.gov>	<Pine.LNX.4.64.0705160829390.22628@dildano.hawaga.org.uk>	<464B3776.2010700@cs.uchicago.edu>
	<20070516120304.19151d46.tfreeman@mcs.anl.gov>
	<464B3D6B.5090801@mcs.anl.gov>
Message-ID: <464B416E.30205@cs.uchicago.edu>

Yes, that could certainly make their resource capacity planning easier, 
since as their resources consumption reaches critical levels, they just 
charge more and more, making it unrealistic that all resources will ever 
be consumed!

Ioan

Kate Keahey wrote:
> Ah, yes, the next thing they will allow people to bid... ;-).
>
> Tim Freeman wrote:
>
>>
>> That's not entirely true at this particular point in time:
>>
>> http://www.pcworld.com/article/id,130832-c,webservices/article.html
>>
>> "We hate being capacity-constrained," Bezos said. "It's not the right 
>> way to
>> run a business. We are trying to get ourselves in a position with EC2 
>> where we
>> will be demand-constrained instead of capacity-constrained."
>>
>>
>>> ... this could have implications on the resource allocation and 
>>> management policies that govern when it makes sense to get more 
>>> resources and when not to.
>>
>> Right now for example, we're programming a little feature into the 
>> workspace-EC2
>> gateway that limits the amount of money an entity can spend :-)
>> Tim 
>

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From itf at mcs.anl.gov  Wed May 16 12:45:35 2007
From: itf at mcs.anl.gov (=?UTF-8?B?SWFuIEZvc3Rlcg==?=)
Date: Wed, 16 May 2007 17:45:35 +0000
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464B3F87.9090708@cs.uchicago.edu>
References: <4649D280.5080906@mcs.anl.gov><Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk><20070515154500.ad1600bf.tfreeman@mcs.anl.gov><464A24AF.7080801@cs.uchicago.edu>
	<464A8857.90800@mcs.anl.gov><Pine.LNX.4.64.0705160829390.22628@dildano.hawaga.org.uk><464B1740.3060808@mcs.anl.gov>
	<464B3F87.9090708@cs.uchicago.edu>
Message-ID: <1154116369-1179337647-cardhu_blackberry.rim.net-1593412775-@bwe059-cell00.bisx.prod.on.blackberry>

Yes, that is all true. But let's focus on getting a static virtual cluster on ec2, with swift apps running on it. I am sure this can done tomorrow!


Sent via BlackBerry from T-Mobile  

-----Original Message-----
From: Ioan Raicu <iraicu at cs.uchicago.edu>
Date: Wed, 16 May 2007 12:29:43 
To:Kate Keahey <keahey at mcs.anl.gov>
Cc:swift-devel at ci.uchicago.edu
Subject: Re: [Swift-devel] swift-on-ec2

Well, the dynamic provisioning assumes that Falkon is acquiring 
resources when it needs them.  This implies that it knows how to talk to 
the EC2 service, and it knows how to bootstrap a VM that has the 
necessary Falkon software stack.

I was actually hoping (at least in the short term) that static resource 
provisioning could be handled by the workspace service, talking to the 
EC2 service and bootstraping the VM (with the necesarry Falkon stack), 
and then once the Falkon executors register with the Falkon dispatcher, 
then Falkon handles the lightweight job management (in place of a 
traditional LRM). 

The provisioning to EC2 could be pushed onto Falkon in the future, but 
it is not currently on my immediate list of things to-do list.

Ioan

Kate Keahey wrote:
> Thanks Ben, this helps a lot! So it seems to me like we are talking 
> about combining dynamic provisioning with lightweight job management 
> which should be pluggable into swift.
>
> Ben Clifford wrote:
>> On Tue, 15 May 2007, Kate Keahey wrote:
>>
>>> As Ian says, Borja and I were planning to meet with Ioan on Thursday 
>>> to discuss interaction between Falkon and the workspace service (not 
>>> necessarily/exclusively in the EC2 context). I don't completely 
>>> understand the relationship between swift and falkon -- are there 
>>> specific applications or scenarios that you are trying to target in 
>>> this exercise?
>>
>> By virtue of the fact that they come from pretty much the same group 
>> of people, they're somewhat fuzzily related - but pretty much swift 
>> is generating (over the duration of its execution, rather than in one 
>> batch) a bunch of jobs that need executing (as well, as various 
>> things like file transfers). As it generates them, it sends them off 
>> to be executed. The official ways that are 'supported' by Swift are 
>> by executing them on the local machine and by sending them off 
>> through GRAM; however, people can plug in whatever they want to do 
>> submissions.
>>
>> I know less about Falkon because it isn't Swift, but the Falkon side 
>> of things is pretty much about running a bunch of jobs - it plugs 
>> into the abovementioned place in Swift so that Swift gives Falkon 
>> jobs to run, and Falkon runs them (with a goal of Falkon being, 
>> presumably, to run it much more efficiently than if they were 
>> submitted straight through GRAM - it seems to do pretty well).
>>
>> There's two things going on with swift - one is about making it 
>> straightforward to use at the low end of things, so that people can 
>> start using it easily - for the most part, that isn't interesting in 
>> itself; the other is about getting it to perform well at the high end 
>> of things, which is where the fun research is. Using Falkon and using 
>> EC2 are both on that side of things.
>>
>

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

_______________________________________________
Swift-devel mailing list
Swift-devel at ci.uchicago.edu
http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel


From iraicu at cs.uchicago.edu  Wed May 16 12:49:49 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 16 May 2007 12:49:49 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <Pine.LNX.4.58.0705161233520.3799@classes.cs.uchicago.edu>
References: <4649D280.5080906@mcs.anl.gov>
	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>
	<20070515154500.ad1600bf.tfreeman@mcs.anl.gov>
	<464A24AF.7080801@cs.uchicago.edu> <464A8857.90800@mcs.anl.gov>
	<Pine.LNX.4.64.0705160829390.22628@dildano.hawaga.org.uk>
	<464B1740.3060808@mcs.anl.gov> <464B3F87.9090708@cs.uchicago.edu>
	<Pine.LNX.4.58.0705161233520.3799@classes.cs.uchicago.edu>
Message-ID: <464B443D.3050708@cs.uchicago.edu>

That would make things much simpler, from Falkon's perspective.  
Essentially, if the workspace service offered an interface that Falkon 
to allocate and de-allocate resources (VMs) on demand, then the Falkon 
dynamic resource provisioning could be used as long as Falkon implement 
this new workspace interface instead of the current GRAM interface it 
uses!  Then, the whole EC2 deployment and bootstrapping would be 
offloaded to the worspace service, and only the resource provisioning 
and task dispatch would be done at Falkon, the same as it is today when 
we use GRAM!

Ioan

Yong Zhao wrote:
> I'd think the the workspace manager should be able to do that, and not
> statically, but allocate new virtual nodes as requested.
>
> Yong.
>
> On Wed, 16 May 2007, Ioan Raicu wrote:
>
>   
>> Well, the dynamic provisioning assumes that Falkon is acquiring
>> resources when it needs them.  This implies that it knows how to talk to
>> the EC2 service, and it knows how to bootstrap a VM that has the
>> necessary Falkon software stack.
>>
>> I was actually hoping (at least in the short term) that static resource
>> provisioning could be handled by the workspace service, talking to the
>> EC2 service and bootstraping the VM (with the necesarry Falkon stack),
>> and then once the Falkon executors register with the Falkon dispatcher,
>> then Falkon handles the lightweight job management (in place of a
>> traditional LRM).
>>
>> The provisioning to EC2 could be pushed onto Falkon in the future, but
>> it is not currently on my immediate list of things to-do list.
>>
>> Ioan
>>
>> Kate Keahey wrote:
>>     
>>> Thanks Ben, this helps a lot! So it seems to me like we are talking
>>> about combining dynamic provisioning with lightweight job management
>>> which should be pluggable into swift.
>>>
>>> Ben Clifford wrote:
>>>       
>>>> On Tue, 15 May 2007, Kate Keahey wrote:
>>>>
>>>>         
>>>>> As Ian says, Borja and I were planning to meet with Ioan on Thursday
>>>>> to discuss interaction between Falkon and the workspace service (not
>>>>> necessarily/exclusively in the EC2 context). I don't completely
>>>>> understand the relationship between swift and falkon -- are there
>>>>> specific applications or scenarios that you are trying to target in
>>>>> this exercise?
>>>>>           
>>>> By virtue of the fact that they come from pretty much the same group
>>>> of people, they're somewhat fuzzily related - but pretty much swift
>>>> is generating (over the duration of its execution, rather than in one
>>>> batch) a bunch of jobs that need executing (as well, as various
>>>> things like file transfers). As it generates them, it sends them off
>>>> to be executed. The official ways that are 'supported' by Swift are
>>>> by executing them on the local machine and by sending them off
>>>> through GRAM; however, people can plug in whatever they want to do
>>>> submissions.
>>>>
>>>> I know less about Falkon because it isn't Swift, but the Falkon side
>>>> of things is pretty much about running a bunch of jobs - it plugs
>>>> into the abovementioned place in Swift so that Swift gives Falkon
>>>> jobs to run, and Falkon runs them (with a goal of Falkon being,
>>>> presumably, to run it much more efficiently than if they were
>>>> submitted straight through GRAM - it seems to do pretty well).
>>>>
>>>> There's two things going on with swift - one is about making it
>>>> straightforward to use at the low end of things, so that people can
>>>> start using it easily - for the most part, that isn't interesting in
>>>> itself; the other is about getting it to perform well at the high end
>>>> of things, which is where the fun research is. Using Falkon and using
>>>> EC2 are both on that side of things.
>>>>
>>>>         
>> --
>> ============================================
>> Ioan Raicu
>> Ph.D. Student
>> ============================================
>> Distributed Systems Laboratory
>> Computer Science Department
>> University of Chicago
>> 1100 E. 58th Street, Ryerson Hall
>> Chicago, IL 60637
>> ============================================
>> Email: iraicu at cs.uchicago.edu
>> Web:   http://www.cs.uchicago.edu/~iraicu
>>        http://dsl.cs.uchicago.edu/
>> ============================================
>> ============================================
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>>     
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070516/f4bb5d69/attachment.html>

From keahey at mcs.anl.gov  Wed May 16 12:52:01 2007
From: keahey at mcs.anl.gov (Kate Keahey)
Date: Wed, 16 May 2007 12:52:01 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464B3F87.9090708@cs.uchicago.edu>
References: <4649D280.5080906@mcs.anl.gov>
	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>
	<20070515154500.ad1600bf.tfreeman@mcs.anl.gov>
	<464A24AF.7080801@cs.uchicago.edu> <464A8857.90800@mcs.anl.gov>
	<Pine.LNX.4.64.0705160829390.22628@dildano.hawaga.org.uk>
	<464B1740.3060808@mcs.anl.gov> <464B3F87.9090708@cs.uchicago.edu>
Message-ID: <464B44C1.302@mcs.anl.gov>

Ioan,

Ioan Raicu wrote:
> Well, the dynamic provisioning assumes that Falkon is acquiring 
> resources when it needs them.  This implies that it knows how to talk to 
> the EC2 service, and it knows how to bootstrap a VM that has the 
> necessary Falkon software stack.
> 
> I was actually hoping (at least in the short term) that static resource 
> provisioning could be handled by the workspace service, talking to the 
> EC2 service and bootstraping the VM (with the necesarry Falkon stack), 
> and then once the Falkon executors register with the Falkon dispatcher, 
> then Falkon handles the lightweight job management (in place of a 
> traditional LRM).

Yes, this is exactly what I was also thinking. My point below is that 
the combined infrastructure would fit into the swift.

> The provisioning to EC2 could be pushed onto Falkon in the future, but 
> it is not currently on my immediate list of things to-do list.
> 
> Ioan
> 
> Kate Keahey wrote:
>> Thanks Ben, this helps a lot! So it seems to me like we are talking 
>> about combining dynamic provisioning with lightweight job management 
>> which should be pluggable into swift.
>>
>> Ben Clifford wrote:
>>> On Tue, 15 May 2007, Kate Keahey wrote:
>>>
>>>> As Ian says, Borja and I were planning to meet with Ioan on Thursday 
>>>> to discuss interaction between Falkon and the workspace service (not 
>>>> necessarily/exclusively in the EC2 context). I don't completely 
>>>> understand the relationship between swift and falkon -- are there 
>>>> specific applications or scenarios that you are trying to target in 
>>>> this exercise?
>>>
>>> By virtue of the fact that they come from pretty much the same group 
>>> of people, they're somewhat fuzzily related - but pretty much swift 
>>> is generating (over the duration of its execution, rather than in one 
>>> batch) a bunch of jobs that need executing (as well, as various 
>>> things like file transfers). As it generates them, it sends them off 
>>> to be executed. The official ways that are 'supported' by Swift are 
>>> by executing them on the local machine and by sending them off 
>>> through GRAM; however, people can plug in whatever they want to do 
>>> submissions.
>>>
>>> I know less about Falkon because it isn't Swift, but the Falkon side 
>>> of things is pretty much about running a bunch of jobs - it plugs 
>>> into the abovementioned place in Swift so that Swift gives Falkon 
>>> jobs to run, and Falkon runs them (with a goal of Falkon being, 
>>> presumably, to run it much more efficiently than if they were 
>>> submitted straight through GRAM - it seems to do pretty well).
>>>
>>> There's two things going on with swift - one is about making it 
>>> straightforward to use at the low end of things, so that people can 
>>> start using it easily - for the most part, that isn't interesting in 
>>> itself; the other is about getting it to perform well at the high end 
>>> of things, which is where the fun research is. Using Falkon and using 
>>> EC2 are both on that side of things.
>>>
>>
> 

-- 

Kate Keahey,
Mathematics & CS Division, Argonne National Laboratory
Computation Institute, University of Chicago


From keahey at mcs.anl.gov  Wed May 16 13:16:14 2007
From: keahey at mcs.anl.gov (Kate Keahey)
Date: Wed, 16 May 2007 13:16:14 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <1583680979-1179336692-cardhu_blackberry.rim.net-2135204572-@bwe032-cell00.bisx.prod.on.blackberry>
References: <4649D280.5080906@mcs.anl.gov>	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk><20070515154500.ad1600bf.tfreeman@mcs.anl.gov><464A24AF.7080801@cs.uchicago.edu>
	<464A8857.90800@mcs.anl.gov>
	<356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>
	<464B1402.9040405@mcs.anl.gov>
	<1583680979-1179336692-cardhu_blackberry.rim.net-2135204572-@bwe032-cell00.bisx.prod.on.blackberry>
Message-ID: <464B4A6E.2040804@mcs.anl.gov>

Ian,

you seem to be referring to the necessary /etc/hosts configuration as 
well as workers registering with the torque headnode below as 
"distractions" -- I agree they can be very distracting, but in my 
experience without these distractions a cluster (virtual or physical) 
won't work in the way such clusters are typically expected to work.

What I said in my mail is that we can set up a base cluster locally so 
that somebody like Ioan can finish the configuration (i.e., install 
Falkon on it). We will configure this cluster once and leave it deployed 
  as long as needed.

Once we have the front-end to EC2 working (which we don't have yet 
although we are close) we will deploy this cluster on EC2 and provide 
methods that will automate this last little bit of configuration that 
*always* has to be done on deployment.

I also think it is quite important that we spend the time tomorrow 
discussing what exactly we are trying to do -- right now, it looks to me 
like it might make more sense to not use clusters (it will help with the 
"distractions" if we don't).

I realize that you are eager for us to get things to run -- I am eager 
too, but I honestly think we will get there faster if we plan better.

Ian Foster wrote:
> Kate:
> 
> I personally will be delighted if you could run the virtual cluster on ec2 tomorrow. I know that there are lots of ways that you could refine its config, local expts that could be performed, etc., but perhaps we could try bypassing those things, which seem somewhat like distractions to me?
> 
> Ian
> 
> 
> Sent via BlackBerry from T-Mobile  
> 
> -----Original Message-----
> From: Kate Keahey <keahey at mcs.anl.gov>
> Date: Wed, 16 May 2007 09:24:02 
> To:itf at mcs.anl.gov
> Cc:swift-devel-bounces at ci.uchicago.edu, Ioan Raicu <iraicu at cs.uchicago.edu>,  swift-devel at ci.uchicago.edu, Borja Sotomayor <borja at borjanet.com>
> Subject: Re: [Swift-devel] swift-on-ec2
> 
> 
> 
> Ian Foster wrote:
>> Kate:
>>
>> If we configure the virtual cluster with a full LRM, as you propose (and it seems have already done--great work!), then we can use this to start Falkon executors--as we do today on regular clusters. So it seems to me that we have all we need. How about you and Ioan spend your time on Thursday running something on EC2, to make sure it sorks?
> 
> As I suggest below, I think it would be easiest if we could deploy and 
> debug a small static cluster locally first, and we can probably give it 
> a shot tomorrow. We still don't have access to the Xen nodes on TeraPort 
> (although hopefully that might change by tomorrow) but I asked Rick to 
> rebuild a couple of nodes at ANL and he did, I think for a test that 
> should give us enough resources to play with.
> 
> At the same time -- if there are multiple ways of doing this, and 
> perhaps better ways than simply using a virtual cluster, we should 
> discuss them now. It is not completely clear to me what the relationship 
> between Falkon and Swift is, and what the specific objectives are (other 
> than that dynamically provisioning resources is required). It looks at 
> this point like the objectives probably overlap with what Ioan, Borja 
> and I wanted to talk about (which I thought was a separate project, but 
> am thrilled to find out is related) so how about we come up with a 
> design tomorrow and post the notes on this list (is this a good venue 
> for that?) and then others can shoot them down.
> 
>> Regarding choice of LRM: have you looked at SGE? That is what quite a few others seem to be using.
> 
> Yes, we have. We also collaborate with others who do, as well as with 
> Sun... As you may remember, Borja did the scheduling work for his thesis 
> in the context of SGE. Last time we talked though, Torque was the 
> scheduler of choice for the virtual cluster LRM so we used that.
> 
> The usage of SGE you are referring to above -- is this in the context of 
> virtualization projects, or as LRM for various Falkon-related applications?
> 
>> Ian
>>
>>
>>
>> Sent via BlackBerry from T-Mobile  
>>
>> -----Original Message-----
>> From: Kate Keahey <keahey at mcs.anl.gov>
>> Date: Tue, 15 May 2007 23:28:07 
>> To:iraicu at cs.uchicago.edu
>> Cc:swift-devel at ci.uchicago.edu
>> Subject: Re: [Swift-devel] swift-on-ec2
>>
>> First -- this is a very useful discussion, would it be possible to see 
>> all of it. We need to understand the requirements and trade-offs in some 
>> detail to figure out the best way to make this work. I see a few 
>> different interaction threads somewhat mixed up here though so just to 
>> make sure we are all on the same wavelength, here is some context.
>>
>> Ian and I have been talking on and off about providing a workspace 
>> service implementation with EC2 backend. The benefit for that would be 
>> that users could deploy the same VMs using the same interface to either 
>> TeraPort or EC2 or yet another resource provider. The workspace service 
>> would also provide some features on top of EC2 (translating between PKI 
>> credentials and Amazon's paying accounts, contextualization as needed to 
>> make deployment dynamic). One application of interest for this was 
>> Swift. Last time we chatted about this though was in the context of 
>> using EC2 to provide a production platform for STAR runs (since 
>> virtualizing enough TeraPort to provide a production platform is taking 
>> a long time). This is what Tim and I are trying to make happen now.
>>
>> Since there was also interest in running Swift in VMs, Mike, Tibi and I 
>> met around February/March and agreed that a reasonable way to proceed 
>> will be for us to stand up a base virtual cluster somewhere locally 
>> (e.g., a static deployment on TeraPort) so that they can finish the 
>> configuration according to their needs, look at performance, figure out 
>> the best way to interact with it, and make sure that there are no 
>> VM-induced gotchas. All of this will be much easier to assess locally 
>> and on a static deployment. Then we'd make sure the cluster is 
>> dynamically deployable using the workspace service (on EC2 or whatever 
>> other provider). During the meeting (and over following emails) we 
>> agreed that the required "base cluster" would be configured with 
>> GRAM/Torque on the headnode plus a number of worker nodes, plus root 
>> privileges. We configured this cluster and it is ready to deploy. Are 
>> you saying now that in fact something different is needed?
>>
>> As Ian says, Borja and I were planning to meet with Ioan on Thursday to 
>> discuss interaction between Falkon and the workspace service (not 
>> necessarily/exclusively in the EC2 context). I don't completely 
>> understand the relationship between swift and falkon -- are there 
>> specific applications or scenarios that you are trying to target in this 
>> exercise?
>>
>> Ioan Raicu wrote:
>>> Hi,
>>> See below:
>>>
>>> Tim Freeman wrote:
>>>> On Tue, 15 May 2007 16:20:03 +0000 (GMT)
>>>> Ben Clifford <benc at hawaga.org.uk> wrote:
>>>>
>>>>  
>>>>> Ian asked about this elsewhere, but its perhaps interesting for 
>>>>> swift-devel people to look at the questions too.
>>>>>
>>>>> On Tue, 15 May 2007, Ian Foster wrote:
>>>>>
>>>>>    
>>>>>> Dear All:
>>>>>>       
>>>>>                                                                                 
>>>>>    
>>>>>> I asked Kate if she and Tim could look into creating VM images that 
>>>>>> would allow us to run Swift applications on Amazon EC2. I think Kate 
>>>>>> is meeting with Ioan about this on Thursday (?).
>>>>>>       
>>>>>                                                                                 
>>>>>    
>>>>>> One issue that I thought would be good to discuss is what we'd want 
>>>>>> in that VM image. Perhaps this is obvious to the rest of you, but it 
>>>>>> isn't to me. A few thoughts:
>>>>>>       * I'm assuming that we want to run "workers" on EC2 nodes, and 
>>>>>> have the
>>>>>> "task dispatch" logic run on some external frontend system outside EC2.
>>>>>>       * I would think that we want to use Falkon to do the task 
>>>>>> dispatch. If so,
>>>>>> we need a Falkon executor on each VM, configured to check in with 
>>>>>> the Falkon
>>>>>> dispatcher. (Alternatively, we could use, say, SGE: in that case, we 
>>>>>> would
>>>>>> want an SGE agent.)
>>>>>>       *  We need a way of getting data to and from the worker nodes. 
>>>>>> Do we want to
>>>>>> run a file system across the EC2 nodes and the external frontend 
>>>>>> node? That
>>>>>> seems rather inefficient. Other options?
>>>>>>       * Should we preload the application code on each EC2 node?
>>>>>>       
>>>>> Here's a couple of approaches:
>>>>>
>>>>>  1) swift regards all the EC2 nodes that we are paying for as a 
>>>>> single     site.
>>>>>
>>>>> Something like falkon handles all the task dispatch and worker node 
>>>>> management. I don't know what that looks like at the moment in 
>>>>> Falkon, but the interface for Swift to send jobs into Falkon sounds 
>>>>> pretty straightforward and shouldn't need changing.
>>>>>     
>>>> So if I understand, here there would be no gateway+LRM but each EC2 
>>>> node +
>>>> Falkon would need a port open to receive tasks?  Or does each node 
>>>> pull down
>>>> instructions OK from behind a firewall?
>>>>   
>>> Falkon supports both polling and notifications.  To use notifications, 
>>> there needs to be an open port on the worker :(
>>>> Is there a latency problem with running each node as an indepdent task
>>>> receiver with the dispatcher off-site from EC2?  I would think it 
>>>> would be
>>>> better to put the queues to fill with tasks on EC2 so it can more 
>>>> quickly get
>>>> the task going when a node is done with a previous task (I may be 
>>>> missing some
>>>> nuances here with respect to Falkon, don't know much about this yet!).   
>>> We have run the Falkon dispatcher at UChicago and workers at ANL without 
>>> any issues, so it can easily tolerate a few ms of latency.  We haven't 
>>> tried it across 10s of ms of latency links, but my instinct says that if 
>>> you have enough workers, you might be able to hide the latency.  We'd 
>>> have to experiment with it to see what happens.  We could potentially do 
>>> some experiments between SDSC and ANL over a 50+ ms link, and see what 
>>> difference in throughputs we get.
>>>
>>> Ioan
>>>> If a gateway node is desired, this option sounds a lot like the GRAM+LRM
>>>> situation we use on VMs with the workspace service and will soon use 
>>>> on EC2 via
>>>> the workspace EC2 gateway we're implementing.  Start up one gateway 
>>>> node and
>>>> then add compute nodes which dynamically join the pool, they are 
>>>> pointed to the
>>>> GRAM node.
>>>>
>>>>  
>>>>> All the nodes in a site are required by our site model to have a 
>>>>> shared filesystem - we've talked about removing it, but I think that 
>>>>> is still the case and if so, isn't going to change soon.     
>>>> Setting up a shared filesystem in this environment is akin to setting 
>>>> up the
>>>> compute nodes to join an LRM pool.  The VMs can communicate over the 
>>>> private
>>>> network at EC2, you can instruct EC2 to let all the nodes be open to 
>>>> each other
>>>> (while simultaneously keeping a separate policy of blocking ports from 
>>>> being
>>>> open from the internet and other people's EC2 nodes).  The 
>>>> non-file-serving
>>>> nodes would simply need to know the private address of the filesystem 
>>>> server
>>>> (unless you are using a fancier network file system than NFS-style ones).
>>>> For background: every VM on EC2 currently gets a public address -- 
>>>> NAT'd to a
>>>> private address which is actually what the VM's one NIC is configured 
>>>> with.
>>>> There is a facility to open/forward specific network ports on the public
>>>> address to each VM either via a group policy or on a VM by VM basis.
>>>>
>>>> [...]  
>>>>> Amazon also has a storage cloud, alongside its compute cloud. I know 
>>>>> very little about that and have never thought about how it would fit 
>>>>> into the above (if at all). Maybe someone else knows more.
>>>>>     
>>>> A VM template on EC2 is called an AMI which stands for Amazon Machine 
>>>> Image.
>>>> This is just a packaging thing but what it mostly means is that the VM is
>>>> stored on S3 and also registered into the EC2 system.
>>>>
>>>> When starting an instance of an AMI, the file is copied from S3 to the
>>>> hypervisor node (what we call propagation in the workspace service).  
>>>> After it
>>>> is used, this file is deleted (an option in the workspace service but 
>>>> there is
>>>> also an option to save it back with any changes). 
>>>> So the VMs are stored in S3 but anything that happens on them after being
>>>> started is lost unless you manually do something about it.
>>>>
>>>> As for free scratch space, you get a good amount per node, 140G.  But 
>>>> the node
>>>> could go down at any moment just like a physical resource.
>>>>
>>>> To harness S3 for safely persisting any data (or if you need more 
>>>> space) you
>>>> would need to actually run S3 clients on the VMs when they are run on 
>>>> EC2.  You
>>>> could alternatively mirror data between nodes assuming that all would 
>>>> not go
>>>> down at once.
>>>> The good thing is that you do not pay transfer costs between S3 and 
>>>> EC2 if you
>>>> chose to use S3 for big storage, you would only pay the "housing fees" 
>>>> so to
>>>> speak.
>>>> Tim
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>
>>>>   
> 

-- 

Kate Keahey,
Mathematics & CS Division, Argonne National Laboratory
Computation Institute, University of Chicago


From nefedova at mcs.anl.gov  Wed May 16 15:07:50 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Wed, 16 May 2007 15:07:50 -0500
Subject: [Swift-devel] Teragrid usage
Message-ID: <EF2846E6-E97F-4A62-AF3F-7ED481AFD1C8@mcs.anl.gov>

Hi,

I checked my Teragrid accounts and it looks like the Swift's  
allocation is almost completely used by now (or is it just for me ?):

Account: TG-CDA060004T
Title: TeraGrid:  Development Account for Multiple Grid Science Projects
Resource: teragrid_roaming
Allocation Period: 2006-08-30 to 2007-08-31

Name (Last First) or Account       Total      Remaining        Usage
----------------------------     ----------  ------------   ----------
    Nefedova  Veronika             30000 SU         0 SU     27491 SU
----------------------------------------------------------------------

Fortunately, Benoit has added me to his group's allocation - so I can  
continue testing on TG. But it looks like Swift's allocation is  
almost gone... Should we renew it ?

Nika


From foster at mcs.anl.gov  Wed May 16 15:19:18 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Wed, 16 May 2007 15:19:18 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464B4A6E.2040804@mcs.anl.gov>
References: <4649D280.5080906@mcs.anl.gov>	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk><20070515154500.ad1600bf.tfreeman@mcs.anl.gov><464A24AF.7080801@cs.uchicago.edu>
	<464A8857.90800@mcs.anl.gov>
	<356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>
	<464B1402.9040405@mcs.anl.gov>
	<1583680979-1179336692-cardhu_blackberry.rim.net-2135204572-@bwe032-cell00.bisx.prod.on.blackberry>
	<464B4A6E.2040804@mcs.anl.gov>
Message-ID: <464B6746.7050907@mcs.anl.gov>

Kate:

I want to emphasize that I was *not* dismissing the issues below as 
distractions.

What I meant was: given that you are working on developing a "virtual 
cluster", which I am pretty sure will be able to execute Swift apps, 
let's focus on getting that done, rather than worrying about "special 
casing" it for Falkon, adding dynamic node acquisition, or the other 
things that people started discussing as potential extensions.

I understand from our IM conversation today that the "virtual cluster" 
is ready for us in a "static environment" such as some machines in our 
lab. In a "dynamic environment" such as EC2, it is not quite ready for 
use yet. Thus, you won't be able to get Swift running on EC2 tomorrow.

Ian.


Kate Keahey wrote:
> Ian,
>
> you seem to be referring to the necessary /etc/hosts configuration as 
> well as workers registering with the torque headnode below as 
> "distractions" -- I agree they can be very distracting, but in my 
> experience without these distractions a cluster (virtual or physical) 
> won't work in the way such clusters are typically expected to work.
>
> What I said in my mail is that we can set up a base cluster locally so 
> that somebody like Ioan can finish the configuration (i.e., install 
> Falkon on it). We will configure this cluster once and leave it 
> deployed  as long as needed.
>
> Once we have the front-end to EC2 working (which we don't have yet 
> although we are close) we will deploy this cluster on EC2 and provide 
> methods that will automate this last little bit of configuration that 
> *always* has to be done on deployment.
>
> I also think it is quite important that we spend the time tomorrow 
> discussing what exactly we are trying to do -- right now, it looks to 
> me like it might make more sense to not use clusters (it will help 
> with the "distractions" if we don't).
>
> I realize that you are eager for us to get things to run -- I am eager 
> too, but I honestly think we will get there faster if we plan better.
>
> Ian Foster wrote:
>> Kate:
>>
>> I personally will be delighted if you could run the virtual cluster 
>> on ec2 tomorrow. I know that there are lots of ways that you could 
>> refine its config, local expts that could be performed, etc., but 
>> perhaps we could try bypassing those things, which seem somewhat like 
>> distractions to me?
>>
>> Ian
>>
>>
>> Sent via BlackBerry from T-Mobile 
>> -----Original Message-----
>> From: Kate Keahey <keahey at mcs.anl.gov>
>> Date: Wed, 16 May 2007 09:24:02 To:itf at mcs.anl.gov
>> Cc:swift-devel-bounces at ci.uchicago.edu, Ioan Raicu 
>> <iraicu at cs.uchicago.edu>,  swift-devel at ci.uchicago.edu, Borja 
>> Sotomayor <borja at borjanet.com>
>> Subject: Re: [Swift-devel] swift-on-ec2
>>
>>
>>
>> Ian Foster wrote:
>>> Kate:
>>>
>>> If we configure the virtual cluster with a full LRM, as you propose 
>>> (and it seems have already done--great work!), then we can use this 
>>> to start Falkon executors--as we do today on regular clusters. So it 
>>> seems to me that we have all we need. How about you and Ioan spend 
>>> your time on Thursday running something on EC2, to make sure it sorks?
>>
>> As I suggest below, I think it would be easiest if we could deploy 
>> and debug a small static cluster locally first, and we can probably 
>> give it a shot tomorrow. We still don't have access to the Xen nodes 
>> on TeraPort (although hopefully that might change by tomorrow) but I 
>> asked Rick to rebuild a couple of nodes at ANL and he did, I think 
>> for a test that should give us enough resources to play with.
>>
>> At the same time -- if there are multiple ways of doing this, and 
>> perhaps better ways than simply using a virtual cluster, we should 
>> discuss them now. It is not completely clear to me what the 
>> relationship between Falkon and Swift is, and what the specific 
>> objectives are (other than that dynamically provisioning resources is 
>> required). It looks at this point like the objectives probably 
>> overlap with what Ioan, Borja and I wanted to talk about (which I 
>> thought was a separate project, but am thrilled to find out is 
>> related) so how about we come up with a design tomorrow and post the 
>> notes on this list (is this a good venue for that?) and then others 
>> can shoot them down.
>>
>>> Regarding choice of LRM: have you looked at SGE? That is what quite 
>>> a few others seem to be using.
>>
>> Yes, we have. We also collaborate with others who do, as well as with 
>> Sun... As you may remember, Borja did the scheduling work for his 
>> thesis in the context of SGE. Last time we talked though, Torque was 
>> the scheduler of choice for the virtual cluster LRM so we used that.
>>
>> The usage of SGE you are referring to above -- is this in the context 
>> of virtualization projects, or as LRM for various Falkon-related 
>> applications?
>>
>>> Ian
>>>
>>>
>>>
>>> Sent via BlackBerry from T-Mobile 
>>> -----Original Message-----
>>> From: Kate Keahey <keahey at mcs.anl.gov>
>>> Date: Tue, 15 May 2007 23:28:07 To:iraicu at cs.uchicago.edu
>>> Cc:swift-devel at ci.uchicago.edu
>>> Subject: Re: [Swift-devel] swift-on-ec2
>>>
>>> First -- this is a very useful discussion, would it be possible to 
>>> see all of it. We need to understand the requirements and trade-offs 
>>> in some detail to figure out the best way to make this work. I see a 
>>> few different interaction threads somewhat mixed up here though so 
>>> just to make sure we are all on the same wavelength, here is some 
>>> context.
>>>
>>> Ian and I have been talking on and off about providing a workspace 
>>> service implementation with EC2 backend. The benefit for that would 
>>> be that users could deploy the same VMs using the same interface to 
>>> either TeraPort or EC2 or yet another resource provider. The 
>>> workspace service would also provide some features on top of EC2 
>>> (translating between PKI credentials and Amazon's paying accounts, 
>>> contextualization as needed to make deployment dynamic). One 
>>> application of interest for this was Swift. Last time we chatted 
>>> about this though was in the context of using EC2 to provide a 
>>> production platform for STAR runs (since virtualizing enough 
>>> TeraPort to provide a production platform is taking a long time). 
>>> This is what Tim and I are trying to make happen now.
>>>
>>> Since there was also interest in running Swift in VMs, Mike, Tibi 
>>> and I met around February/March and agreed that a reasonable way to 
>>> proceed will be for us to stand up a base virtual cluster somewhere 
>>> locally (e.g., a static deployment on TeraPort) so that they can 
>>> finish the configuration according to their needs, look at 
>>> performance, figure out the best way to interact with it, and make 
>>> sure that there are no VM-induced gotchas. All of this will be much 
>>> easier to assess locally and on a static deployment. Then we'd make 
>>> sure the cluster is dynamically deployable using the workspace 
>>> service (on EC2 or whatever other provider). During the meeting (and 
>>> over following emails) we agreed that the required "base cluster" 
>>> would be configured with GRAM/Torque on the headnode plus a number 
>>> of worker nodes, plus root privileges. We configured this cluster 
>>> and it is ready to deploy. Are you saying now that in fact something 
>>> different is needed?
>>>
>>> As Ian says, Borja and I were planning to meet with Ioan on Thursday 
>>> to discuss interaction between Falkon and the workspace service (not 
>>> necessarily/exclusively in the EC2 context). I don't completely 
>>> understand the relationship between swift and falkon -- are there 
>>> specific applications or scenarios that you are trying to target in 
>>> this exercise?
>>>
>>> Ioan Raicu wrote:
>>>> Hi,
>>>> See below:
>>>>
>>>> Tim Freeman wrote:
>>>>> On Tue, 15 May 2007 16:20:03 +0000 (GMT)
>>>>> Ben Clifford <benc at hawaga.org.uk> wrote:
>>>>>
>>>>>  
>>>>>> Ian asked about this elsewhere, but its perhaps interesting for 
>>>>>> swift-devel people to look at the questions too.
>>>>>>
>>>>>> On Tue, 15 May 2007, Ian Foster wrote:
>>>>>>
>>>>>>   
>>>>>>> Dear All:
>>>>>>>       
>>>>>>                                                                                 
>>>>>>   
>>>>>>> I asked Kate if she and Tim could look into creating VM images 
>>>>>>> that would allow us to run Swift applications on Amazon EC2. I 
>>>>>>> think Kate is meeting with Ioan about this on Thursday (?).
>>>>>>>       
>>>>>>                                                                                 
>>>>>>   
>>>>>>> One issue that I thought would be good to discuss is what we'd 
>>>>>>> want in that VM image. Perhaps this is obvious to the rest of 
>>>>>>> you, but it isn't to me. A few thoughts:
>>>>>>>       * I'm assuming that we want to run "workers" on EC2 nodes, 
>>>>>>> and have the
>>>>>>> "task dispatch" logic run on some external frontend system 
>>>>>>> outside EC2.
>>>>>>>       * I would think that we want to use Falkon to do the task 
>>>>>>> dispatch. If so,
>>>>>>> we need a Falkon executor on each VM, configured to check in 
>>>>>>> with the Falkon
>>>>>>> dispatcher. (Alternatively, we could use, say, SGE: in that 
>>>>>>> case, we would
>>>>>>> want an SGE agent.)
>>>>>>>       *  We need a way of getting data to and from the worker 
>>>>>>> nodes. Do we want to
>>>>>>> run a file system across the EC2 nodes and the external frontend 
>>>>>>> node? That
>>>>>>> seems rather inefficient. Other options?
>>>>>>>       * Should we preload the application code on each EC2 node?
>>>>>>>       
>>>>>> Here's a couple of approaches:
>>>>>>
>>>>>>  1) swift regards all the EC2 nodes that we are paying for as a 
>>>>>> single     site.
>>>>>>
>>>>>> Something like falkon handles all the task dispatch and worker 
>>>>>> node management. I don't know what that looks like at the moment 
>>>>>> in Falkon, but the interface for Swift to send jobs into Falkon 
>>>>>> sounds pretty straightforward and shouldn't need changing.
>>>>>>     
>>>>> So if I understand, here there would be no gateway+LRM but each 
>>>>> EC2 node +
>>>>> Falkon would need a port open to receive tasks?  Or does each node 
>>>>> pull down
>>>>> instructions OK from behind a firewall?
>>>>>   
>>>> Falkon supports both polling and notifications.  To use 
>>>> notifications, there needs to be an open port on the worker :(
>>>>> Is there a latency problem with running each node as an indepdent 
>>>>> task
>>>>> receiver with the dispatcher off-site from EC2?  I would think it 
>>>>> would be
>>>>> better to put the queues to fill with tasks on EC2 so it can more 
>>>>> quickly get
>>>>> the task going when a node is done with a previous task (I may be 
>>>>> missing some
>>>>> nuances here with respect to Falkon, don't know much about this 
>>>>> yet!).   
>>>> We have run the Falkon dispatcher at UChicago and workers at ANL 
>>>> without any issues, so it can easily tolerate a few ms of latency.  
>>>> We haven't tried it across 10s of ms of latency links, but my 
>>>> instinct says that if you have enough workers, you might be able to 
>>>> hide the latency.  We'd have to experiment with it to see what 
>>>> happens.  We could potentially do some experiments between SDSC and 
>>>> ANL over a 50+ ms link, and see what difference in throughputs we get.
>>>>
>>>> Ioan
>>>>> If a gateway node is desired, this option sounds a lot like the 
>>>>> GRAM+LRM
>>>>> situation we use on VMs with the workspace service and will soon 
>>>>> use on EC2 via
>>>>> the workspace EC2 gateway we're implementing.  Start up one 
>>>>> gateway node and
>>>>> then add compute nodes which dynamically join the pool, they are 
>>>>> pointed to the
>>>>> GRAM node.
>>>>>
>>>>>  
>>>>>> All the nodes in a site are required by our site model to have a 
>>>>>> shared filesystem - we've talked about removing it, but I think 
>>>>>> that is still the case and if so, isn't going to change soon.     
>>>>> Setting up a shared filesystem in this environment is akin to 
>>>>> setting up the
>>>>> compute nodes to join an LRM pool.  The VMs can communicate over 
>>>>> the private
>>>>> network at EC2, you can instruct EC2 to let all the nodes be open 
>>>>> to each other
>>>>> (while simultaneously keeping a separate policy of blocking ports 
>>>>> from being
>>>>> open from the internet and other people's EC2 nodes).  The 
>>>>> non-file-serving
>>>>> nodes would simply need to know the private address of the 
>>>>> filesystem server
>>>>> (unless you are using a fancier network file system than NFS-style 
>>>>> ones).
>>>>> For background: every VM on EC2 currently gets a public address -- 
>>>>> NAT'd to a
>>>>> private address which is actually what the VM's one NIC is 
>>>>> configured with.
>>>>> There is a facility to open/forward specific network ports on the 
>>>>> public
>>>>> address to each VM either via a group policy or on a VM by VM basis.
>>>>>
>>>>> [...] 
>>>>>> Amazon also has a storage cloud, alongside its compute cloud. I 
>>>>>> know very little about that and have never thought about how it 
>>>>>> would fit into the above (if at all). Maybe someone else knows more.
>>>>>>     
>>>>> A VM template on EC2 is called an AMI which stands for Amazon 
>>>>> Machine Image.
>>>>> This is just a packaging thing but what it mostly means is that 
>>>>> the VM is
>>>>> stored on S3 and also registered into the EC2 system.
>>>>>
>>>>> When starting an instance of an AMI, the file is copied from S3 to 
>>>>> the
>>>>> hypervisor node (what we call propagation in the workspace 
>>>>> service).  After it
>>>>> is used, this file is deleted (an option in the workspace service 
>>>>> but there is
>>>>> also an option to save it back with any changes). So the VMs are 
>>>>> stored in S3 but anything that happens on them after being
>>>>> started is lost unless you manually do something about it.
>>>>>
>>>>> As for free scratch space, you get a good amount per node, 140G.  
>>>>> But the node
>>>>> could go down at any moment just like a physical resource.
>>>>>
>>>>> To harness S3 for safely persisting any data (or if you need more 
>>>>> space) you
>>>>> would need to actually run S3 clients on the VMs when they are run 
>>>>> on EC2.  You
>>>>> could alternatively mirror data between nodes assuming that all 
>>>>> would not go
>>>>> down at once.
>>>>> The good thing is that you do not pay transfer costs between S3 
>>>>> and EC2 if you
>>>>> chose to use S3 for big storage, you would only pay the "housing 
>>>>> fees" so to
>>>>> speak.
>>>>> Tim
>>>>> _______________________________________________
>>>>> Swift-devel mailing list
>>>>> Swift-devel at ci.uchicago.edu
>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>
>>>>>   
>>
>

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.


From foster at mcs.anl.gov  Wed May 16 15:22:09 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Wed, 16 May 2007 15:22:09 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464B4A6E.2040804@mcs.anl.gov>
References: <4649D280.5080906@mcs.anl.gov>	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk><20070515154500.ad1600bf.tfreeman@mcs.anl.gov><464A24AF.7080801@cs.uchicago.edu>
	<464A8857.90800@mcs.anl.gov>
	<356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>
	<464B1402.9040405@mcs.anl.gov>
	<1583680979-1179336692-cardhu_blackberry.rim.net-2135204572-@bwe032-cell00.bisx.prod.on.blackberry>
	<464B4A6E.2040804@mcs.anl.gov>
Message-ID: <464B67F1.5060408@mcs.anl.gov>

The people using SGE were just using it as a LRM, I think.

Ian.
>>> Regarding choice of LRM: have you looked at SGE? That is what quite 
>>> a few others seem to be using.
>>
>> Yes, we have. We also collaborate with others who do, as well as with 
>> Sun... As you may remember, Borja did the scheduling work for his 
>> thesis in the context of SGE. Last time we talked though, Torque was 
>> the scheduler of choice for the virtual cluster LRM so we used that.
>>
>> The usage of SGE you are referring to above -- is this in the context 
>> of virtualization projects, or as LRM for various Falkon-related 
>> applications?
>>


From benc at hawaga.org.uk  Wed May 16 15:44:48 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 16 May 2007 20:44:48 +0000 (GMT)
Subject: [Swift-devel] Teragrid usage
In-Reply-To: <EF2846E6-E97F-4A62-AF3F-7ED481AFD1C8@mcs.anl.gov>
References: <EF2846E6-E97F-4A62-AF3F-7ED481AFD1C8@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705162044290.20212@dildano.hawaga.org.uk>


On Wed, 16 May 2007, Veronika Nefedova wrote:

> I checked my Teragrid accounts and it looks like the Swift's allocation is
> almost completely used by now (or is it just for me ?):


I show different figures, that suggest that yes, that account is empty.

-- 


From benc at hawaga.org.uk  Wed May 16 16:17:58 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 16 May 2007 21:17:58 +0000 (GMT)
Subject: [Swift-devel] mappers on files that are inputs and outputs
In-Reply-To: <1179330195.4473.0.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0705161506410.22628@dildano.hawaga.org.uk> 
	<1179329342.4368.0.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0705161536450.22628@dildano.hawaga.org.uk>
	<1179330195.4473.0.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705162100150.22628@dildano.hawaga.org.uk>


On Wed, 16 May 2007, Mihael Hategan wrote:

> The translator does that bit. You hacked the translated file, but
> incompletely.

I used the translated file as it came out of Karajan.java - no manual 
editing.

So being worried that it had got broken, I made a test case that I think 
is the demonstrating my problem, and tried on r740, r625 and r101 (those 
being an even spread over the evolution of Karajan.java) and get 
essentially the same results with all three of those versions (modulo 
output format changes).

I tried the following two programs on each of the above:

working:

   string m <simple_mapper;prefix="map1">;

   string f = @filename(m);

   print(f);

(it outputs map1)

not working:
   
   type foo {
     string txt;
   }

   foo m <simple_mapper;prefix="map1">;

   string f = @filename(m.txt);

   print(f);

(it gives the error I pasted before)

-- 


From wilde at mcs.anl.gov  Wed May 16 18:05:35 2007
From: wilde at mcs.anl.gov (Mike Wilde)
Date: Wed, 16 May 2007 18:05:35 -0500
Subject: [Swift-devel] Teragrid usage
In-Reply-To: <Pine.LNX.4.64.0705162044290.20212@dildano.hawaga.org.uk>
References: <EF2846E6-E97F-4A62-AF3F-7ED481AFD1C8@mcs.anl.gov>
	<Pine.LNX.4.64.0705162044290.20212@dildano.hawaga.org.uk>
Message-ID: <464B8E3F.7020106@mcs.anl.gov>

Oi.  I'll see what I can do.

- Mike

Ben Clifford wrote, On 5/16/2007 3:44 PM:
> 
> On Wed, 16 May 2007, Veronika Nefedova wrote:
> 
>> I checked my Teragrid accounts and it looks like the Swift's allocation is
>> almost completely used by now (or is it just for me ?):
> 
> 
> I show different figures, that suggest that yes, that account is empty.
> 

-- 
Mike Wilde
Computation Institute, University of Chicago
Math & Computer Science Division
Argonne National Laboratory
Argonne, IL   60439    USA
tel 630-252-7497 fax 630-252-1997


From benc at hawaga.org.uk  Thu May 17 05:08:19 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 17 May 2007 10:08:19 +0000 (GMT)
Subject: [Swift-devel] mappers on files that are inputs and outputs
In-Reply-To: <1179329342.4368.0.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0705161506410.22628@dildano.hawaga.org.uk>
	<1179329342.4368.0.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705171004360.22628@dildano.hawaga.org.uk>


On Wed, 16 May 2007, Mihael Hategan wrote:

> You should probably also add the input=true mapping parameter?

If I *remove* the input=true mapping parameter that the translater puts 
there, it works (which is consistent, I suppose, with this working when 
used as an output).

This is in bug 60, http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=60 
and I'll poke round at it more later - I can work round by using the CSV 
mapper for now.

-- 


From benc at hawaga.org.uk  Thu May 17 08:03:28 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 17 May 2007 13:03:28 +0000 (GMT)
Subject: [Swift-devel] tutorial code snippets look bad in Internet Explorer
Message-ID: <Pine.LNX.4.64.0705171302030.22628@dildano.hawaga.org.uk>


On at least one machine that I've seen, the code snippets at 
http://www.ci.uchicago.edu/swift/guides/tutorial.php come out all on one 
line. Does that happen for anyone here with that browser?

-- 


From hategan at mcs.anl.gov  Thu May 17 08:11:55 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 17 May 2007 16:11:55 +0300
Subject: [Swift-devel] tutorial code snippets look bad in Internet Explorer
In-Reply-To: <Pine.LNX.4.64.0705171302030.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705171302030.22628@dildano.hawaga.org.uk>
Message-ID: <1179407515.26179.0.camel@blabla.mcs.anl.gov>

Could be either the syntax highlighting or IE being what it is. Try
disabling javascript and see if it helps.

Mihael

On Thu, 2007-05-17 at 13:03 +0000, Ben Clifford wrote:
> On at least one machine that I've seen, the code snippets at 
> http://www.ci.uchicago.edu/swift/guides/tutorial.php come out all on one 
> line. Does that happen for anyone here with that browser?
> 


From keahey at mcs.anl.gov  Thu May 17 09:42:30 2007
From: keahey at mcs.anl.gov (Kate Keahey)
Date: Thu, 17 May 2007 09:42:30 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464B6746.7050907@mcs.anl.gov>
References: <4649D280.5080906@mcs.anl.gov>	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk><20070515154500.ad1600bf.tfreeman@mcs.anl.gov><464A24AF.7080801@cs.uchicago.edu>
	<464A8857.90800@mcs.anl.gov>
	<356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>
	<464B1402.9040405@mcs.anl.gov>
	<1583680979-1179336692-cardhu_blackberry.rim.net-2135204572-@bwe032-cell00.bisx.prod.on.blackberry>
	<464B4A6E.2040804@mcs.anl.gov> <464B6746.7050907@mcs.anl.gov>
Message-ID: <464C69D6.70909@mcs.anl.gov>


Ian Foster wrote:
> Kate:
> 
> I want to emphasize that I was *not* dismissing the issues below as 
> distractions.
> 
> What I meant was: given that you are working on developing a "virtual 
> cluster", which I am pretty sure will be able to execute Swift apps, 
> let's focus on getting that done, rather than worrying about "special 
> casing" it for Falkon, adding dynamic node acquisition, or the other 
> things that people started discussing as potential extensions.

We only now really began to discuss how to use VMs with Swift/Falkon -- 
the original set of issues you posted was just what was needed, it 
clearly inspired a very good discussion, and made me realize that I 
should have been talking to a wider set of people about this. Please, 
don't go back on us now... It also looks to me like there may be 
solutions that will make more sense both from the perspective of the 
architecture and will also be easier to implement with the current state 
of virtualization tools. For example, if we can set up Falkon to 
provision single nodes operating in pull mode (pulling work from a 
"master") various contextualization issues will have become much easier.

> 
> I understand from our IM conversation today that the "virtual cluster" 
> is ready for us in a "static environment" such as some machines in our 
> lab. In a "dynamic environment" such as EC2, it is not quite ready for 
> use yet. Thus, you won't be able to get Swift running on EC2 tomorrow.

This is not quite accurate; static refers to statically assigned IPs -- 
we have control over our IPs and can assign them to the cluster nodes in 
the same way each time we deploy it. Amazon will choose new IPs for the 
nodes each time the cluster is deployed, so each time the configuration 
of the cluster will have to be adjusted to reflect different IP 
assignment to the nodes (but if we were to change the IPs on the cluster 
nodes in a local environment we would be just as dynamic).

But if you deploy just one node (e.g., a node operating in the pull mode 
as in the example above) the need for this configuration adjustment may 
go away (depending on what the node does) so everything may become much 
simpler.

We can spend some time looking at deploying a VM on EC2 if it is of 
interest (as well as deploying a VM via the workspace service if that is 
of interest), we can run things on the deployed VM, etc. But I 
*strongly* argue that we spend at least some time defining what we want 
from this project, what is realistic to have in the short-term, what 
will be hard/impossible/inconvenient and try to build it systematically. 
Then we can figure out who does what and by when this is going to be done.


> 
> Ian.
> 
> 
> Kate Keahey wrote:
>> Ian,
>>
>> you seem to be referring to the necessary /etc/hosts configuration as 
>> well as workers registering with the torque headnode below as 
>> "distractions" -- I agree they can be very distracting, but in my 
>> experience without these distractions a cluster (virtual or physical) 
>> won't work in the way such clusters are typically expected to work.
>>
>> What I said in my mail is that we can set up a base cluster locally so 
>> that somebody like Ioan can finish the configuration (i.e., install 
>> Falkon on it). We will configure this cluster once and leave it 
>> deployed  as long as needed.
>>
>> Once we have the front-end to EC2 working (which we don't have yet 
>> although we are close) we will deploy this cluster on EC2 and provide 
>> methods that will automate this last little bit of configuration that 
>> *always* has to be done on deployment.
>>
>> I also think it is quite important that we spend the time tomorrow 
>> discussing what exactly we are trying to do -- right now, it looks to 
>> me like it might make more sense to not use clusters (it will help 
>> with the "distractions" if we don't).
>>
>> I realize that you are eager for us to get things to run -- I am eager 
>> too, but I honestly think we will get there faster if we plan better.
>>
>> Ian Foster wrote:
>>> Kate:
>>>
>>> I personally will be delighted if you could run the virtual cluster 
>>> on ec2 tomorrow. I know that there are lots of ways that you could 
>>> refine its config, local expts that could be performed, etc., but 
>>> perhaps we could try bypassing those things, which seem somewhat like 
>>> distractions to me?
>>>
>>> Ian
>>>
>>>
>>> Sent via BlackBerry from T-Mobile -----Original Message-----
>>> From: Kate Keahey <keahey at mcs.anl.gov>
>>> Date: Wed, 16 May 2007 09:24:02 To:itf at mcs.anl.gov
>>> Cc:swift-devel-bounces at ci.uchicago.edu, Ioan Raicu 
>>> <iraicu at cs.uchicago.edu>,  swift-devel at ci.uchicago.edu, Borja 
>>> Sotomayor <borja at borjanet.com>
>>> Subject: Re: [Swift-devel] swift-on-ec2
>>>
>>>
>>>
>>> Ian Foster wrote:
>>>> Kate:
>>>>
>>>> If we configure the virtual cluster with a full LRM, as you propose 
>>>> (and it seems have already done--great work!), then we can use this 
>>>> to start Falkon executors--as we do today on regular clusters. So it 
>>>> seems to me that we have all we need. How about you and Ioan spend 
>>>> your time on Thursday running something on EC2, to make sure it sorks?
>>>
>>> As I suggest below, I think it would be easiest if we could deploy 
>>> and debug a small static cluster locally first, and we can probably 
>>> give it a shot tomorrow. We still don't have access to the Xen nodes 
>>> on TeraPort (although hopefully that might change by tomorrow) but I 
>>> asked Rick to rebuild a couple of nodes at ANL and he did, I think 
>>> for a test that should give us enough resources to play with.
>>>
>>> At the same time -- if there are multiple ways of doing this, and 
>>> perhaps better ways than simply using a virtual cluster, we should 
>>> discuss them now. It is not completely clear to me what the 
>>> relationship between Falkon and Swift is, and what the specific 
>>> objectives are (other than that dynamically provisioning resources is 
>>> required). It looks at this point like the objectives probably 
>>> overlap with what Ioan, Borja and I wanted to talk about (which I 
>>> thought was a separate project, but am thrilled to find out is 
>>> related) so how about we come up with a design tomorrow and post the 
>>> notes on this list (is this a good venue for that?) and then others 
>>> can shoot them down.
>>>
>>>> Regarding choice of LRM: have you looked at SGE? That is what quite 
>>>> a few others seem to be using.
>>>
>>> Yes, we have. We also collaborate with others who do, as well as with 
>>> Sun... As you may remember, Borja did the scheduling work for his 
>>> thesis in the context of SGE. Last time we talked though, Torque was 
>>> the scheduler of choice for the virtual cluster LRM so we used that.
>>>
>>> The usage of SGE you are referring to above -- is this in the context 
>>> of virtualization projects, or as LRM for various Falkon-related 
>>> applications?
>>>
>>>> Ian
>>>>
>>>>
>>>>
>>>> Sent via BlackBerry from T-Mobile -----Original Message-----
>>>> From: Kate Keahey <keahey at mcs.anl.gov>
>>>> Date: Tue, 15 May 2007 23:28:07 To:iraicu at cs.uchicago.edu
>>>> Cc:swift-devel at ci.uchicago.edu
>>>> Subject: Re: [Swift-devel] swift-on-ec2
>>>>
>>>> First -- this is a very useful discussion, would it be possible to 
>>>> see all of it. We need to understand the requirements and trade-offs 
>>>> in some detail to figure out the best way to make this work. I see a 
>>>> few different interaction threads somewhat mixed up here though so 
>>>> just to make sure we are all on the same wavelength, here is some 
>>>> context.
>>>>
>>>> Ian and I have been talking on and off about providing a workspace 
>>>> service implementation with EC2 backend. The benefit for that would 
>>>> be that users could deploy the same VMs using the same interface to 
>>>> either TeraPort or EC2 or yet another resource provider. The 
>>>> workspace service would also provide some features on top of EC2 
>>>> (translating between PKI credentials and Amazon's paying accounts, 
>>>> contextualization as needed to make deployment dynamic). One 
>>>> application of interest for this was Swift. Last time we chatted 
>>>> about this though was in the context of using EC2 to provide a 
>>>> production platform for STAR runs (since virtualizing enough 
>>>> TeraPort to provide a production platform is taking a long time). 
>>>> This is what Tim and I are trying to make happen now.
>>>>
>>>> Since there was also interest in running Swift in VMs, Mike, Tibi 
>>>> and I met around February/March and agreed that a reasonable way to 
>>>> proceed will be for us to stand up a base virtual cluster somewhere 
>>>> locally (e.g., a static deployment on TeraPort) so that they can 
>>>> finish the configuration according to their needs, look at 
>>>> performance, figure out the best way to interact with it, and make 
>>>> sure that there are no VM-induced gotchas. All of this will be much 
>>>> easier to assess locally and on a static deployment. Then we'd make 
>>>> sure the cluster is dynamically deployable using the workspace 
>>>> service (on EC2 or whatever other provider). During the meeting (and 
>>>> over following emails) we agreed that the required "base cluster" 
>>>> would be configured with GRAM/Torque on the headnode plus a number 
>>>> of worker nodes, plus root privileges. We configured this cluster 
>>>> and it is ready to deploy. Are you saying now that in fact something 
>>>> different is needed?
>>>>
>>>> As Ian says, Borja and I were planning to meet with Ioan on Thursday 
>>>> to discuss interaction between Falkon and the workspace service (not 
>>>> necessarily/exclusively in the EC2 context). I don't completely 
>>>> understand the relationship between swift and falkon -- are there 
>>>> specific applications or scenarios that you are trying to target in 
>>>> this exercise?
>>>>
>>>> Ioan Raicu wrote:
>>>>> Hi,
>>>>> See below:
>>>>>
>>>>> Tim Freeman wrote:
>>>>>> On Tue, 15 May 2007 16:20:03 +0000 (GMT)
>>>>>> Ben Clifford <benc at hawaga.org.uk> wrote:
>>>>>>
>>>>>>  
>>>>>>> Ian asked about this elsewhere, but its perhaps interesting for 
>>>>>>> swift-devel people to look at the questions too.
>>>>>>>
>>>>>>> On Tue, 15 May 2007, Ian Foster wrote:
>>>>>>>
>>>>>>>  
>>>>>>>> Dear All:
>>>>>>>>       
>>>>>>>                                                                                 
>>>>>>>  
>>>>>>>> I asked Kate if she and Tim could look into creating VM images 
>>>>>>>> that would allow us to run Swift applications on Amazon EC2. I 
>>>>>>>> think Kate is meeting with Ioan about this on Thursday (?).
>>>>>>>>       
>>>>>>>                                                                                 
>>>>>>>  
>>>>>>>> One issue that I thought would be good to discuss is what we'd 
>>>>>>>> want in that VM image. Perhaps this is obvious to the rest of 
>>>>>>>> you, but it isn't to me. A few thoughts:
>>>>>>>>       * I'm assuming that we want to run "workers" on EC2 nodes, 
>>>>>>>> and have the
>>>>>>>> "task dispatch" logic run on some external frontend system 
>>>>>>>> outside EC2.
>>>>>>>>       * I would think that we want to use Falkon to do the task 
>>>>>>>> dispatch. If so,
>>>>>>>> we need a Falkon executor on each VM, configured to check in 
>>>>>>>> with the Falkon
>>>>>>>> dispatcher. (Alternatively, we could use, say, SGE: in that 
>>>>>>>> case, we would
>>>>>>>> want an SGE agent.)
>>>>>>>>       *  We need a way of getting data to and from the worker 
>>>>>>>> nodes. Do we want to
>>>>>>>> run a file system across the EC2 nodes and the external frontend 
>>>>>>>> node? That
>>>>>>>> seems rather inefficient. Other options?
>>>>>>>>       * Should we preload the application code on each EC2 node?
>>>>>>>>       
>>>>>>> Here's a couple of approaches:
>>>>>>>
>>>>>>>  1) swift regards all the EC2 nodes that we are paying for as a 
>>>>>>> single     site.
>>>>>>>
>>>>>>> Something like falkon handles all the task dispatch and worker 
>>>>>>> node management. I don't know what that looks like at the moment 
>>>>>>> in Falkon, but the interface for Swift to send jobs into Falkon 
>>>>>>> sounds pretty straightforward and shouldn't need changing.
>>>>>>>     
>>>>>> So if I understand, here there would be no gateway+LRM but each 
>>>>>> EC2 node +
>>>>>> Falkon would need a port open to receive tasks?  Or does each node 
>>>>>> pull down
>>>>>> instructions OK from behind a firewall?
>>>>>>   
>>>>> Falkon supports both polling and notifications.  To use 
>>>>> notifications, there needs to be an open port on the worker :(
>>>>>> Is there a latency problem with running each node as an indepdent 
>>>>>> task
>>>>>> receiver with the dispatcher off-site from EC2?  I would think it 
>>>>>> would be
>>>>>> better to put the queues to fill with tasks on EC2 so it can more 
>>>>>> quickly get
>>>>>> the task going when a node is done with a previous task (I may be 
>>>>>> missing some
>>>>>> nuances here with respect to Falkon, don't know much about this 
>>>>>> yet!).   
>>>>> We have run the Falkon dispatcher at UChicago and workers at ANL 
>>>>> without any issues, so it can easily tolerate a few ms of latency.  
>>>>> We haven't tried it across 10s of ms of latency links, but my 
>>>>> instinct says that if you have enough workers, you might be able to 
>>>>> hide the latency.  We'd have to experiment with it to see what 
>>>>> happens.  We could potentially do some experiments between SDSC and 
>>>>> ANL over a 50+ ms link, and see what difference in throughputs we get.
>>>>>
>>>>> Ioan
>>>>>> If a gateway node is desired, this option sounds a lot like the 
>>>>>> GRAM+LRM
>>>>>> situation we use on VMs with the workspace service and will soon 
>>>>>> use on EC2 via
>>>>>> the workspace EC2 gateway we're implementing.  Start up one 
>>>>>> gateway node and
>>>>>> then add compute nodes which dynamically join the pool, they are 
>>>>>> pointed to the
>>>>>> GRAM node.
>>>>>>
>>>>>>  
>>>>>>> All the nodes in a site are required by our site model to have a 
>>>>>>> shared filesystem - we've talked about removing it, but I think 
>>>>>>> that is still the case and if so, isn't going to change soon.     
>>>>>> Setting up a shared filesystem in this environment is akin to 
>>>>>> setting up the
>>>>>> compute nodes to join an LRM pool.  The VMs can communicate over 
>>>>>> the private
>>>>>> network at EC2, you can instruct EC2 to let all the nodes be open 
>>>>>> to each other
>>>>>> (while simultaneously keeping a separate policy of blocking ports 
>>>>>> from being
>>>>>> open from the internet and other people's EC2 nodes).  The 
>>>>>> non-file-serving
>>>>>> nodes would simply need to know the private address of the 
>>>>>> filesystem server
>>>>>> (unless you are using a fancier network file system than NFS-style 
>>>>>> ones).
>>>>>> For background: every VM on EC2 currently gets a public address -- 
>>>>>> NAT'd to a
>>>>>> private address which is actually what the VM's one NIC is 
>>>>>> configured with.
>>>>>> There is a facility to open/forward specific network ports on the 
>>>>>> public
>>>>>> address to each VM either via a group policy or on a VM by VM basis.
>>>>>>
>>>>>> [...]
>>>>>>> Amazon also has a storage cloud, alongside its compute cloud. I 
>>>>>>> know very little about that and have never thought about how it 
>>>>>>> would fit into the above (if at all). Maybe someone else knows more.
>>>>>>>     
>>>>>> A VM template on EC2 is called an AMI which stands for Amazon 
>>>>>> Machine Image.
>>>>>> This is just a packaging thing but what it mostly means is that 
>>>>>> the VM is
>>>>>> stored on S3 and also registered into the EC2 system.
>>>>>>
>>>>>> When starting an instance of an AMI, the file is copied from S3 to 
>>>>>> the
>>>>>> hypervisor node (what we call propagation in the workspace 
>>>>>> service).  After it
>>>>>> is used, this file is deleted (an option in the workspace service 
>>>>>> but there is
>>>>>> also an option to save it back with any changes). So the VMs are 
>>>>>> stored in S3 but anything that happens on them after being
>>>>>> started is lost unless you manually do something about it.
>>>>>>
>>>>>> As for free scratch space, you get a good amount per node, 140G.  
>>>>>> But the node
>>>>>> could go down at any moment just like a physical resource.
>>>>>>
>>>>>> To harness S3 for safely persisting any data (or if you need more 
>>>>>> space) you
>>>>>> would need to actually run S3 clients on the VMs when they are run 
>>>>>> on EC2.  You
>>>>>> could alternatively mirror data between nodes assuming that all 
>>>>>> would not go
>>>>>> down at once.
>>>>>> The good thing is that you do not pay transfer costs between S3 
>>>>>> and EC2 if you
>>>>>> chose to use S3 for big storage, you would only pay the "housing 
>>>>>> fees" so to
>>>>>> speak.
>>>>>> Tim
>>>>>> _______________________________________________
>>>>>> Swift-devel mailing list
>>>>>> Swift-devel at ci.uchicago.edu
>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>
>>>>>>   
>>>
>>
> 

-- 

Kate Keahey,
Mathematics & CS Division, Argonne National Laboratory
Computation Institute, University of Chicago


From itf at mcs.anl.gov  Thu May 17 10:14:16 2007
From: itf at mcs.anl.gov (=?UTF-8?B?SWFuIEZvc3Rlcg==?=)
Date: Thu, 17 May 2007 15:14:16 +0000
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464C69D6.70909@mcs.anl.gov>
References: <4649D280.5080906@mcs.anl.gov>	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk><20070515154500.ad1600bf.tfreeman@mcs.anl.gov><464A24AF.7080801@cs.uchicago.edu>
	<464A8857.90800@mcs.anl.gov>
	<356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>
	<464B1402.9040405@mcs.anl.gov>
	<1583680979-1179336692-cardhu_blackberry.rim.net-2135204572-@bwe032-cell00.bisx.prod.on.blackberry>
	<464B4A6E.2040804@mcs.anl.gov> <464B6746.7050907@mcs.anl.gov>
	<464C69D6.70909@mcs.anl.gov>
Message-ID: <1350594402-1179414977-cardhu_blackberry.rim.net-526979985-@bwe005-cell00.bisx.prod.on.blackberry>

If the discussion is useful then by all means continue it.

I was concerned that if you were very close to having a virtual cluster that would work for us, then taking time to create a different virtual cluster design would slow us down. But maybe that won't happen.

Ian


Sent via BlackBerry from T-Mobile  

-----Original Message-----
From: Kate Keahey <keahey at mcs.anl.gov>
Date: Thu, 17 May 2007 09:42:30 
To:Ian Foster <foster at mcs.anl.gov>
Cc:itf at mcs.anl.gov, swift-devel-bounces at ci.uchicago.edu,  Ioan Raicu <iraicu at cs.uchicago.edu>, swift-devel at ci.uchicago.edu,  Borja Sotomayor <borja at borjanet.com>
Subject: Re: [Swift-devel] swift-on-ec2


Ian Foster wrote:
> Kate:
> 
> I want to emphasize that I was *not* dismissing the issues below as 
> distractions.
> 
> What I meant was: given that you are working on developing a "virtual 
> cluster", which I am pretty sure will be able to execute Swift apps, 
> let's focus on getting that done, rather than worrying about "special 
> casing" it for Falkon, adding dynamic node acquisition, or the other 
> things that people started discussing as potential extensions.

We only now really began to discuss how to use VMs with Swift/Falkon -- 
the original set of issues you posted was just what was needed, it 
clearly inspired a very good discussion, and made me realize that I 
should have been talking to a wider set of people about this. Please, 
don't go back on us now... It also looks to me like there may be 
solutions that will make more sense both from the perspective of the 
architecture and will also be easier to implement with the current state 
of virtualization tools. For example, if we can set up Falkon to 
provision single nodes operating in pull mode (pulling work from a 
"master") various contextualization issues will have become much easier.

> 
> I understand from our IM conversation today that the "virtual cluster" 
> is ready for us in a "static environment" such as some machines in our 
> lab. In a "dynamic environment" such as EC2, it is not quite ready for 
> use yet. Thus, you won't be able to get Swift running on EC2 tomorrow.

This is not quite accurate; static refers to statically assigned IPs -- 
we have control over our IPs and can assign them to the cluster nodes in 
the same way each time we deploy it. Amazon will choose new IPs for the 
nodes each time the cluster is deployed, so each time the configuration 
of the cluster will have to be adjusted to reflect different IP 
assignment to the nodes (but if we were to change the IPs on the cluster 
nodes in a local environment we would be just as dynamic).

But if you deploy just one node (e.g., a node operating in the pull mode 
as in the example above) the need for this configuration adjustment may 
go away (depending on what the node does) so everything may become much 
simpler.

We can spend some time looking at deploying a VM on EC2 if it is of 
interest (as well as deploying a VM via the workspace service if that is 
of interest), we can run things on the deployed VM, etc. But I 
*strongly* argue that we spend at least some time defining what we want 
from this project, what is realistic to have in the short-term, what 
will be hard/impossible/inconvenient and try to build it systematically. 
Then we can figure out who does what and by when this is going to be done.


> 
> Ian.
> 
> 
> Kate Keahey wrote:
>> Ian,
>>
>> you seem to be referring to the necessary /etc/hosts configuration as 
>> well as workers registering with the torque headnode below as 
>> "distractions" -- I agree they can be very distracting, but in my 
>> experience without these distractions a cluster (virtual or physical) 
>> won't work in the way such clusters are typically expected to work.
>>
>> What I said in my mail is that we can set up a base cluster locally so 
>> that somebody like Ioan can finish the configuration (i.e., install 
>> Falkon on it). We will configure this cluster once and leave it 
>> deployed  as long as needed.
>>
>> Once we have the front-end to EC2 working (which we don't have yet 
>> although we are close) we will deploy this cluster on EC2 and provide 
>> methods that will automate this last little bit of configuration that 
>> *always* has to be done on deployment.
>>
>> I also think it is quite important that we spend the time tomorrow 
>> discussing what exactly we are trying to do -- right now, it looks to 
>> me like it might make more sense to not use clusters (it will help 
>> with the "distractions" if we don't).
>>
>> I realize that you are eager for us to get things to run -- I am eager 
>> too, but I honestly think we will get there faster if we plan better.
>>
>> Ian Foster wrote:
>>> Kate:
>>>
>>> I personally will be delighted if you could run the virtual cluster 
>>> on ec2 tomorrow. I know that there are lots of ways that you could 
>>> refine its config, local expts that could be performed, etc., but 
>>> perhaps we could try bypassing those things, which seem somewhat like 
>>> distractions to me?
>>>
>>> Ian
>>>
>>>
>>> Sent via BlackBerry from T-Mobile -----Original Message-----
>>> From: Kate Keahey <keahey at mcs.anl.gov>
>>> Date: Wed, 16 May 2007 09:24:02 To:itf at mcs.anl.gov
>>> Cc:swift-devel-bounces at ci.uchicago.edu, Ioan Raicu 
>>> <iraicu at cs.uchicago.edu>,  swift-devel at ci.uchicago.edu, Borja 
>>> Sotomayor <borja at borjanet.com>
>>> Subject: Re: [Swift-devel] swift-on-ec2
>>>
>>>
>>>
>>> Ian Foster wrote:
>>>> Kate:
>>>>
>>>> If we configure the virtual cluster with a full LRM, as you propose 
>>>> (and it seems have already done--great work!), then we can use this 
>>>> to start Falkon executors--as we do today on regular clusters. So it 
>>>> seems to me that we have all we need. How about you and Ioan spend 
>>>> your time on Thursday running something on EC2, to make sure it sorks?
>>>
>>> As I suggest below, I think it would be easiest if we could deploy 
>>> and debug a small static cluster locally first, and we can probably 
>>> give it a shot tomorrow. We still don't have access to the Xen nodes 
>>> on TeraPort (although hopefully that might change by tomorrow) but I 
>>> asked Rick to rebuild a couple of nodes at ANL and he did, I think 
>>> for a test that should give us enough resources to play with.
>>>
>>> At the same time -- if there are multiple ways of doing this, and 
>>> perhaps better ways than simply using a virtual cluster, we should 
>>> discuss them now. It is not completely clear to me what the 
>>> relationship between Falkon and Swift is, and what the specific 
>>> objectives are (other than that dynamically provisioning resources is 
>>> required). It looks at this point like the objectives probably 
>>> overlap with what Ioan, Borja and I wanted to talk about (which I 
>>> thought was a separate project, but am thrilled to find out is 
>>> related) so how about we come up with a design tomorrow and post the 
>>> notes on this list (is this a good venue for that?) and then others 
>>> can shoot them down.
>>>
>>>> Regarding choice of LRM: have you looked at SGE? That is what quite 
>>>> a few others seem to be using.
>>>
>>> Yes, we have. We also collaborate with others who do, as well as with 
>>> Sun... As you may remember, Borja did the scheduling work for his 
>>> thesis in the context of SGE. Last time we talked though, Torque was 
>>> the scheduler of choice for the virtual cluster LRM so we used that.
>>>
>>> The usage of SGE you are referring to above -- is this in the context 
>>> of virtualization projects, or as LRM for various Falkon-related 
>>> applications?
>>>
>>>> Ian
>>>>
>>>>
>>>>
>>>> Sent via BlackBerry from T-Mobile -----Original Message-----
>>>> From: Kate Keahey <keahey at mcs.anl.gov>
>>>> Date: Tue, 15 May 2007 23:28:07 To:iraicu at cs.uchicago.edu
>>>> Cc:swift-devel at ci.uchicago.edu
>>>> Subject: Re: [Swift-devel] swift-on-ec2
>>>>
>>>> First -- this is a very useful discussion, would it be possible to 
>>>> see all of it. We need to understand the requirements and trade-offs 
>>>> in some detail to figure out the best way to make this work. I see a 
>>>> few different interaction threads somewhat mixed up here though so 
>>>> just to make sure we are all on the same wavelength, here is some 
>>>> context.
>>>>
>>>> Ian and I have been talking on and off about providing a workspace 
>>>> service implementation with EC2 backend. The benefit for that would 
>>>> be that users could deploy the same VMs using the same interface to 
>>>> either TeraPort or EC2 or yet another resource provider. The 
>>>> workspace service would also provide some features on top of EC2 
>>>> (translating between PKI credentials and Amazon's paying accounts, 
>>>> contextualization as needed to make deployment dynamic). One 
>>>> application of interest for this was Swift. Last time we chatted 
>>>> about this though was in the context of using EC2 to provide a 
>>>> production platform for STAR runs (since virtualizing enough 
>>>> TeraPort to provide a production platform is taking a long time). 
>>>> This is what Tim and I are trying to make happen now.
>>>>
>>>> Since there was also interest in running Swift in VMs, Mike, Tibi 
>>>> and I met around February/March and agreed that a reasonable way to 
>>>> proceed will be for us to stand up a base virtual cluster somewhere 
>>>> locally (e.g., a static deployment on TeraPort) so that they can 
>>>> finish the configuration according to their needs, look at 
>>>> performance, figure out the best way to interact with it, and make 
>>>> sure that there are no VM-induced gotchas. All of this will be much 
>>>> easier to assess locally and on a static deployment. Then we'd make 
>>>> sure the cluster is dynamically deployable using the workspace 
>>>> service (on EC2 or whatever other provider). During the meeting (and 
>>>> over following emails) we agreed that the required "base cluster" 
>>>> would be configured with GRAM/Torque on the headnode plus a number 
>>>> of worker nodes, plus root privileges. We configured this cluster 
>>>> and it is ready to deploy. Are you saying now that in fact something 
>>>> different is needed?
>>>>
>>>> As Ian says, Borja and I were planning to meet with Ioan on Thursday 
>>>> to discuss interaction between Falkon and the workspace service (not 
>>>> necessarily/exclusively in the EC2 context). I don't completely 
>>>> understand the relationship between swift and falkon -- are there 
>>>> specific applications or scenarios that you are trying to target in 
>>>> this exercise?
>>>>
>>>> Ioan Raicu wrote:
>>>>> Hi,
>>>>> See below:
>>>>>
>>>>> Tim Freeman wrote:
>>>>>> On Tue, 15 May 2007 16:20:03 +0000 (GMT)
>>>>>> Ben Clifford <benc at hawaga.org.uk> wrote:
>>>>>>
>>>>>>  
>>>>>>> Ian asked about this elsewhere, but its perhaps interesting for 
>>>>>>> swift-devel people to look at the questions too.
>>>>>>>
>>>>>>> On Tue, 15 May 2007, Ian Foster wrote:
>>>>>>>
>>>>>>>  
>>>>>>>> Dear All:
>>>>>>>>       
>>>>>>>                                                                                 
>>>>>>>  
>>>>>>>> I asked Kate if she and Tim could look into creating VM images 
>>>>>>>> that would allow us to run Swift applications on Amazon EC2. I 
>>>>>>>> think Kate is meeting with Ioan about this on Thursday (?).
>>>>>>>>       
>>>>>>>                                                                                 
>>>>>>>  
>>>>>>>> One issue that I thought would be good to discuss is what we'd 
>>>>>>>> want in that VM image. Perhaps this is obvious to the rest of 
>>>>>>>> you, but it isn't to me. A few thoughts:
>>>>>>>>       * I'm assuming that we want to run "workers" on EC2 nodes, 
>>>>>>>> and have the
>>>>>>>> "task dispatch" logic run on some external frontend system 
>>>>>>>> outside EC2.
>>>>>>>>       * I would think that we want to use Falkon to do the task 
>>>>>>>> dispatch. If so,
>>>>>>>> we need a Falkon executor on each VM, configured to check in 
>>>>>>>> with the Falkon
>>>>>>>> dispatcher. (Alternatively, we could use, say, SGE: in that 
>>>>>>>> case, we would
>>>>>>>> want an SGE agent.)
>>>>>>>>       *  We need a way of getting data to and from the worker 
>>>>>>>> nodes. Do we want to
>>>>>>>> run a file system across the EC2 nodes and the external frontend 
>>>>>>>> node? That
>>>>>>>> seems rather inefficient. Other options?
>>>>>>>>       * Should we preload the application code on each EC2 node?
>>>>>>>>       
>>>>>>> Here's a couple of approaches:
>>>>>>>
>>>>>>>  1) swift regards all the EC2 nodes that we are paying for as a 
>>>>>>> single     site.
>>>>>>>
>>>>>>> Something like falkon handles all the task dispatch and worker 
>>>>>>> node management. I don't know what that looks like at the moment 
>>>>>>> in Falkon, but the interface for Swift to send jobs into Falkon 
>>>>>>> sounds pretty straightforward and shouldn't need changing.
>>>>>>>     
>>>>>> So if I understand, here there would be no gateway+LRM but each 
>>>>>> EC2 node +
>>>>>> Falkon would need a port open to receive tasks?  Or does each node 
>>>>>> pull down
>>>>>> instructions OK from behind a firewall?
>>>>>>   
>>>>> Falkon supports both polling and notifications.  To use 
>>>>> notifications, there needs to be an open port on the worker :(
>>>>>> Is there a latency problem with running each node as an indepdent 
>>>>>> task
>>>>>> receiver with the dispatcher off-site from EC2?  I would think it 
>>>>>> would be
>>>>>> better to put the queues to fill with tasks on EC2 so it can more 
>>>>>> quickly get
>>>>>> the task going when a node is done with a previous task (I may be 
>>>>>> missing some
>>>>>> nuances here with respect to Falkon, don't know much about this 
>>>>>> yet!).   
>>>>> We have run the Falkon dispatcher at UChicago and workers at ANL 
>>>>> without any issues, so it can easily tolerate a few ms of latency.  
>>>>> We haven't tried it across 10s of ms of latency links, but my 
>>>>> instinct says that if you have enough workers, you might be able to 
>>>>> hide the latency.  We'd have to experiment with it to see what 
>>>>> happens.  We could potentially do some experiments between SDSC and 
>>>>> ANL over a 50+ ms link, and see what difference in throughputs we get.
>>>>>
>>>>> Ioan
>>>>>> If a gateway node is desired, this option sounds a lot like the 
>>>>>> GRAM+LRM
>>>>>> situation we use on VMs with the workspace service and will soon 
>>>>>> use on EC2 via
>>>>>> the workspace EC2 gateway we're implementing.  Start up one 
>>>>>> gateway node and
>>>>>> then add compute nodes which dynamically join the pool, they are 
>>>>>> pointed to the
>>>>>> GRAM node.
>>>>>>
>>>>>>  
>>>>>>> All the nodes in a site are required by our site model to have a 
>>>>>>> shared filesystem - we've talked about removing it, but I think 
>>>>>>> that is still the case and if so, isn't going to change soon.     
>>>>>> Setting up a shared filesystem in this environment is akin to 
>>>>>> setting up the
>>>>>> compute nodes to join an LRM pool.  The VMs can communicate over 
>>>>>> the private
>>>>>> network at EC2, you can instruct EC2 to let all the nodes be open 
>>>>>> to each other
>>>>>> (while simultaneously keeping a separate policy of blocking ports 
>>>>>> from being
>>>>>> open from the internet and other people's EC2 nodes).  The 
>>>>>> non-file-serving
>>>>>> nodes would simply need to know the private address of the 
>>>>>> filesystem server
>>>>>> (unless you are using a fancier network file system than NFS-style 
>>>>>> ones).
>>>>>> For background: every VM on EC2 currently gets a public address -- 
>>>>>> NAT'd to a
>>>>>> private address which is actually what the VM's one NIC is 
>>>>>> configured with.
>>>>>> There is a facility to open/forward specific network ports on the 
>>>>>> public
>>>>>> address to each VM either via a group policy or on a VM by VM basis.
>>>>>>
>>>>>> [...]
>>>>>>> Amazon also has a storage cloud, alongside its compute cloud. I 
>>>>>>> know very little about that and have never thought about how it 
>>>>>>> would fit into the above (if at all). Maybe someone else knows more.
>>>>>>>     
>>>>>> A VM template on EC2 is called an AMI which stands for Amazon 
>>>>>> Machine Image.
>>>>>> This is just a packaging thing but what it mostly means is that 
>>>>>> the VM is
>>>>>> stored on S3 and also registered into the EC2 system.
>>>>>>
>>>>>> When starting an instance of an AMI, the file is copied from S3 to 
>>>>>> the
>>>>>> hypervisor node (what we call propagation in the workspace 
>>>>>> service).  After it
>>>>>> is used, this file is deleted (an option in the workspace service 
>>>>>> but there is
>>>>>> also an option to save it back with any changes). So the VMs are 
>>>>>> stored in S3 but anything that happens on them after being
>>>>>> started is lost unless you manually do something about it.
>>>>>>
>>>>>> As for free scratch space, you get a good amount per node, 140G.  
>>>>>> But the node
>>>>>> could go down at any moment just like a physical resource.
>>>>>>
>>>>>> To harness S3 for safely persisting any data (or if you need more 
>>>>>> space) you
>>>>>> would need to actually run S3 clients on the VMs when they are run 
>>>>>> on EC2.  You
>>>>>> could alternatively mirror data between nodes assuming that all 
>>>>>> would not go
>>>>>> down at once.
>>>>>> The good thing is that you do not pay transfer costs between S3 
>>>>>> and EC2 if you
>>>>>> chose to use S3 for big storage, you would only pay the "housing 
>>>>>> fees" so to
>>>>>> speak.
>>>>>> Tim
>>>>>> _______________________________________________
>>>>>> Swift-devel mailing list
>>>>>> Swift-devel at ci.uchicago.edu
>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>
>>>>>>   
>>>
>>
> 

-- 

Kate Keahey,
Mathematics & CS Division, Argonne National Laboratory
Computation Institute, University of Chicago


From iraicu at cs.uchicago.edu  Thu May 17 11:10:16 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Thu, 17 May 2007 11:10:16 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464C69D6.70909@mcs.anl.gov>
References: <4649D280.5080906@mcs.anl.gov>	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk><20070515154500.ad1600bf.tfreeman@mcs.anl.gov><464A24AF.7080801@cs.uchicago.edu>
	<464A8857.90800@mcs.anl.gov>
	<356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>
	<464B1402.9040405@mcs.anl.gov>
	<1583680979-1179336692-cardhu_blackberry.rim.net-2135204572-@bwe032-cell00.bisx.prod.on.blackberry>
	<464B4A6E.2040804@mcs.anl.gov> <464B6746.7050907@mcs.anl.gov>
	<464C69D6.70909@mcs.anl.gov>
Message-ID: <464C7E68.1030400@cs.uchicago.edu>


Kate Keahey wrote:
>
>
> Ian Foster wrote:
>> Kate:
>>
>> I want to emphasize that I was *not* dismissing the issues below as 
>> distractions.
>>
>> What I meant was: given that you are working on developing a "virtual 
>> cluster", which I am pretty sure will be able to execute Swift apps, 
>> let's focus on getting that done, rather than worrying about "special 
>> casing" it for Falkon, adding dynamic node acquisition, or the other 
>> things that people started discussing as potential extensions.
>
> We only now really began to discuss how to use VMs with Swift/Falkon 
> -- the original set of issues you posted was just what was needed, it 
> clearly inspired a very good discussion, and made me realize that I 
> should have been talking to a wider set of people about this. Please, 
> don't go back on us now... It also looks to me like there may be 
> solutions that will make more sense both from the perspective of the 
> architecture and will also be easier to implement with the current 
> state of virtualization tools. For example, if we can set up Falkon to 
> provision single nodes operating in pull mode (pulling work from a 
> "master") various contextualization issues will have become much easier.
>
>>
>> I understand from our IM conversation today that the "virtual 
>> cluster" is ready for us in a "static environment" such as some 
>> machines in our lab. In a "dynamic environment" such as EC2, it is 
>> not quite ready for use yet. Thus, you won't be able to get Swift 
>> running on EC2 tomorrow.
>
> This is not quite accurate; static refers to statically assigned IPs 
> -- we have control over our IPs and can assign them to the cluster 
> nodes in the same way each time we deploy it. Amazon will choose new 
> IPs for the nodes each time the cluster is deployed, so each time the 
> configuration of the cluster will have to be adjusted to reflect 
> different IP assignment to the nodes (but if we were to change the IPs 
> on the cluster nodes in a local environment we would be just as dynamic).
>
> But if you deploy just one node (e.g., a node operating in the pull 
> mode as in the example above) the need for this configuration 
> adjustment may go away (depending on what the node does) so everything 
> may become much simpler.
Currently, a Falkon executor (the worker code) upon bootstrapping, makes 
1 WS call to the Falkon dispatcher (running in a GT4 container) to 
register its name and the port on which the notification engine is 
listening on.  Once this is done, the executors go into a listen mode 
for notifications, and only acts (send WS calls out) upon the reception 
of notifications.  So, the VMs that run the Falkon executors can get 
DHCP addresses, and the registration message will include all the 
necessary information about where the Falkon dispatcher needs to contact 
the respective Falkon executor!  Now, the one configuration parameter 
that we must have is the location of the Falkon dispatcher.  If we have 
it running in a static location (a well known machine and port), then 
this can be hard coded into the bootstrapping scripts, and there is no 
configuration needed!  If the dispatcher does not have a static resource 
to run on (i.e. it runs in another VM), then this information needs to 
be passed to the executor bootstrapping scripts! 

Ioan
>
> We can spend some time looking at deploying a VM on EC2 if it is of 
> interest (as well as deploying a VM via the workspace service if that 
> is of interest), we can run things on the deployed VM, etc. But I 
> *strongly* argue that we spend at least some time defining what we 
> want from this project, what is realistic to have in the short-term, 
> what will be hard/impossible/inconvenient and try to build it 
> systematically. Then we can figure out who does what and by when this 
> is going to be done.
>
>
>>
>> Ian.
>>
>>
>> Kate Keahey wrote:
>>> Ian,
>>>
>>> you seem to be referring to the necessary /etc/hosts configuration 
>>> as well as workers registering with the torque headnode below as 
>>> "distractions" -- I agree they can be very distracting, but in my 
>>> experience without these distractions a cluster (virtual or 
>>> physical) won't work in the way such clusters are typically expected 
>>> to work.
>>>
>>> What I said in my mail is that we can set up a base cluster locally 
>>> so that somebody like Ioan can finish the configuration (i.e., 
>>> install Falkon on it). We will configure this cluster once and leave 
>>> it deployed  as long as needed.
>>>
>>> Once we have the front-end to EC2 working (which we don't have yet 
>>> although we are close) we will deploy this cluster on EC2 and 
>>> provide methods that will automate this last little bit of 
>>> configuration that *always* has to be done on deployment.
>>>
>>> I also think it is quite important that we spend the time tomorrow 
>>> discussing what exactly we are trying to do -- right now, it looks 
>>> to me like it might make more sense to not use clusters (it will 
>>> help with the "distractions" if we don't).
>>>
>>> I realize that you are eager for us to get things to run -- I am 
>>> eager too, but I honestly think we will get there faster if we plan 
>>> better.
>>>
>>> Ian Foster wrote:
>>>> Kate:
>>>>
>>>> I personally will be delighted if you could run the virtual cluster 
>>>> on ec2 tomorrow. I know that there are lots of ways that you could 
>>>> refine its config, local expts that could be performed, etc., but 
>>>> perhaps we could try bypassing those things, which seem somewhat 
>>>> like distractions to me?
>>>>
>>>> Ian
>>>>
>>>>
>>>> Sent via BlackBerry from T-Mobile -----Original Message-----
>>>> From: Kate Keahey <keahey at mcs.anl.gov>
>>>> Date: Wed, 16 May 2007 09:24:02 To:itf at mcs.anl.gov
>>>> Cc:swift-devel-bounces at ci.uchicago.edu, Ioan Raicu 
>>>> <iraicu at cs.uchicago.edu>,  swift-devel at ci.uchicago.edu, Borja 
>>>> Sotomayor <borja at borjanet.com>
>>>> Subject: Re: [Swift-devel] swift-on-ec2
>>>>
>>>>
>>>>
>>>> Ian Foster wrote:
>>>>> Kate:
>>>>>
>>>>> If we configure the virtual cluster with a full LRM, as you 
>>>>> propose (and it seems have already done--great work!), then we can 
>>>>> use this to start Falkon executors--as we do today on regular 
>>>>> clusters. So it seems to me that we have all we need. How about 
>>>>> you and Ioan spend your time on Thursday running something on EC2, 
>>>>> to make sure it sorks?
>>>>
>>>> As I suggest below, I think it would be easiest if we could deploy 
>>>> and debug a small static cluster locally first, and we can probably 
>>>> give it a shot tomorrow. We still don't have access to the Xen 
>>>> nodes on TeraPort (although hopefully that might change by 
>>>> tomorrow) but I asked Rick to rebuild a couple of nodes at ANL and 
>>>> he did, I think for a test that should give us enough resources to 
>>>> play with.
>>>>
>>>> At the same time -- if there are multiple ways of doing this, and 
>>>> perhaps better ways than simply using a virtual cluster, we should 
>>>> discuss them now. It is not completely clear to me what the 
>>>> relationship between Falkon and Swift is, and what the specific 
>>>> objectives are (other than that dynamically provisioning resources 
>>>> is required). It looks at this point like the objectives probably 
>>>> overlap with what Ioan, Borja and I wanted to talk about (which I 
>>>> thought was a separate project, but am thrilled to find out is 
>>>> related) so how about we come up with a design tomorrow and post 
>>>> the notes on this list (is this a good venue for that?) and then 
>>>> others can shoot them down.
>>>>
>>>>> Regarding choice of LRM: have you looked at SGE? That is what 
>>>>> quite a few others seem to be using.
>>>>
>>>> Yes, we have. We also collaborate with others who do, as well as 
>>>> with Sun... As you may remember, Borja did the scheduling work for 
>>>> his thesis in the context of SGE. Last time we talked though, 
>>>> Torque was the scheduler of choice for the virtual cluster LRM so 
>>>> we used that.
>>>>
>>>> The usage of SGE you are referring to above -- is this in the 
>>>> context of virtualization projects, or as LRM for various 
>>>> Falkon-related applications?
>>>>
>>>>> Ian
>>>>>
>>>>>
>>>>>
>>>>> Sent via BlackBerry from T-Mobile -----Original Message-----
>>>>> From: Kate Keahey <keahey at mcs.anl.gov>
>>>>> Date: Tue, 15 May 2007 23:28:07 To:iraicu at cs.uchicago.edu
>>>>> Cc:swift-devel at ci.uchicago.edu
>>>>> Subject: Re: [Swift-devel] swift-on-ec2
>>>>>
>>>>> First -- this is a very useful discussion, would it be possible to 
>>>>> see all of it. We need to understand the requirements and 
>>>>> trade-offs in some detail to figure out the best way to make this 
>>>>> work. I see a few different interaction threads somewhat mixed up 
>>>>> here though so just to make sure we are all on the same 
>>>>> wavelength, here is some context.
>>>>>
>>>>> Ian and I have been talking on and off about providing a workspace 
>>>>> service implementation with EC2 backend. The benefit for that 
>>>>> would be that users could deploy the same VMs using the same 
>>>>> interface to either TeraPort or EC2 or yet another resource 
>>>>> provider. The workspace service would also provide some features 
>>>>> on top of EC2 (translating between PKI credentials and Amazon's 
>>>>> paying accounts, contextualization as needed to make deployment 
>>>>> dynamic). One application of interest for this was Swift. Last 
>>>>> time we chatted about this though was in the context of using EC2 
>>>>> to provide a production platform for STAR runs (since virtualizing 
>>>>> enough TeraPort to provide a production platform is taking a long 
>>>>> time). This is what Tim and I are trying to make happen now.
>>>>>
>>>>> Since there was also interest in running Swift in VMs, Mike, Tibi 
>>>>> and I met around February/March and agreed that a reasonable way 
>>>>> to proceed will be for us to stand up a base virtual cluster 
>>>>> somewhere locally (e.g., a static deployment on TeraPort) so that 
>>>>> they can finish the configuration according to their needs, look 
>>>>> at performance, figure out the best way to interact with it, and 
>>>>> make sure that there are no VM-induced gotchas. All of this will 
>>>>> be much easier to assess locally and on a static deployment. Then 
>>>>> we'd make sure the cluster is dynamically deployable using the 
>>>>> workspace service (on EC2 or whatever other provider). During the 
>>>>> meeting (and over following emails) we agreed that the required 
>>>>> "base cluster" would be configured with GRAM/Torque on the 
>>>>> headnode plus a number of worker nodes, plus root privileges. We 
>>>>> configured this cluster and it is ready to deploy. Are you saying 
>>>>> now that in fact something different is needed?
>>>>>
>>>>> As Ian says, Borja and I were planning to meet with Ioan on 
>>>>> Thursday to discuss interaction between Falkon and the workspace 
>>>>> service (not necessarily/exclusively in the EC2 context). I don't 
>>>>> completely understand the relationship between swift and falkon -- 
>>>>> are there specific applications or scenarios that you are trying 
>>>>> to target in this exercise?
>>>>>
>>>>> Ioan Raicu wrote:
>>>>>> Hi,
>>>>>> See below:
>>>>>>
>>>>>> Tim Freeman wrote:
>>>>>>> On Tue, 15 May 2007 16:20:03 +0000 (GMT)
>>>>>>> Ben Clifford <benc at hawaga.org.uk> wrote:
>>>>>>>
>>>>>>>  
>>>>>>>> Ian asked about this elsewhere, but its perhaps interesting for 
>>>>>>>> swift-devel people to look at the questions too.
>>>>>>>>
>>>>>>>> On Tue, 15 May 2007, Ian Foster wrote:
>>>>>>>>
>>>>>>>>  
>>>>>>>>> Dear All:
>>>>>>>>>       
>>>>>>>>                                                                                 
>>>>>>>>  
>>>>>>>>> I asked Kate if she and Tim could look into creating VM images 
>>>>>>>>> that would allow us to run Swift applications on Amazon EC2. I 
>>>>>>>>> think Kate is meeting with Ioan about this on Thursday (?).
>>>>>>>>>       
>>>>>>>>                                                                                 
>>>>>>>>  
>>>>>>>>> One issue that I thought would be good to discuss is what we'd 
>>>>>>>>> want in that VM image. Perhaps this is obvious to the rest of 
>>>>>>>>> you, but it isn't to me. A few thoughts:
>>>>>>>>>       * I'm assuming that we want to run "workers" on EC2 
>>>>>>>>> nodes, and have the
>>>>>>>>> "task dispatch" logic run on some external frontend system 
>>>>>>>>> outside EC2.
>>>>>>>>>       * I would think that we want to use Falkon to do the 
>>>>>>>>> task dispatch. If so,
>>>>>>>>> we need a Falkon executor on each VM, configured to check in 
>>>>>>>>> with the Falkon
>>>>>>>>> dispatcher. (Alternatively, we could use, say, SGE: in that 
>>>>>>>>> case, we would
>>>>>>>>> want an SGE agent.)
>>>>>>>>>       *  We need a way of getting data to and from the worker 
>>>>>>>>> nodes. Do we want to
>>>>>>>>> run a file system across the EC2 nodes and the external 
>>>>>>>>> frontend node? That
>>>>>>>>> seems rather inefficient. Other options?
>>>>>>>>>       * Should we preload the application code on each EC2 node?
>>>>>>>>>       
>>>>>>>> Here's a couple of approaches:
>>>>>>>>
>>>>>>>>  1) swift regards all the EC2 nodes that we are paying for as a 
>>>>>>>> single     site.
>>>>>>>>
>>>>>>>> Something like falkon handles all the task dispatch and worker 
>>>>>>>> node management. I don't know what that looks like at the 
>>>>>>>> moment in Falkon, but the interface for Swift to send jobs into 
>>>>>>>> Falkon sounds pretty straightforward and shouldn't need changing.
>>>>>>>>     
>>>>>>> So if I understand, here there would be no gateway+LRM but each 
>>>>>>> EC2 node +
>>>>>>> Falkon would need a port open to receive tasks?  Or does each 
>>>>>>> node pull down
>>>>>>> instructions OK from behind a firewall?
>>>>>>>   
>>>>>> Falkon supports both polling and notifications.  To use 
>>>>>> notifications, there needs to be an open port on the worker :(
>>>>>>> Is there a latency problem with running each node as an 
>>>>>>> indepdent task
>>>>>>> receiver with the dispatcher off-site from EC2?  I would think 
>>>>>>> it would be
>>>>>>> better to put the queues to fill with tasks on EC2 so it can 
>>>>>>> more quickly get
>>>>>>> the task going when a node is done with a previous task (I may 
>>>>>>> be missing some
>>>>>>> nuances here with respect to Falkon, don't know much about this 
>>>>>>> yet!).   
>>>>>> We have run the Falkon dispatcher at UChicago and workers at ANL 
>>>>>> without any issues, so it can easily tolerate a few ms of 
>>>>>> latency.  We haven't tried it across 10s of ms of latency links, 
>>>>>> but my instinct says that if you have enough workers, you might 
>>>>>> be able to hide the latency.  We'd have to experiment with it to 
>>>>>> see what happens.  We could potentially do some experiments 
>>>>>> between SDSC and ANL over a 50+ ms link, and see what difference 
>>>>>> in throughputs we get.
>>>>>>
>>>>>> Ioan
>>>>>>> If a gateway node is desired, this option sounds a lot like the 
>>>>>>> GRAM+LRM
>>>>>>> situation we use on VMs with the workspace service and will soon 
>>>>>>> use on EC2 via
>>>>>>> the workspace EC2 gateway we're implementing.  Start up one 
>>>>>>> gateway node and
>>>>>>> then add compute nodes which dynamically join the pool, they are 
>>>>>>> pointed to the
>>>>>>> GRAM node.
>>>>>>>
>>>>>>>  
>>>>>>>> All the nodes in a site are required by our site model to have 
>>>>>>>> a shared filesystem - we've talked about removing it, but I 
>>>>>>>> think that is still the case and if so, isn't going to change 
>>>>>>>> soon.     
>>>>>>> Setting up a shared filesystem in this environment is akin to 
>>>>>>> setting up the
>>>>>>> compute nodes to join an LRM pool.  The VMs can communicate over 
>>>>>>> the private
>>>>>>> network at EC2, you can instruct EC2 to let all the nodes be 
>>>>>>> open to each other
>>>>>>> (while simultaneously keeping a separate policy of blocking 
>>>>>>> ports from being
>>>>>>> open from the internet and other people's EC2 nodes).  The 
>>>>>>> non-file-serving
>>>>>>> nodes would simply need to know the private address of the 
>>>>>>> filesystem server
>>>>>>> (unless you are using a fancier network file system than 
>>>>>>> NFS-style ones).
>>>>>>> For background: every VM on EC2 currently gets a public address 
>>>>>>> -- NAT'd to a
>>>>>>> private address which is actually what the VM's one NIC is 
>>>>>>> configured with.
>>>>>>> There is a facility to open/forward specific network ports on 
>>>>>>> the public
>>>>>>> address to each VM either via a group policy or on a VM by VM 
>>>>>>> basis.
>>>>>>>
>>>>>>> [...]
>>>>>>>> Amazon also has a storage cloud, alongside its compute cloud. I 
>>>>>>>> know very little about that and have never thought about how it 
>>>>>>>> would fit into the above (if at all). Maybe someone else knows 
>>>>>>>> more.
>>>>>>>>     
>>>>>>> A VM template on EC2 is called an AMI which stands for Amazon 
>>>>>>> Machine Image.
>>>>>>> This is just a packaging thing but what it mostly means is that 
>>>>>>> the VM is
>>>>>>> stored on S3 and also registered into the EC2 system.
>>>>>>>
>>>>>>> When starting an instance of an AMI, the file is copied from S3 
>>>>>>> to the
>>>>>>> hypervisor node (what we call propagation in the workspace 
>>>>>>> service).  After it
>>>>>>> is used, this file is deleted (an option in the workspace 
>>>>>>> service but there is
>>>>>>> also an option to save it back with any changes). So the VMs are 
>>>>>>> stored in S3 but anything that happens on them after being
>>>>>>> started is lost unless you manually do something about it.
>>>>>>>
>>>>>>> As for free scratch space, you get a good amount per node, 
>>>>>>> 140G.  But the node
>>>>>>> could go down at any moment just like a physical resource.
>>>>>>>
>>>>>>> To harness S3 for safely persisting any data (or if you need 
>>>>>>> more space) you
>>>>>>> would need to actually run S3 clients on the VMs when they are 
>>>>>>> run on EC2.  You
>>>>>>> could alternatively mirror data between nodes assuming that all 
>>>>>>> would not go
>>>>>>> down at once.
>>>>>>> The good thing is that you do not pay transfer costs between S3 
>>>>>>> and EC2 if you
>>>>>>> chose to use S3 for big storage, you would only pay the "housing 
>>>>>>> fees" so to
>>>>>>> speak.
>>>>>>> Tim
>>>>>>> _______________________________________________
>>>>>>> Swift-devel mailing list
>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>>
>>>>>>>   
>>>>
>>>
>>
>

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From tfreeman at mcs.anl.gov  Thu May 17 11:24:49 2007
From: tfreeman at mcs.anl.gov (Tim Freeman)
Date: Thu, 17 May 2007 11:24:49 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464C7E68.1030400@cs.uchicago.edu>
References: <4649D280.5080906@mcs.anl.gov>
	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>
	<20070515154500.ad1600bf.tfreeman@mcs.anl.gov>
	<464A24AF.7080801@cs.uchicago.edu> <464A8857.90800@mcs.anl.gov>
	<356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>
	<464B1402.9040405@mcs.anl.gov>
	<1583680979-1179336692-cardhu_blackberry.rim.net-2135204572-@bwe032-cell00.bisx.prod.on.blackberry>
	<464B4A6E.2040804@mcs.anl.gov> <464B6746.7050907@mcs.anl.gov>
	<464C69D6.70909@mcs.anl.gov> <464C7E68.1030400@cs.uchicago.edu>
Message-ID: <20070517112449.3a856f70.tfreeman@mcs.anl.gov>

On Thu, 17 May 2007 11:10:16 -0500
Ioan Raicu <iraicu at cs.uchicago.edu> wrote:

> 
> 
> Kate Keahey wrote:
> >
> >
> > Ian Foster wrote:
> >> Kate:
> >>
> >> I want to emphasize that I was *not* dismissing the issues below as 
> >> distractions.
> >>
> >> What I meant was: given that you are working on developing a "virtual 
> >> cluster", which I am pretty sure will be able to execute Swift apps, 
> >> let's focus on getting that done, rather than worrying about "special 
> >> casing" it for Falkon, adding dynamic node acquisition, or the other 
> >> things that people started discussing as potential extensions.
> >
> > We only now really began to discuss how to use VMs with Swift/Falkon 
> > -- the original set of issues you posted was just what was needed, it 
> > clearly inspired a very good discussion, and made me realize that I 
> > should have been talking to a wider set of people about this. Please, 
> > don't go back on us now... It also looks to me like there may be 
> > solutions that will make more sense both from the perspective of the 
> > architecture and will also be easier to implement with the current 
> > state of virtualization tools. For example, if we can set up Falkon to 
> > provision single nodes operating in pull mode (pulling work from a 
> > "master") various contextualization issues will have become much easier.
> >
> >>
> >> I understand from our IM conversation today that the "virtual 
> >> cluster" is ready for us in a "static environment" such as some 
> >> machines in our lab. In a "dynamic environment" such as EC2, it is 
> >> not quite ready for use yet. Thus, you won't be able to get Swift 
> >> running on EC2 tomorrow.
> >
> > This is not quite accurate; static refers to statically assigned IPs 
> > -- we have control over our IPs and can assign them to the cluster 
> > nodes in the same way each time we deploy it. Amazon will choose new 
> > IPs for the nodes each time the cluster is deployed, so each time the 
> > configuration of the cluster will have to be adjusted to reflect 
> > different IP assignment to the nodes (but if we were to change the IPs 
> > on the cluster nodes in a local environment we would be just as dynamic).
> >
> > But if you deploy just one node (e.g., a node operating in the pull 
> > mode as in the example above) the need for this configuration 
> > adjustment may go away (depending on what the node does) so everything 
> > may become much simpler.
> Currently, a Falkon executor (the worker code) upon bootstrapping, makes 
> 1 WS call to the Falkon dispatcher (running in a GT4 container) to 
> register its name and the port on which the notification engine is 
> listening on.  Once this is done, the executors go into a listen mode 
> for notifications, and only acts (send WS calls out) upon the reception 
> of notifications.  So, the VMs that run the Falkon executors can get 
> DHCP addresses, and the registration message will include all the 
> necessary information about where the Falkon dispatcher needs to contact 
> the respective Falkon executor

On EC2 the VM has a private address with a corresponding public one that it
can discover (through very EC2-specific mechanisms).  We've been working on
abstractions and software for doing this in a non ad-hoc way.  I'll let Kate
expound at your meeting. 

> Now, the one configuration parameter 
> that we must have is the location of the Falkon dispatcher.  If we have 
> it running in a static location (a well known machine and port), then 
> this can be hard coded into the bootstrapping scripts, and there is no 
> configuration needed!  If the dispatcher does not have a static resource 
> to run on (i.e. it runs in another VM), then this information needs to 
> be passed to the executor bootstrapping scripts

Through those EC2-specific mechanisms you can push per VM instance deployment
and the VM instance can be coded to discover this bit of information just like
its public IP.

Tying VMs + grid computing to EC2 specific mechanisms is the totally wrong way
to go, but it may be necessary to case for it specifically in the VM's boot +
contextualization process since we (the grid computing people) don't control
the middleware there. 

Tim


From tfreeman at mcs.anl.gov  Thu May 17 11:26:37 2007
From: tfreeman at mcs.anl.gov (Tim Freeman)
Date: Thu, 17 May 2007 11:26:37 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <20070517112449.3a856f70.tfreeman@mcs.anl.gov>
References: <4649D280.5080906@mcs.anl.gov>
	<Pine.LNX.4.64.0705151600410.20212@dildano.hawaga.org.uk>
	<20070515154500.ad1600bf.tfreeman@mcs.anl.gov>
	<464A24AF.7080801@cs.uchicago.edu> <464A8857.90800@mcs.anl.gov>
	<356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>
	<464B1402.9040405@mcs.anl.gov>
	<1583680979-1179336692-cardhu_blackberry.rim.net-2135204572-@bwe032-cell00.bisx.prod.on.blackberry>
	<464B4A6E.2040804@mcs.anl.gov> <464B6746.7050907@mcs.anl.gov>
	<464C69D6.70909@mcs.anl.gov> <464C7E68.1030400@cs.uchicago.edu>
	<20070517112449.3a856f70.tfreeman@mcs.anl.gov>
Message-ID: <20070517112637.70ae6c9f.tfreeman@mcs.anl.gov>

On Thu, 17 May 2007 11:24:49 -0500
Tim Freeman <tfreeman at mcs.anl.gov> wrote:

> 
> Through those EC2-specific mechanisms you can push per VM instance deployment

s/per VM instance deployment/per-VM-instance deployment information/ 

Tim 


From tiberius at ci.uchicago.edu  Thu May 17 11:29:55 2007
From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun)
Date: Thu, 17 May 2007 11:29:55 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <20070517112637.70ae6c9f.tfreeman@mcs.anl.gov>
References: <4649D280.5080906@mcs.anl.gov>
	<356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>
	<464B1402.9040405@mcs.anl.gov>
	<1583680979-1179336692-cardhu_blackberry.rim.net-2135204572-@bwe032-cell00.bisx.prod.on.blackberry>
	<464B4A6E.2040804@mcs.anl.gov> <464B6746.7050907@mcs.anl.gov>
	<464C69D6.70909@mcs.anl.gov> <464C7E68.1030400@cs.uchicago.edu>
	<20070517112449.3a856f70.tfreeman@mcs.anl.gov>
	<20070517112637.70ae6c9f.tfreeman@mcs.anl.gov>
Message-ID: <fec1351f0705170929m5feac6c3w90077796c789d7ff@mail.gmail.com>

Since there are dependencies in setting up the Falcon-enabled cluster
(essentially passing the IP of the headnode to the workers), maybe we
can have a Swift workflow that start up the whole EC2 grid shebang

Tibi

On 5/17/07, Tim Freeman <tfreeman at mcs.anl.gov> wrote:
> On Thu, 17 May 2007 11:24:49 -0500
> Tim Freeman <tfreeman at mcs.anl.gov> wrote:
>
> >
> > Through those EC2-specific mechanisms you can push per VM instance deployment
>
> s/per VM instance deployment/per-VM-instance deployment information/
>
> Tim
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


-- 
Tiberiu (Tibi) Stef-Praun, PhD
Research Staff, Computation Institute
5640 S. Ellis Ave, #405
University of Chicago
http://www-unix.mcs.anl.gov/~tiberius/


From iraicu at cs.uchicago.edu  Thu May 17 11:51:10 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Thu, 17 May 2007 11:51:10 -0500
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <fec1351f0705170929m5feac6c3w90077796c789d7ff@mail.gmail.com>
References: <4649D280.5080906@mcs.anl.gov>	
	<356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>	
	<464B1402.9040405@mcs.anl.gov>	
	<1583680979-1179336692-cardhu_blackberry.rim.net-2135204572-@bwe032-cell00.bisx.prod.on.blackberry>	
	<464B4A6E.2040804@mcs.anl.gov> <464B6746.7050907@mcs.anl.gov>	
	<464C69D6.70909@mcs.anl.gov> <464C7E68.1030400@cs.uchicago.edu>	
	<20070517112449.3a856f70.tfreeman@mcs.anl.gov>	
	<20070517112637.70ae6c9f.tfreeman@mcs.anl.gov>
	<fec1351f0705170929m5feac6c3w90077796c789d7ff@mail.gmail.com>
Message-ID: <464C87FE.5050006@cs.uchicago.edu>

If Swift can do it (through an LRM presumably), then Falkon could do it 
as well!
This should either be done from Falkon, or from the workspace service 
itself, but not from Swift...
Ioan

Tiberiu Stef-Praun wrote:
> Since there are dependencies in setting up the Falcon-enabled cluster
> (essentially passing the IP of the headnode to the workers), maybe we
> can have a Swift workflow that start up the whole EC2 grid shebang
>
> Tibi
>
> On 5/17/07, Tim Freeman <tfreeman at mcs.anl.gov> wrote:
>> On Thu, 17 May 2007 11:24:49 -0500
>> Tim Freeman <tfreeman at mcs.anl.gov> wrote:
>>
>> >
>> > Through those EC2-specific mechanisms you can push per VM instance 
>> deployment
>>
>> s/per VM instance deployment/per-VM-instance deployment information/
>>
>> Tim
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>
>

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From benc at hawaga.org.uk  Thu May 17 12:04:05 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 17 May 2007 17:04:05 +0000 (GMT)
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464C87FE.5050006@cs.uchicago.edu>
References: <4649D280.5080906@mcs.anl.gov> 
	<356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>
	<464B1402.9040405@mcs.anl.gov> 
	<1583680979-1179336692-cardhu_blackberry.rim.net-2135204572-@bwe032-cell00.bisx.prod.on.blackberry>
	<464B4A6E.2040804@mcs.anl.gov> <464B6746.7050907@mcs.anl.gov> 
	<464C69D6.70909@mcs.anl.gov> <464C7E68.1030400@cs.uchicago.edu> 
	<20070517112449.3a856f70.tfreeman@mcs.anl.gov>
	<20070517112637.70ae6c9f.tfreeman@mcs.anl.gov>
	<fec1351f0705170929m5feac6c3w90077796c789d7ff@mail.gmail.com>
	<464C87FE.5050006@cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705171703300.20212@dildano.hawaga.org.uk>


Management of remote virtual machine start-and-config on EC2 strikes me as 
being almost entirely out of scope for both swift and falkon...

-- 


From itf at mcs.anl.gov  Thu May 17 12:11:21 2007
From: itf at mcs.anl.gov (=?UTF-8?B?SWFuIEZvc3Rlcg==?=)
Date: Thu, 17 May 2007 17:11:21 +0000
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <Pine.LNX.4.64.0705171703300.20212@dildano.hawaga.org.uk>
References: <4649D280.5080906@mcs.anl.gov>
	<356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry><464B1402.9040405@mcs.anl.gov>
	<1583680979-1179336692-cardhu_blackberry.rim.net-2135204572-@bwe032-cell00.bisx.prod.on.blackberry><464B4A6E.2040804@mcs.anl.gov>
	<464B6746.7050907@mcs.anl.gov> <464C69D6.70909@mcs.anl.gov>
	<464C7E68.1030400@cs.uchicago.edu>
	<20070517112449.3a856f70.tfreeman@mcs.anl.gov><20070517112637.70ae6c9f.tfreeman@mcs.anl.gov><fec1351f0705170929m5feac6c3w90077796c789d7ff@mail.gmail.com><464C87FE.5050006@cs.uchicago.edu>
	<Pine.LNX.4.64.0705171703300.20212@dildano.hawaga.org.uk>
Message-ID: <1608928822-1179422001-cardhu_blackberry.rim.net-20619473-@bwe035-cell00.bisx.prod.on.blackberry>

Indeed .... 

Sent via BlackBerry from T-Mobile  

-----Original Message-----
From: Ben Clifford <benc at hawaga.org.uk>
Date: Thu, 17 May 2007 17:04:05 
To:Ioan Raicu <iraicu at cs.uchicago.edu>
Cc:swift-devel at ci.uchicago.edu, borja at borjanet.com, itf at mcs.anl.gov
Subject: Re: [Swift-devel] swift-on-ec2


Management of remote virtual machine start-and-config on EC2 strikes me as 
being almost entirely out of scope for both swift and falkon...

-- 

_______________________________________________
Swift-devel mailing list
Swift-devel at ci.uchicago.edu
http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel


From hategan at mcs.anl.gov  Thu May 17 16:27:51 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 18 May 2007 00:27:51 +0300
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <fec1351f0705170929m5feac6c3w90077796c789d7ff@mail.gmail.com>
References: <4649D280.5080906@mcs.anl.gov>
	<356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>
	<464B1402.9040405@mcs.anl.gov>
	<1583680979-1179336692-cardhu_blackberry.rim.net-2135204572-@bwe032-cell00.bisx.prod.on.blackberry>
	<464B4A6E.2040804@mcs.anl.gov> <464B6746.7050907@mcs.anl.gov>
	<464C69D6.70909@mcs.anl.gov> <464C7E68.1030400@cs.uchicago.edu>
	<20070517112449.3a856f70.tfreeman@mcs.anl.gov>
	<20070517112637.70ae6c9f.tfreeman@mcs.anl.gov>
	<fec1351f0705170929m5feac6c3w90077796c789d7ff@mail.gmail.com>
Message-ID: <1179437271.27959.14.camel@blabla.mcs.anl.gov>

On Thu, 2007-05-17 at 11:29 -0500, Tiberiu Stef-Praun wrote:
> Since there are dependencies in setting up the Falcon-enabled cluster
> (essentially passing the IP of the headnode to the workers), maybe we
> can have a Swift workflow that start up the whole EC2 grid shebang

We're running into that "workflow" might be a "program" issue (or the
other way around - i get confused).

Yes. It would make sense to deal with parallelism/concurrency/RPC in a
system suitable for those kinds of things.

Mihael

> 
> Tibi
> 
> On 5/17/07, Tim Freeman <tfreeman at mcs.anl.gov> wrote:
> > On Thu, 17 May 2007 11:24:49 -0500
> > Tim Freeman <tfreeman at mcs.anl.gov> wrote:
> >
> > >
> > > Through those EC2-specific mechanisms you can push per VM instance deployment
> >
> > s/per VM instance deployment/per-VM-instance deployment information/
> >
> > Tim
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> 
> 


From hategan at mcs.anl.gov  Thu May 17 16:31:03 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 18 May 2007 00:31:03 +0300
Subject: [Swift-devel] swift-on-ec2
In-Reply-To: <464C87FE.5050006@cs.uchicago.edu>
References: <4649D280.5080906@mcs.anl.gov>
	<356127187-1179301594-cardhu_blackberry.rim.net-179336256-@bwe047-cell00.bisx.prod.on.blackberry>
	<464B1402.9040405@mcs.anl.gov>
	<1583680979-1179336692-cardhu_blackberry.rim.net-2135204572-@bwe032-cell00.bisx.prod.on.blackberry>
	<464B4A6E.2040804@mcs.anl.gov> <464B6746.7050907@mcs.anl.gov>
	<464C69D6.70909@mcs.anl.gov> <464C7E68.1030400@cs.uchicago.edu>
	<20070517112449.3a856f70.tfreeman@mcs.anl.gov>
	<20070517112637.70ae6c9f.tfreeman@mcs.anl.gov>
	<fec1351f0705170929m5feac6c3w90077796c789d7ff@mail.gmail.com>
	<464C87FE.5050006@cs.uchicago.edu>
Message-ID: <1179437463.27959.18.camel@blabla.mcs.anl.gov>

On Thu, 2007-05-17 at 11:51 -0500, Ioan Raicu wrote:
> If Swift can do it (through an LRM presumably), then Falkon could do it 
> as well!

I think Falkon could do it by virtue of the fact that Java can
eventually do it if somebody writes the right bits of code.

Mihael

> This should either be done from Falkon, or from the workspace service 
> itself, but not from Swift...
> Ioan
> 
> Tiberiu Stef-Praun wrote:
> > Since there are dependencies in setting up the Falcon-enabled cluster
> > (essentially passing the IP of the headnode to the workers), maybe we
> > can have a Swift workflow that start up the whole EC2 grid shebang
> >
> > Tibi
> >
> > On 5/17/07, Tim Freeman <tfreeman at mcs.anl.gov> wrote:
> >> On Thu, 17 May 2007 11:24:49 -0500
> >> Tim Freeman <tfreeman at mcs.anl.gov> wrote:
> >>
> >> >
> >> > Through those EC2-specific mechanisms you can push per VM instance 
> >> deployment
> >>
> >> s/per VM instance deployment/per-VM-instance deployment information/
> >>
> >> Tim
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>
> >
> >
> 


From benc at hawaga.org.uk  Mon May 21 06:55:00 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Mon, 21 May 2007 11:55:00 +0000 (GMT)
Subject: [Swift-devel] multiple arguments
In-Reply-To: <Pine.LNX.4.58.0705021024440.25235@classes.cs.uchicago.edu>
References: <Pine.LNX.4.64.0705020906040.3117@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705020926570.25235@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705021500100.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705021024440.25235@classes.cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705211152510.20212@dildano.hawaga.org.uk>


r752 reintroduces (in a different, equally unspecified manner) support for 
[*], at least to the extent that I have seen it used.

I opened bug 61 to track the fact that this is not properly specified in 
the language in terms of the data model / type system and (perhaps as a 
consequence) messily implemented.

On Wed, 2 May 2007, Yong Zhao wrote:

> That's strange. I used @filenames a lot a while ago and never had any
> problems. Check the kml translation, maybe you added the getfieldvalue
> stuff to getFilenames, which should not happen. i.e.
> 
> It needs to be
> 	<vdl:getFilenames var="{sliced}">
> 		<argument name="path"> ....</...>
> 	</...>
> 
> not
> 	<vdl:getFilenames><vdl:getFieldvalue ....>
> 
> 
> Yong.
> 
> On Wed, 2 May 2007, Ben Clifford wrote:
> 
> >
> >
> > On Wed, 2 May 2007, Yong Zhao wrote:
> >
> > > use @filenames(sliced[*].img).
> >
> > I get this:
> >
> > Execution failed:
> >         org.griphyn.vdl.mapping.InvalidPathException: Invalid path (*.img)
> > for type volume
> >
> >
> > I tried something a little simpler:
> >
> >
> > type file;
> >
> > (file out) echo(file n[])
> > {
> >   app {
> >     echo @filenames(n) stdout=out;
> >   }
> > }
> >
> >
> > file f[] <fixed_array_mapper;files="a b c">;
> >
> > file out;
> >
> > out=echo(f);
> >
> >
> > but that hangs...
> >
> > oof.
> >
> > --
> >
> 
> 


From dvezendla.savithri at gmail.com  Tue May 22 10:08:27 2007
From: dvezendla.savithri at gmail.com (DVezendla)
Date: Tue, 22 May 2007 11:08:27 -0400
Subject: [Swift-devel] New to Swift
Message-ID: <4262c2820705220808m26a5774cjd25f817907ea0e00@mail.gmail.com>

Hi there,
I am new to Swift Scripting language.
Please help me how to start and proceed.


Thanks & Regards,
--DVezendla
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070522/27c5432d/attachment.html>

From benc at hawaga.org.uk  Tue May 22 10:38:10 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 22 May 2007 15:38:10 +0000 (GMT)
Subject: [Swift-devel] swift + gram2 + condor
Message-ID: <Pine.LNX.4.64.0705221535540.20212@dildano.hawaga.org.uk>


I just started playing with swift submitting through gram2 to a condor 
installation, as that is my preferred queueing system for training 
systems. wrapper.sh seems to go awry, though, reporting strange errors 
where it seems to be interpreting parameters out of place. Perhaps a 
quoting problem.

This seems vaguely familiar - I think maybe I tried it before and gave up.

Has anyone else used swift->gram2->condor successfully?

-- 


From benc at hawaga.org.uk  Tue May 22 10:40:01 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 22 May 2007 15:40:01 +0000 (GMT)
Subject: [Swift-devel] New to Swift
In-Reply-To: <4262c2820705220808m26a5774cjd25f817907ea0e00@mail.gmail.com>
References: <4262c2820705220808m26a5774cjd25f817907ea0e00@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0705221538200.20212@dildano.hawaga.org.uk>


On Tue, 22 May 2007, DVezendla wrote:

> Hi there,
> I am new to Swift Scripting language.
> Please help me how to start and proceed.

Hi. There is a quickstart guide at 
http://www.ci.uchicago.edu/swift/guides/quickstartguide.php which should 
talk you through getting a simple hello world program running, and then 
other documentation (the beginnings of a tutorial, and the user guide) at 
http://www.ci.uchicago.edu/swift/guides/

-- 


From yongzh at cs.uchicago.edu  Tue May 22 10:50:27 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Tue, 22 May 2007 10:50:27 -0500 (CDT)
Subject: [Swift-devel] swift + gram2 + condor
In-Reply-To: <Pine.LNX.4.64.0705221535540.20212@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705221535540.20212@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.58.0705221047430.18474@classes.cs.uchicago.edu>

I've had some success using gram + condor, but I think that was before we
introduced wrapper.sh.

Condor does have quoting problem, I do not remember exactly how we dealt
with that in VDS1.

Yong.

On Tue, 22 May 2007, Ben Clifford wrote:

>
> I just started playing with swift submitting through gram2 to a condor
> installation, as that is my preferred queueing system for training
> systems. wrapper.sh seems to go awry, though, reporting strange errors
> where it seems to be interpreting parameters out of place. Perhaps a
> quoting problem.
>
> This seems vaguely familiar - I think maybe I tried it before and gave up.
>
> Has anyone else used swift->gram2->condor successfully?
>
> --
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From wilde at mcs.anl.gov  Tue May 22 13:00:20 2007
From: wilde at mcs.anl.gov (Mike Wilde)
Date: Tue, 22 May 2007 13:00:20 -0500
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
In-Reply-To: <685C1420-03DE-4F2E-BDC7-A8A2C5636154@mcs.anl.gov>
References: <685C1420-03DE-4F2E-BDC7-A8A2C5636154@mcs.anl.gov>
Message-ID: <46532FB4.5070707@mcs.anl.gov>

Stu, sorry - I missed this message until you mentioned it to me just 
now.

Thinking about it, I'd like to have Ben and Mihael involved as well 
as all the local Swift and Falkon people. Ben will be arriving this 
Thu I think but Im not sure what time. Mihael is working from 
Romania through late June, and can join I hope via skype or telecon. 
(Im looking for a good Skype speakerphone).

Since the people you mentioned are mostly from within the DSL, eg 
Joe Bester, perhaps we can schedule this meeting by email for a date 
  around June 12-14, the last few days before Ben heads back. June 
13 I think is best for me.

Does anyone see a pressing reason to do this meeting earlier?

- Mike


Stuart Martin wrote, On 5/20/2007 10:51 PM:
> Hi Mike,
> 
> Will you are swift folks be at the committers all hands meeting?  Does 
> it make sense to sync up on plans for GRAM and Swift?  We could have 
> this as a GRAM meeting on Thursday the 24th?  We could also invite the 
> GridWay guys.  What do you think?
> 
> -Stu
> 
> 

-- 
Mike Wilde
Computation Institute, University of Chicago
Math & Computer Science Division
Argonne National Laboratory
Argonne, IL   60439    USA
tel 630-252-7497 fax 630-252-1997


From benc at hawaga.org.uk  Tue May 22 13:10:03 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 22 May 2007 18:10:03 +0000 (GMT)
Subject: [Swift-devel] swift + gram2 + condor
In-Reply-To: <Pine.LNX.4.58.0705221047430.18474@classes.cs.uchicago.edu>
References: <Pine.LNX.4.64.0705221535540.20212@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705221047430.18474@classes.cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705221808510.22628@dildano.hawaga.org.uk>


On Tue, 22 May 2007, Yong Zhao wrote:

> Condor does have quoting problem, I do not remember exactly how we dealt
> with that in VDS1.

I think really its GRAM that has the problem, not Condor - GRAM is meant 
to abstract away stuff like this.

-- 


From benc at hawaga.org.uk  Tue May 22 13:10:44 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 22 May 2007 18:10:44 +0000 (GMT)
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
In-Reply-To: <46532FB4.5070707@mcs.anl.gov>
References: <685C1420-03DE-4F2E-BDC7-A8A2C5636154@mcs.anl.gov>
	<46532FB4.5070707@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705221810190.20212@dildano.hawaga.org.uk>


On Tue, 22 May 2007, Mike Wilde wrote:

> Since the people you mentioned are mostly from within the DSL, eg Joe Bester,
> perhaps we can schedule this meeting by email for a date  around June 12-14,
> the last few days before Ben heads back. June 13 I think is best for me.
> 
> Does anyone see a pressing reason to do this meeting earlier?

12th-14th is best for me - I'll have my mind on many other things until 
then.

-- 


From wilde at mcs.anl.gov  Tue May 22 13:17:31 2007
From: wilde at mcs.anl.gov (Mike Wilde)
Date: Tue, 22 May 2007 13:17:31 -0500
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
In-Reply-To: <Pine.LNX.4.64.0705221810190.20212@dildano.hawaga.org.uk>
References: <685C1420-03DE-4F2E-BDC7-A8A2C5636154@mcs.anl.gov>
	<46532FB4.5070707@mcs.anl.gov>
	<Pine.LNX.4.64.0705221810190.20212@dildano.hawaga.org.uk>
Message-ID: <465333BB.2070600@mcs.anl.gov>

So lets shoot for Jun 13 then and see if that works for everyone 
whose interested.

- Mike

Ben Clifford wrote, On 5/22/2007 1:10 PM:
> 
> On Tue, 22 May 2007, Mike Wilde wrote:
> 
>> Since the people you mentioned are mostly from within the DSL, eg Joe Bester,
>> perhaps we can schedule this meeting by email for a date  around June 12-14,
>> the last few days before Ben heads back. June 13 I think is best for me.
>>
>> Does anyone see a pressing reason to do this meeting earlier?
> 
> 12th-14th is best for me - I'll have my mind on many other things until 
> then.
> 

-- 
Mike Wilde
Computation Institute, University of Chicago
Math & Computer Science Division
Argonne National Laboratory
Argonne, IL   60439    USA
tel 630-252-7497 fax 630-252-1997


From tiberius at ci.uchicago.edu  Tue May 22 13:26:07 2007
From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun)
Date: Tue, 22 May 2007 13:26:07 -0500
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
In-Reply-To: <465333BB.2070600@mcs.anl.gov>
References: <685C1420-03DE-4F2E-BDC7-A8A2C5636154@mcs.anl.gov>
	<46532FB4.5070707@mcs.anl.gov>
	<Pine.LNX.4.64.0705221810190.20212@dildano.hawaga.org.uk>
	<465333BB.2070600@mcs.anl.gov>
Message-ID: <fec1351f0705221126n186f74d4pdbcad3eeb94bd875@mail.gmail.com>

I seem to available on that date.
Can I attend as well ?

Tibi

On 5/22/07, Mike Wilde <wilde at mcs.anl.gov> wrote:
> So lets shoot for Jun 13 then and see if that works for everyone
> whose interested.
>
> - Mike
>
> Ben Clifford wrote, On 5/22/2007 1:10 PM:
> >
> > On Tue, 22 May 2007, Mike Wilde wrote:
> >
> >> Since the people you mentioned are mostly from within the DSL, eg Joe Bester,
> >> perhaps we can schedule this meeting by email for a date  around June 12-14,
> >> the last few days before Ben heads back. June 13 I think is best for me.
> >>
> >> Does anyone see a pressing reason to do this meeting earlier?
> >
> > 12th-14th is best for me - I'll have my mind on many other things until
> > then.
> >
>
> --
> Mike Wilde
> Computation Institute, University of Chicago
> Math & Computer Science Division
> Argonne National Laboratory
> Argonne, IL   60439    USA
> tel 630-252-7497 fax 630-252-1997
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


-- 
Tiberiu (Tibi) Stef-Praun, PhD
Research Staff, Computation Institute
5640 S. Ellis Ave, #405
University of Chicago
http://www-unix.mcs.anl.gov/~tiberius/


From foster at mcs.anl.gov  Tue May 22 13:26:25 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Tue, 22 May 2007 13:26:25 -0500
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
In-Reply-To: <46532FB4.5070707@mcs.anl.gov>
References: <685C1420-03DE-4F2E-BDC7-A8A2C5636154@mcs.anl.gov>
	<46532FB4.5070707@mcs.anl.gov>
Message-ID: <465335D1.2040306@mcs.anl.gov>

It would be interesting to hear what issues are of interest on each side.

Are there WS-GRAM issues that are causing problems for Swift?

Is advance reservation important for Swift?

Swift is increasingly using Falkon to handle submissions, which reduces 
the number of GRAM operations performed significantly.

Ian.


Mike Wilde wrote:
> Stu, sorry - I missed this message until you mentioned it to me just now.
>
> Thinking about it, I'd like to have Ben and Mihael involved as well as 
> all the local Swift and Falkon people. Ben will be arriving this Thu I 
> think but Im not sure what time. Mihael is working from Romania 
> through late June, and can join I hope via skype or telecon. (Im 
> looking for a good Skype speakerphone).
>
> Since the people you mentioned are mostly from within the DSL, eg Joe 
> Bester, perhaps we can schedule this meeting by email for a date 
>  around June 12-14, the last few days before Ben heads back. June 13 I 
> think is best for me.
>
> Does anyone see a pressing reason to do this meeting earlier?
>
> - Mike
>
>
> Stuart Martin wrote, On 5/20/2007 10:51 PM:
>> Hi Mike,
>>
>> Will you are swift folks be at the committers all hands meeting?  
>> Does it make sense to sync up on plans for GRAM and Swift?  We could 
>> have this as a GRAM meeting on Thursday the 24th?  We could also 
>> invite the GridWay guys.  What do you think?
>>
>> -Stu
>>
>>
>

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.


From benc at hawaga.org.uk  Tue May 22 13:43:15 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 22 May 2007 18:43:15 +0000 (GMT)
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
In-Reply-To: <465335D1.2040306@mcs.anl.gov>
References: <685C1420-03DE-4F2E-BDC7-A8A2C5636154@mcs.anl.gov>
	<46532FB4.5070707@mcs.anl.gov> <465335D1.2040306@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705221837010.20212@dildano.hawaga.org.uk>


On Tue, 22 May 2007, Ian Foster wrote:

> Are there WS-GRAM issues that are causing problems for Swift?

No one uses WS-GRAM with Swift, so we aren't really uncovering issus 
there.


> Is advance reservation important for Swift?

We haven't really talked about that. I'm not sure how it would fit in, but 
if people want it, it would be nice to accomodate it somehow.


> Swift is increasingly using Falkon to handle submissions, which reduces 
> the number of GRAM operations performed significantly.

At the high/experimental end, yes. However, if we have any expectation of 
people downloading and using it by themselves without us providing 
professional services-style consultancy, then those users won't be going 
anywhere near Falkon any time soon.

--


From yongzh at cs.uchicago.edu  Tue May 22 14:05:49 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Tue, 22 May 2007 14:05:49 -0500 (CDT)
Subject: [Swift-devel] swift + gram2 + condor
In-Reply-To: <Pine.LNX.4.64.0705221808510.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705221535540.20212@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705221047430.18474@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705221808510.22628@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.58.0705221405240.26170@classes.cs.uchicago.edu>

yep, maybe we can talk with people that developed the interface between
gram and condor?

Yong.

On Tue, 22 May 2007, Ben Clifford wrote:

>
>
> On Tue, 22 May 2007, Yong Zhao wrote:
>
> > Condor does have quoting problem, I do not remember exactly how we dealt
> > with that in VDS1.
>
> I think really its GRAM that has the problem, not Condor - GRAM is meant
> to abstract away stuff like this.
>
> --
>


From yongzh at cs.uchicago.edu  Tue May 22 14:09:35 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Tue, 22 May 2007 14:09:35 -0500 (CDT)
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
In-Reply-To: <Pine.LNX.4.64.0705221837010.20212@dildano.hawaga.org.uk>
References: <685C1420-03DE-4F2E-BDC7-A8A2C5636154@mcs.anl.gov>
	<46532FB4.5070707@mcs.anl.gov> <465335D1.2040306@mcs.anl.gov>
	<Pine.LNX.4.64.0705221837010.20212@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.58.0705221408220.26170@classes.cs.uchicago.edu>

I used WS_GRAM a while ago with Swift, I did not encounter any specific
WS-GRAM problem then.

Yong.

On Tue, 22 May 2007, Ben Clifford wrote:

>
> On Tue, 22 May 2007, Ian Foster wrote:
>
> > Are there WS-GRAM issues that are causing problems for Swift?
>
> No one uses WS-GRAM with Swift, so we aren't really uncovering issus
> there.
>
>
> > Is advance reservation important for Swift?
>
> We haven't really talked about that. I'm not sure how it would fit in, but
> if people want it, it would be nice to accomodate it somehow.
>
>
> > Swift is increasingly using Falkon to handle submissions, which reduces
> > the number of GRAM operations performed significantly.
>
> At the high/experimental end, yes. However, if we have any expectation of
> people downloading and using it by themselves without us providing
> professional services-style consultancy, then those users won't be going
> anywhere near Falkon any time soon.
>
> --
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From smartin at mcs.anl.gov  Tue May 22 14:10:48 2007
From: smartin at mcs.anl.gov (Stuart Martin)
Date: Tue, 22 May 2007 14:10:48 -0500
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
In-Reply-To: <Pine.LNX.4.64.0705221837010.20212@dildano.hawaga.org.uk>
References: <685C1420-03DE-4F2E-BDC7-A8A2C5636154@mcs.anl.gov>
	<46532FB4.5070707@mcs.anl.gov> <465335D1.2040306@mcs.anl.gov>
	<Pine.LNX.4.64.0705221837010.20212@dildano.hawaga.org.uk>
Message-ID: <DA40B2E2-8A00-43E0-B76F-0EB9A16EFAF6@mcs.anl.gov>

On May 22, 2007, at May 22, 1:43 PM, Ben Clifford wrote:
>
> On Tue, 22 May 2007, Ian Foster wrote:
>
>> Are there WS-GRAM issues that are causing problems for Swift?
>
> No one uses WS-GRAM with Swift, so we aren't really uncovering issus
> there.

Why not?  What are you using?  GRAM2?  local executions?  Other  
services?

>
>
>> Is advance reservation important for Swift?
>
> We haven't really talked about that. I'm not sure how it would fit  
> in, but
> if people want it, it would be nice to accomodate it somehow.
>
>
>> Swift is increasingly using Falkon to handle submissions, which  
>> reduces
>> the number of GRAM operations performed significantly.
>
> At the high/experimental end, yes. However, if we have any  
> expectation of
> people downloading and using it by themselves without us providing
> professional services-style consultancy, then those users won't be  
> going
> anywhere near Falkon any time soon.
>
> --
>


From tiberius at ci.uchicago.edu  Tue May 22 14:17:18 2007
From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun)
Date: Tue, 22 May 2007 14:17:18 -0500
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
In-Reply-To: <DA40B2E2-8A00-43E0-B76F-0EB9A16EFAF6@mcs.anl.gov>
References: <685C1420-03DE-4F2E-BDC7-A8A2C5636154@mcs.anl.gov>
	<46532FB4.5070707@mcs.anl.gov> <465335D1.2040306@mcs.anl.gov>
	<Pine.LNX.4.64.0705221837010.20212@dildano.hawaga.org.uk>
	<DA40B2E2-8A00-43E0-B76F-0EB9A16EFAF6@mcs.anl.gov>
Message-ID: <fec1351f0705221217q3882537v8868117e3a8692e1@mail.gmail.com>

I might have used ws-gram at the TACC site.
I think it was quite a while ago, so I am not 100% that I actually used ws-gram.

Tibi

On 5/22/07, Stuart Martin <smartin at mcs.anl.gov> wrote:
> On May 22, 2007, at May 22, 1:43 PM, Ben Clifford wrote:
> >
> > On Tue, 22 May 2007, Ian Foster wrote:
> >
> >> Are there WS-GRAM issues that are causing problems for Swift?
> >
> > No one uses WS-GRAM with Swift, so we aren't really uncovering issus
> > there.
>
> Why not?  What are you using?  GRAM2?  local executions?  Other
> services?
>
> >
> >
> >> Is advance reservation important for Swift?
> >
> > We haven't really talked about that. I'm not sure how it would fit
> > in, but
> > if people want it, it would be nice to accomodate it somehow.
> >
> >
> >> Swift is increasingly using Falkon to handle submissions, which
> >> reduces
> >> the number of GRAM operations performed significantly.
> >
> > At the high/experimental end, yes. However, if we have any
> > expectation of
> > people downloading and using it by themselves without us providing
> > professional services-style consultancy, then those users won't be
> > going
> > anywhere near Falkon any time soon.
> >
> > --
> >
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


-- 
Tiberiu (Tibi) Stef-Praun, PhD
Research Staff, Computation Institute
5640 S. Ellis Ave, #405
University of Chicago
http://www-unix.mcs.anl.gov/~tiberius/


From benc at hawaga.org.uk  Tue May 22 14:22:32 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 22 May 2007 19:22:32 +0000 (GMT)
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
In-Reply-To: <DA40B2E2-8A00-43E0-B76F-0EB9A16EFAF6@mcs.anl.gov>
References: <685C1420-03DE-4F2E-BDC7-A8A2C5636154@mcs.anl.gov>
	<46532FB4.5070707@mcs.anl.gov> <465335D1.2040306@mcs.anl.gov>
	<Pine.LNX.4.64.0705221837010.20212@dildano.hawaga.org.uk>
	<DA40B2E2-8A00-43E0-B76F-0EB9A16EFAF6@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705221910500.22628@dildano.hawaga.org.uk>


On Tue, 22 May 2007, Stuart Martin wrote:

> On May 22, 2007, at May 22, 1:43 PM, Ben Clifford wrote:
> > 
> > On Tue, 22 May 2007, Ian Foster wrote:
> > 
> > > Are there WS-GRAM issues that are causing problems for Swift?
> > 
> > No one uses WS-GRAM with Swift, so we aren't really uncovering issus
> > there.
> 

> Why not?  What are you using?  GRAM2?  local executions?  Other services?

for the high end stuff, Swift submits jobs to Falkon. Falkon, I think, 
uses WS-GRAM to start up its own workers, but that startup part of Falkon 
not Swift.

For low end stuff, the two providers that I think people use much are 
local exec and GRAM2.

Local exec is not in the space that GRAM is addressing, so ignore.

The GRAM2 vs GRAM4 question pretty much comes down to the fact that people 
in production (at least as far as I encounter them) tend to use GRAM2 
rather than GRAM4 and so Swift tends to get used that way too - there's no 
real motivation to push people to use a different submission system than 
what they're used to, and one thing we decided within our group is that we 
would concentrate on being very application focused (after we had spent 
rather a long time pontificating and debating). GRAM2 -> GRAM4 doesn't 
provide enough incentive (in the way that a GRAM2 -> Falkon change does) 
for our actual apps (for example that Tibi and Nika work on).

At some point, perhaps, GRAM2 will decay or GRAM4 will become tantalising, 
at which point it would be in the interests of being app-focused to shift. 
Or we might change our priorities to be less app focused.

-- 


From iraicu at cs.uchicago.edu  Tue May 22 14:34:07 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Tue, 22 May 2007 14:34:07 -0500
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
In-Reply-To: <Pine.LNX.4.64.0705221837010.20212@dildano.hawaga.org.uk>
References: <685C1420-03DE-4F2E-BDC7-A8A2C5636154@mcs.anl.gov>	<46532FB4.5070707@mcs.anl.gov>
	<465335D1.2040306@mcs.anl.gov>
	<Pine.LNX.4.64.0705221837010.20212@dildano.hawaga.org.uk>
Message-ID: <465345AF.3010201@cs.uchicago.edu>

See below:

Ben Clifford wrote:
> On Tue, 22 May 2007, Ian Foster wrote:
>
>   
>> Are there WS-GRAM issues that are causing problems for Swift?
>>     
>
> No one uses WS-GRAM with Swift, so we aren't really uncovering issus 
> there.
>
>
>   
>> Is advance reservation important for Swift?
>>     
>
> We haven't really talked about that. I'm not sure how it would fit in, but 
> if people want it, it would be nice to accomodate it somehow.
>
>
>   
>> Swift is increasingly using Falkon to handle submissions, which reduces 
>> the number of GRAM operations performed significantly.
>>     
>
> At the high/experimental end, yes. However, if we have any expectation of 
> people downloading and using it by themselves without us providing 
> professional services-style consultancy, then those users won't be going 
> anywhere near Falkon any time soon.
>   
We have learned quite a bit about setting up Falkon at different sites 
across the TG.  The caveats that we have to watch out for are:

   1. platform specific JVM location, this is not set correctly in the
      remote machine's environment, and is different from site to site;
      this remains as an issue that needs to be addressed per site
   2. some sites require the project be explicitly specified; this has
      been fixed
   3. expired credentials errors don't get propagated to the user's
      screen, they are simply written to logs...
   4. some sites (ANL) support GRAM4 extensions, while other sites do
      not; we now support both RSL formats
   5. the many logs that we generate are quite hard for people to
      follow, and keep track of what each one contains; we fixed this by
      developing a GUI that  can connect to the GT4 container remotely
      and display relevant information!
   6. TG machines have an old kernel that do not support changing the
      thread stack size
          * this has implications on the number of threads a JVM can
            create before running out of memory
          * we have observed that we can create about 100~200 threads
            per JVM on most TG nodes
          * the GT4 container operates on a pool of threads for
            everything it does, so the max number of threads it will
            create is bounded!
          * the provisioner currently creates a new thread for every job
            (resource allocation) it sends to GRAM4
                o depending on which allocation strategy is used, this
                  might/might not be a problem on TG nodes
                o in theory, we don't want more than 100 or so GRAM4
                  jobs in parallel running, but  if we choose the policy
                  in which each job allocates a single machine, then we
                  can easily surpass 100 jobs in parallel... all the
                  other policies, would be able to allocate 1K+, even
                  10K+ machines with less than 100 jobs in parallel, so
                  it could work perfectly fine even with the current
                  implementation; in the long run, this might be able to
                  be changed to a pool of threads in the provisioner!

The things that I believe are needed for it be more friendly to 
new/existing users outside of the core developers:

   1. A suite of tests that will ensure everything is set correctly,
      before using Falkon
          * we could check against grid-proxy-info in a script
          * make sure GRAM4 works at the particular site by using
            globusrun-ws
          * check the JAVA_HOME and java commands from within a GRAM4
            submitted job
          * check if ANT is installed; this is needed to recompile the
            Falkon service
   2. get more of the Falkon configuration parameters into config files,
      rather than scripts or code!
   3. clean up the scripts, and make them more robust and user friendly
   4. make an interface into the provisioning component and Falkon to
      allow the live configuration of Falkon without requiring restarts
   5. Documentation well beyond the current 1 page readme that is only
      sufficient if everything works!
   6. There is no documentation on how to set up the needed security if
      a user wants to enable security in Falkon; the default is no security

Maybe there are others that I missed, but I don't think we are that far 
from people being able to use it without us taking them by the hand the 
entire way.  The things that would be good to do are not on the top of 
my things to do list, but in time, I'll get them done.  If anyone wants 
to help with these, I would  not refuse anyone's help.

Ioan
> --
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From smartin at mcs.anl.gov  Tue May 22 14:41:05 2007
From: smartin at mcs.anl.gov (Stuart Martin)
Date: Tue, 22 May 2007 14:41:05 -0500
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
In-Reply-To: <Pine.LNX.4.64.0705221910500.22628@dildano.hawaga.org.uk>
References: <685C1420-03DE-4F2E-BDC7-A8A2C5636154@mcs.anl.gov>
	<46532FB4.5070707@mcs.anl.gov> <465335D1.2040306@mcs.anl.gov>
	<Pine.LNX.4.64.0705221837010.20212@dildano.hawaga.org.uk>
	<DA40B2E2-8A00-43E0-B76F-0EB9A16EFAF6@mcs.anl.gov>
	<Pine.LNX.4.64.0705221910500.22628@dildano.hawaga.org.uk>
Message-ID: <EBCA77FC-971B-4537-8FB8-EF17032D2DD3@mcs.anl.gov>


On May 22, 2007, at May 22, 2:22 PM, Ben Clifford wrote:

>
>
> On Tue, 22 May 2007, Stuart Martin wrote:
>
>> On May 22, 2007, at May 22, 1:43 PM, Ben Clifford wrote:
>>>
>>> On Tue, 22 May 2007, Ian Foster wrote:
>>>
>>>> Are there WS-GRAM issues that are causing problems for Swift?
>>>
>>> No one uses WS-GRAM with Swift, so we aren't really uncovering issus
>>> there.
>>
>
>> Why not?  What are you using?  GRAM2?  local executions?  Other  
>> services?
>
> for the high end stuff, Swift submits jobs to Falkon. Falkon, I think,
> uses WS-GRAM to start up its own workers, but that startup part of  
> Falkon
> not Swift.
>
> For low end stuff, the two providers that I think people use much are
> local exec and GRAM2.
>
> Local exec is not in the space that GRAM is addressing, so ignore.

Agreed.  Just trying to learn what people are doing.

>
> The GRAM2 vs GRAM4 question pretty much comes down to the fact that  
> people
> in production (at least as far as I encounter them) tend to use GRAM2
> rather than GRAM4 and so Swift tends to get used that way too -  
> there's no
> real motivation to push people to use a different submission system  
> than
> what they're used to, and one thing we decided within our group is  
> that we
> would concentrate on being very application focused (after we had  
> spent
> rather a long time pontificating and debating). GRAM2 -> GRAM4 doesn't
> provide enough incentive (in the way that a GRAM2 -> Falkon change  
> does)
> for our actual apps (for example that Tibi and Nika work on).

Fair enough.  GRAM4 is deployed on most of TG and OSG now.  It would  
be good to push jobs to GRAM4 when reasonable/possible.  The apps  
folks should not care which service is used.  It should be hidden by  
Swift.  Or are the apps folks your working with also dictating what  
GRAM service is deployed/used?

>
> At some point, perhaps, GRAM2 will decay or GRAM4 will become  
> tantalising,
> at which point it would be in the interests of being app-focused to  
> shift.
> Or we might change our priorities to be less app focused.

Some are quite happy with GRAM4 in 4.0.3.  We're improving things  
right now to make GRAM4 outperform GRAM2 in most all the important  
benchmarks.  This should be in 4.0.5.  I think things at that point  
become "tantalizing".

>
> -- 
>


From yongzh at cs.uchicago.edu  Tue May 22 14:51:22 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Tue, 22 May 2007 14:51:22 -0500 (CDT)
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
In-Reply-To: <EBCA77FC-971B-4537-8FB8-EF17032D2DD3@mcs.anl.gov>
References: <685C1420-03DE-4F2E-BDC7-A8A2C5636154@mcs.anl.gov>
	<46532FB4.5070707@mcs.anl.gov> <465335D1.2040306@mcs.anl.gov>
	<Pine.LNX.4.64.0705221837010.20212@dildano.hawaga.org.uk>
	<DA40B2E2-8A00-43E0-B76F-0EB9A16EFAF6@mcs.anl.gov>
	<Pine.LNX.4.64.0705221910500.22628@dildano.hawaga.org.uk>
	<EBCA77FC-971B-4537-8FB8-EF17032D2DD3@mcs.anl.gov>
Message-ID: <Pine.LNX.4.58.0705221449200.26170@classes.cs.uchicago.edu>

Swift does hide which provider the app uses, say, local, gt2, gt4, falkon.
I think the major reasons they are not using WS-GRAM are:

- WS_GRAM not configured
- WS_GRAM slower than GT2

But as you've pointed out, as things improve, we should shift to WS_GRAM
gradually.

Yong.

On Tue, 22 May 2007, Stuart Martin wrote:

>
> On May 22, 2007, at May 22, 2:22 PM, Ben Clifford wrote:
>
> >
> >
> > On Tue, 22 May 2007, Stuart Martin wrote:
> >
> >> On May 22, 2007, at May 22, 1:43 PM, Ben Clifford wrote:
> >>>
> >>> On Tue, 22 May 2007, Ian Foster wrote:
> >>>
> >>>> Are there WS-GRAM issues that are causing problems for Swift?
> >>>
> >>> No one uses WS-GRAM with Swift, so we aren't really uncovering issus
> >>> there.
> >>
> >
> >> Why not?  What are you using?  GRAM2?  local executions?  Other
> >> services?
> >
> > for the high end stuff, Swift submits jobs to Falkon. Falkon, I think,
> > uses WS-GRAM to start up its own workers, but that startup part of
> > Falkon
> > not Swift.
> >
> > For low end stuff, the two providers that I think people use much are
> > local exec and GRAM2.
> >
> > Local exec is not in the space that GRAM is addressing, so ignore.
>
> Agreed.  Just trying to learn what people are doing.
>
> >
> > The GRAM2 vs GRAM4 question pretty much comes down to the fact that
> > people
> > in production (at least as far as I encounter them) tend to use GRAM2
> > rather than GRAM4 and so Swift tends to get used that way too -
> > there's no
> > real motivation to push people to use a different submission system
> > than
> > what they're used to, and one thing we decided within our group is
> > that we
> > would concentrate on being very application focused (after we had
> > spent
> > rather a long time pontificating and debating). GRAM2 -> GRAM4 doesn't
> > provide enough incentive (in the way that a GRAM2 -> Falkon change
> > does)
> > for our actual apps (for example that Tibi and Nika work on).
>
> Fair enough.  GRAM4 is deployed on most of TG and OSG now.  It would
> be good to push jobs to GRAM4 when reasonable/possible.  The apps
> folks should not care which service is used.  It should be hidden by
> Swift.  Or are the apps folks your working with also dictating what
> GRAM service is deployed/used?
>
> >
> > At some point, perhaps, GRAM2 will decay or GRAM4 will become
> > tantalising,
> > at which point it would be in the interests of being app-focused to
> > shift.
> > Or we might change our priorities to be less app focused.
>
> Some are quite happy with GRAM4 in 4.0.3.  We're improving things
> right now to make GRAM4 outperform GRAM2 in most all the important
> benchmarks.  This should be in 4.0.5.  I think things at that point
> become "tantalizing".
>
> >
> > --
> >
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From tfreeman at mcs.anl.gov  Tue May 22 22:10:59 2007
From: tfreeman at mcs.anl.gov (Tim Freeman)
Date: Tue, 22 May 2007 22:10:59 -0500
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
In-Reply-To: <465345AF.3010201@cs.uchicago.edu>
References: <685C1420-03DE-4F2E-BDC7-A8A2C5636154@mcs.anl.gov>
	<46532FB4.5070707@mcs.anl.gov> <465335D1.2040306@mcs.anl.gov>
	<Pine.LNX.4.64.0705221837010.20212@dildano.hawaga.org.uk>
	<465345AF.3010201@cs.uchicago.edu>
Message-ID: <20070522221059.3e992405.tfreeman@mcs.anl.gov>

On Tue, 22 May 2007 14:34:07 -0500
Ioan Raicu <iraicu at cs.uchicago.edu> wrote:

> the many logs that we generate are quite hard for people to
>       follow, and keep track of what each one contains; we fixed this by
>       developing a GUI that  can connect to the GT4 container remotely
>       and display relevant information!

Is that something that could be used for other GT services?


>           * the provisioner currently creates a new thread for every job
>             (resource allocation) it sends to GRAM4
>                 o depending on which allocation strategy is used, this
>                   might/might not be a problem on TG nodes
>                 o in theory, we don't want more than 100 or so GRAM4
>                   jobs in parallel running, but  if we choose the policy
>                   in which each job allocates a single machine, then we
>                   can easily surpass 100 jobs in parallel... all the
>                   other policies, would be able to allocate 1K+, even
>                   10K+ machines with less than 100 jobs in parallel, so
>                   it could work perfectly fine even with the current
>                   implementation; in the long run, this might be able to
>                   be changed to a pool of threads in the provisioner!

Are these threads just waiting on notifications?  

If so: you should be able to reduce this to one thread by subscribing with the
same notification consumer EPR for each GRAM job and demuxing the result (demux
based on the producer EPR that is passed to the class implementing
NotifyCallback).  That way the thread that creates the GRAM job can disappear
once it is done with the create call. 

Tim 


From hategan at mcs.anl.gov  Wed May 23 04:10:42 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 23 May 2007 12:10:42 +0300
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
In-Reply-To: <20070522221059.3e992405.tfreeman@mcs.anl.gov> (from
	tfreeman@mcs.anl.gov on Wed May 23 06:10:59 2007)
References: <685C1420-03DE-4F2E-BDC7-A8A2C5636154@mcs.anl.gov>
	<46532FB4.5070707@mcs.anl.gov> <465335D1.2040306@mcs.anl.gov>
	<Pine.LNX.4.64.0705221837010.20212@dildano.hawaga.org.uk>
	<465345AF.3010201@cs.uchicago.edu>
	<20070522221059.3e992405.tfreeman@mcs.anl.gov>
Message-ID: <1179911442l.13147l.0l@blabla>

On 05/23/2007 06:10:59 AM, Tim Freeman wrote:
> On Tue, 22 May 2007 14:34:07 -0500
> Ioan Raicu <iraicu at cs.uchicago.edu> wrote:
> 
> > the many logs that we generate are quite hard for people to
> >       follow, and keep track of what each one contains; we fixed
> this by
> >       developing a GUI that  can connect to the GT4 container
> remotely
> >       and display relevant information!
> 
> Is that something that could be used for other GT services?
> 
> 
> >           * the provisioner currently creates a new thread for every
> job
> >             (resource allocation) it sends to GRAM4
> >                 o depending on which allocation strategy is used,
> this
> >                   might/might not be a problem on TG nodes
> >                 o in theory, we don't want more than 100 or so GRAM4
> >                   jobs in parallel running, but  if we choose the
> policy
> >                   in which each job allocates a single machine, then
> we
> >                   can easily surpass 100 jobs in parallel... all the
> >                   other policies, would be able to allocate 1K+,
> even
> >                   10K+ machines with less than 100 jobs in parallel,
> so
> >                   it could work perfectly fine even with the current
> >                   implementation; in the long run, this might be
> able to
> >                   be changed to a pool of threads in the
> provisioner!
> 
> Are these threads just waiting on notifications?
> 
> If so: you should be able to reduce this to one thread by subscribing
> with the
> same notification consumer EPR for each GRAM job and demuxing the
> result (demux
> based on the producer EPR that is passed to the class implementing
> NotifyCallback).  That way the thread that creates the GRAM job can
> disappear
> once it is done with the create call.

I'd recommend the CoG abstractions. They do exactly that, but hide all  
the details.

Mihael

> 
> Tim
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> 


From benc at hawaga.org.uk  Wed May 23 09:15:33 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 23 May 2007 14:15:33 +0000 (GMT)
Subject: [Swift-devel] swift after 2007-04-29
Message-ID: <Pine.LNX.4.64.0705231414530.22628@dildano.hawaga.org.uk>


Has anyone on this list used a swift source base more recent than 29th of 
april?

-- 


From benc at hawaga.org.uk  Wed May 23 10:19:03 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 23 May 2007 15:19:03 +0000 (GMT)
Subject: [Swift-devel] wiring swift and falkon together
Message-ID: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>


i hear rumour that its sufficiently unclear how to wire swift and falkon 
together that people are avoiding testing swift code (more recent than the 
8th of march build that Yong made)

that is lame - it means large chunks of our app testing are being done 
with code that is 2.5 months old.

I don't know how Falkon gets deployed alongside swift at the moment, so I 
don't know what to do to make this easier - are they written down 
anywhere?

-- 


From tiberius at ci.uchicago.edu  Wed May 23 10:57:08 2007
From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun)
Date: Wed, 23 May 2007 10:57:08 -0500
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
Message-ID: <fec1351f0705230857s2bb734d1td28b93a97dc841a3@mail.gmail.com>

It seems that Yong's Falkon provider is working (according to Nika),
so I was wondering when will it make it into the Swift ? At that point
it's more convenient for me to test it (as I would only have to handle
the Falkon backend configuration).

Tibi

On 5/23/07, Ben Clifford <benc at hawaga.org.uk> wrote:
>
> i hear rumour that its sufficiently unclear how to wire swift and falkon
> together that people are avoiding testing swift code (more recent than the
> 8th of march build that Yong made)
>
> that is lame - it means large chunks of our app testing are being done
> with code that is 2.5 months old.
>
> I don't know how Falkon gets deployed alongside swift at the moment, so I
> don't know what to do to make this easier - are they written down
> anywhere?
>
> --
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


-- 
Tiberiu (Tibi) Stef-Praun, PhD
Research Staff, Computation Institute
5640 S. Ellis Ave, #405
University of Chicago
http://www-unix.mcs.anl.gov/~tiberius/


From benc at hawaga.org.uk  Wed May 23 11:01:47 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 23 May 2007 16:01:47 +0000 (GMT)
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <fec1351f0705230857s2bb734d1td28b93a97dc841a3@mail.gmail.com>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<fec1351f0705230857s2bb734d1td28b93a97dc841a3@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0705231600340.22628@dildano.hawaga.org.uk>


On Wed, 23 May 2007, Tiberiu Stef-Praun wrote:

> It seems that Yong's Falkon provider is working (according to Nika),
> so I was wondering when will it make it into the Swift ? At that point
> it's more convenient for me to test it (as I would only have to handle
> the Falkon backend configuration).

Does it have build dependencies on Falkon code?

-- 


From wilde at mcs.anl.gov  Wed May 23 11:03:29 2007
From: wilde at mcs.anl.gov (Mike Wilde)
Date: Wed, 23 May 2007 11:03:29 -0500
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <fec1351f0705230857s2bb734d1td28b93a97dc841a3@mail.gmail.com>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<fec1351f0705230857s2bb734d1td28b93a97dc841a3@mail.gmail.com>
Message-ID: <465465D1.4060002@mcs.anl.gov>

Im in favor of asking Ioan and possibly Yong - to the extent he has 
time - to push forward on this, to specifications from Ben and 
Mihael, and based on usability feedback from Nika and Tibi who need 
to speak for users' needs. Ben's specs should also address code 
quality, testing/certification and maintainability.

- Mike


Tiberiu Stef-Praun wrote, On 5/23/2007 10:57 AM:
> It seems that Yong's Falkon provider is working (according to Nika),
> so I was wondering when will it make it into the Swift ? At that point
> it's more convenient for me to test it (as I would only have to handle
> the Falkon backend configuration).
> 
> Tibi
> 
> On 5/23/07, Ben Clifford <benc at hawaga.org.uk> wrote:
>>
>> i hear rumour that its sufficiently unclear how to wire swift and falkon
>> together that people are avoiding testing swift code (more recent than 
>> the
>> 8th of march build that Yong made)
>>
>> that is lame - it means large chunks of our app testing are being done
>> with code that is 2.5 months old.
>>
>> I don't know how Falkon gets deployed alongside swift at the moment, so I
>> don't know what to do to make this easier - are they written down
>> anywhere?
>>
>> -- 
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
> 
> 

-- 
Mike Wilde
Computation Institute, University of Chicago
Math & Computer Science Division
Argonne National Laboratory
Argonne, IL   60439    USA
tel 630-252-7497 fax 630-252-1997


From benc at hawaga.org.uk  Wed May 23 11:12:08 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 23 May 2007 16:12:08 +0000 (GMT)
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <465465D1.4060002@mcs.anl.gov>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<fec1351f0705230857s2bb734d1td28b93a97dc841a3@mail.gmail.com>
	<465465D1.4060002@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705231610040.20212@dildano.hawaga.org.uk>


What does Falkon deployment look like at the moment? (in terms of 
procedures to deploy it from an empty computer, and in terms of how files 
are laid out, and in terms of how things get configured)?

I think it doesn't make sense to look at the falkon/swift interface code 
without looking at the whole deployment process for both Swift and Falkon 
together.

On Wed, 23 May 2007, Mike Wilde wrote:

> Im in favor of asking Ioan and possibly Yong - to the extent he has time - to
> push forward on this, to specifications from Ben and Mihael, and based on
> usability feedback from Nika and Tibi who need to speak for users' needs.
> Ben's specs should also address code quality, testing/certification and
> maintainability.
> 
> - Mike
> 
> 
> Tiberiu Stef-Praun wrote, On 5/23/2007 10:57 AM:
> > It seems that Yong's Falkon provider is working (according to Nika),
> > so I was wondering when will it make it into the Swift ? At that point
> > it's more convenient for me to test it (as I would only have to handle
> > the Falkon backend configuration).
> > 
> > Tibi
> > 
> > On 5/23/07, Ben Clifford <benc at hawaga.org.uk> wrote:
> > > 
> > > i hear rumour that its sufficiently unclear how to wire swift and falkon
> > > together that people are avoiding testing swift code (more recent than the
> > > 8th of march build that Yong made)
> > > 
> > > that is lame - it means large chunks of our app testing are being done
> > > with code that is 2.5 months old.
> > > 
> > > I don't know how Falkon gets deployed alongside swift at the moment, so I
> > > don't know what to do to make this easier - are they written down
> > > anywhere?
> > > 
> > > -- 
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > 
> > 
> > 
> 
> 


From iraicu at cs.uchicago.edu  Wed May 23 11:28:51 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 23 May 2007 11:28:51 -0500
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
Message-ID: <46546BC3.4070600@cs.uchicago.edu>

See below:

Tim Freeman wrote:
> On Tue, 22 May 2007 14:34:07 -0500
> Ioan Raicu <iraicu at cs.uchicago.edu> wrote:
>
>   
>> the many logs that we generate are quite hard for people to
>>       follow, and keep track of what each one contains; we fixed this by
>>       developing a GUI that  can connect to the GT4 container remotely
>>       and display relevant information!
>>     
>
> Is that something that could be used for other GT services?
>
>   
I guess so...
Here is a screen shot:
http://people.cs.uchicago.edu/~iraicu/research/Falkon/Falkon_GUI.gif


Essentially, all it does is it uses Java swing to paint the GUI, which 
has a bunch of text fields that get populated from data from the results 
of web service calls which are being polled against the GT4 service in 
question (Falkon in our case).  Its nothing fancy, but I bet something 
like this could be made for the GT4 container in general that would give 
basic container and host statistics!
>   
>>           * the provisioner currently creates a new thread for every job
>>             (resource allocation) it sends to GRAM4
>>                 o depending on which allocation strategy is used, this
>>                   might/might not be a problem on TG nodes
>>                 o in theory, we don't want more than 100 or so GRAM4
>>                   jobs in parallel running, but  if we choose the policy
>>                   in which each job allocates a single machine, then we
>>                   can easily surpass 100 jobs in parallel... all the
>>                   other policies, would be able to allocate 1K+, even
>>                   10K+ machines with less than 100 jobs in parallel, so
>>                   it could work perfectly fine even with the current
>>                   implementation; in the long run, this might be able to
>>                   be changed to a pool of threads in the provisioner!
>>     
>
> Are these threads just waiting on notifications?  
>
>   
Right... I took the easy way out and just created 1 thread per GRAM job, 
but it doesn't have to be this way, as you pointed out below.
> If so: you should be able to reduce this to one thread by subscribing with the
> same notification consumer EPR for each GRAM job and demuxing the result (demux
> based on the producer EPR that is passed to the class implementing
> NotifyCallback).  That way the thread that creates the GRAM job can disappear
> once it is done with the create call. 
>
>   
This is on my list of things to do, but I just haven't gotten around to 
fixing this!  It hasn't really been an issue with my current tests and 
usage scenarios, but needs to be addressed for the general case as 
people will likely hit this problem if we have enough users using Falkon.

Thanks,
Ioan
> Tim 
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070523/d564e7cb/attachment.html>

From benc at hawaga.org.uk  Wed May 23 11:35:25 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 23 May 2007 16:35:25 +0000 (GMT)
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
In-Reply-To: <46546BC3.4070600@cs.uchicago.edu>
References: <46546BC3.4070600@cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705231633440.22628@dildano.hawaga.org.uk>


On Wed, 23 May 2007, Ioan Raicu wrote:

> I guess so...
> Here is a screen shot:
> http://people.cs.uchicago.edu/~iraicu/research/Falkon/Falkon_GUI.gif
> 
> 
> Essentially, all it does is it uses Java swing to paint the GUI, which has a
> bunch of text fields that get populated from data from the results of web
> service calls which are being polled against the GT4 service in question
> (Falkon in our case).  Its nothing fancy, but I bet something like this could
> be made for the GT4 container in general that would give basic container and
> host statistics!

Does it use WS-Resource Properties?  If it doesn't, it probably should. If 
it does, it overlaps strongly with the work of the Globus MDS group and 
it might be interesting to interact with them.

-- 


From iraicu at cs.uchicago.edu  Wed May 23 11:36:06 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 23 May 2007 11:36:06 -0500
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
Message-ID: <46546D76.3020904@cs.uchicago.edu>

Hmmm... from my understanding, the Falkon provider is independent of the 
fact that Swift will actually use Falkon or not.  There is no 
requirement that Falkon be used, even if you have the Falkon provider 
installed!

With that said, our statement about people avoiding the latest version 
of Swift due to the Falkon provider doesn't make any sense. Maybe Yong 
has more input on this...

About how Falkon gets deplyed, it is simply uncompressed, you modify 1 
or 2 config files, and use the included scripts to start everything!  
All this is in the included readme.txt in the Falkon archive, 
downloadable online on my web site.  Once again, if someone is not 
intersted in using Falkon, then I see no reason why they would be doing 
anything different than before just because there is now a Falkon 
provider in Swift.

Ioan

Ben Clifford wrote:
> i hear rumour that its sufficiently unclear how to wire swift and falkon 
> together that people are avoiding testing swift code (more recent than the 
> 8th of march build that Yong made)
>
> that is lame - it means large chunks of our app testing are being done 
> with code that is 2.5 months old.
>
> I don't know how Falkon gets deployed alongside swift at the moment, so I 
> don't know what to do to make this easier - are they written down 
> anywhere?
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From nefedova at mcs.anl.gov  Wed May 23 11:39:45 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Wed, 23 May 2007 11:39:45 -0500
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <46546D76.3020904@cs.uchicago.edu>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<46546D76.3020904@cs.uchicago.edu>
Message-ID: <4955391B-5395-4F90-852C-BC06908FBD20@mcs.anl.gov>

I think Yong told me to use his swift install from terminable when I  
started using Falcon. I am not sure why -- I presumed there were some  
specifics in that install.

Nika

On May 23, 2007, at 11:36 AM, Ioan Raicu wrote:

> Hmmm... from my understanding, the Falkon provider is independent  
> of the fact that Swift will actually use Falkon or not.  There is  
> no requirement that Falkon be used, even if you have the Falkon  
> provider installed!
>
> With that said, our statement about people avoiding the latest  
> version of Swift due to the Falkon provider doesn't make any sense.  
> Maybe Yong has more input on this...
>
> About how Falkon gets deplyed, it is simply uncompressed, you  
> modify 1 or 2 config files, and use the included scripts to start  
> everything!  All this is in the included readme.txt in the Falkon  
> archive, downloadable online on my web site.  Once again, if  
> someone is not intersted in using Falkon, then I see no reason why  
> they would be doing anything different than before just because  
> there is now a Falkon provider in Swift.
>
> Ioan
>
> Ben Clifford wrote:
>> i hear rumour that its sufficiently unclear how to wire swift and  
>> falkon together that people are avoiding testing swift code (more  
>> recent than the 8th of march build that Yong made)
>>
>> that is lame - it means large chunks of our app testing are being  
>> done with code that is 2.5 months old.
>>
>> I don't know how Falkon gets deployed alongside swift at the  
>> moment, so I don't know what to do to make this easier - are they  
>> written down anywhere?
>>
>>
>
> -- 
> ============================================
> Ioan Raicu
> Ph.D. Student
> ============================================
> Distributed Systems Laboratory
> Computer Science Department
> University of Chicago
> 1100 E. 58th Street, Ryerson Hall
> Chicago, IL 60637
> ============================================
> Email: iraicu at cs.uchicago.edu
> Web:   http://www.cs.uchicago.edu/~iraicu
>       http://dsl.cs.uchicago.edu/
> ============================================
> ============================================
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From iraicu at cs.uchicago.edu  Wed May 23 11:42:31 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 23 May 2007 11:42:31 -0500
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <Pine.LNX.4.64.0705231600340.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>	<fec1351f0705230857s2bb734d1td28b93a97dc841a3@mail.gmail.com>
	<Pine.LNX.4.64.0705231600340.22628@dildano.hawaga.org.uk>
Message-ID: <46546EF7.5050506@cs.uchicago.edu>

Yes, it needs the stubs generated from the WSDL file which defines the 
interface into Falkon.  These stubs can simply be generated on the fly 
from the WSDL file, or copied from the Falkon service after compilation 
of the service.

So, there are dependencies, but nothing that requires the Falkon 
distribution :)

Ioan

Ben Clifford wrote:
> On Wed, 23 May 2007, Tiberiu Stef-Praun wrote:
>
>   
>> It seems that Yong's Falkon provider is working (according to Nika),
>> so I was wondering when will it make it into the Swift ? At that point
>> it's more convenient for me to test it (as I would only have to handle
>> the Falkon backend configuration).
>>     
>
> Does it have build dependencies on Falkon code?
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070523/c6477979/attachment.html>

From benc at hawaga.org.uk  Wed May 23 11:44:42 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 23 May 2007 16:44:42 +0000 (GMT)
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <46546D76.3020904@cs.uchicago.edu>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<46546D76.3020904@cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705231637230.22628@dildano.hawaga.org.uk>


On Wed, 23 May 2007, Ioan Raicu wrote:

> Hmmm... from my understanding, the Falkon provider is independent of the fact
> that Swift will actually use Falkon or not.  There is no requirement that
> Falkon be used, even if you have the Falkon provider installed!

Hopefully its that way, configurable by eg. a site catalog setting. I 
don't know if that is the case though right now. If not, we should make it 
that way.

> About how Falkon gets deplyed, it is simply uncompressed, you modify 1 or 2
> config files, and use the included scripts to start everything!  All this is
> in the included readme.txt in the Falkon archive, downloadable online on my
> web site.  Once again, if someone is not intersted in using Falkon, then I see
> no reason why they would be doing anything different than before just because
> there is now a Falkon provider in Swift.

ok. Does the swift/falkon provider need to be told an EPR to the Falkon 
web service?

My concerns mostly are not so much about having a provider in the source 
tree when people aren't going to use; that's fine. But the code needs to 
not be in the form of some random jar file without it being clear where it 
came from. If the code can build without needing Falkon code around (which 
I suspect it can't), then its simple to put it in the Swift codebase. If 
it has Falkon build dependencies (eg for web service stubs) then thats 
more stuff accumulating in the codebase that needs long term management 
(and brings in incompatibilities if you want to modify the Falkon web 
services API)

-- 


From yongzh at cs.uchicago.edu  Wed May 23 12:04:59 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Wed, 23 May 2007 12:04:59 -0500 (CDT)
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.58.0705231201530.22237@classes.cs.uchicago.edu>

I would say testing swift has nothing to do with the Falkon provider. The
provider is just one of the many providers that you can choose to use or
not, such as local, GT2, GT4, PBS etc.

I would strongly encourage people to look at the CoG documentation about
providers and others. The provider interface is nothing specific to
Falkon, I am frustrated that you guys mix the provider issue with Falkon
and make claims without looking deep into related documents.

Yong.

On Wed, 23 May 2007, Ben Clifford wrote:

>
> i hear rumour that its sufficiently unclear how to wire swift and falkon
> together that people are avoiding testing swift code (more recent than the
> 8th of march build that Yong made)
>
> that is lame - it means large chunks of our app testing are being done
> with code that is 2.5 months old.
>
> I don't know how Falkon gets deployed alongside swift at the moment, so I
> don't know what to do to make this easier - are they written down
> anywhere?
>
> --
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From yongzh at cs.uchicago.edu  Wed May 23 12:07:48 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Wed, 23 May 2007 12:07:48 -0500 (CDT)
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <Pine.LNX.4.64.0705231600340.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<fec1351f0705230857s2bb734d1td28b93a97dc841a3@mail.gmail.com>
	<Pine.LNX.4.64.0705231600340.22628@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.58.0705231205120.22237@classes.cs.uchicago.edu>

It of course needs something from Falkon, for instance, the service stubs
for the Falkon service. But it does not require Falkon to be deployed or
bundled together with Swift. This applies to other providers such as GT2
or GT4. You can use them, but you do not need to package them with Swift.

Yong.

On Wed, 23 May 2007, Ben Clifford wrote:

>
>
> On Wed, 23 May 2007, Tiberiu Stef-Praun wrote:
>
> > It seems that Yong's Falkon provider is working (according to Nika),
> > so I was wondering when will it make it into the Swift ? At that point
> > it's more convenient for me to test it (as I would only have to handle
> > the Falkon backend configuration).
>
> Does it have build dependencies on Falkon code?
>
> --
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From iraicu at cs.uchicago.edu  Wed May 23 12:08:57 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 23 May 2007 12:08:57 -0500
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <465465D1.4060002@mcs.anl.gov>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>	<fec1351f0705230857s2bb734d1td28b93a97dc841a3@mail.gmail.com>
	<465465D1.4060002@mcs.anl.gov>
Message-ID: <46547529.9010809@cs.uchicago.edu>

Hi,
I am certainly swamped right now for the next months or so at the very 
least (data caching support in Falkon, working with Nika for her apps, 
Kate+Borja for possibly using VMs and EC2, DSL Workshop, SC challenge 
brainstorming, HPDC hot topics paper, etc...). 

I could certainly use some help from developers which might be much more 
familiar with what it takes to get a prototype from research to 
production ready.  I am willing to work these developers, but if I have 
to do it myself (and with Yong's help), then I can't promise anything 
about what timeline I can have something more production ready.

Can any resources (developer power) be devoted to getting Falkon 
production ready?

Ioan

Mike Wilde wrote:
> Im in favor of asking Ioan and possibly Yong - to the extent he has 
> time - to push forward on this, to specifications from Ben and Mihael, 
> and based on usability feedback from Nika and Tibi who need to speak 
> for users' needs. Ben's specs should also address code quality, 
> testing/certification and maintainability.
>
> - Mike
>
>
> Tiberiu Stef-Praun wrote, On 5/23/2007 10:57 AM:
>> It seems that Yong's Falkon provider is working (according to Nika),
>> so I was wondering when will it make it into the Swift ? At that point
>> it's more convenient for me to test it (as I would only have to handle
>> the Falkon backend configuration).
>>
>> Tibi
>>
>> On 5/23/07, Ben Clifford <benc at hawaga.org.uk> wrote:
>>>
>>> i hear rumour that its sufficiently unclear how to wire swift and 
>>> falkon
>>> together that people are avoiding testing swift code (more recent 
>>> than the
>>> 8th of march build that Yong made)
>>>
>>> that is lame - it means large chunks of our app testing are being done
>>> with code that is 2.5 months old.
>>>
>>> I don't know how Falkon gets deployed alongside swift at the moment, 
>>> so I
>>> don't know what to do to make this easier - are they written down
>>> anywhere?
>>>
>>> -- 
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>
>>
>

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From yongzh at cs.uchicago.edu  Wed May 23 12:10:48 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Wed, 23 May 2007 12:10:48 -0500 (CDT)
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <4955391B-5395-4F90-852C-BC06908FBD20@mcs.anl.gov>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<46546D76.3020904@cs.uchicago.edu>
	<4955391B-5395-4F90-852C-BC06908FBD20@mcs.anl.gov>
Message-ID: <Pine.LNX.4.58.0705231208110.22237@classes.cs.uchicago.edu>

Because the Falkon provider code is not in SVN, and that install is where
you can get the Falkon provider code. But you did update your Swift code
to the lastest source code in SVN at that time (as you needed some
new features) as I told you to. So in that sense you were testing at least
some relative new Swift code.

Yong.

On Wed, 23 May 2007, Veronika Nefedova wrote:

> I think Yong told me to use his swift install from terminable when I
> started using Falcon. I am not sure why -- I presumed there were some
> specifics in that install.
>
> Nika
>
> On May 23, 2007, at 11:36 AM, Ioan Raicu wrote:
>
> > Hmmm... from my understanding, the Falkon provider is independent
> > of the fact that Swift will actually use Falkon or not.  There is
> > no requirement that Falkon be used, even if you have the Falkon
> > provider installed!
> >
> > With that said, our statement about people avoiding the latest
> > version of Swift due to the Falkon provider doesn't make any sense.
> > Maybe Yong has more input on this...
> >
> > About how Falkon gets deplyed, it is simply uncompressed, you
> > modify 1 or 2 config files, and use the included scripts to start
> > everything!  All this is in the included readme.txt in the Falkon
> > archive, downloadable online on my web site.  Once again, if
> > someone is not intersted in using Falkon, then I see no reason why
> > they would be doing anything different than before just because
> > there is now a Falkon provider in Swift.
> >
> > Ioan
> >
> > Ben Clifford wrote:
> >> i hear rumour that its sufficiently unclear how to wire swift and
> >> falkon together that people are avoiding testing swift code (more
> >> recent than the 8th of march build that Yong made)
> >>
> >> that is lame - it means large chunks of our app testing are being
> >> done with code that is 2.5 months old.
> >>
> >> I don't know how Falkon gets deployed alongside swift at the
> >> moment, so I don't know what to do to make this easier - are they
> >> written down anywhere?
> >>
> >>
> >
> > --
> > ============================================
> > Ioan Raicu
> > Ph.D. Student
> > ============================================
> > Distributed Systems Laboratory
> > Computer Science Department
> > University of Chicago
> > 1100 E. 58th Street, Ryerson Hall
> > Chicago, IL 60637
> > ============================================
> > Email: iraicu at cs.uchicago.edu
> > Web:   http://www.cs.uchicago.edu/~iraicu
> >       http://dsl.cs.uchicago.edu/
> > ============================================
> > ============================================
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From tiberius at ci.uchicago.edu  Wed May 23 12:40:24 2007
From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun)
Date: Wed, 23 May 2007 12:40:24 -0500
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <Pine.LNX.4.58.0705231208110.22237@classes.cs.uchicago.edu>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<46546D76.3020904@cs.uchicago.edu>
	<4955391B-5395-4F90-852C-BC06908FBD20@mcs.anl.gov>
	<Pine.LNX.4.58.0705231208110.22237@classes.cs.uchicago.edu>
Message-ID: <fec1351f0705231040k1807b78dw54a7a00de4d1bdca@mail.gmail.com>

So to summarize:
-  apparently there are small changes needed by the Swift's Falcon provider
- will anyone get these in the SVN, and make Swift " Falcon-ready" ?
Note that I did not say "Falcon enabled", because the latter means
that a Falkon service is installed somewhere and ready to run

Tibi

On 5/23/07, Yong Zhao <yongzh at cs.uchicago.edu> wrote:
> Because the Falkon provider code is not in SVN, and that install is where
> you can get the Falkon provider code. But you did update your Swift code
> to the lastest source code in SVN at that time (as you needed some
> new features) as I told you to. So in that sense you were testing at least
> some relative new Swift code.
>
> Yong.
>
> On Wed, 23 May 2007, Veronika Nefedova wrote:
>
> > I think Yong told me to use his swift install from terminable when I
> > started using Falcon. I am not sure why -- I presumed there were some
> > specifics in that install.
> >
> > Nika
> >
> > On May 23, 2007, at 11:36 AM, Ioan Raicu wrote:
> >
> > > Hmmm... from my understanding, the Falkon provider is independent
> > > of the fact that Swift will actually use Falkon or not.  There is
> > > no requirement that Falkon be used, even if you have the Falkon
> > > provider installed!
> > >
> > > With that said, our statement about people avoiding the latest
> > > version of Swift due to the Falkon provider doesn't make any sense.
> > > Maybe Yong has more input on this...
> > >
> > > About how Falkon gets deplyed, it is simply uncompressed, you
> > > modify 1 or 2 config files, and use the included scripts to start
> > > everything!  All this is in the included readme.txt in the Falkon
> > > archive, downloadable online on my web site.  Once again, if
> > > someone is not intersted in using Falkon, then I see no reason why
> > > they would be doing anything different than before just because
> > > there is now a Falkon provider in Swift.
> > >
> > > Ioan
> > >
> > > Ben Clifford wrote:
> > >> i hear rumour that its sufficiently unclear how to wire swift and
> > >> falkon together that people are avoiding testing swift code (more
> > >> recent than the 8th of march build that Yong made)
> > >>
> > >> that is lame - it means large chunks of our app testing are being
> > >> done with code that is 2.5 months old.
> > >>
> > >> I don't know how Falkon gets deployed alongside swift at the
> > >> moment, so I don't know what to do to make this easier - are they
> > >> written down anywhere?
> > >>
> > >>
> > >
> > > --
> > > ============================================
> > > Ioan Raicu
> > > Ph.D. Student
> > > ============================================
> > > Distributed Systems Laboratory
> > > Computer Science Department
> > > University of Chicago
> > > 1100 E. 58th Street, Ryerson Hall
> > > Chicago, IL 60637
> > > ============================================
> > > Email: iraicu at cs.uchicago.edu
> > > Web:   http://www.cs.uchicago.edu/~iraicu
> > >       http://dsl.cs.uchicago.edu/
> > > ============================================
> > > ============================================
> > >
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


-- 
Tiberiu (Tibi) Stef-Praun, PhD
Research Staff, Computation Institute
5640 S. Ellis Ave, #405
University of Chicago
http://www-unix.mcs.anl.gov/~tiberius/


From benc at hawaga.org.uk  Wed May 23 12:42:26 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 23 May 2007 17:42:26 +0000 (GMT)
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <Pine.LNX.4.58.0705231208110.22237@classes.cs.uchicago.edu>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<46546D76.3020904@cs.uchicago.edu>
	<4955391B-5395-4F90-852C-BC06908FBD20@mcs.anl.gov>
	<Pine.LNX.4.58.0705231208110.22237@classes.cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705231739300.20212@dildano.hawaga.org.uk>


so a relatively straightforward thing to do would be to put the source 
code into the swift SVN, put the stubs in jar form into the swift SVN, 
have the falkon provider built as part of the swift build and made 
available for use.

another way would be for it to go into cog. but that's for cog to decide, 
not me.

either way looks pretty much the same when swift is deployed.

how does a user specify that jobs should go through falkon rather than the 
other mechanisms?

-- 


From yongzh at cs.uchicago.edu  Wed May 23 13:09:08 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Wed, 23 May 2007 13:09:08 -0500 (CDT)
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <Pine.LNX.4.64.0705231739300.20212@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<46546D76.3020904@cs.uchicago.edu>
	<4955391B-5395-4F90-852C-BC06908FBD20@mcs.anl.gov>
	<Pine.LNX.4.58.0705231208110.22237@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705231739300.20212@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.58.0705231307330.22237@classes.cs.uchicago.edu>

Currently the provider resides in the cog branch. I'm not quite sure how
to put it into another branch.

In the sites.xml, if there is a Falkon service URL, then the Falkon
provider is selected.

Yong.

On Wed, 23 May 2007, Ben Clifford wrote:

>
> so a relatively straightforward thing to do would be to put the source
> code into the swift SVN, put the stubs in jar form into the swift SVN,
> have the falkon provider built as part of the swift build and made
> available for use.
>
> another way would be for it to go into cog. but that's for cog to decide,
> not me.
>
> either way looks pretty much the same when swift is deployed.
>
> how does a user specify that jobs should go through falkon rather than the
> other mechanisms?
>
> --
>


From benc at hawaga.org.uk  Wed May 23 13:11:58 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 23 May 2007 18:11:58 +0000 (GMT)
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <46547529.9010809@cs.uchicago.edu>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<fec1351f0705230857s2bb734d1td28b93a97dc841a3@mail.gmail.com>
	<465465D1.4060002@mcs.anl.gov> <46547529.9010809@cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705231806320.20212@dildano.hawaga.org.uk>


On Wed, 23 May 2007, Ioan Raicu wrote:

> I could certainly use some help from developers which might be much more
> familiar with what it takes to get a prototype from research to production
> ready.

what it takes, perhaps more than anything, is a bunch of time, both as a 
one off occurence and as an on-going concern.  something no-one has much 
of :-(

most often underestimated is the on-going time - I've seen plenty of stuff 
been made "production ready and released" and then left to rot, which it 
will do within months without constant care and attention.

-- 


From benc at hawaga.org.uk  Wed May 23 13:12:25 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 23 May 2007 18:12:25 +0000 (GMT)
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <Pine.LNX.4.58.0705231307330.22237@classes.cs.uchicago.edu>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<46546D76.3020904@cs.uchicago.edu>
	<4955391B-5395-4F90-852C-BC06908FBD20@mcs.anl.gov>
	<Pine.LNX.4.58.0705231208110.22237@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705231739300.20212@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705231307330.22237@classes.cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705231812110.22628@dildano.hawaga.org.uk>


On Wed, 23 May 2007, Yong Zhao wrote:

> In the sites.xml, if there is a Falkon service URL, then the Falkon
> provider is selected.

how is it determined that its a falkon url?

-- 


From iraicu at cs.uchicago.edu  Wed May 23 13:14:54 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 23 May 2007 13:14:54 -0500
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <Pine.LNX.4.64.0705231610040.20212@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>	<fec1351f0705230857s2bb734d1td28b93a97dc841a3@mail.gmail.com>	<465465D1.4060002@mcs.anl.gov>
	<Pine.LNX.4.64.0705231610040.20212@dildano.hawaga.org.uk>
Message-ID: <4654849E.9030902@cs.uchicago.edu>


Ben Clifford wrote:
> What does Falkon deployment look like at the moment? (in terms of 
> procedures to deploy it from an empty computer, and in terms of how files 
> are laid out, and in terms of how things get configured)?
>   
there is an archive that has everything one might need, a GT4 container, 
the service code (which is already deployed in the GT4 container), the 
executor code, the client code, and the monitor GUI code.  We also 
bundle a 1.4 32bit JRE.  There is a configuration file that points to 
where the logs are supposed to go, and another one with what security 
mechanisms you want to use.  There are also a bunch of settable 
parameters in the startup scripts, and a few obscure setable parameters 
in the code, that requires recompiling the service.  To recompile the 
service, you need ANT installed and configured as well as 1.4+ JDK; to 
recompile anything else, you just need 1.4+ JDK.  With a single script, 
and a single arguement (the port number), you can start the entire 
Falkon system!
> I think it doesn't make sense to look at the falkon/swift interface code 
> without looking at the whole deployment process for both Swift and Falkon 
> together.
>   
My understanding is that the Falkon provider can be specified in the 
sites.xml, including where the Falkon dispatcher will be found.  Other 
than that, everything else should be straight forward.

Ioan
> On Wed, 23 May 2007, Mike Wilde wrote:
>
>   
>> Im in favor of asking Ioan and possibly Yong - to the extent he has time - to
>> push forward on this, to specifications from Ben and Mihael, and based on
>> usability feedback from Nika and Tibi who need to speak for users' needs.
>> Ben's specs should also address code quality, testing/certification and
>> maintainability.
>>
>> - Mike
>>
>>
>> Tiberiu Stef-Praun wrote, On 5/23/2007 10:57 AM:
>>     
>>> It seems that Yong's Falkon provider is working (according to Nika),
>>> so I was wondering when will it make it into the Swift ? At that point
>>> it's more convenient for me to test it (as I would only have to handle
>>> the Falkon backend configuration).
>>>
>>> Tibi
>>>
>>> On 5/23/07, Ben Clifford <benc at hawaga.org.uk> wrote:
>>>       
>>>> i hear rumour that its sufficiently unclear how to wire swift and falkon
>>>> together that people are avoiding testing swift code (more recent than the
>>>> 8th of march build that Yong made)
>>>>
>>>> that is lame - it means large chunks of our app testing are being done
>>>> with code that is 2.5 months old.
>>>>
>>>> I don't know how Falkon gets deployed alongside swift at the moment, so I
>>>> don't know what to do to make this easier - are they written down
>>>> anywhere?
>>>>
>>>> -- 
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>
>>>>         
>>>       
>>     
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070523/405e4cca/attachment.html>

From iraicu at cs.uchicago.edu  Wed May 23 13:17:30 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 23 May 2007 13:17:30 -0500
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
In-Reply-To: <Pine.LNX.4.64.0705231633440.22628@dildano.hawaga.org.uk>
References: <46546BC3.4070600@cs.uchicago.edu>
	<Pine.LNX.4.64.0705231633440.22628@dildano.hawaga.org.uk>
Message-ID: <4654853A.6000104@cs.uchicago.edu>


Ben Clifford wrote:
> On Wed, 23 May 2007, Ioan Raicu wrote:
>
>   
>> I guess so...
>> Here is a screen shot:
>> http://people.cs.uchicago.edu/~iraicu/research/Falkon/Falkon_GUI.gif
>>
>>
>> Essentially, all it does is it uses Java swing to paint the GUI, which has a
>> bunch of text fields that get populated from data from the results of web
>> service calls which are being polled against the GT4 service in question
>> (Falkon in our case).  Its nothing fancy, but I bet something like this could
>> be made for the GT4 container in general that would give basic container and
>> host statistics!
>>     
>
> Does it use WS-Resource Properties?  
No, but it could... the GUI was a 1 day hack, and I found it simpler to 
simply add a monitorStatus function that returned a bunch of system 
metrics! 
> If it doesn't, it probably should. If 
> it does, it overlaps strongly with the work of the Globus MDS group and 
> it might be interesting to interact with them.
>   
I never meant for the monitor GUI to be anything fancy, it was simply to 
give me a more efficient way of looking at the log files.  I intended it 
to be a poll driven GUI, rather than notification driven, for 
simplicity!  If anyone wants to extend this, feel free!

Ioan

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070523/bba0e4a0/attachment.html>

From yongzh at cs.uchicago.edu  Wed May 23 13:18:07 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Wed, 23 May 2007 13:18:07 -0500 (CDT)
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <Pine.LNX.4.64.0705231812110.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<46546D76.3020904@cs.uchicago.edu>
	<4955391B-5395-4F90-852C-BC06908FBD20@mcs.anl.gov>
	<Pine.LNX.4.58.0705231208110.22237@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705231739300.20212@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705231307330.22237@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705231812110.22628@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.58.0705231316080.22237@classes.cs.uchicago.edu>

It is a WSRF service EPR with something like this:

http://tg-login1.uc.teragrid.org:50001/wsrf/services/GenericPortal/core/WS/GPFactoryService

Although the GenericPortal stuff needs to be changed to Falkon soon.

Yong.

On Wed, 23 May 2007, Ben Clifford wrote:

>
>
> On Wed, 23 May 2007, Yong Zhao wrote:
>
> > In the sites.xml, if there is a Falkon service URL, then the Falkon
> > provider is selected.
>
> how is it determined that its a falkon url?
>
> --
>
>


From benc at hawaga.org.uk  Wed May 23 13:24:31 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 23 May 2007 18:24:31 +0000 (GMT)
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <Pine.LNX.4.58.0705231316080.22237@classes.cs.uchicago.edu>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<46546D76.3020904@cs.uchicago.edu>
	<4955391B-5395-4F90-852C-BC06908FBD20@mcs.anl.gov>
	<Pine.LNX.4.58.0705231208110.22237@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705231739300.20212@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705231307330.22237@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705231812110.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705231316080.22237@classes.cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705231822310.22628@dildano.hawaga.org.uk>


On Wed, 23 May 2007, Yong Zhao wrote:

> It is a WSRF service EPR with something like this:
> 
> http://tg-login1.uc.teragrid.org:50001/wsrf/services/GenericPortal/core/WS/GPFactoryService
> 
> Although the GenericPortal stuff needs to be changed to Falkon soon.

An http URI doesn't really indicate that its Falkon compared to some other 
system that also chooses to use web services to submit. Perhaps there 
should be a site catalog entry to pick providers - there already so-of is 
that in the legacy GRAM version parameter.

-- 


From benc at hawaga.org.uk  Wed May 23 13:30:21 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 23 May 2007 18:30:21 +0000 (GMT)
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
In-Reply-To: <4654853A.6000104@cs.uchicago.edu>
References: <46546BC3.4070600@cs.uchicago.edu>
	<Pine.LNX.4.64.0705231633440.22628@dildano.hawaga.org.uk>
	<4654853A.6000104@cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705231827580.22628@dildano.hawaga.org.uk>


On Wed, 23 May 2007, Ioan Raicu wrote:

> I found it simpler to simply add a monitorStatus function that returned 
> a bunch of system metrics!

A damnation of the GT WS Resource Properties implementation!

-- 


From benc at hawaga.org.uk  Wed May 23 13:38:15 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 23 May 2007 18:38:15 +0000 (GMT)
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <Pine.LNX.4.58.0705231307330.22237@classes.cs.uchicago.edu>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<46546D76.3020904@cs.uchicago.edu>
	<4955391B-5395-4F90-852C-BC06908FBD20@mcs.anl.gov>
	<Pine.LNX.4.58.0705231208110.22237@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705231739300.20212@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705231307330.22237@classes.cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705231837560.22628@dildano.hawaga.org.uk>


On Wed, 23 May 2007, Yong Zhao wrote:

> Currently the provider resides in the cog branch. I'm not quite sure how
> to put it into another branch.

It is in the cog svn at the moment?

-- 


From foster at mcs.anl.gov  Wed May 23 13:38:47 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Wed, 23 May 2007 13:38:47 -0500
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
In-Reply-To: <Pine.LNX.4.64.0705231827580.22628@dildano.hawaga.org.uk>
References: <46546BC3.4070600@cs.uchicago.edu>	<Pine.LNX.4.64.0705231633440.22628@dildano.hawaga.org.uk>	<4654853A.6000104@cs.uchicago.edu>
	<Pine.LNX.4.64.0705231827580.22628@dildano.hawaga.org.uk>
Message-ID: <46548A37.7070708@mcs.anl.gov>

maybe ... or maybe an indication that Ioan is an inveterate NIHer ...

Ben Clifford wrote:
> On Wed, 23 May 2007, Ioan Raicu wrote:
>
>   
>> I found it simpler to simply add a monitorStatus function that returned 
>> a bunch of system metrics!
>>     
>
> A damnation of the GT WS Resource Properties implementation!
>
>   

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070523/48d732b9/attachment.html>

From iraicu at cs.uchicago.edu  Wed May 23 13:51:23 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 23 May 2007 13:51:23 -0500
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <Pine.LNX.4.64.0705231637230.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<46546D76.3020904@cs.uchicago.edu>
	<Pine.LNX.4.64.0705231637230.22628@dildano.hawaga.org.uk>
Message-ID: <46548D2B.1010404@cs.uchicago.edu>


Ben Clifford wrote:
> On Wed, 23 May 2007, Ioan Raicu wrote:
>
>   
>> Hmmm... from my understanding, the Falkon provider is independent of the fact
>> that Swift will actually use Falkon or not.  There is no requirement that
>> Falkon be used, even if you have the Falkon provider installed!
>>     
>
> Hopefully its that way, configurable by eg. a site catalog setting. I 
> don't know if that is the case though right now. If not, we should make it 
> that way.
>
>   
>> About how Falkon gets deplyed, it is simply uncompressed, you modify 1 or 2
>> config files, and use the included scripts to start everything!  All this is
>> in the included readme.txt in the Falkon archive, downloadable online on my
>> web site.  Once again, if someone is not intersted in using Falkon, then I see
>> no reason why they would be doing anything different than before just because
>> there is now a Falkon provider in Swift.
>>     
>
> ok. Does the swift/falkon provider need to be told an EPR to the Falkon 
> web service?
>   
No, it creates a new resource for which the EPR is returned, and that is 
used over and over again until Swift shuts down and the resource is 
destroyed.  Basically, the service URL is all is needed!
> My concerns mostly are not so much about having a provider in the source 
> tree when people aren't going to use; that's fine. But the code needs to 
> not be in the form of some random jar file without it being clear where it 
> came from. If the code can build without needing Falkon code around (which 
> I suspect it can't), then its simple to put it in the Swift codebase. If 
> it has Falkon build dependencies (eg for web service stubs) then thats 
> more stuff accumulating in the codebase that needs long term management 
> (and brings in incompatibilities if you want to modify the Falkon web 
> services API)
>   
I was able to generate stubs from a command line tool bundled with GT4 a 
while back, so I don't see why you couldn't just have it all Falkon 
independent!

Ioan

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070523/e1c373f3/attachment.html>

From benc at hawaga.org.uk  Wed May 23 13:55:28 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 23 May 2007 18:55:28 +0000 (GMT)
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <46548D2B.1010404@cs.uchicago.edu>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<46546D76.3020904@cs.uchicago.edu>
	<Pine.LNX.4.64.0705231637230.22628@dildano.hawaga.org.uk>
	<46548D2B.1010404@cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705231855170.22628@dildano.hawaga.org.uk>


On Wed, 23 May 2007, Ioan Raicu wrote:

> I was able to generate stubs from a command line tool bundled with GT4 a while
> back, so I don't see why you couldn't just have it all Falkon independent!

needs the wsdl though?

-- 


From iraicu at cs.uchicago.edu  Wed May 23 13:58:21 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 23 May 2007 13:58:21 -0500
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <Pine.LNX.4.64.0705231822310.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>	<46546D76.3020904@cs.uchicago.edu>	<4955391B-5395-4F90-852C-BC06908FBD20@mcs.anl.gov>	<Pine.LNX.4.58.0705231208110.22237@classes.cs.uchicago.edu>	<Pine.LNX.4.64.0705231739300.20212@dildano.hawaga.org.uk>	<Pine.LNX.4.58.0705231307330.22237@classes.cs.uchicago.edu>	<Pine.LNX.4.64.0705231812110.22628@dildano.hawaga.org.uk>	<Pine.LNX.4.58.0705231316080.22237@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705231822310.22628@dildano.hawaga.org.uk>
Message-ID: <46548ECD.8060103@cs.uchicago.edu>

I believe that its searching not just for the http:// url, but a 
specific string (i..e. GenericPortal currently, Falkon soon)...
Ioan

Ben Clifford wrote:
> On Wed, 23 May 2007, Yong Zhao wrote:
>
>   
>> It is a WSRF service EPR with something like this:
>>
>> http://tg-login1.uc.teragrid.org:50001/wsrf/services/GenericPortal/core/WS/GPFactoryService
>>
>> Although the GenericPortal stuff needs to be changed to Falkon soon.
>>     
>
> An http URI doesn't really indicate that its Falkon compared to some other 
> system that also chooses to use web services to submit. Perhaps there 
> should be a site catalog entry to pick providers - there already so-of is 
> that in the legacy GRAM version parameter.
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070523/40e3b431/attachment.html>

From benc at hawaga.org.uk  Wed May 23 14:00:38 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 23 May 2007 19:00:38 +0000 (GMT)
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <46548ECD.8060103@cs.uchicago.edu>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<46546D76.3020904@cs.uchicago.edu>
	<4955391B-5395-4F90-852C-BC06908FBD20@mcs.anl.gov>
	<Pine.LNX.4.58.0705231208110.22237@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705231739300.20212@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705231307330.22237@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705231812110.22628@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0705231316080.22237@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0705231822310.22628@dildano.hawaga.org.uk>
	<46548ECD.8060103@cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705231900110.22628@dildano.hawaga.org.uk>


On Wed, 23 May 2007, Ioan Raicu wrote:

> I believe that its searching not just for the http:// url, but a specific
> string (i..e. GenericPortal currently, Falkon soon)...

evil!

service URLs are (should be) opaque.

-- 


From iraicu at cs.uchicago.edu  Wed May 23 14:01:47 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 23 May 2007 14:01:47 -0500
Subject: [Swift-devel] Re: GRAM and Swift discussion this week?
In-Reply-To: <Pine.LNX.4.64.0705231827580.22628@dildano.hawaga.org.uk>
References: <46546BC3.4070600@cs.uchicago.edu>
	<Pine.LNX.4.64.0705231633440.22628@dildano.hawaga.org.uk>
	<4654853A.6000104@cs.uchicago.edu>
	<Pine.LNX.4.64.0705231827580.22628@dildano.hawaga.org.uk>
Message-ID: <46548F9B.8040303@cs.uchicago.edu>

I did not give it much thought, and did not look into how it would look 
as resource properties.  My service does expose some resource 
properties, but I found them to be harder to configure, and in my 
current way of handling them, I would have had to retrieve each resource 
property in a separate WS call, being very inefficient :(  Maybe I could 
have made an encapsulating object that held all the system metrics, 
similar to what my function does, but ah well... in the next version...

Ioan

Ben Clifford wrote:
> On Wed, 23 May 2007, Ioan Raicu wrote:
>
>   
>> I found it simpler to simply add a monitorStatus function that returned 
>> a bunch of system metrics!
>>     
>
> A damnation of the GT WS Resource Properties implementation!
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070523/e36df3cd/attachment.html>

From iraicu at cs.uchicago.edu  Wed May 23 14:04:24 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 23 May 2007 14:04:24 -0500
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <Pine.LNX.4.64.0705231855170.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<46546D76.3020904@cs.uchicago.edu>
	<Pine.LNX.4.64.0705231637230.22628@dildano.hawaga.org.uk>
	<46548D2B.1010404@cs.uchicago.edu>
	<Pine.LNX.4.64.0705231855170.22628@dildano.hawaga.org.uk>
Message-ID: <46549038.5020009@cs.uchicago.edu>

You can get the WSDL from a running service by querying the service in a 
standard WS way....
Ioan

Ben Clifford wrote:
> On Wed, 23 May 2007, Ioan Raicu wrote:
>
>   
>> I was able to generate stubs from a command line tool bundled with GT4 a while
>> back, so I don't see why you couldn't just have it all Falkon independent!
>>     
>
> needs the wsdl though?
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070523/e5eee6cc/attachment.html>

From benc at hawaga.org.uk  Wed May 23 14:05:12 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 23 May 2007 19:05:12 +0000 (GMT)
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <46549038.5020009@cs.uchicago.edu>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<46546D76.3020904@cs.uchicago.edu>
	<Pine.LNX.4.64.0705231637230.22628@dildano.hawaga.org.uk>
	<46548D2B.1010404@cs.uchicago.edu>
	<Pine.LNX.4.64.0705231855170.22628@dildano.hawaga.org.uk>
	<46549038.5020009@cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705231905040.22628@dildano.hawaga.org.uk>


On Wed, 23 May 2007, Ioan Raicu wrote:

> You can get the WSDL from a running service by querying the service in a
> standard WS way....

but not at compile time.

-- 


From iraicu at cs.uchicago.edu  Wed May 23 15:23:55 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 23 May 2007 15:23:55 -0500
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <Pine.LNX.4.64.0705231905040.22628@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<46546D76.3020904@cs.uchicago.edu>
	<Pine.LNX.4.64.0705231637230.22628@dildano.hawaga.org.uk>
	<46548D2B.1010404@cs.uchicago.edu>
	<Pine.LNX.4.64.0705231855170.22628@dildano.hawaga.org.uk>
	<46549038.5020009@cs.uchicago.edu>
	<Pine.LNX.4.64.0705231905040.22628@dildano.hawaga.org.uk>
Message-ID: <4654A2DB.7030904@cs.uchicago.edu>

It all depends at how complicated your compile scripts are, and if 
Falkon is operational anywhere... it could be done at compile time... if 
not, then you'd have to package it with Swift... these are probably 
small details, I bet we could work around them if we know exactly what 
end result we want. 

Also, what is so bad about including the WSDL definition of Falkon with 
Swift, so you can generate the stubs at compile time?

Ioan

Ben Clifford wrote:
> On Wed, 23 May 2007, Ioan Raicu wrote:
>
>   
>> You can get the WSDL from a running service by querying the service in a
>> standard WS way....
>>     
>
> but not at compile time.
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070523/901bdd2a/attachment.html>

From benc at hawaga.org.uk  Wed May 23 18:36:09 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 23 May 2007 23:36:09 +0000 (GMT)
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <4654A2DB.7030904@cs.uchicago.edu>
References: <Pine.LNX.4.64.0705231511380.22628@dildano.hawaga.org.uk>
	<46546D76.3020904@cs.uchicago.edu>
	<Pine.LNX.4.64.0705231637230.22628@dildano.hawaga.org.uk>
	<46548D2B.1010404@cs.uchicago.edu>
	<Pine.LNX.4.64.0705231855170.22628@dildano.hawaga.org.uk>
	<46549038.5020009@cs.uchicago.edu>
	<Pine.LNX.4.64.0705231905040.22628@dildano.hawaga.org.uk>
	<4654A2DB.7030904@cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0705232334390.22628@dildano.hawaga.org.uk>


On Wed, 23 May 2007, Ioan Raicu wrote:

> Also, what is so bad about including the WSDL definition of Falkon with 
> Swift, so you can generate the stubs at compile time?

Not so much of an issue.

Pretty much the main consideration is that you will have difficulty 
changing the interface once people start taking copies of the interface 
definition.

But I think that is the way to go for now.

-- 


From hategan at mcs.anl.gov  Thu May 24 03:49:06 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 24 May 2007 11:49:06 +0300
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <Pine.LNX.4.64.0705231739300.20212@dildano.hawaga.org.uk> (from
	benc@hawaga.org.uk on Wed May 23 20:42:26 2007)
Message-ID: <1179996546l.17759l.0l@blabla>

On 05/23/2007 08:42:26 PM, Ben Clifford wrote:
> 
> so a relatively straightforward thing to do would be to put the source
> 
> code into the swift SVN, put the stubs in jar form into the swift SVN,
> 
> have the falkon provider built as part of the swift build and made
> available for use.

I'm not sure if that is wise. The falkon provider should be a separate  
module (build entity). In other words, it should be straightforward to  
either build it or not build it.

> 
> another way would be for it to go into cog. but that's for cog to
> decide,
> not me.

It's somewhat unlikely.

> 
> either way looks pretty much the same when swift is deployed.
> 
> how does a user specify that jobs should go through falkon rather than
> the
> other mechanisms?
> 
> --
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> 


From benc at hawaga.org.uk  Thu May 24 16:31:58 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 24 May 2007 21:31:58 +0000 (GMT)
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <1179996546l.17759l.0l@blabla>
References: <1179996546l.17759l.0l@blabla>
Message-ID: <Pine.LNX.4.64.0705242128020.22628@dildano.hawaga.org.uk>


On Thu, 24 May 2007, Mihael Hategan wrote:

> I'm not sure if that is wise. The falkon provider should be a separate module
> (build entity). In other words, it should be straightforward to either build
> it or not build it.

needs a sensible deployment mechanism then, which pretty much means some 
nicer way of plugging in new providers to swift than manually editing 
config files / source code each time. there's a feature req for something 
like that for plugging in new mappers, but it should cover both.

-- 


From hategan at mcs.anl.gov  Fri May 25 02:48:07 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 25 May 2007 10:48:07 +0300
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <Pine.LNX.4.64.0705242128020.22628@dildano.hawaga.org.uk> (from
	benc@hawaga.org.uk on Fri May 25 00:31:58 2007)
References: <1179996546l.17759l.0l@blabla>
	<Pine.LNX.4.64.0705242128020.22628@dildano.hawaga.org.uk>
Message-ID: <1180079287l.22334l.0l@blabla>

On 05/25/2007 12:31:58 AM, Ben Clifford wrote:
> 
> 
> On Thu, 24 May 2007, Mihael Hategan wrote:
> 
> > I'm not sure if that is wise. The falkon provider should be a
> separate module
> > (build entity). In other words, it should be straightforward to
> either build
> > it or not build it.
> 
> needs a sensible deployment mechanism then, which pretty much means
> some
> nicer way of plugging in new providers to swift than manually editing
> config files / source code each time. there's a feature req for
> something
> like that for plugging in new mappers, but it should cover both.
> 

Providers in cog are dynamically loaded. Assuming that swift handles  
the sites.xml entries correctly, all it takes is to have the relevant  
jars and config files on the classpath (assuming that the provider  
itself does not have funny requirements).

Mihael


From benc at hawaga.org.uk  Fri May 25 06:45:37 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Fri, 25 May 2007 11:45:37 +0000 (GMT)
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <1180079287l.22334l.0l@blabla>
References: <1179996546l.17759l.0l@blabla>
	<Pine.LNX.4.64.0705242128020.22628@dildano.hawaga.org.uk>
	<1180079287l.22334l.0l@blabla>
Message-ID: <Pine.LNX.4.64.0705251144030.22628@dildano.hawaga.org.uk>


On Fri, 25 May 2007, Mihael Hategan wrote:

> Providers in cog are dynamically loaded. Assuming that swift handles the
> sites.xml entries correctly

which I guess isn't the case at the moment because providers are 
explicitly named in libexec/scheduler.xml and in vdl-sc.k.

-- 


From hategan at mcs.anl.gov  Wed May 30 06:03:52 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 30 May 2007 14:03:52 +0300
Subject: [Swift-devel] wiring swift and falkon together
In-Reply-To: <Pine.LNX.4.64.0705251144030.22628@dildano.hawaga.org.uk>
References: <1179996546l.17759l.0l@blabla>
	<Pine.LNX.4.64.0705242128020.22628@dildano.hawaga.org.uk>
	<1180079287l.22334l.0l@blabla>
	<Pine.LNX.4.64.0705251144030.22628@dildano.hawaga.org.uk>
Message-ID: <1180523032.2501.10.camel@blabla.mcs.anl.gov>

On Fri, 2007-05-25 at 11:45 +0000, Ben Clifford wrote:
> 
> On Fri, 25 May 2007, Mihael Hategan wrote:
> 
> > Providers in cog are dynamically loaded. Assuming that swift handles the
> > sites.xml entries correctly
> 
> which I guess isn't the case at the moment because providers are 
> explicitly named in libexec/scheduler.xml and in vdl-sc.k.

Right. Let me rephrase: if vdl-sc.k has the right stuff, then deployment
of the falkon provider should consist of sticking the jar & config files
in lib and etc, respectively.

> 


From benc at hawaga.org.uk  Thu May 31 16:45:14 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 31 May 2007 21:45:14 +0000 (GMT)
Subject: [Swift-devel] Teragrid usage
In-Reply-To: <EF2846E6-E97F-4A62-AF3F-7ED481AFD1C8@mcs.anl.gov>
References: <EF2846E6-E97F-4A62-AF3F-7ED481AFD1C8@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0705312144500.20212@dildano.hawaga.org.uk>


Anyone know if its possible to see how those units were spent? (eg userid? 
job logs?)

On Wed, 16 May 2007, Veronika Nefedova wrote:

> Hi,
> 
> I checked my Teragrid accounts and it looks like the Swift's allocation is
> almost completely used by now (or is it just for me ?):
> 
> Account: TG-CDA060004T
> Title: TeraGrid:  Development Account for Multiple Grid Science Projects
> Resource: teragrid_roaming
> Allocation Period: 2006-08-30 to 2007-08-31
> 
> Name (Last First) or Account       Total      Remaining        Usage
> ----------------------------     ----------  ------------   ----------
>   Nefedova  Veronika             30000 SU         0 SU     27491 SU
> ----------------------------------------------------------------------
> 
> Fortunately, Benoit has added me to his group's allocation - so I can continue
> testing on TG. But it looks like Swift's allocation is almost gone... Should
> we renew it ?
> 
> Nika
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 


From wilde at mcs.anl.gov  Thu May 31 17:05:09 2007
From: wilde at mcs.anl.gov (Mike Wilde)
Date: Thu, 31 May 2007 17:05:09 -0500
Subject: [Swift-devel] Teragrid usage
In-Reply-To: <Pine.LNX.4.64.0705312144500.20212@dildano.hawaga.org.uk>
References: <EF2846E6-E97F-4A62-AF3F-7ED481AFD1C8@mcs.anl.gov>
	<Pine.LNX.4.64.0705312144500.20212@dildano.hawaga.org.uk>
Message-ID: <465F4695.7040505@mcs.anl.gov>

Here's a first approximation:

Account: TG-CDA060004T
Title: TeraGrid:  Development Account for Multiple Grid Science Projects
Resource: teragrid_roaming
Local project name on dtf.ncsa.teragrid is kgx
Allocation Period: 2006-08-30 to 2007-08-31

Name (Last First) or Account       Total      Remaining        Usage
----------------------------     ----------  ------------   ----------
    Clifford  Ben                  30000 SU         0 SU         0 SU
    Jamieson  Andrew               30000 SU         0 SU         0 SU
    Nefedova  Veronika             30000 SU         0 SU     31147 SU
    Stef-praun  Tiberiu            30000 SU         0 SU       568 SU
PI-Wilde  Michael                 30000 SU         0 SU         0 SU
    Zhao  Yong                     30000 SU         0 SU     13664 SU


- Mike

Ben Clifford wrote, On 5/31/2007 4:45 PM:
> Anyone know if its possible to see how those units were spent? (eg userid? 
> job logs?)
> 
> On Wed, 16 May 2007, Veronika Nefedova wrote:
> 
>> Hi,
>>
>> I checked my Teragrid accounts and it looks like the Swift's allocation is
>> almost completely used by now (or is it just for me ?):
>>
>> Account: TG-CDA060004T
>> Title: TeraGrid:  Development Account for Multiple Grid Science Projects
>> Resource: teragrid_roaming
>> Allocation Period: 2006-08-30 to 2007-08-31
>>
>> Name (Last First) or Account       Total      Remaining        Usage
>> ----------------------------     ----------  ------------   ----------
>>   Nefedova  Veronika             30000 SU         0 SU     27491 SU
>> ----------------------------------------------------------------------
>>
>> Fortunately, Benoit has added me to his group's allocation - so I can continue
>> testing on TG. But it looks like Swift's allocation is almost gone... Should
>> we renew it ?
>>
>> Nika
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> 

-- 
Mike Wilde
Computation Institute, University of Chicago
Math & Computer Science Division
Argonne National Laboratory
Argonne, IL   60439    USA
tel 630-252-7497 fax 630-252-1997


From foster at mcs.anl.gov  Thu May 31 20:16:20 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Thu, 31 May 2007 20:16:20 -0500
Subject: [Swift-devel] Teragrid usage
In-Reply-To: <Pine.LNX.4.64.0705312144500.20212@dildano.hawaga.org.uk>
References: <EF2846E6-E97F-4A62-AF3F-7ED481AFD1C8@mcs.anl.gov>
	<Pine.LNX.4.64.0705312144500.20212@dildano.hawaga.org.uk>
Message-ID: <465F7364.9000808@mcs.anl.gov>

We shouldn't be using the "Swift development" account for application 
work. We should have a CNARI allocation, an economics allocation, a 
MolDyn allocation, etc.

Ben Clifford wrote:
> Anyone know if its possible to see how those units were spent? (eg userid? 
> job logs?)
>
> On Wed, 16 May 2007, Veronika Nefedova wrote:
>
>   
>> Hi,
>>
>> I checked my Teragrid accounts and it looks like the Swift's allocation is
>> almost completely used by now (or is it just for me ?):
>>
>> Account: TG-CDA060004T
>> Title: TeraGrid:  Development Account for Multiple Grid Science Projects
>> Resource: teragrid_roaming
>> Allocation Period: 2006-08-30 to 2007-08-31
>>
>> Name (Last First) or Account       Total      Remaining        Usage
>> ----------------------------     ----------  ------------   ----------
>>   Nefedova  Veronika             30000 SU         0 SU     27491 SU
>> ----------------------------------------------------------------------
>>
>> Fortunately, Benoit has added me to his group's allocation - so I can continue
>> testing on TG. But it looks like Swift's allocation is almost gone... Should
>> we renew it ?
>>
>> Nika
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>>     
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>   

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070531/ff1b6f97/attachment.html>