From dennis at ucar.edu  Wed Sep  1 09:59:35 2010
From: dennis at ucar.edu (John Dennis)
Date: Wed, 1 Sep 2010 08:59:35 -0600
Subject: [Swift-user] Deleting no longer necessary anonymous files in
	_concurrent
In-Reply-To: <alpine.DEB.2.00.1008271502220.3319@wozniak-desktop.mcs.anl.gov>
References: <AANLkTinZSyRjt=FGxmXpgOfOQe1CNQo0P3vvwEEGfpcf@mail.gmail.com>
	<alpine.DEB.2.00.1008271502220.3319@wozniak-desktop.mcs.anl.gov>
Message-ID: <8FDB154D-9A93-4424-B0FE-8C188162C606@ucar.edu>

Justin,

	I am a little confused by your response that cleaning up temporary  
files is not the responsibility of the Swift language.  We did not
create  the file  
'wgt_files-935f5705-27ed-4a99-9420-441269bba3a0-36-4-0-array' Swift  
did.  I certainly have not use for it.  It was created
as part of the parallelization process.   Consider the following bit  
of pseudo swift code

  foreach years {
  	file wgt_files[];
  	foreach month {
  		wgt_files[] = DoSomething();
  	}	
  }

	The 'wgt_files' is only in  scope within the 'foreach years' loop.   
Once all iterations of 'foreach years' loop has completed,
I would expect the 'wgt_files' to be deleted once a variable/file goes  
out of scope.   Isn't this really an issue of garbage collection
for the Swift language?

	While I do see how you could use the external variable to manage this  
all ourselves that would significantly complicate the
source code and remove much of the simple and elegant solution that  
Swift provides.

	Matthew and I are concerned about this because of the impact this has  
on disk usage.  For example our Swift script
requires temporary space of size 4x the input data.  Our generated  
data is tiny, while the size of the _concurrent directory
is 2x the size of the input data.  Now we want to execute the Swift  
script on ~30 TB of data.  So just to enable parallel execution
with Swift would require an extra 120TB of disk space.  I realize that  
parallel execution will consume more disk space but this seems
excessive.

Thanks,
John Dennis
	

On Aug 30, 2010, at 3:54 PM, Justin M Wozniak wrote:

> Hi Matthew
> 	Deleting files is out of the scope of the Swift language.  You can  
> of course remove them yourself in your scripts, and as long as Swift  
> does not try to stage them out you should be fine.
> 	You may want to look at external variables as another way to  
> approach this (manual 2.5).  Using external variables you can manage  
> the files in your scripts while maintaining the Swift progress model.
> 	Justin
>
> On Fri, 27 Aug 2010, Matthew Woitaszek wrote:
>> Good afternoon,
>>
>> I'm working with a script that creates arrays of intermediate files
>> using the anonymous concurrent mapper, such as:
>>
>> file wgt_file[];
>>
>> As I expect, all of these files get generated in the remote swift
>> temporary directory and are then returned to the _concurrent  
>> directory
>> on the host executing Swift. However, in this particular application,
>> they're then immediately consumed by a subsequent procedure and never
>> needed again.
>>
>> Is there a way to configure Swift or the file mapper declaration to
>> delete these files after the remaining script "consumes" them? (That
>> is, after all procedures relying on them as inputs have been
>> executed?) Or can (should?) that be done manually?
>>
>> More speculatively, is there a way to keep files like these on the
>> execution host and not even bring them back to _concurrent? (With  
>> loss
>> of generality, I'm executing on a single site, and don't really ever
>> need the file locally, for restarts or staging to another site.)
>>
>> Any advice about managing copies of large intermediate data files in
>> the Swift execution context would be appreciated!
>>
>> Matthew
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>
>
> -- 
> Justin M Wozniak
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20100901/447309bc/attachment.html>

From wozniak at mcs.anl.gov  Fri Sep  3 13:50:57 2010
From: wozniak at mcs.anl.gov (Justin M Wozniak)
Date: Fri, 3 Sep 2010 13:50:57 -0500 (Central Daylight Time)
Subject: [Swift-user] Deleting no longer necessary anonymous files in
	_concurrent
In-Reply-To: <8FDB154D-9A93-4424-B0FE-8C188162C606@ucar.edu>
References: <AANLkTinZSyRjt=FGxmXpgOfOQe1CNQo0P3vvwEEGfpcf@mail.gmail.com>
	<alpine.DEB.2.00.1008271502220.3319@wozniak-desktop.mcs.anl.gov>
	<8FDB154D-9A93-4424-B0FE-8C188162C606@ucar.edu>
Message-ID: <alpine.WNT.2.00.1009031328380.3844@JWOZNIAK-DESK>


First off, I definitely recognize the importance of doing this 
efficiently.  I also realize you may be thinking of certain "make" 
functionality that does something like this.

We are currently working on improvements to Swift's data access 
mechanisms.  Many applications create temporary intermediate data, so we 
are definitely looking at this.

>From Swift's perspective, there are two aspects- the garbage collection or 
automated delete and the data placement between job executions.  The 
garbage collection is something that could be handled in a few ways (cache 
management) or simply by tearing down the intermediate storage system. 
The data placement involves using an intermediate storage system that is 
at the compute site, preventing full stage out to the client, and ensuring 
that this storage system is accessible to both the producer and consumer 
of the pipeline data.  (Swift assumes that there is one permanent 
filesystem, the one from which it is run, and uses staging for everything 
else.  A given pair of jobs could execute at separate sites with 
different filesystems.)

There is "beta" functionality in the Swift trunk to directly utilize a 
local filesystem (that at least two applications are using).  If there is 
a "scratch" filesystem that you can use, I can direct you to that.  We are 
also productizing the ability to setup an temporary storage system for use 
by Swift, but that is not available yet.

On Wed, 1 Sep 2010, John Dennis wrote:

> Justin,
>
> 	I am a little confused by your response that cleaning up temporary 
> files is not the responsibility of the Swift language.  We did not
> create  the file 
> 'wgt_files-935f5705-27ed-4a99-9420-441269bba3a0-36-4-0-array' Swift did.  I 
> certainly have not use for it.  It was created
> as part of the parallelization process.   Consider the following bit of 
> pseudo swift code
>
> foreach years {
> 	file wgt_files[];
> 	foreach month {
> 		wgt_files[] = DoSomething();
> 	} 
> }
>
> 	The 'wgt_files' is only in  scope within the 'foreach years' loop. 
> Once all iterations of 'foreach years' loop has completed,
> I would expect the 'wgt_files' to be deleted once a variable/file goes out of 
> scope.   Isn't this really an issue of garbage collection
> for the Swift language?
>
> 	While I do see how you could use the external variable to manage this 
> all ourselves that would significantly complicate the
> source code and remove much of the simple and elegant solution that Swift 
> provides.
>
> 	Matthew and I are concerned about this because of the impact this has 
> on disk usage.  For example our Swift script
> requires temporary space of size 4x the input data.  Our generated data is 
> tiny, while the size of the _concurrent directory
> is 2x the size of the input data.  Now we want to execute the Swift script on 
> ~30 TB of data.  So just to enable parallel execution
> with Swift would require an extra 120TB of disk space.  I realize that 
> parallel execution will consume more disk space but this seems
> excessive.
>
> Thanks,
> John Dennis
> 
>
>
> On Aug 30, 2010, at 3:54 PM, Justin M Wozniak wrote:
>
>> Hi Matthew
>> 	Deleting files is out of the scope of the Swift language.  You can of 
>> course remove them yourself in your scripts, and as long as Swift does not 
>> try to stage them out you should be fine.
>> 	You may want to look at external variables as another way to approach 
>> this (manual 2.5).  Using external variables you can manage the files in 
>> your scripts while maintaining the Swift progress model.
>> 	Justin
>> 
>> On Fri, 27 Aug 2010, Matthew Woitaszek wrote:
>>> Good afternoon,
>>> 
>>> I'm working with a script that creates arrays of intermediate files
>>> using the anonymous concurrent mapper, such as:
>>> 
>>> file wgt_file[];
>>> 
>>> As I expect, all of these files get generated in the remote swift
>>> temporary directory and are then returned to the _concurrent directory
>>> on the host executing Swift. However, in this particular application,
>>> they're then immediately consumed by a subsequent procedure and never
>>> needed again.
>>> 
>>> Is there a way to configure Swift or the file mapper declaration to
>>> delete these files after the remaining script "consumes" them? (That
>>> is, after all procedures relying on them as inputs have been
>>> executed?) Or can (should?) that be done manually?
>>> 
>>> More speculatively, is there a way to keep files like these on the
>>> execution host and not even bring them back to _concurrent? (With loss
>>> of generality, I'm executing on a single site, and don't really ever
>>> need the file locally, for restarts or staging to another site.)
>>> 
>>> Any advice about managing copies of large intermediate data files in
>>> the Swift execution context would be appreciated!
>>> 
>>> Matthew
>>> _______________________________________________
>>> Swift-user mailing list
>>> Swift-user at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>> 
>> 
>> -- 
>> Justin M Wozniak
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>

-- 
Justin M Wozniak


From matthew.woitaszek at gmail.com  Tue Sep  7 13:22:42 2010
From: matthew.woitaszek at gmail.com (Matthew Woitaszek)
Date: Tue, 7 Sep 2010 12:22:42 -0600
Subject: [Swift-user] Deleting no longer necessary anonymous files in
	_concurrent
In-Reply-To: <alpine.WNT.2.00.1009031328380.3844@JWOZNIAK-DESK>
References: <AANLkTinZSyRjt=FGxmXpgOfOQe1CNQo0P3vvwEEGfpcf@mail.gmail.com>
	<alpine.DEB.2.00.1008271502220.3319@wozniak-desktop.mcs.anl.gov>
	<8FDB154D-9A93-4424-B0FE-8C188162C606@ucar.edu>
	<alpine.WNT.2.00.1009031328380.3844@JWOZNIAK-DESK>
Message-ID: <AANLkTikxC1pQT5ivK=wjUNU3YBMU6zGyd9ApriE0Kj8o@mail.gmail.com>

Hi Justin,

Thanks for your reply -- I'd definitely like to learn more about the
alternate staging/scratch options.

> There is "beta" functionality in the Swift trunk to directly utilize a local
> filesystem (that at least two applications are using).  If there is a
> "scratch" filesystem that you can use, I can direct you to that.

By this, do you mean a something like a node-local scratch system,
where files could be staged directly from _concurrent to a node
instead of a "site", or is it something else?

If node-local, I fear that might be a step backwards for our
application. In our case, the staging time vs. capacity tradeoff is
becoming quite problematic. On one hand, I really only want to keep
one copy of everything (_concurrent), but limiting the amount of
storage on the a site might increase staging, which negates the
parallelism, so I'm back to prefering a big site cache to minimize
that.

Is there a way to get tasks to read/write directly out of _concurrent
without the staging to the remote site at all? I suspect the answer is
"no" due to your description of _concurrent's importance as the
permanent file system and its use in staging to site file systems. But
in our case, we're coincidentally at one site, so the big GPFS scratch
file system area ends up holding both _concurrent as well as the swift
site temporary directory in different paths.

> The
> data placement involves using an intermediate storage system that is at the
> compute site, preventing full stage out to the client, and ensuring that
> this storage system is accessible to both the producer and consumer of the
> pipeline data.

This sounds like a feature that John and I would sign up for. :-)

I see the new use.provider.staging option in the trunk, and "sfs" is
very tempting...

(Also, thanks for your thoughts on garbage collection; I'll stick with
the possibilities in the staging arena for now!)

Thanks for your time,

Matthew


On Fri, Sep 3, 2010 at 12:50 PM, Justin M Wozniak <wozniak at mcs.anl.gov> wrote:
>
> First off, I definitely recognize the importance of doing this efficiently.
> ?I also realize you may be thinking of certain "make" functionality that
> does something like this.
>
> We are currently working on improvements to Swift's data access mechanisms.
> ?Many applications create temporary intermediate data, so we are definitely
> looking at this.
>
>> From Swift's perspective, there are two aspects- the garbage collection or
>
> automated delete and the data placement between job executions. ?The garbage
> collection is something that could be handled in a few ways (cache
> management) or simply by tearing down the intermediate storage system. The
> data placement involves using an intermediate storage system that is at the
> compute site, preventing full stage out to the client, and ensuring that
> this storage system is accessible to both the producer and consumer of the
> pipeline data. ?(Swift assumes that there is one permanent filesystem, the
> one from which it is run, and uses staging for everything else. ?A given
> pair of jobs could execute at separate sites with different filesystems.)
>
> There is "beta" functionality in the Swift trunk to directly utilize a local
> filesystem (that at least two applications are using). ?If there is a
> "scratch" filesystem that you can use, I can direct you to that. ?We are
> also productizing the ability to setup an temporary storage system for use
> by Swift, but that is not available yet.
>
> On Wed, 1 Sep 2010, John Dennis wrote:
>
>> Justin,
>>
>> ? ? ? ?I am a little confused by your response that cleaning up temporary
>> files is not the responsibility of the Swift language. ?We did not
>> create ?the file
>> 'wgt_files-935f5705-27ed-4a99-9420-441269bba3a0-36-4-0-array' Swift did. ?I
>> certainly have not use for it. ?It was created
>> as part of the parallelization process. ? Consider the following bit of
>> pseudo swift code
>>
>> foreach years {
>> ? ? ? ?file wgt_files[];
>> ? ? ? ?foreach month {
>> ? ? ? ? ? ? ? ?wgt_files[] = DoSomething();
>> ? ? ? ?} }
>>
>> ? ? ? ?The 'wgt_files' is only in ?scope within the 'foreach years' loop.
>> Once all iterations of 'foreach years' loop has completed,
>> I would expect the 'wgt_files' to be deleted once a variable/file goes out
>> of scope. ? Isn't this really an issue of garbage collection
>> for the Swift language?
>>
>> ? ? ? ?While I do see how you could use the external variable to manage
>> this all ourselves that would significantly complicate the
>> source code and remove much of the simple and elegant solution that Swift
>> provides.
>>
>> ? ? ? ?Matthew and I are concerned about this because of the impact this
>> has on disk usage. ?For example our Swift script
>> requires temporary space of size 4x the input data. ?Our generated data is
>> tiny, while the size of the _concurrent directory
>> is 2x the size of the input data. ?Now we want to execute the Swift script
>> on ~30 TB of data. ?So just to enable parallel execution
>> with Swift would require an extra 120TB of disk space. ?I realize that
>> parallel execution will consume more disk space but this seems
>> excessive.
>>
>> Thanks,
>> John Dennis
>>
>>
>>
>> On Aug 30, 2010, at 3:54 PM, Justin M Wozniak wrote:
>>
>>> Hi Matthew
>>> ? ? ? ?Deleting files is out of the scope of the Swift language. ?You can
>>> of course remove them yourself in your scripts, and as long as Swift does
>>> not try to stage them out you should be fine.
>>> ? ? ? ?You may want to look at external variables as another way to
>>> approach this (manual 2.5). ?Using external variables you can manage the
>>> files in your scripts while maintaining the Swift progress model.
>>> ? ? ? ?Justin
>>>
>>> On Fri, 27 Aug 2010, Matthew Woitaszek wrote:
>>>>
>>>> Good afternoon,
>>>>
>>>> I'm working with a script that creates arrays of intermediate files
>>>> using the anonymous concurrent mapper, such as:
>>>>
>>>> file wgt_file[];
>>>>
>>>> As I expect, all of these files get generated in the remote swift
>>>> temporary directory and are then returned to the _concurrent directory
>>>> on the host executing Swift. However, in this particular application,
>>>> they're then immediately consumed by a subsequent procedure and never
>>>> needed again.
>>>>
>>>> Is there a way to configure Swift or the file mapper declaration to
>>>> delete these files after the remaining script "consumes" them? (That
>>>> is, after all procedures relying on them as inputs have been
>>>> executed?) Or can (should?) that be done manually?
>>>>
>>>> More speculatively, is there a way to keep files like these on the
>>>> execution host and not even bring them back to _concurrent? (With loss
>>>> of generality, I'm executing on a single site, and don't really ever
>>>> need the file locally, for restarts or staging to another site.)
>>>>
>>>> Any advice about managing copies of large intermediate data files in
>>>> the Swift execution context would be appreciated!
>>>>
>>>> Matthew
>>>> _______________________________________________
>>>> Swift-user mailing list
>>>> Swift-user at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>>>
>>>
>>> --
>>> Justin M Wozniak
>>> _______________________________________________
>>> Swift-user mailing list
>>> Swift-user at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>
>
> --
> Justin M Wozniak
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>


From hategan at mcs.anl.gov  Tue Sep  7 13:53:16 2010
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 07 Sep 2010 11:53:16 -0700
Subject: [Swift-user] Deleting no longer necessary anonymous files in
	_concurrent
In-Reply-To: <AANLkTikxC1pQT5ivK=wjUNU3YBMU6zGyd9ApriE0Kj8o@mail.gmail.com>
References: <AANLkTinZSyRjt=FGxmXpgOfOQe1CNQo0P3vvwEEGfpcf@mail.gmail.com>
	<alpine.DEB.2.00.1008271502220.3319@wozniak-desktop.mcs.anl.gov>
	<8FDB154D-9A93-4424-B0FE-8C188162C606@ucar.edu>
	<alpine.WNT.2.00.1009031328380.3844@JWOZNIAK-DESK>
	<AANLkTikxC1pQT5ivK=wjUNU3YBMU6zGyd9ApriE0Kj8o@mail.gmail.com>
Message-ID: <1283885596.10503.25.camel@blabla2.none>

On Tue, 2010-09-07 at 12:22 -0600, Matthew Woitaszek wrote:
[...]
> 
> > There is "beta" functionality in the Swift trunk to directly utilize a local
> > filesystem (that at least two applications are using).  If there is a
> > "scratch" filesystem that you can use, I can direct you to that.
> 
> By this, do you mean a something like a node-local scratch system,
> where files could be staged directly from _concurrent to a node
> instead of a "site", or is it something else?
> 
> If node-local, I fear that might be a step backwards for our
> application. In our case, the staging time vs. capacity tradeoff is
> becoming quite problematic. On one hand, I really only want to keep
> one copy of everything (_concurrent), but limiting the amount of
> storage on the a site might increase staging, which negates the
> parallelism, so I'm back to prefering a big site cache to minimize
> that.

The data, intermediate or not, has to be at least in one place.
The stable/traditional version of swift tends to have at least 3 copies
of each piece of data:
- on the client (1)
- on the shared fs of a target cluster (2)
- on the compute node (3)

(3) is arguable. One can run apps using data directly from (2). However,
it's been our experience that, due to the way SFSes work, copying the
data to the compute node yields better performance in most cases
(actually pretty much all cases we've measured). This may not
necessarily apply to your case, and we'd like to hear if that's the
case. You can switch between the two behaviors by specifying an
additional <scratch> directory in sites.xml. If that's there, (3)
applies. If not symlinks to (2) are used instead. I'll call this issue
(A).

Stuff we're working on currently includes bypassing (2) and copying data
directly between (1) and (3). It turns out that shared file systems are
pretty poor when it comes to parallelism, due to distributed
consistencies they have to enforce. However, given that in swift all
data is single-assignment (which translates into files being written at
most once), most of the problems that SFSes need to deal with don't
really exist, but there is no way to tell them that. So we've got some
prototypes there. At least on the BG/P we get clear (a few times)
performance improvements if we do (1) <-> (3).

Ideally we would also want to bypass (3) -> (1) -> (3) for intermediate
data, since we can do (3) -> (3) instead. This is something Justin has
been working on, I believe on single clusters. I'd personally like to
see it working between multiple clusters, too.

> 
> Is there a way to get tasks to read/write directly out of _concurrent
> without the staging to the remote site at all? I suspect the answer is
> "no" due to your description of _concurrent's importance as the
> permanent file system and its use in staging to site file systems. But
> in our case, we're coincidentally at one site, so the big GPFS scratch
> file system area ends up holding both _concurrent as well as the swift
> site temporary directory in different paths.

It is possible, but not currently there. Again, issue (A) may apply
here, so provider.staging/sfs may be better.

Mihael


From jon.monette at gmail.com  Wed Sep  8 18:45:49 2010
From: jon.monette at gmail.com (Jonathan Monette)
Date: Wed, 08 Sep 2010 18:45:49 -0500
Subject: [Swift-user] Swift app question
Message-ID: <4C88202D.3010406@gmail.com>

Hello,
     This is probably a simple question but when are app functions 
executed?  Take a look at this psuedocode.


foreach y in years
{
         Month m1< "month1.txt">;
         Month m2 <"month2.txt">;

         Year y = calculate( m1, m2 );
}

When will the app "calculate" be executed?  Will it execute as soon as 
m1 and m2 for a given iteration are mapped or will it wait till each 
thread has mapped its own m1 and m2 and execute the apps all together.


From hategan at mcs.anl.gov  Wed Sep  8 19:51:15 2010
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 08 Sep 2010 17:51:15 -0700
Subject: [Swift-user] Re: Swift app question
In-Reply-To: <4C88202D.3010406@gmail.com>
References: <4C88202D.3010406@gmail.com>
Message-ID: <1283993475.4340.2.camel@blabla2.none>

On Wed, 2010-09-08 at 18:45 -0500, Jonathan Monette wrote:
> Hello,
>      This is probably a simple question but when are app functions 
> executed?  Take a look at this psuedocode.
> 
> 
> foreach y in years
> {
>          Month m1< "month1.txt">;
>          Month m2 <"month2.txt">;
> 
>          Year y = calculate( m1, m2 );
> }
> 
> When will the app "calculate" be executed?  Will it execute as soon as 
> m1 and m2 for a given iteration are mapped or will it wait till each 
> thread has mapped its own m1 and m2 and execute the apps all together.

Is the use of y twice (once in the foreach and once for the result of
calculate()) accidental?

Mihael


From jon.monette at gmail.com  Wed Sep  8 19:52:56 2010
From: jon.monette at gmail.com (jon.monette at gmail.com)
Date: Thu, 9 Sep 2010 00:52:56 +0000
Subject: [Swift-user] Re: Swift app question
Message-ID: <803451294-1283993577-cardhu_decombobulator_blackberry.rim.net-2140308723-@bda2073.bisx.prod.on.blackberry>

Yes. This is the general work flow to part of my problem
------Original Message------
From: Mihael Hategan
To: Jonathan Monette
Cc: swift-user at ci.uchicago.edu
Cc: Justin M Wozniak
Subject: Re: Swift app question
Sent: Sep 8, 2010 7:51 PM

On Wed, 2010-09-08 at 18:45 -0500, Jonathan Monette wrote:
> Hello,
>      This is probably a simple question but when are app functions 
> executed?  Take a look at this psuedocode.
> 
> 
> foreach y in years
> {
>          Month m1< "month1.txt">;
>          Month m2 <"month2.txt">;
> 
>          Year y = calculate( m1, m2 );
> }
> 
> When will the app "calculate" be executed?  Will it execute as soon as 
> m1 and m2 for a given iteration are mapped or will it wait till each 
> thread has mapped its own m1 and m2 and execute the apps all together.

Is the use of y twice (once in the foreach and once for the result of
calculate()) accidental?

Mihael


Sent on the Sprint? Now Network from my BlackBerry?

From hategan at mcs.anl.gov  Wed Sep  8 21:21:32 2010
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 08 Sep 2010 19:21:32 -0700
Subject: [Swift-user] Re: Swift app question
In-Reply-To: <803451294-1283993577-cardhu_decombobulator_blackberry.rim.net-2140308723-@bda2073.bisx.prod.on.blackberry>
References: <803451294-1283993577-cardhu_decombobulator_blackberry.rim.net-2140308723-@bda2073.bisx.prod.on.blackberry>
Message-ID: <1283998892.4442.3.camel@blabla2.none>

What the code says there is iterate through some list holding the
current iteration variable in "y" and then declare an entirely new
variable, also named "y", in the inner scope of the iteration an assign
it the value of some computation. Is this what you meant?

In other words variables in swift are single assignment. You can't
successively assign different values to the same variable in the same
scope.

On Thu, 2010-09-09 at 00:52 +0000, jon.monette at gmail.com wrote:
> Yes. This is the general work flow to part of my problem
> ------Original Message------
> From: Mihael Hategan
> To: Jonathan Monette
> Cc: swift-user at ci.uchicago.edu
> Cc: Justin M Wozniak
> Subject: Re: Swift app question
> Sent: Sep 8, 2010 7:51 PM
> 
> On Wed, 2010-09-08 at 18:45 -0500, Jonathan Monette wrote:
> > Hello,
> >      This is probably a simple question but when are app functions 
> > executed?  Take a look at this psuedocode.
> > 
> > 
> > foreach y in years
> > {
> >          Month m1< "month1.txt">;
> >          Month m2 <"month2.txt">;
> > 
> >          Year y = calculate( m1, m2 );
> > }
> > 
> > When will the app "calculate" be executed?  Will it execute as soon as 
> > m1 and m2 for a given iteration are mapped or will it wait till each 
> > thread has mapped its own m1 and m2 and execute the apps all together.
> 
> Is the use of y twice (once in the foreach and once for the result of
> calculate()) accidental?
> 
> Mihael
> 
> 
> 
> Sent on the Sprint? Now Network from my BlackBerry?


From jon.monette at gmail.com  Wed Sep  8 21:28:35 2010
From: jon.monette at gmail.com (Jonathan Monette)
Date: Wed, 08 Sep 2010 21:28:35 -0500
Subject: [Swift-user] Re: Swift app question
In-Reply-To: <1283998892.4442.3.camel@blabla2.none>
References: <803451294-1283993577-cardhu_decombobulator_blackberry.rim.net-2140308723-@bda2073.bisx.prod.on.blackberry>
	<1283998892.4442.3.camel@blabla2.none>
Message-ID: <4C884653.3000902@gmail.com>


This is what I meant.

foreach y in years
{
     Month m1<"month1.txt">;
     Month m2<"month2.txt">;

     Year x = calculate( m1, m2 );
}

I know that threads will be created and each iteration for the foreach 
loop will run in parallel.  What I am trying to understand is when is 
the calculate app executed.  This is a very dumbed down example but I 
want to know will x be mapped to the output of calculate once m1 and m2 
are closed or is there a "barrier" that blocks until all threads have 
finished mapping m1 and m2 before the apps are run in parallel?

On 9/8/10 9:21 PM, Mihael Hategan wrote:
> foreach y in years
> >  >  {
> >  >            Month m1<  "month1.txt">;
> >  >            Month m2<"month2.txt">;
> >  >  
> >  >            Year y = calculate( m1, m2 );
> >  >  }

-- 
Jon

Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein


From hategan at mcs.anl.gov  Wed Sep  8 21:37:38 2010
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 08 Sep 2010 19:37:38 -0700
Subject: [Swift-user] Re: Swift app question
In-Reply-To: <4C884653.3000902@gmail.com>
References: <803451294-1283993577-cardhu_decombobulator_blackberry.rim.net-2140308723-@bda2073.bisx.prod.on.blackberry>
	<1283998892.4442.3.camel@blabla2.none>  <4C884653.3000902@gmail.com>
Message-ID: <1283999858.4542.7.camel@blabla2.none>

Theoretically since there is no dependency between m1, m2 and y, it
should run right ahead. Practically each invocation will probably wait
for values in years.

But I have to ask. Why bother doing this for every year if, at least
from your code, x would have the same value every time (i.e. there is no
actual dependency on y)?

Mihael

On Wed, 2010-09-08 at 21:28 -0500, Jonathan Monette wrote:
> 
> This is what I meant.
> 
> foreach y in years
> {
>      Month m1<"month1.txt">;
>      Month m2<"month2.txt">;
> 
>      Year x = calculate( m1, m2 );
> }
> 
> I know that threads will be created and each iteration for the foreach 
> loop will run in parallel.  What I am trying to understand is when is 
> the calculate app executed.  This is a very dumbed down example but I 
> want to know will x be mapped to the output of calculate once m1 and m2 
> are closed or is there a "barrier" that blocks until all threads have 
> finished mapping m1 and m2 before the apps are run in parallel?
> 
> On 9/8/10 9:21 PM, Mihael Hategan wrote:
> > foreach y in years
> > >  >  {
> > >  >            Month m1<  "month1.txt">;
> > >  >            Month m2<"month2.txt">;
> > >  >  
> > >  >            Year y = calculate( m1, m2 );
> > >  >  }
> 


From jon.monette at gmail.com  Wed Sep  8 21:40:03 2010
From: jon.monette at gmail.com (Jonathan Monette)
Date: Wed, 08 Sep 2010 21:40:03 -0500
Subject: [Swift-user] Re: Swift app question
In-Reply-To: <1283999858.4542.7.camel@blabla2.none>
References: <803451294-1283993577-cardhu_decombobulator_blackberry.rim.net-2140308723-@bda2073.bisx.prod.on.blackberry>	
	<1283998892.4442.3.camel@blabla2.none> <4C884653.3000902@gmail.com>
	<1283999858.4542.7.camel@blabla2.none>
Message-ID: <4C884903.2010908@gmail.com>

  Ok.  And like I said this was a dumbed down example.  I just needed to 
show a mappings and didn't want to use a fancy mapper.  In my code x 
will be a different value for each iteration.  Thanks though.  That 
clears things up.

On 9/8/10 9:37 PM, Mihael Hategan wrote:
> Theoretically since there is no dependency between m1, m2 and y, it
> should run right ahead. Practically each invocation will probably wait
> for values in years.
>
> But I have to ask. Why bother doing this for every year if, at least
> from your code, x would have the same value every time (i.e. there is no
> actual dependency on y)?
>
> Mihael
>
> On Wed, 2010-09-08 at 21:28 -0500, Jonathan Monette wrote:
>> This is what I meant.
>>
>> foreach y in years
>> {
>>       Month m1<"month1.txt">;
>>       Month m2<"month2.txt">;
>>
>>       Year x = calculate( m1, m2 );
>> }
>>
>> I know that threads will be created and each iteration for the foreach
>> loop will run in parallel.  What I am trying to understand is when is
>> the calculate app executed.  This is a very dumbed down example but I
>> want to know will x be mapped to the output of calculate once m1 and m2
>> are closed or is there a "barrier" that blocks until all threads have
>> finished mapping m1 and m2 before the apps are run in parallel?
>>
>> On 9/8/10 9:21 PM, Mihael Hategan wrote:
>>> foreach y in years
>>>>   >   {
>>>>   >             Month m1<   "month1.txt">;
>>>>   >             Month m2<"month2.txt">;
>>>>   >
>>>>   >             Year y = calculate( m1, m2 );
>>>>   >   }
>

-- 
Jon

Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein


From hategan at mcs.anl.gov  Wed Sep  8 22:24:56 2010
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 08 Sep 2010 20:24:56 -0700
Subject: [Swift-user] Re: Swift app question
In-Reply-To: <4C884903.2010908@gmail.com>
References: <803451294-1283993577-cardhu_decombobulator_blackberry.rim.net-2140308723-@bda2073.bisx.prod.on.blackberry>
	<1283998892.4442.3.camel@blabla2.none>  <4C884653.3000902@gmail.com>
	<1283999858.4542.7.camel@blabla2.none>  <4C884903.2010908@gmail.com>
Message-ID: <1284002696.4694.3.camel@blabla2.none>

On Wed, 2010-09-08 at 21:40 -0500, Jonathan Monette wrote:
> Ok.  And like I said this was a dumbed down example.  I just needed to 
> show a mappings and didn't want to use a fancy mapper.  In my code x 
> will be a different value for each iteration.  Thanks though.  That 
> clears things up.

I was a bit confused. If there is a dependency relation between y and
the inputs to an app (whether though a mapper or directly) than it has
to be satisfied for the app to run. However, when it comes to mappers,
swift allows some hidden dependencies to be expressed. For example when
some app produces "a.txt" and somewhere you say f <"a.txt">. Swift won't
enforce that.

Mihael

[...]


From jon.monette at gmail.com  Wed Sep  8 22:32:48 2010
From: jon.monette at gmail.com (Jonathan Monette)
Date: Wed, 08 Sep 2010 22:32:48 -0500
Subject: [Swift-user] Re: Swift app question
In-Reply-To: <1284002696.4694.3.camel@blabla2.none>
References: <803451294-1283993577-cardhu_decombobulator_blackberry.rim.net-2140308723-@bda2073.bisx.prod.on.blackberry>	
	<1283998892.4442.3.camel@blabla2.none>
	<4C884653.3000902@gmail.com>	
	<1283999858.4542.7.camel@blabla2.none> <4C884903.2010908@gmail.com>
	<1284002696.4694.3.camel@blabla2.none>
Message-ID: <4C885560.9040108@gmail.com>

Here is my actual code I was referencing:

( Image diff_imgs[] ) mDiffBatch( Table diff_tbl, MosaicData hdr )
{
     DiffStruct diffs[] <csv_mapper; file = diff_tbl, skip = 1, 
hdelim="| ">;

     tracef( "%s is closed %k\n", @filename( hdr ), hdr 
);                                             //1
     tracef( "Mapped %i files from the csv_mapper and \"%s\"\n", 
@length( diffs ), @diff_tbl );      //2

     foreach d_entry, i in diffs
     {
         tracef( "%s is closed on iteration %i%k\n", @d_entry.plus, i, 
d_entry.plus );                    //3
         tracef( "%s is closed onn iteration %i%k\n", @d_entry.minus,  
i, d_entry.minus );           //4

         Image proj_1 <single_file_mapper; file = @strcat( "proj_dir/", 
@d_entry.plus )>;
         Image proj_2 <single_file_mapper; file = @strcat( "proj_dir/", 
@d_entry.minus )>;

         tracef( "%s is closed on iteration %i%k\n", @proj_1, i, proj_1 
);                                       //5
         tracef( "%s is closed on iteration %i%k\n", @proj_2, i ,proj_2 
);                                      //6

         Image diff_img <single_file_mapper; file = @strcat( 
"diff_dir/", @d_entry.diff )>;
         tracef( "diff_img was mapped to %s on iteration %i\n" 
, at diff_img, i );                           //7
         diff_img = mDiff( proj_1, proj_2, hdr );

         tracef( "DIFFERENCED %s on iteration %i%k\n", @filename( 
diff_img ), i ,diff_img );   //8
         diff_imgs[ i ] = diff_img;
     }
}

tracef 1 and 2 always print out.  tracef 3, 4, 5, 6, and 7 print out 
some of the iterations but never all.  And the tracef 8 never gets 
printed because the script hangs and the app mDiff is never executed.   
This is what I have been trying to recreate.  But simply taking out the 
mDiff app and replacing it with a script that basically does a cat has 
the script complete to the end.  So I have been trying to understand 
what Swift is actually doing.
This code hangs.

On 09/08/2010 10:24 PM, Mihael Hategan wrote:
> On Wed, 2010-09-08 at 21:40 -0500, Jonathan Monette wrote:
>    
>> Ok.  And like I said this was a dumbed down example.  I just needed to
>> show a mappings and didn't want to use a fancy mapper.  In my code x
>> will be a different value for each iteration.  Thanks though.  That
>> clears things up.
>>      
> I was a bit confused. If there is a dependency relation between y and
> the inputs to an app (whether though a mapper or directly) than it has
> to be satisfied for the app to run. However, when it comes to mappers,
> swift allows some hidden dependencies to be expressed. For example when
> some app produces "a.txt" and somewhere you say f<"a.txt">. Swift won't
> enforce that.
>
> Mihael
>
> [...]
>
>    


From jon.monette at gmail.com  Thu Sep  9 11:50:19 2010
From: jon.monette at gmail.com (Jonathan Monette)
Date: Thu, 09 Sep 2010 11:50:19 -0500
Subject: [Swift-user] external mapper
Message-ID: <4C89104B.6030201@gmail.com>

Hello,
     How do I pass parameters to the script in the external mapper?  I 
see there is a -symbol option but how does that work?  What does the 
syntax look like?


From hategan at mcs.anl.gov  Thu Sep  9 12:01:55 2010
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 09 Sep 2010 10:01:55 -0700
Subject: [Swift-user] external mapper
In-Reply-To: <4C89104B.6030201@gmail.com>
References: <4C89104B.6030201@gmail.com>
Message-ID: <1284051715.6082.2.camel@blabla2.none>

On Thu, 2010-09-09 at 11:50 -0500, Jonathan Monette wrote:
> Hello,
>      How do I pass parameters to the script in the external mapper?  I 
> see there is a -symbol option but how does that work?  What does the 
> syntax look like?

Any mapper parameter will be passed as "-param" "value" to the argv of
the script.

So:

file f <external_mapper;exec="myscript",a="v1",b="v2">;

should result in 
"exec -a v1 -b v2"

Mihael


From jon.monette at gmail.com  Thu Sep  9 12:01:39 2010
From: jon.monette at gmail.com (Jonathan Monette)
Date: Thu, 09 Sep 2010 12:01:39 -0500
Subject: [Swift-user] external mapper
In-Reply-To: <1284051715.6082.2.camel@blabla2.none>
References: <4C89104B.6030201@gmail.com> <1284051715.6082.2.camel@blabla2.none>
Message-ID: <4C8912F3.3010301@gmail.com>

Alright.  Thanks.

On 09/09/2010 12:01 PM, Mihael Hategan wrote:
> On Thu, 2010-09-09 at 11:50 -0500, Jonathan Monette wrote:
>    
>> Hello,
>>       How do I pass parameters to the script in the external mapper?  I
>> see there is a -symbol option but how does that work?  What does the
>> syntax look like?
>>      
> Any mapper parameter will be passed as "-param" "value" to the argv of
> the script.
>
> So:
>
> file f<external_mapper;exec="myscript",a="v1",b="v2">;
>
> should result in
> "exec -a v1 -b v2"
>
> Mihael
>
>
>    


From iraicu at cs.uchicago.edu  Tue Sep 21 11:49:57 2010
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Tue, 21 Sep 2010 11:49:57 -0500
Subject: [Swift-user] CFP: Workshop on Data Intensive Computing in the
 Clouds (DataCloud) 2011, co-located with IEEE IPDPS 2011
Message-ID: <4C98E235.70907@cs.uchicago.edu>

 
---------------------------------------------------------------------------------
                             *** Call for Papers ***
        WORKSHOP ON DATA INTENSIVE COMPUTING IN THE CLOUDS (DATACLOUD 2011)
           In conjunction with IPDPS 2011, May 16, Anchorage, Alaska
                  http://www.cct.lsu.edu/~kosar/DataCloud2011
---------------------------------------------------------------------------------

The First International Workshop on Data Intensive Computing in the Clouds
(DataCloud2011) will be held in conjunction with the 25th IEEE 
International
Parallel and Distributed Computing Symposium (IPDPS 2011), in Anchorage, 
Alaska.

Applications and experiments in all areas of science are becoming 
increasingly
complex and more demanding in terms of their computational and data 
requirements.
Some applications generate data volumes reaching hundreds of terabytes 
and even
petabytes. As scientific applications become more data intensive, the 
management
of data resources and dataflow between the storage and compute resources is
becoming the main bottleneck. Analyzing, visualizing, and disseminating 
these
large data sets has become a major challenge and data intensive 
computing is now
considered as the ?fourth paradigm? in scientific discovery after 
theoretical,
experimental, and computational science.

DataCloud2011 will provide the scientific community a dedicated forum for
discussing new research, development, and deployment efforts in running
data-intensive computing workloads on Cloud Computing infrastructures. The
DataCloud2011 workshop will focus on the use of    cloud-based 
technologies to meet
the new data intensive scientific challenges that are not well served by 
the
current supercomputers, grids or compute-intensive clouds. We believe the
workshop will be an excellent place to help the community define the 
current
state, determine future goals, and present architectures and services 
for future
clouds supporting data intensive computing.

TOPICS
---------------------------------------------------------------------------------
- Data-intensive cloud computing applications, characteristics, challenges
- Case studies of data intensive computing in the clouds
- Performance evaluation of data clouds, data grids, and data centers
- Energy-efficient data cloud design and management
- Data placement, scheduling, and interoperability in the clouds
- Accountability, QoS, and SLAs
- Data privacy and protection in a public cloud environment
- Distributed file systems for clouds
- Data streaming and parallelization
- New programming models for data-intensive cloud computing
- Scalability issues in clouds
- Social computing and massively social gaming
- 3D Internet and implications
- Future research challenges in data-intensive cloud computing

IMPORTANT DATES
---------------------------------------------------------------------------------
Abstract submission: December 1, 2010
Paper submission: December 8, 2010
Acceptance notification: January 7, 2011
Final papers due: February 1, 2011

PAPER SUBMISSION
---------------------------------------------------------------------------------
DataCloud2011 invites authors to submit original and unpublished technical
papers. All submissions will be peer-reviewed and judged on correctness,
originality, technical strength, significance, quality of presentation, and
relevance to the workshop topics of interest. Submitted papers may not have
appeared in or be under consideration for another workshop, conference or a
journal, nor may they be under review or submitted to another forum 
during the
DataCloud2011 review process. Submitted papers may not exceed 10 
single-spaced
double-column pages using 10-point size font on 8.5x11 inch pages (IEEE
conference style, document templates can be found at
ftp://pubftp.computer.org/Press/Outgoing/proceedings/instruct8.5x11.pdf and
ftp://pubftp.computer.org/Press/Outgoing/proceedings/instruct8.5x11.doc),
including figures, tables, and references. A 250 word abstract (PDF 
format) must
be submitted online at https://cmt.research.microsoft.com/DataCloud2011/ 
before
the deadline of December 1st, 2010 at 11:59PM PST; the final 10 page 
papers in
PDF format will be due on December 8th, 2010 at    11:59PM PST.

WORKSHOP and PROGRAM CHAIRS
---------------------------------------------------------------------------------
Tevfik Kosar, Louisiana State University
Ioan Raicu, Illinois Institute of Technology

STEERING COMMITTEE
---------------------------------------------------------------------------------
Ian Foster, Univ of Chicago & Argonne National Lab
Geoffrey Fox, Indiana University
James Hamilton, Amazon Web Services
Manish Parashar, Rutgers University & NSF
Dan Reed, Microsoft Research
Rich Wolski, University of California, Santa Barbara
Liang-Jie Zhang, IBM Research

PROGRAM COMMITTEE
---------------------------------------------------------------------------------
David Abramson, Monash University, Australia
Roger Barga, Microsoft Research
John Bent, Los Alamos National Laboratory
Umit Catalyurek, Ohio State University
Abhishek Chandra, University of Minnesota
Rong N. Chang, IBM Research
Alok Choudhary, Northwestern University
Brian Cooper, Google
Ewa Deelman, University of Southern California
Murat Demirbas, University at Buffalo
Adriana Iamnitchi, University of South Florida
Maria Indrawan, Monash University, Australia
Alexandru Iosup, Delft University of Technology, Netherlands
Peter Kacsuk, Hungarian Academy of Sciences, Hungary
Dan Katz, University of Chicago
Steven Ko, University at Buffalo
Gregor von Laszewski, Rochester Institute of Technology
Erwin Laure, CERN, Switzerland
Ignacio Llorente, Universidad Complutense de Madrid, Spain
Reagan Moore, University of North Carolina
Lavanya Ramakrishnan, Lawrence Berkeley National Laboratory
Ian Taylor, Cardiff University, UK
Douglas Thain, University of Notre Dame
Bernard Traversat, Oracle
Yong Zhao, Univ of Electronic Science & Tech of China


-- 
=================================================================
Ioan Raicu, Ph.D.
Assistant Professor
=================================================================
Computer Science Department
Illinois Institute of Technology
10 W. 31st Street
Stuart Building, Room 237D
Chicago, IL 60616
=================================================================
Cel:    1-847-722-0876
Office: 1-312-567-5704
Email:iraicu at cs.iit.edu
Web:http://www.cs.iit.edu/~iraicu/
=================================================================
=================================================================


From iraicu at cs.uchicago.edu  Fri Sep 24 12:45:16 2010
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Fri, 24 Sep 2010 12:45:16 -0500
Subject: [Swift-user] CFP: Special Issue on Science-driven Cloud Computing, 
 in the Scientific Programming Journal
Message-ID: <4C9CE3AC.5000605@cs.uchicago.edu>


Call for Papers

---------------------------------------------------------------------------------
Scientific Programming Journal
Special Issue on Science-driven Cloud Computing
http://www.cs.iit.edu/~iraicu/SPJ_ScienceCloud_2011/

Overview
---------------------------------------------------------------------------------
Cloud computing first established in the business computing domain is now a topic
of research in computer science and an interesting execution platform for science
applications. Today there are a number of commercial and science cloud
deployments, including those provided by Amazon, Google, IBM, Microsoft, and
others. Campus and national labs are also deploying their own cloud solutions.
The ability to control the resources and the pay-as-you go usage model enables
new approaches to application development and resource provisioning. Science
applications are looking towards the cloud to provide a stable and customizable
execution environment. This special issue of the Scientific Programming Journal	
is dedicated to the computational challenges and opportunities of cloud
computing.

Topics
---------------------------------------------------------------------------------
We invite the submission of original work that is related to the topics below.
Topics of interest include (in the context of Cloud Computing):
*	Scientific cloud applications
*	Novel programming models
*	High-performance computing
*	Many-task computing
*	Resource scheduling
*	Compute resource management
*	Resource provisioning and configuration (compute, data, and network)
*	Adaptive computing and resource usage
*	Power-aware use of clouds computing
*	Storage cloud architectures and implementations
*	Cloud scalability and elasticity
*	Performance Evaluations and Benchmarks
*	Quality of service and SLA management
*	Cloud heterogeneity
*	Charging models
*	Models, frameworks and systems for cloud security and privacy
*	Monitoring

Paper Submission
---------------------------------------------------------------------------------
Authors are encouraged to submit high quality, original work that has neither
appeared in, nor is under consideration by other journals. The manuscript must
follow the formatting instructions found at the Scientific Programming site at
http://www.iospress.nl/html/10589244_ita.html. Papers should be not more than 25
pages of single column text using double spaced 10 point size on 8.5 x 11 inch
pages and 1" margins (including all text, figures, and references). A 250 word
abstract (PDF format) must be submitted online at
https://cmt.research.microsoft.com/SPJ_ScienceCloud_2011/  before the deadline of
October 22nd, 2010 at 11:59PM PST; the final 25 page papers in PDF format will be
due on October 29th, 2010 at 11:59PM PST. Papers will be peer-reviewed, and
accepted papers will be published in the IOS Press. Notifications of the paper
decisions will be sent out by December 1st, 2010. Accepted papers will be
published by IOS Press without any fees to the authors.

Important dates
---------------------------------------------------------------------------------
*	Abstract Due:			October 22nd, 2010
*	Papers Due:			October 29th, 2010
*	Reviews Completed:		December 1st, 2010
*	Publication Date:		Early 2011

Guest Editors:
---------------------------------------------------------------------------------
Ivona Brandic, Vienna University of Technology,ivona at infosys.tuwien.ac.at
Ewa Deelman, University of Southern California,deelman at isi.edu
Ioan Raicu, Illinois Institute of Technology,iraicu at cs.iit.edu

For more information on this special issue in Scientific Programming Journal,
please visithttp://www.cs.iit.edu/~iraicu/SPJ_ScienceCloud_2011/.

-- 
=================================================================
Ioan Raicu, Ph.D.
Assistant Professor
=================================================================
Computer Science Department
Illinois Institute of Technology
10 W. 31st Street
Stuart Building, Room 237D
Chicago, IL 60616
=================================================================
Cel:    1-847-722-0876
Office: 1-312-567-5704
Email:iraicu at cs.iit.edu
Web:http://www.cs.iit.edu/~iraicu/
=================================================================
=================================================================


From aespinosa at cs.uchicago.edu  Tue Sep 28 17:09:37 2010
From: aespinosa at cs.uchicago.edu (Allan Espinosa)
Date: Tue, 28 Sep 2010 17:09:37 -0500
Subject: [Swift-user] persitent service + manual workers
Message-ID: <AANLkTim=GT-AakH4w5MNTFEJ6aw4EWeaTXPQ3Z=nYZOr@mail.gmail.com>

Hi,

I'm having trouble with the workers registering with the coaster service:

$ ./trunk/bin/coaster-service -nosec
Local contacts: [http://128.135.125.18:50000]
Started local service: http://128.135.125.18:50000
Started coaster service: http://128.135.125.18:1984
Started coaster service: http://128.135.125.18:1984
SC-null: Disabling heartbeats (config is null)
Multiplexer 0 started
(0) Scheduling SC-null for addition
nullChannel started
Multiplexer 1 started
Unknown handler: REGISTER. Available handlers: {CHMOD=class
org.globus.cog.abstraction.impl.file.coaster.handlers.ChmodHandler,
ISDIR=class org.globus.cog.abstraction.impl.file.coaster.handlers.IsDirectoryHandler,
LIST=class org.globus.cog.abstraction.impl.file.coaster.handlers.ListHandler,
SUBMITJOB=class
org.globus.cog.abstraction.coaster.service.SubmitJobHandler,
MKDIR=class org.globus.cog.abstraction.impl.file.coaster.handlers.MkdirHandler,
PUT=class org.globus.cog.abstraction.impl.file.coaster.handlers.PutFileHandler,
DEL=class org.globus.cog.abstraction.impl.file.coaster.handlers.DeleteHandler,
HEARTBEAT=class
org.globus.cog.karajan.workflow.service.handlers.HeartBeatHandler,
CONFIGSERVICE=class
org.globus.cog.abstraction.coaster.service.ServiceConfigurationHandler,
FILEINFO=class org.globus.cog.abstraction.impl.file.coaster.handlers.FileInfoHandler,
SHUTDOWNSERVICE=class
org.globus.cog.abstraction.coaster.service.ServiceShutdownHandler,
SHUTDOWN=class org.globus.cog.karajan.workflow.service.handlers.ShutdownHandler,
EXISTS=class org.globus.cog.abstraction.impl.file.coaster.handlers.ExistsHandler,
CHANNELCONFIG=class
org.globus.cog.karajan.workflow.service.handlers.ChannelConfigurationHandler,
RMDIR=class org.globus.cog.abstraction.impl.file.coaster.handlers.RmdirHandler,
RENAME=class org.globus.cog.abstraction.impl.file.coaster.handlers.RenameHandler,
VERSION=class org.globus.cog.karajan.workflow.service.handlers.VersionHandler,
WORKERSHELLCMD=class
org.globus.cog.abstraction.coaster.service.WorkerShellHandler,
GET=class org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler}


Worker connection invocation:
$ /home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl
http://128.135.125.18:1984  foo /home/aespinosa/tmp
Failed to process data: Failed to register (service returned error:
Unknown command: REGISTER) at
/home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl
line 676.


Invocation through the other port:
$ /home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl
http://128.135.125.18:50000  foo /home/aespinosa/tmp
Failed to process data: Failed to register (service returned error:
java.lang.NullPointerException) at
/home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl
line 676.

service dump:
$ ./trunk/bin/coaster-service -nosec
Local contacts: [http://128.135.125.18:50000]
Started local service: http://128.135.125.18:50000
Started coaster service: http://128.135.125.18:1984
Started coaster service: http://128.135.125.18:1984
SC-null: Disabling heartbeats (config is null)
Multiplexer 0 started
(0) Scheduling SC-null for addition
nullChannel started
Multiplexer 1 started
Received registration: blockid = foo, url =
Avg stream buf: 0
Avg stream buf: 0
Avg stream buf: 0


I'm using the latest trunk code from swift and cog


Just to confirm the "local service" is where swift submits the jobs
and "coaster service" is the one the workers connect to, correct?

Thanks,
-Allan

-- 
Allan M. Espinosa <http://amespinosa.wordpress.com>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>


From wilde at mcs.anl.gov  Tue Sep 28 21:11:10 2010
From: wilde at mcs.anl.gov (wilde at mcs.anl.gov)
Date: Tue, 28 Sep 2010 20:11:10 -0600 (GMT-06:00)
Subject: [Swift-user] persitent service + manual workers
In-Reply-To: <1957630536.424041285726207329.JavaMail.root@zimbra.anl.gov>
Message-ID: <2008803707.424091285726270469.JavaMail.root@zimbra.anl.gov>

Allan, I think the worker should connect on port 50000 and Swift should connect on port 1984.

I further think that if you are running the service and the workers manually, you probably want to set the sites.xml entry for Swift to coasters-persistent and passive mode, otherwise the service seems to enforce come kind of state machine whereby it expects Swift to operate in automatic mode.

Ive started trying to document this on page 1 of the attached document but this needs more work.

What I found works for me is to run a dummy swift job set to persistent+passive mode, as there is no way to force the standalone service to passive move by command line option. My R scripts in ~wilde/SwiftR/swift/exec/start-swift-workers do this but are a bit complex and in the process of cleanup.  If I get something from the R scripts that you can use I'll send asap.

- Mike


----- "Allan Espinosa" <aespinosa at cs.uchicago.edu> wrote:

> Hi,
> 
> I'm having trouble with the workers registering with the coaster
> service:
> 
> $ ./trunk/bin/coaster-service -nosec
> Local contacts: [http://128.135.125.18:50000]
> Started local service: http://128.135.125.18:50000
> Started coaster service: http://128.135.125.18:1984
> Started coaster service: http://128.135.125.18:1984
> SC-null: Disabling heartbeats (config is null)
> Multiplexer 0 started
> (0) Scheduling SC-null for addition
> nullChannel started
> Multiplexer 1 started
> Unknown handler: REGISTER. Available handlers: {CHMOD=class
> org.globus.cog.abstraction.impl.file.coaster.handlers.ChmodHandler,
> ISDIR=class
> org.globus.cog.abstraction.impl.file.coaster.handlers.IsDirectoryHandler,
> LIST=class
> org.globus.cog.abstraction.impl.file.coaster.handlers.ListHandler,
> SUBMITJOB=class
> org.globus.cog.abstraction.coaster.service.SubmitJobHandler,
> MKDIR=class
> org.globus.cog.abstraction.impl.file.coaster.handlers.MkdirHandler,
> PUT=class
> org.globus.cog.abstraction.impl.file.coaster.handlers.PutFileHandler,
> DEL=class
> org.globus.cog.abstraction.impl.file.coaster.handlers.DeleteHandler,
> HEARTBEAT=class
> org.globus.cog.karajan.workflow.service.handlers.HeartBeatHandler,
> CONFIGSERVICE=class
> org.globus.cog.abstraction.coaster.service.ServiceConfigurationHandler,
> FILEINFO=class
> org.globus.cog.abstraction.impl.file.coaster.handlers.FileInfoHandler,
> SHUTDOWNSERVICE=class
> org.globus.cog.abstraction.coaster.service.ServiceShutdownHandler,
> SHUTDOWN=class
> org.globus.cog.karajan.workflow.service.handlers.ShutdownHandler,
> EXISTS=class
> org.globus.cog.abstraction.impl.file.coaster.handlers.ExistsHandler,
> CHANNELCONFIG=class
> org.globus.cog.karajan.workflow.service.handlers.ChannelConfigurationHandler,
> RMDIR=class
> org.globus.cog.abstraction.impl.file.coaster.handlers.RmdirHandler,
> RENAME=class
> org.globus.cog.abstraction.impl.file.coaster.handlers.RenameHandler,
> VERSION=class
> org.globus.cog.karajan.workflow.service.handlers.VersionHandler,
> WORKERSHELLCMD=class
> org.globus.cog.abstraction.coaster.service.WorkerShellHandler,
> GET=class
> org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler}
> 
> 
> Worker connection invocation:
> $
> /home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl
> http://128.135.125.18:1984  foo /home/aespinosa/tmp
> Failed to process data: Failed to register (service returned error:
> Unknown command: REGISTER) at
> /home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl
> line 676.
> 
> 
> 
> Invocation through the other port:
> $
> /home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl
> http://128.135.125.18:50000  foo /home/aespinosa/tmp
> Failed to process data: Failed to register (service returned error:
> java.lang.NullPointerException) at
> /home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl
> line 676.
> 
> service dump:
> $ ./trunk/bin/coaster-service -nosec
> Local contacts: [http://128.135.125.18:50000]
> Started local service: http://128.135.125.18:50000
> Started coaster service: http://128.135.125.18:1984
> Started coaster service: http://128.135.125.18:1984
> SC-null: Disabling heartbeats (config is null)
> Multiplexer 0 started
> (0) Scheduling SC-null for addition
> nullChannel started
> Multiplexer 1 started
> Received registration: blockid = foo, url =
> Avg stream buf: 0
> Avg stream buf: 0
> Avg stream buf: 0
> 
> 
> I'm using the latest trunk code from swift and cog
> 
> 
> 
> 
> Just to confirm the "local service" is where swift submits the jobs
> and "coaster service" is the one the workers connect to, correct?
> 
> Thanks,
> -Allan
> 
> -- 
> Allan M. Espinosa <http://amespinosa.wordpress.com>
> PhD student, Computer Science
> University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: SwiftConfigurations.odg
Type: application/vnd.oasis.opendocument.graphics
Size: 21664 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20100928/d8b47534/attachment.odg>