From dennis at ucar.edu Wed Sep 1 09:59:35 2010 From: dennis at ucar.edu (John Dennis) Date: Wed, 1 Sep 2010 08:59:35 -0600 Subject: [Swift-user] Deleting no longer necessary anonymous files in _concurrent In-Reply-To: References: Message-ID: <8FDB154D-9A93-4424-B0FE-8C188162C606@ucar.edu> Justin, I am a little confused by your response that cleaning up temporary files is not the responsibility of the Swift language. We did not create the file 'wgt_files-935f5705-27ed-4a99-9420-441269bba3a0-36-4-0-array' Swift did. I certainly have not use for it. It was created as part of the parallelization process. Consider the following bit of pseudo swift code foreach years { file wgt_files[]; foreach month { wgt_files[] = DoSomething(); } } The 'wgt_files' is only in scope within the 'foreach years' loop. Once all iterations of 'foreach years' loop has completed, I would expect the 'wgt_files' to be deleted once a variable/file goes out of scope. Isn't this really an issue of garbage collection for the Swift language? While I do see how you could use the external variable to manage this all ourselves that would significantly complicate the source code and remove much of the simple and elegant solution that Swift provides. Matthew and I are concerned about this because of the impact this has on disk usage. For example our Swift script requires temporary space of size 4x the input data. Our generated data is tiny, while the size of the _concurrent directory is 2x the size of the input data. Now we want to execute the Swift script on ~30 TB of data. So just to enable parallel execution with Swift would require an extra 120TB of disk space. I realize that parallel execution will consume more disk space but this seems excessive. Thanks, John Dennis On Aug 30, 2010, at 3:54 PM, Justin M Wozniak wrote: > Hi Matthew > Deleting files is out of the scope of the Swift language. You can > of course remove them yourself in your scripts, and as long as Swift > does not try to stage them out you should be fine. > You may want to look at external variables as another way to > approach this (manual 2.5). Using external variables you can manage > the files in your scripts while maintaining the Swift progress model. > Justin > > On Fri, 27 Aug 2010, Matthew Woitaszek wrote: >> Good afternoon, >> >> I'm working with a script that creates arrays of intermediate files >> using the anonymous concurrent mapper, such as: >> >> file wgt_file[]; >> >> As I expect, all of these files get generated in the remote swift >> temporary directory and are then returned to the _concurrent >> directory >> on the host executing Swift. However, in this particular application, >> they're then immediately consumed by a subsequent procedure and never >> needed again. >> >> Is there a way to configure Swift or the file mapper declaration to >> delete these files after the remaining script "consumes" them? (That >> is, after all procedures relying on them as inputs have been >> executed?) Or can (should?) that be done manually? >> >> More speculatively, is there a way to keep files like these on the >> execution host and not even bring them back to _concurrent? (With >> loss >> of generality, I'm executing on a single site, and don't really ever >> need the file locally, for restarts or staging to another site.) >> >> Any advice about managing copies of large intermediate data files in >> the Swift execution context would be appreciated! >> >> Matthew >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> > > -- > Justin M Wozniak > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From wozniak at mcs.anl.gov Fri Sep 3 13:50:57 2010 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Fri, 3 Sep 2010 13:50:57 -0500 (Central Daylight Time) Subject: [Swift-user] Deleting no longer necessary anonymous files in _concurrent In-Reply-To: <8FDB154D-9A93-4424-B0FE-8C188162C606@ucar.edu> References: <8FDB154D-9A93-4424-B0FE-8C188162C606@ucar.edu> Message-ID: First off, I definitely recognize the importance of doing this efficiently. I also realize you may be thinking of certain "make" functionality that does something like this. We are currently working on improvements to Swift's data access mechanisms. Many applications create temporary intermediate data, so we are definitely looking at this. >From Swift's perspective, there are two aspects- the garbage collection or automated delete and the data placement between job executions. The garbage collection is something that could be handled in a few ways (cache management) or simply by tearing down the intermediate storage system. The data placement involves using an intermediate storage system that is at the compute site, preventing full stage out to the client, and ensuring that this storage system is accessible to both the producer and consumer of the pipeline data. (Swift assumes that there is one permanent filesystem, the one from which it is run, and uses staging for everything else. A given pair of jobs could execute at separate sites with different filesystems.) There is "beta" functionality in the Swift trunk to directly utilize a local filesystem (that at least two applications are using). If there is a "scratch" filesystem that you can use, I can direct you to that. We are also productizing the ability to setup an temporary storage system for use by Swift, but that is not available yet. On Wed, 1 Sep 2010, John Dennis wrote: > Justin, > > I am a little confused by your response that cleaning up temporary > files is not the responsibility of the Swift language. We did not > create the file > 'wgt_files-935f5705-27ed-4a99-9420-441269bba3a0-36-4-0-array' Swift did. I > certainly have not use for it. It was created > as part of the parallelization process. Consider the following bit of > pseudo swift code > > foreach years { > file wgt_files[]; > foreach month { > wgt_files[] = DoSomething(); > } > } > > The 'wgt_files' is only in scope within the 'foreach years' loop. > Once all iterations of 'foreach years' loop has completed, > I would expect the 'wgt_files' to be deleted once a variable/file goes out of > scope. Isn't this really an issue of garbage collection > for the Swift language? > > While I do see how you could use the external variable to manage this > all ourselves that would significantly complicate the > source code and remove much of the simple and elegant solution that Swift > provides. > > Matthew and I are concerned about this because of the impact this has > on disk usage. For example our Swift script > requires temporary space of size 4x the input data. Our generated data is > tiny, while the size of the _concurrent directory > is 2x the size of the input data. Now we want to execute the Swift script on > ~30 TB of data. So just to enable parallel execution > with Swift would require an extra 120TB of disk space. I realize that > parallel execution will consume more disk space but this seems > excessive. > > Thanks, > John Dennis > > > > On Aug 30, 2010, at 3:54 PM, Justin M Wozniak wrote: > >> Hi Matthew >> Deleting files is out of the scope of the Swift language. You can of >> course remove them yourself in your scripts, and as long as Swift does not >> try to stage them out you should be fine. >> You may want to look at external variables as another way to approach >> this (manual 2.5). Using external variables you can manage the files in >> your scripts while maintaining the Swift progress model. >> Justin >> >> On Fri, 27 Aug 2010, Matthew Woitaszek wrote: >>> Good afternoon, >>> >>> I'm working with a script that creates arrays of intermediate files >>> using the anonymous concurrent mapper, such as: >>> >>> file wgt_file[]; >>> >>> As I expect, all of these files get generated in the remote swift >>> temporary directory and are then returned to the _concurrent directory >>> on the host executing Swift. However, in this particular application, >>> they're then immediately consumed by a subsequent procedure and never >>> needed again. >>> >>> Is there a way to configure Swift or the file mapper declaration to >>> delete these files after the remaining script "consumes" them? (That >>> is, after all procedures relying on them as inputs have been >>> executed?) Or can (should?) that be done manually? >>> >>> More speculatively, is there a way to keep files like these on the >>> execution host and not even bring them back to _concurrent? (With loss >>> of generality, I'm executing on a single site, and don't really ever >>> need the file locally, for restarts or staging to another site.) >>> >>> Any advice about managing copies of large intermediate data files in >>> the Swift execution context would be appreciated! >>> >>> Matthew >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >>> >> >> -- >> Justin M Wozniak >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > -- Justin M Wozniak From matthew.woitaszek at gmail.com Tue Sep 7 13:22:42 2010 From: matthew.woitaszek at gmail.com (Matthew Woitaszek) Date: Tue, 7 Sep 2010 12:22:42 -0600 Subject: [Swift-user] Deleting no longer necessary anonymous files in _concurrent In-Reply-To: References: <8FDB154D-9A93-4424-B0FE-8C188162C606@ucar.edu> Message-ID: Hi Justin, Thanks for your reply -- I'd definitely like to learn more about the alternate staging/scratch options. > There is "beta" functionality in the Swift trunk to directly utilize a local > filesystem (that at least two applications are using). If there is a > "scratch" filesystem that you can use, I can direct you to that. By this, do you mean a something like a node-local scratch system, where files could be staged directly from _concurrent to a node instead of a "site", or is it something else? If node-local, I fear that might be a step backwards for our application. In our case, the staging time vs. capacity tradeoff is becoming quite problematic. On one hand, I really only want to keep one copy of everything (_concurrent), but limiting the amount of storage on the a site might increase staging, which negates the parallelism, so I'm back to prefering a big site cache to minimize that. Is there a way to get tasks to read/write directly out of _concurrent without the staging to the remote site at all? I suspect the answer is "no" due to your description of _concurrent's importance as the permanent file system and its use in staging to site file systems. But in our case, we're coincidentally at one site, so the big GPFS scratch file system area ends up holding both _concurrent as well as the swift site temporary directory in different paths. > The > data placement involves using an intermediate storage system that is at the > compute site, preventing full stage out to the client, and ensuring that > this storage system is accessible to both the producer and consumer of the > pipeline data. This sounds like a feature that John and I would sign up for. :-) I see the new use.provider.staging option in the trunk, and "sfs" is very tempting... (Also, thanks for your thoughts on garbage collection; I'll stick with the possibilities in the staging arena for now!) Thanks for your time, Matthew On Fri, Sep 3, 2010 at 12:50 PM, Justin M Wozniak wrote: > > First off, I definitely recognize the importance of doing this efficiently. > ?I also realize you may be thinking of certain "make" functionality that > does something like this. > > We are currently working on improvements to Swift's data access mechanisms. > ?Many applications create temporary intermediate data, so we are definitely > looking at this. > >> From Swift's perspective, there are two aspects- the garbage collection or > > automated delete and the data placement between job executions. ?The garbage > collection is something that could be handled in a few ways (cache > management) or simply by tearing down the intermediate storage system. The > data placement involves using an intermediate storage system that is at the > compute site, preventing full stage out to the client, and ensuring that > this storage system is accessible to both the producer and consumer of the > pipeline data. ?(Swift assumes that there is one permanent filesystem, the > one from which it is run, and uses staging for everything else. ?A given > pair of jobs could execute at separate sites with different filesystems.) > > There is "beta" functionality in the Swift trunk to directly utilize a local > filesystem (that at least two applications are using). ?If there is a > "scratch" filesystem that you can use, I can direct you to that. ?We are > also productizing the ability to setup an temporary storage system for use > by Swift, but that is not available yet. > > On Wed, 1 Sep 2010, John Dennis wrote: > >> Justin, >> >> ? ? ? ?I am a little confused by your response that cleaning up temporary >> files is not the responsibility of the Swift language. ?We did not >> create ?the file >> 'wgt_files-935f5705-27ed-4a99-9420-441269bba3a0-36-4-0-array' Swift did. ?I >> certainly have not use for it. ?It was created >> as part of the parallelization process. ? Consider the following bit of >> pseudo swift code >> >> foreach years { >> ? ? ? ?file wgt_files[]; >> ? ? ? ?foreach month { >> ? ? ? ? ? ? ? ?wgt_files[] = DoSomething(); >> ? ? ? ?} } >> >> ? ? ? ?The 'wgt_files' is only in ?scope within the 'foreach years' loop. >> Once all iterations of 'foreach years' loop has completed, >> I would expect the 'wgt_files' to be deleted once a variable/file goes out >> of scope. ? Isn't this really an issue of garbage collection >> for the Swift language? >> >> ? ? ? ?While I do see how you could use the external variable to manage >> this all ourselves that would significantly complicate the >> source code and remove much of the simple and elegant solution that Swift >> provides. >> >> ? ? ? ?Matthew and I are concerned about this because of the impact this >> has on disk usage. ?For example our Swift script >> requires temporary space of size 4x the input data. ?Our generated data is >> tiny, while the size of the _concurrent directory >> is 2x the size of the input data. ?Now we want to execute the Swift script >> on ~30 TB of data. ?So just to enable parallel execution >> with Swift would require an extra 120TB of disk space. ?I realize that >> parallel execution will consume more disk space but this seems >> excessive. >> >> Thanks, >> John Dennis >> >> >> >> On Aug 30, 2010, at 3:54 PM, Justin M Wozniak wrote: >> >>> Hi Matthew >>> ? ? ? ?Deleting files is out of the scope of the Swift language. ?You can >>> of course remove them yourself in your scripts, and as long as Swift does >>> not try to stage them out you should be fine. >>> ? ? ? ?You may want to look at external variables as another way to >>> approach this (manual 2.5). ?Using external variables you can manage the >>> files in your scripts while maintaining the Swift progress model. >>> ? ? ? ?Justin >>> >>> On Fri, 27 Aug 2010, Matthew Woitaszek wrote: >>>> >>>> Good afternoon, >>>> >>>> I'm working with a script that creates arrays of intermediate files >>>> using the anonymous concurrent mapper, such as: >>>> >>>> file wgt_file[]; >>>> >>>> As I expect, all of these files get generated in the remote swift >>>> temporary directory and are then returned to the _concurrent directory >>>> on the host executing Swift. However, in this particular application, >>>> they're then immediately consumed by a subsequent procedure and never >>>> needed again. >>>> >>>> Is there a way to configure Swift or the file mapper declaration to >>>> delete these files after the remaining script "consumes" them? (That >>>> is, after all procedures relying on them as inputs have been >>>> executed?) Or can (should?) that be done manually? >>>> >>>> More speculatively, is there a way to keep files like these on the >>>> execution host and not even bring them back to _concurrent? (With loss >>>> of generality, I'm executing on a single site, and don't really ever >>>> need the file locally, for restarts or staging to another site.) >>>> >>>> Any advice about managing copies of large intermediate data files in >>>> the Swift execution context would be appreciated! >>>> >>>> Matthew >>>> _______________________________________________ >>>> Swift-user mailing list >>>> Swift-user at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >>>> >>> >>> -- >>> Justin M Wozniak >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> > > -- > Justin M Wozniak > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > From hategan at mcs.anl.gov Tue Sep 7 13:53:16 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 07 Sep 2010 11:53:16 -0700 Subject: [Swift-user] Deleting no longer necessary anonymous files in _concurrent In-Reply-To: References: <8FDB154D-9A93-4424-B0FE-8C188162C606@ucar.edu> Message-ID: <1283885596.10503.25.camel@blabla2.none> On Tue, 2010-09-07 at 12:22 -0600, Matthew Woitaszek wrote: [...] > > > There is "beta" functionality in the Swift trunk to directly utilize a local > > filesystem (that at least two applications are using). If there is a > > "scratch" filesystem that you can use, I can direct you to that. > > By this, do you mean a something like a node-local scratch system, > where files could be staged directly from _concurrent to a node > instead of a "site", or is it something else? > > If node-local, I fear that might be a step backwards for our > application. In our case, the staging time vs. capacity tradeoff is > becoming quite problematic. On one hand, I really only want to keep > one copy of everything (_concurrent), but limiting the amount of > storage on the a site might increase staging, which negates the > parallelism, so I'm back to prefering a big site cache to minimize > that. The data, intermediate or not, has to be at least in one place. The stable/traditional version of swift tends to have at least 3 copies of each piece of data: - on the client (1) - on the shared fs of a target cluster (2) - on the compute node (3) (3) is arguable. One can run apps using data directly from (2). However, it's been our experience that, due to the way SFSes work, copying the data to the compute node yields better performance in most cases (actually pretty much all cases we've measured). This may not necessarily apply to your case, and we'd like to hear if that's the case. You can switch between the two behaviors by specifying an additional directory in sites.xml. If that's there, (3) applies. If not symlinks to (2) are used instead. I'll call this issue (A). Stuff we're working on currently includes bypassing (2) and copying data directly between (1) and (3). It turns out that shared file systems are pretty poor when it comes to parallelism, due to distributed consistencies they have to enforce. However, given that in swift all data is single-assignment (which translates into files being written at most once), most of the problems that SFSes need to deal with don't really exist, but there is no way to tell them that. So we've got some prototypes there. At least on the BG/P we get clear (a few times) performance improvements if we do (1) <-> (3). Ideally we would also want to bypass (3) -> (1) -> (3) for intermediate data, since we can do (3) -> (3) instead. This is something Justin has been working on, I believe on single clusters. I'd personally like to see it working between multiple clusters, too. > > Is there a way to get tasks to read/write directly out of _concurrent > without the staging to the remote site at all? I suspect the answer is > "no" due to your description of _concurrent's importance as the > permanent file system and its use in staging to site file systems. But > in our case, we're coincidentally at one site, so the big GPFS scratch > file system area ends up holding both _concurrent as well as the swift > site temporary directory in different paths. It is possible, but not currently there. Again, issue (A) may apply here, so provider.staging/sfs may be better. Mihael From jon.monette at gmail.com Wed Sep 8 18:45:49 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Wed, 08 Sep 2010 18:45:49 -0500 Subject: [Swift-user] Swift app question Message-ID: <4C88202D.3010406@gmail.com> Hello, This is probably a simple question but when are app functions executed? Take a look at this psuedocode. foreach y in years { Month m1< "month1.txt">; Month m2 <"month2.txt">; Year y = calculate( m1, m2 ); } When will the app "calculate" be executed? Will it execute as soon as m1 and m2 for a given iteration are mapped or will it wait till each thread has mapped its own m1 and m2 and execute the apps all together. From hategan at mcs.anl.gov Wed Sep 8 19:51:15 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 08 Sep 2010 17:51:15 -0700 Subject: [Swift-user] Re: Swift app question In-Reply-To: <4C88202D.3010406@gmail.com> References: <4C88202D.3010406@gmail.com> Message-ID: <1283993475.4340.2.camel@blabla2.none> On Wed, 2010-09-08 at 18:45 -0500, Jonathan Monette wrote: > Hello, > This is probably a simple question but when are app functions > executed? Take a look at this psuedocode. > > > foreach y in years > { > Month m1< "month1.txt">; > Month m2 <"month2.txt">; > > Year y = calculate( m1, m2 ); > } > > When will the app "calculate" be executed? Will it execute as soon as > m1 and m2 for a given iteration are mapped or will it wait till each > thread has mapped its own m1 and m2 and execute the apps all together. Is the use of y twice (once in the foreach and once for the result of calculate()) accidental? Mihael From jon.monette at gmail.com Wed Sep 8 19:52:56 2010 From: jon.monette at gmail.com (jon.monette at gmail.com) Date: Thu, 9 Sep 2010 00:52:56 +0000 Subject: [Swift-user] Re: Swift app question Message-ID: <803451294-1283993577-cardhu_decombobulator_blackberry.rim.net-2140308723-@bda2073.bisx.prod.on.blackberry> Yes. This is the general work flow to part of my problem ------Original Message------ From: Mihael Hategan To: Jonathan Monette Cc: swift-user at ci.uchicago.edu Cc: Justin M Wozniak Subject: Re: Swift app question Sent: Sep 8, 2010 7:51 PM On Wed, 2010-09-08 at 18:45 -0500, Jonathan Monette wrote: > Hello, > This is probably a simple question but when are app functions > executed? Take a look at this psuedocode. > > > foreach y in years > { > Month m1< "month1.txt">; > Month m2 <"month2.txt">; > > Year y = calculate( m1, m2 ); > } > > When will the app "calculate" be executed? Will it execute as soon as > m1 and m2 for a given iteration are mapped or will it wait till each > thread has mapped its own m1 and m2 and execute the apps all together. Is the use of y twice (once in the foreach and once for the result of calculate()) accidental? Mihael Sent on the Sprint? Now Network from my BlackBerry? From hategan at mcs.anl.gov Wed Sep 8 21:21:32 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 08 Sep 2010 19:21:32 -0700 Subject: [Swift-user] Re: Swift app question In-Reply-To: <803451294-1283993577-cardhu_decombobulator_blackberry.rim.net-2140308723-@bda2073.bisx.prod.on.blackberry> References: <803451294-1283993577-cardhu_decombobulator_blackberry.rim.net-2140308723-@bda2073.bisx.prod.on.blackberry> Message-ID: <1283998892.4442.3.camel@blabla2.none> What the code says there is iterate through some list holding the current iteration variable in "y" and then declare an entirely new variable, also named "y", in the inner scope of the iteration an assign it the value of some computation. Is this what you meant? In other words variables in swift are single assignment. You can't successively assign different values to the same variable in the same scope. On Thu, 2010-09-09 at 00:52 +0000, jon.monette at gmail.com wrote: > Yes. This is the general work flow to part of my problem > ------Original Message------ > From: Mihael Hategan > To: Jonathan Monette > Cc: swift-user at ci.uchicago.edu > Cc: Justin M Wozniak > Subject: Re: Swift app question > Sent: Sep 8, 2010 7:51 PM > > On Wed, 2010-09-08 at 18:45 -0500, Jonathan Monette wrote: > > Hello, > > This is probably a simple question but when are app functions > > executed? Take a look at this psuedocode. > > > > > > foreach y in years > > { > > Month m1< "month1.txt">; > > Month m2 <"month2.txt">; > > > > Year y = calculate( m1, m2 ); > > } > > > > When will the app "calculate" be executed? Will it execute as soon as > > m1 and m2 for a given iteration are mapped or will it wait till each > > thread has mapped its own m1 and m2 and execute the apps all together. > > Is the use of y twice (once in the foreach and once for the result of > calculate()) accidental? > > Mihael > > > > Sent on the Sprint? Now Network from my BlackBerry? From jon.monette at gmail.com Wed Sep 8 21:28:35 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Wed, 08 Sep 2010 21:28:35 -0500 Subject: [Swift-user] Re: Swift app question In-Reply-To: <1283998892.4442.3.camel@blabla2.none> References: <803451294-1283993577-cardhu_decombobulator_blackberry.rim.net-2140308723-@bda2073.bisx.prod.on.blackberry> <1283998892.4442.3.camel@blabla2.none> Message-ID: <4C884653.3000902@gmail.com> This is what I meant. foreach y in years { Month m1<"month1.txt">; Month m2<"month2.txt">; Year x = calculate( m1, m2 ); } I know that threads will be created and each iteration for the foreach loop will run in parallel. What I am trying to understand is when is the calculate app executed. This is a very dumbed down example but I want to know will x be mapped to the output of calculate once m1 and m2 are closed or is there a "barrier" that blocks until all threads have finished mapping m1 and m2 before the apps are run in parallel? On 9/8/10 9:21 PM, Mihael Hategan wrote: > foreach y in years > > > { > > > Month m1< "month1.txt">; > > > Month m2<"month2.txt">; > > > > > > Year y = calculate( m1, m2 ); > > > } -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From hategan at mcs.anl.gov Wed Sep 8 21:37:38 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 08 Sep 2010 19:37:38 -0700 Subject: [Swift-user] Re: Swift app question In-Reply-To: <4C884653.3000902@gmail.com> References: <803451294-1283993577-cardhu_decombobulator_blackberry.rim.net-2140308723-@bda2073.bisx.prod.on.blackberry> <1283998892.4442.3.camel@blabla2.none> <4C884653.3000902@gmail.com> Message-ID: <1283999858.4542.7.camel@blabla2.none> Theoretically since there is no dependency between m1, m2 and y, it should run right ahead. Practically each invocation will probably wait for values in years. But I have to ask. Why bother doing this for every year if, at least from your code, x would have the same value every time (i.e. there is no actual dependency on y)? Mihael On Wed, 2010-09-08 at 21:28 -0500, Jonathan Monette wrote: > > This is what I meant. > > foreach y in years > { > Month m1<"month1.txt">; > Month m2<"month2.txt">; > > Year x = calculate( m1, m2 ); > } > > I know that threads will be created and each iteration for the foreach > loop will run in parallel. What I am trying to understand is when is > the calculate app executed. This is a very dumbed down example but I > want to know will x be mapped to the output of calculate once m1 and m2 > are closed or is there a "barrier" that blocks until all threads have > finished mapping m1 and m2 before the apps are run in parallel? > > On 9/8/10 9:21 PM, Mihael Hategan wrote: > > foreach y in years > > > > { > > > > Month m1< "month1.txt">; > > > > Month m2<"month2.txt">; > > > > > > > > Year y = calculate( m1, m2 ); > > > > } > From jon.monette at gmail.com Wed Sep 8 21:40:03 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Wed, 08 Sep 2010 21:40:03 -0500 Subject: [Swift-user] Re: Swift app question In-Reply-To: <1283999858.4542.7.camel@blabla2.none> References: <803451294-1283993577-cardhu_decombobulator_blackberry.rim.net-2140308723-@bda2073.bisx.prod.on.blackberry> <1283998892.4442.3.camel@blabla2.none> <4C884653.3000902@gmail.com> <1283999858.4542.7.camel@blabla2.none> Message-ID: <4C884903.2010908@gmail.com> Ok. And like I said this was a dumbed down example. I just needed to show a mappings and didn't want to use a fancy mapper. In my code x will be a different value for each iteration. Thanks though. That clears things up. On 9/8/10 9:37 PM, Mihael Hategan wrote: > Theoretically since there is no dependency between m1, m2 and y, it > should run right ahead. Practically each invocation will probably wait > for values in years. > > But I have to ask. Why bother doing this for every year if, at least > from your code, x would have the same value every time (i.e. there is no > actual dependency on y)? > > Mihael > > On Wed, 2010-09-08 at 21:28 -0500, Jonathan Monette wrote: >> This is what I meant. >> >> foreach y in years >> { >> Month m1<"month1.txt">; >> Month m2<"month2.txt">; >> >> Year x = calculate( m1, m2 ); >> } >> >> I know that threads will be created and each iteration for the foreach >> loop will run in parallel. What I am trying to understand is when is >> the calculate app executed. This is a very dumbed down example but I >> want to know will x be mapped to the output of calculate once m1 and m2 >> are closed or is there a "barrier" that blocks until all threads have >> finished mapping m1 and m2 before the apps are run in parallel? >> >> On 9/8/10 9:21 PM, Mihael Hategan wrote: >>> foreach y in years >>>> > { >>>> > Month m1< "month1.txt">; >>>> > Month m2<"month2.txt">; >>>> > >>>> > Year y = calculate( m1, m2 ); >>>> > } > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From hategan at mcs.anl.gov Wed Sep 8 22:24:56 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 08 Sep 2010 20:24:56 -0700 Subject: [Swift-user] Re: Swift app question In-Reply-To: <4C884903.2010908@gmail.com> References: <803451294-1283993577-cardhu_decombobulator_blackberry.rim.net-2140308723-@bda2073.bisx.prod.on.blackberry> <1283998892.4442.3.camel@blabla2.none> <4C884653.3000902@gmail.com> <1283999858.4542.7.camel@blabla2.none> <4C884903.2010908@gmail.com> Message-ID: <1284002696.4694.3.camel@blabla2.none> On Wed, 2010-09-08 at 21:40 -0500, Jonathan Monette wrote: > Ok. And like I said this was a dumbed down example. I just needed to > show a mappings and didn't want to use a fancy mapper. In my code x > will be a different value for each iteration. Thanks though. That > clears things up. I was a bit confused. If there is a dependency relation between y and the inputs to an app (whether though a mapper or directly) than it has to be satisfied for the app to run. However, when it comes to mappers, swift allows some hidden dependencies to be expressed. For example when some app produces "a.txt" and somewhere you say f <"a.txt">. Swift won't enforce that. Mihael [...] From jon.monette at gmail.com Wed Sep 8 22:32:48 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Wed, 08 Sep 2010 22:32:48 -0500 Subject: [Swift-user] Re: Swift app question In-Reply-To: <1284002696.4694.3.camel@blabla2.none> References: <803451294-1283993577-cardhu_decombobulator_blackberry.rim.net-2140308723-@bda2073.bisx.prod.on.blackberry> <1283998892.4442.3.camel@blabla2.none> <4C884653.3000902@gmail.com> <1283999858.4542.7.camel@blabla2.none> <4C884903.2010908@gmail.com> <1284002696.4694.3.camel@blabla2.none> Message-ID: <4C885560.9040108@gmail.com> Here is my actual code I was referencing: ( Image diff_imgs[] ) mDiffBatch( Table diff_tbl, MosaicData hdr ) { DiffStruct diffs[] ; tracef( "%s is closed %k\n", @filename( hdr ), hdr ); //1 tracef( "Mapped %i files from the csv_mapper and \"%s\"\n", @length( diffs ), @diff_tbl ); //2 foreach d_entry, i in diffs { tracef( "%s is closed on iteration %i%k\n", @d_entry.plus, i, d_entry.plus ); //3 tracef( "%s is closed onn iteration %i%k\n", @d_entry.minus, i, d_entry.minus ); //4 Image proj_1 ; Image proj_2 ; tracef( "%s is closed on iteration %i%k\n", @proj_1, i, proj_1 ); //5 tracef( "%s is closed on iteration %i%k\n", @proj_2, i ,proj_2 ); //6 Image diff_img ; tracef( "diff_img was mapped to %s on iteration %i\n" , at diff_img, i ); //7 diff_img = mDiff( proj_1, proj_2, hdr ); tracef( "DIFFERENCED %s on iteration %i%k\n", @filename( diff_img ), i ,diff_img ); //8 diff_imgs[ i ] = diff_img; } } tracef 1 and 2 always print out. tracef 3, 4, 5, 6, and 7 print out some of the iterations but never all. And the tracef 8 never gets printed because the script hangs and the app mDiff is never executed. This is what I have been trying to recreate. But simply taking out the mDiff app and replacing it with a script that basically does a cat has the script complete to the end. So I have been trying to understand what Swift is actually doing. This code hangs. On 09/08/2010 10:24 PM, Mihael Hategan wrote: > On Wed, 2010-09-08 at 21:40 -0500, Jonathan Monette wrote: > >> Ok. And like I said this was a dumbed down example. I just needed to >> show a mappings and didn't want to use a fancy mapper. In my code x >> will be a different value for each iteration. Thanks though. That >> clears things up. >> > I was a bit confused. If there is a dependency relation between y and > the inputs to an app (whether though a mapper or directly) than it has > to be satisfied for the app to run. However, when it comes to mappers, > swift allows some hidden dependencies to be expressed. For example when > some app produces "a.txt" and somewhere you say f<"a.txt">. Swift won't > enforce that. > > Mihael > > [...] > > From jon.monette at gmail.com Thu Sep 9 11:50:19 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Thu, 09 Sep 2010 11:50:19 -0500 Subject: [Swift-user] external mapper Message-ID: <4C89104B.6030201@gmail.com> Hello, How do I pass parameters to the script in the external mapper? I see there is a -symbol option but how does that work? What does the syntax look like? From hategan at mcs.anl.gov Thu Sep 9 12:01:55 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 09 Sep 2010 10:01:55 -0700 Subject: [Swift-user] external mapper In-Reply-To: <4C89104B.6030201@gmail.com> References: <4C89104B.6030201@gmail.com> Message-ID: <1284051715.6082.2.camel@blabla2.none> On Thu, 2010-09-09 at 11:50 -0500, Jonathan Monette wrote: > Hello, > How do I pass parameters to the script in the external mapper? I > see there is a -symbol option but how does that work? What does the > syntax look like? Any mapper parameter will be passed as "-param" "value" to the argv of the script. So: file f ; should result in "exec -a v1 -b v2" Mihael From jon.monette at gmail.com Thu Sep 9 12:01:39 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Thu, 09 Sep 2010 12:01:39 -0500 Subject: [Swift-user] external mapper In-Reply-To: <1284051715.6082.2.camel@blabla2.none> References: <4C89104B.6030201@gmail.com> <1284051715.6082.2.camel@blabla2.none> Message-ID: <4C8912F3.3010301@gmail.com> Alright. Thanks. On 09/09/2010 12:01 PM, Mihael Hategan wrote: > On Thu, 2010-09-09 at 11:50 -0500, Jonathan Monette wrote: > >> Hello, >> How do I pass parameters to the script in the external mapper? I >> see there is a -symbol option but how does that work? What does the >> syntax look like? >> > Any mapper parameter will be passed as "-param" "value" to the argv of > the script. > > So: > > file f; > > should result in > "exec -a v1 -b v2" > > Mihael > > > From iraicu at cs.uchicago.edu Tue Sep 21 11:49:57 2010 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 21 Sep 2010 11:49:57 -0500 Subject: [Swift-user] CFP: Workshop on Data Intensive Computing in the Clouds (DataCloud) 2011, co-located with IEEE IPDPS 2011 Message-ID: <4C98E235.70907@cs.uchicago.edu> --------------------------------------------------------------------------------- *** Call for Papers *** WORKSHOP ON DATA INTENSIVE COMPUTING IN THE CLOUDS (DATACLOUD 2011) In conjunction with IPDPS 2011, May 16, Anchorage, Alaska http://www.cct.lsu.edu/~kosar/DataCloud2011 --------------------------------------------------------------------------------- The First International Workshop on Data Intensive Computing in the Clouds (DataCloud2011) will be held in conjunction with the 25th IEEE International Parallel and Distributed Computing Symposium (IPDPS 2011), in Anchorage, Alaska. Applications and experiments in all areas of science are becoming increasingly complex and more demanding in terms of their computational and data requirements. Some applications generate data volumes reaching hundreds of terabytes and even petabytes. As scientific applications become more data intensive, the management of data resources and dataflow between the storage and compute resources is becoming the main bottleneck. Analyzing, visualizing, and disseminating these large data sets has become a major challenge and data intensive computing is now considered as the ?fourth paradigm? in scientific discovery after theoretical, experimental, and computational science. DataCloud2011 will provide the scientific community a dedicated forum for discussing new research, development, and deployment efforts in running data-intensive computing workloads on Cloud Computing infrastructures. The DataCloud2011 workshop will focus on the use of cloud-based technologies to meet the new data intensive scientific challenges that are not well served by the current supercomputers, grids or compute-intensive clouds. We believe the workshop will be an excellent place to help the community define the current state, determine future goals, and present architectures and services for future clouds supporting data intensive computing. TOPICS --------------------------------------------------------------------------------- - Data-intensive cloud computing applications, characteristics, challenges - Case studies of data intensive computing in the clouds - Performance evaluation of data clouds, data grids, and data centers - Energy-efficient data cloud design and management - Data placement, scheduling, and interoperability in the clouds - Accountability, QoS, and SLAs - Data privacy and protection in a public cloud environment - Distributed file systems for clouds - Data streaming and parallelization - New programming models for data-intensive cloud computing - Scalability issues in clouds - Social computing and massively social gaming - 3D Internet and implications - Future research challenges in data-intensive cloud computing IMPORTANT DATES --------------------------------------------------------------------------------- Abstract submission: December 1, 2010 Paper submission: December 8, 2010 Acceptance notification: January 7, 2011 Final papers due: February 1, 2011 PAPER SUBMISSION --------------------------------------------------------------------------------- DataCloud2011 invites authors to submit original and unpublished technical papers. All submissions will be peer-reviewed and judged on correctness, originality, technical strength, significance, quality of presentation, and relevance to the workshop topics of interest. Submitted papers may not have appeared in or be under consideration for another workshop, conference or a journal, nor may they be under review or submitted to another forum during the DataCloud2011 review process. Submitted papers may not exceed 10 single-spaced double-column pages using 10-point size font on 8.5x11 inch pages (IEEE conference style, document templates can be found at ftp://pubftp.computer.org/Press/Outgoing/proceedings/instruct8.5x11.pdf and ftp://pubftp.computer.org/Press/Outgoing/proceedings/instruct8.5x11.doc), including figures, tables, and references. A 250 word abstract (PDF format) must be submitted online at https://cmt.research.microsoft.com/DataCloud2011/ before the deadline of December 1st, 2010 at 11:59PM PST; the final 10 page papers in PDF format will be due on December 8th, 2010 at 11:59PM PST. WORKSHOP and PROGRAM CHAIRS --------------------------------------------------------------------------------- Tevfik Kosar, Louisiana State University Ioan Raicu, Illinois Institute of Technology STEERING COMMITTEE --------------------------------------------------------------------------------- Ian Foster, Univ of Chicago & Argonne National Lab Geoffrey Fox, Indiana University James Hamilton, Amazon Web Services Manish Parashar, Rutgers University & NSF Dan Reed, Microsoft Research Rich Wolski, University of California, Santa Barbara Liang-Jie Zhang, IBM Research PROGRAM COMMITTEE --------------------------------------------------------------------------------- David Abramson, Monash University, Australia Roger Barga, Microsoft Research John Bent, Los Alamos National Laboratory Umit Catalyurek, Ohio State University Abhishek Chandra, University of Minnesota Rong N. Chang, IBM Research Alok Choudhary, Northwestern University Brian Cooper, Google Ewa Deelman, University of Southern California Murat Demirbas, University at Buffalo Adriana Iamnitchi, University of South Florida Maria Indrawan, Monash University, Australia Alexandru Iosup, Delft University of Technology, Netherlands Peter Kacsuk, Hungarian Academy of Sciences, Hungary Dan Katz, University of Chicago Steven Ko, University at Buffalo Gregor von Laszewski, Rochester Institute of Technology Erwin Laure, CERN, Switzerland Ignacio Llorente, Universidad Complutense de Madrid, Spain Reagan Moore, University of North Carolina Lavanya Ramakrishnan, Lawrence Berkeley National Laboratory Ian Taylor, Cardiff University, UK Douglas Thain, University of Notre Dame Bernard Traversat, Oracle Yong Zhao, Univ of Electronic Science & Tech of China -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor ================================================================= Computer Science Department Illinois Institute of Technology 10 W. 31st Street Stuart Building, Room 237D Chicago, IL 60616 ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email:iraicu at cs.iit.edu Web:http://www.cs.iit.edu/~iraicu/ ================================================================= ================================================================= From iraicu at cs.uchicago.edu Fri Sep 24 12:45:16 2010 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Fri, 24 Sep 2010 12:45:16 -0500 Subject: [Swift-user] CFP: Special Issue on Science-driven Cloud Computing, in the Scientific Programming Journal Message-ID: <4C9CE3AC.5000605@cs.uchicago.edu> Call for Papers --------------------------------------------------------------------------------- Scientific Programming Journal Special Issue on Science-driven Cloud Computing http://www.cs.iit.edu/~iraicu/SPJ_ScienceCloud_2011/ Overview --------------------------------------------------------------------------------- Cloud computing first established in the business computing domain is now a topic of research in computer science and an interesting execution platform for science applications. Today there are a number of commercial and science cloud deployments, including those provided by Amazon, Google, IBM, Microsoft, and others. Campus and national labs are also deploying their own cloud solutions. The ability to control the resources and the pay-as-you go usage model enables new approaches to application development and resource provisioning. Science applications are looking towards the cloud to provide a stable and customizable execution environment. This special issue of the Scientific Programming Journal is dedicated to the computational challenges and opportunities of cloud computing. Topics --------------------------------------------------------------------------------- We invite the submission of original work that is related to the topics below. Topics of interest include (in the context of Cloud Computing): * Scientific cloud applications * Novel programming models * High-performance computing * Many-task computing * Resource scheduling * Compute resource management * Resource provisioning and configuration (compute, data, and network) * Adaptive computing and resource usage * Power-aware use of clouds computing * Storage cloud architectures and implementations * Cloud scalability and elasticity * Performance Evaluations and Benchmarks * Quality of service and SLA management * Cloud heterogeneity * Charging models * Models, frameworks and systems for cloud security and privacy * Monitoring Paper Submission --------------------------------------------------------------------------------- Authors are encouraged to submit high quality, original work that has neither appeared in, nor is under consideration by other journals. The manuscript must follow the formatting instructions found at the Scientific Programming site at http://www.iospress.nl/html/10589244_ita.html. Papers should be not more than 25 pages of single column text using double spaced 10 point size on 8.5 x 11 inch pages and 1" margins (including all text, figures, and references). A 250 word abstract (PDF format) must be submitted online at https://cmt.research.microsoft.com/SPJ_ScienceCloud_2011/ before the deadline of October 22nd, 2010 at 11:59PM PST; the final 25 page papers in PDF format will be due on October 29th, 2010 at 11:59PM PST. Papers will be peer-reviewed, and accepted papers will be published in the IOS Press. Notifications of the paper decisions will be sent out by December 1st, 2010. Accepted papers will be published by IOS Press without any fees to the authors. Important dates --------------------------------------------------------------------------------- * Abstract Due: October 22nd, 2010 * Papers Due: October 29th, 2010 * Reviews Completed: December 1st, 2010 * Publication Date: Early 2011 Guest Editors: --------------------------------------------------------------------------------- Ivona Brandic, Vienna University of Technology,ivona at infosys.tuwien.ac.at Ewa Deelman, University of Southern California,deelman at isi.edu Ioan Raicu, Illinois Institute of Technology,iraicu at cs.iit.edu For more information on this special issue in Scientific Programming Journal, please visithttp://www.cs.iit.edu/~iraicu/SPJ_ScienceCloud_2011/. -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor ================================================================= Computer Science Department Illinois Institute of Technology 10 W. 31st Street Stuart Building, Room 237D Chicago, IL 60616 ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email:iraicu at cs.iit.edu Web:http://www.cs.iit.edu/~iraicu/ ================================================================= ================================================================= From aespinosa at cs.uchicago.edu Tue Sep 28 17:09:37 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 28 Sep 2010 17:09:37 -0500 Subject: [Swift-user] persitent service + manual workers Message-ID: Hi, I'm having trouble with the workers registering with the coaster service: $ ./trunk/bin/coaster-service -nosec Local contacts: [http://128.135.125.18:50000] Started local service: http://128.135.125.18:50000 Started coaster service: http://128.135.125.18:1984 Started coaster service: http://128.135.125.18:1984 SC-null: Disabling heartbeats (config is null) Multiplexer 0 started (0) Scheduling SC-null for addition nullChannel started Multiplexer 1 started Unknown handler: REGISTER. Available handlers: {CHMOD=class org.globus.cog.abstraction.impl.file.coaster.handlers.ChmodHandler, ISDIR=class org.globus.cog.abstraction.impl.file.coaster.handlers.IsDirectoryHandler, LIST=class org.globus.cog.abstraction.impl.file.coaster.handlers.ListHandler, SUBMITJOB=class org.globus.cog.abstraction.coaster.service.SubmitJobHandler, MKDIR=class org.globus.cog.abstraction.impl.file.coaster.handlers.MkdirHandler, PUT=class org.globus.cog.abstraction.impl.file.coaster.handlers.PutFileHandler, DEL=class org.globus.cog.abstraction.impl.file.coaster.handlers.DeleteHandler, HEARTBEAT=class org.globus.cog.karajan.workflow.service.handlers.HeartBeatHandler, CONFIGSERVICE=class org.globus.cog.abstraction.coaster.service.ServiceConfigurationHandler, FILEINFO=class org.globus.cog.abstraction.impl.file.coaster.handlers.FileInfoHandler, SHUTDOWNSERVICE=class org.globus.cog.abstraction.coaster.service.ServiceShutdownHandler, SHUTDOWN=class org.globus.cog.karajan.workflow.service.handlers.ShutdownHandler, EXISTS=class org.globus.cog.abstraction.impl.file.coaster.handlers.ExistsHandler, CHANNELCONFIG=class org.globus.cog.karajan.workflow.service.handlers.ChannelConfigurationHandler, RMDIR=class org.globus.cog.abstraction.impl.file.coaster.handlers.RmdirHandler, RENAME=class org.globus.cog.abstraction.impl.file.coaster.handlers.RenameHandler, VERSION=class org.globus.cog.karajan.workflow.service.handlers.VersionHandler, WORKERSHELLCMD=class org.globus.cog.abstraction.coaster.service.WorkerShellHandler, GET=class org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler} Worker connection invocation: $ /home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl http://128.135.125.18:1984 foo /home/aespinosa/tmp Failed to process data: Failed to register (service returned error: Unknown command: REGISTER) at /home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl line 676. Invocation through the other port: $ /home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl http://128.135.125.18:50000 foo /home/aespinosa/tmp Failed to process data: Failed to register (service returned error: java.lang.NullPointerException) at /home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl line 676. service dump: $ ./trunk/bin/coaster-service -nosec Local contacts: [http://128.135.125.18:50000] Started local service: http://128.135.125.18:50000 Started coaster service: http://128.135.125.18:1984 Started coaster service: http://128.135.125.18:1984 SC-null: Disabling heartbeats (config is null) Multiplexer 0 started (0) Scheduling SC-null for addition nullChannel started Multiplexer 1 started Received registration: blockid = foo, url = Avg stream buf: 0 Avg stream buf: 0 Avg stream buf: 0 I'm using the latest trunk code from swift and cog Just to confirm the "local service" is where swift submits the jobs and "coaster service" is the one the workers connect to, correct? Thanks, -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wilde at mcs.anl.gov Tue Sep 28 21:11:10 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Tue, 28 Sep 2010 20:11:10 -0600 (GMT-06:00) Subject: [Swift-user] persitent service + manual workers In-Reply-To: <1957630536.424041285726207329.JavaMail.root@zimbra.anl.gov> Message-ID: <2008803707.424091285726270469.JavaMail.root@zimbra.anl.gov> Allan, I think the worker should connect on port 50000 and Swift should connect on port 1984. I further think that if you are running the service and the workers manually, you probably want to set the sites.xml entry for Swift to coasters-persistent and passive mode, otherwise the service seems to enforce come kind of state machine whereby it expects Swift to operate in automatic mode. Ive started trying to document this on page 1 of the attached document but this needs more work. What I found works for me is to run a dummy swift job set to persistent+passive mode, as there is no way to force the standalone service to passive move by command line option. My R scripts in ~wilde/SwiftR/swift/exec/start-swift-workers do this but are a bit complex and in the process of cleanup. If I get something from the R scripts that you can use I'll send asap. - Mike ----- "Allan Espinosa" wrote: > Hi, > > I'm having trouble with the workers registering with the coaster > service: > > $ ./trunk/bin/coaster-service -nosec > Local contacts: [http://128.135.125.18:50000] > Started local service: http://128.135.125.18:50000 > Started coaster service: http://128.135.125.18:1984 > Started coaster service: http://128.135.125.18:1984 > SC-null: Disabling heartbeats (config is null) > Multiplexer 0 started > (0) Scheduling SC-null for addition > nullChannel started > Multiplexer 1 started > Unknown handler: REGISTER. Available handlers: {CHMOD=class > org.globus.cog.abstraction.impl.file.coaster.handlers.ChmodHandler, > ISDIR=class > org.globus.cog.abstraction.impl.file.coaster.handlers.IsDirectoryHandler, > LIST=class > org.globus.cog.abstraction.impl.file.coaster.handlers.ListHandler, > SUBMITJOB=class > org.globus.cog.abstraction.coaster.service.SubmitJobHandler, > MKDIR=class > org.globus.cog.abstraction.impl.file.coaster.handlers.MkdirHandler, > PUT=class > org.globus.cog.abstraction.impl.file.coaster.handlers.PutFileHandler, > DEL=class > org.globus.cog.abstraction.impl.file.coaster.handlers.DeleteHandler, > HEARTBEAT=class > org.globus.cog.karajan.workflow.service.handlers.HeartBeatHandler, > CONFIGSERVICE=class > org.globus.cog.abstraction.coaster.service.ServiceConfigurationHandler, > FILEINFO=class > org.globus.cog.abstraction.impl.file.coaster.handlers.FileInfoHandler, > SHUTDOWNSERVICE=class > org.globus.cog.abstraction.coaster.service.ServiceShutdownHandler, > SHUTDOWN=class > org.globus.cog.karajan.workflow.service.handlers.ShutdownHandler, > EXISTS=class > org.globus.cog.abstraction.impl.file.coaster.handlers.ExistsHandler, > CHANNELCONFIG=class > org.globus.cog.karajan.workflow.service.handlers.ChannelConfigurationHandler, > RMDIR=class > org.globus.cog.abstraction.impl.file.coaster.handlers.RmdirHandler, > RENAME=class > org.globus.cog.abstraction.impl.file.coaster.handlers.RenameHandler, > VERSION=class > org.globus.cog.karajan.workflow.service.handlers.VersionHandler, > WORKERSHELLCMD=class > org.globus.cog.abstraction.coaster.service.WorkerShellHandler, > GET=class > org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler} > > > Worker connection invocation: > $ > /home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl > http://128.135.125.18:1984 foo /home/aespinosa/tmp > Failed to process data: Failed to register (service returned error: > Unknown command: REGISTER) at > /home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl > line 676. > > > > Invocation through the other port: > $ > /home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl > http://128.135.125.18:50000 foo /home/aespinosa/tmp > Failed to process data: Failed to register (service returned error: > java.lang.NullPointerException) at > /home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl > line 676. > > service dump: > $ ./trunk/bin/coaster-service -nosec > Local contacts: [http://128.135.125.18:50000] > Started local service: http://128.135.125.18:50000 > Started coaster service: http://128.135.125.18:1984 > Started coaster service: http://128.135.125.18:1984 > SC-null: Disabling heartbeats (config is null) > Multiplexer 0 started > (0) Scheduling SC-null for addition > nullChannel started > Multiplexer 1 started > Received registration: blockid = foo, url = > Avg stream buf: 0 > Avg stream buf: 0 > Avg stream buf: 0 > > > I'm using the latest trunk code from swift and cog > > > > > Just to confirm the "local service" is where swift submits the jobs > and "coaster service" is the one the workers connect to, correct? > > Thanks, > -Allan > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: SwiftConfigurations.odg Type: application/vnd.oasis.opendocument.graphics Size: 21664 bytes Desc: not available URL: