From wilde at mcs.anl.gov Tue May 1 12:30:37 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 1 May 2012 12:30:37 -0500 (CDT) Subject: [Swift-devel] java installations on Beagle In-Reply-To: <58DAA4C9-0A51-45C8-8298-4A8D26095380@uchicago.edu> Message-ID: <863082116.1166.1335893437549.JavaMail.root@zimbra.anl.gov> Hi Lorenzo, Just a few notes re your java emails (sorry, we are all swamped right now with multiple colliding deadlines). - should never need to set JAVA_HOME manually; in fact is usually dangerous to do set. Java will set it to the dir containing the command you run. Just get the right java bin/ in your path). - we are still experimenting with which Java works best on Beagle, so I am cc'ing this to swift-devel for comments. I recall that Ketan found some issues with the default IBM Java that were remedied by using a Sun Java. - *usually* its best to use the latest Sun Java available. - I am assuming for Beagle (and most modern machines) we aways want to use a 64 bit Java Mike ----- Original Message ----- > From: "Lorenzo Pesce" > To: "Michael Wilde" > Sent: Tuesday, May 1, 2012 10:30:18 AM > Subject: java installations on Beagle > Mike, > > No rush here. It looks like there is some mess in the java > installations on Beagle. For one there are a bunch: > lpesce at login5:~> module avail java > > ------------------------------- /opt/modulefiles > ------------------------------- > java/jdk1.6.0_20 java/jdk1.6.0_24(default) > lpesce at login5:~> module avail sun-java > > --------------------------- /soft/modulefiles/tools > ---------------------------- > sun-java/jdk1.7.0_02(default) > > > I thought that the module called "java" was the sun java. Now I a > utterly perplexed. > > > Moreover, $JAVA_HOME doesn't seem to be affected by which java is > loaded, which seems dubious to me. > > I am going to force a new definition of JAVA_HOME *within* the ant > module when I load sun-java -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Tue May 1 12:48:52 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 1 May 2012 12:48:52 -0500 (CDT) Subject: [Swift-devel] Streams In-Reply-To: Message-ID: <906416717.1211.1335894532731.JavaMail.root@zimbra.anl.gov> Hi Ketan, I'll try to reply to this and Ben's followup some time next week. Im very interested in the streams paradigm. Note that we have at least one example of a Swift script running in an infinite loop: its the Swift remote function call server in the SwiftR package (Swift for the R language). I think that uses an infinite iterate loop that exits when it gets an exit command from the FIFO its reading. We talked about adding simple socket interfaces via builtin functions. The streams question is of string interest for the future. - Mike ----- Original Message ----- > From: "Ketan Maheshwari" > To: "Swift Devel" > Sent: Monday, April 30, 2012 1:39:54 PM > Subject: [Swift-devel] Streams > Hi, > > > I am working on a DOE powergrid related project here at Cornell. > > > An aim of the project is to compute power grid state estimation and > react in time-critical fashion. > > > The application at a very high level, is a distributed > producer/consumer system where multiple producers produce data streams > consumed by multiple consumers in a publish-subscribe model of data > flow. > > > The producers (phasor measurement units) produce streams continuously > and consumers(State Estimators) can subscribe to the producers. There > can be multiple consumers consuming from a single producer for > performance and consistency purposes. > > > Can Swift support this model of computation? In particular, I am > wondering how to go about the following aspects with Swift: > > > 1. Describe application which could run in an 'infinite' loop. > 2. Mappers to streams. I think these streams should be some kind of > named buffers. A memory to memory stream model is what I vaguely view > this as. > > > The streams are binary encoded ones in big-endian format and could be > parsed (by consumers) as id'd tuples each containing 5-6 fixed width > field of timestamp, voltage, current, delta etc. data. > > > There are other requirements of the application and plenty of low > level nitty gritty but I think Swift could handle all of'em. I am just > unsure of the above two at the moment. > > > We are in discussion with WSU collaborators to deliver some of the > 'real' parts of the application. However, in the meantime we do have > toy components to test and play with. > > > Any input is greatly appreciated. > > > Regards, > -- > Ketan > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From jonmon at mcs.anl.gov Tue May 1 12:38:05 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Tue, 1 May 2012 12:38:05 -0500 Subject: [Swift-devel] java installations on Beagle In-Reply-To: <863082116.1166.1335893437549.JavaMail.root@zimbra.anl.gov> References: <863082116.1166.1335893437549.JavaMail.root@zimbra.anl.gov> Message-ID: <3FDEAE64-E05B-4FE5-94D9-751E0F7BC3EC@mcs.anl.gov> Its always best to use sun java but IBMs java works fine. The only issue we have with IBMs java is when submitting jobs to beagle remotely with automatic coasters. In that scenario an eof exception is thrown. On May 1, 2012, at 12:30, Michael Wilde wrote: > Hi Lorenzo, > > Just a few notes re your java emails (sorry, we are all swamped right now with multiple colliding deadlines). > > - should never need to set JAVA_HOME manually; in fact is usually dangerous to do set. Java will set it to the dir containing the command you run. Just get the right java bin/ in your path). > > - we are still experimenting with which Java works best on Beagle, so I am cc'ing this to swift-devel for comments. I recall that Ketan found some issues with the default IBM Java that were remedied by using a Sun Java. > > - *usually* its best to use the latest Sun Java available. > > - I am assuming for Beagle (and most modern machines) we aways want to use a 64 bit Java > > Mike > > > ----- Original Message ----- >> From: "Lorenzo Pesce" >> To: "Michael Wilde" >> Sent: Tuesday, May 1, 2012 10:30:18 AM >> Subject: java installations on Beagle >> Mike, >> >> No rush here. It looks like there is some mess in the java >> installations on Beagle. For one there are a bunch: >> lpesce at login5:~> module avail java >> >> ------------------------------- /opt/modulefiles >> ------------------------------- >> java/jdk1.6.0_20 java/jdk1.6.0_24(default) >> lpesce at login5:~> module avail sun-java >> >> --------------------------- /soft/modulefiles/tools >> ---------------------------- >> sun-java/jdk1.7.0_02(default) >> >> >> I thought that the module called "java" was the sun java. Now I a >> utterly perplexed. >> >> >> Moreover, $JAVA_HOME doesn't seem to be affected by which java is >> loaded, which seems dubious to me. >> >> I am going to force a new definition of JAVA_HOME *within* the ant >> module when I load sun-java > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From tim.g.armstrong at gmail.com Tue May 1 13:52:01 2012 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Tue, 1 May 2012 13:52:01 -0500 Subject: [Swift-devel] Streams In-Reply-To: <906416717.1211.1335894532731.JavaMail.root@zimbra.anl.gov> References: <906416717.1211.1335894532731.JavaMail.root@zimbra.anl.gov> Message-ID: My instinct is that if you wanted to do this in Swift, then the right way would be to model a stream as a variable in the same way as files. Swift would be responsible for hooking up applications to different streams... You can't do this in current Swift, I'm guessing it would need major extensions - Tim On Tue, May 1, 2012 at 12:48 PM, Michael Wilde wrote: > Hi Ketan, > > I'll try to reply to this and Ben's followup some time next week. Im very > interested in the streams paradigm. > > Note that we have at least one example of a Swift script running in an > infinite loop: its the Swift remote function call server in the SwiftR > package (Swift for the R language). I think that uses an infinite iterate > loop that exits when it gets an exit command from the FIFO its reading. We > talked about adding simple socket interfaces via builtin functions. > > The streams question is of string interest for the future. > > - Mike > > > ----- Original Message ----- > > From: "Ketan Maheshwari" > > To: "Swift Devel" > > Sent: Monday, April 30, 2012 1:39:54 PM > > Subject: [Swift-devel] Streams > > Hi, > > > > > > I am working on a DOE powergrid related project here at Cornell. > > > > > > An aim of the project is to compute power grid state estimation and > > react in time-critical fashion. > > > > > > The application at a very high level, is a distributed > > producer/consumer system where multiple producers produce data streams > > consumed by multiple consumers in a publish-subscribe model of data > > flow. > > > > > > The producers (phasor measurement units) produce streams continuously > > and consumers(State Estimators) can subscribe to the producers. There > > can be multiple consumers consuming from a single producer for > > performance and consistency purposes. > > > > > > Can Swift support this model of computation? In particular, I am > > wondering how to go about the following aspects with Swift: > > > > > > 1. Describe application which could run in an 'infinite' loop. > > 2. Mappers to streams. I think these streams should be some kind of > > named buffers. A memory to memory stream model is what I vaguely view > > this as. > > > > > > The streams are binary encoded ones in big-endian format and could be > > parsed (by consumers) as id'd tuples each containing 5-6 fixed width > > field of timestamp, voltage, current, delta etc. data. > > > > > > There are other requirements of the application and plenty of low > > level nitty gritty but I think Swift could handle all of'em. I am just > > unsure of the above two at the moment. > > > > > > We are in discussion with WSU collaborators to deliver some of the > > 'real' parts of the application. However, in the meantime we do have > > toy components to test and play with. > > > > > > Any input is greatly appreciated. > > > > > > Regards, > > -- > > Ketan > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue May 1 13:57:37 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 01 May 2012 11:57:37 -0700 Subject: [Swift-devel] Streams In-Reply-To: References: <906416717.1211.1335894532731.JavaMail.root@zimbra.anl.gov> Message-ID: <1335898657.6052.3.camel@blabla> On Tue, 2012-05-01 at 13:52 -0500, Tim Armstrong wrote: > My instinct is that if you wanted to do this in Swift, then the right > way would be to model a stream as a variable in the same way as files. > Swift would be responsible for hooking up applications to different > streams... You can't do this in current Swift, I'm guessing it would > need major extensions Not really. The arrays already work that way (as Ben points out). What is needed is an external notification mechanism (i.e. something to tell Swift when a new file comes in - in other words that an array got a new element). That's easy conceptually. The more difficult part is, and Ben also points that out, the "garbage collection" issue. I.e. the ability to process an infinite number of objects in a loop without running out of resources. From wilde at mcs.anl.gov Tue May 1 14:00:41 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 1 May 2012 14:00:41 -0500 (CDT) Subject: [Swift-devel] Streams In-Reply-To: Message-ID: <264423839.1326.1335898841890.JavaMail.root@zimbra.anl.gov> Maybe we could come close with an (~infinite) server loop, and an ext mapper invocation in that loop that pulled data from some IPC interface and created files for Swift to process on each trip around the loop. Thats essentially what the SwiftR service does, using FIFOs to do it all with no code changes to Swift itself (for the moment, just to play with the mechanism and paradigm). I think you could experiment with this approach as a frst approximation, Ketan. Getting and adapting the SwiftR code *might* be a start. In fact you could simulate the input stream by creating it or reading it from an R driver program (which can do socket calls). - Mike ----- Original Message ----- > From: "Tim Armstrong" > To: "Michael Wilde" > Cc: "Ketan Maheshwari" , "Swift Devel" > Sent: Tuesday, May 1, 2012 1:52:01 PM > Subject: Re: [Swift-devel] Streams > My instinct is that if you wanted to do this in Swift, then the right > way would be to model a stream as a variable in the same way as files. > Swift would be responsible for hooking up applications to different > streams... You can't do this in current Swift, I'm guessing it would > need major extensions > > - Tim > > > On Tue, May 1, 2012 at 12:48 PM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > Hi Ketan, > > I'll try to reply to this and Ben's followup some time next week. Im > very interested in the streams paradigm. > > Note that we have at least one example of a Swift script running in an > infinite loop: its the Swift remote function call server in the SwiftR > package (Swift for the R language). I think that uses an infinite > iterate loop that exits when it gets an exit command from the FIFO its > reading. We talked about adding simple socket interfaces via builtin > functions. > > The streams question is of string interest for the future. > > - Mike > > > > > ----- Original Message ----- > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > To: "Swift Devel" < swift-devel at ci.uchicago.edu > > > Sent: Monday, April 30, 2012 1:39:54 PM > > Subject: [Swift-devel] Streams > > Hi, > > > > > > I am working on a DOE powergrid related project here at Cornell. > > > > > > An aim of the project is to compute power grid state estimation and > > react in time-critical fashion. > > > > > > The application at a very high level, is a distributed > > producer/consumer system where multiple producers produce data > > streams > > consumed by multiple consumers in a publish-subscribe model of data > > flow. > > > > > > The producers (phasor measurement units) produce streams > > continuously > > and consumers(State Estimators) can subscribe to the producers. > > There > > can be multiple consumers consuming from a single producer for > > performance and consistency purposes. > > > > > > Can Swift support this model of computation? In particular, I am > > wondering how to go about the following aspects with Swift: > > > > > > 1. Describe application which could run in an 'infinite' loop. > > 2. Mappers to streams. I think these streams should be some kind of > > named buffers. A memory to memory stream model is what I vaguely > > view > > this as. > > > > > > The streams are binary encoded ones in big-endian format and could > > be > > parsed (by consumers) as id'd tuples each containing 5-6 fixed width > > field of timestamp, voltage, current, delta etc. data. > > > > > > There are other requirements of the application and plenty of low > > level nitty gritty but I think Swift could handle all of'em. I am > > just > > unsure of the above two at the moment. > > > > > > We are in discussion with WSU collaborators to deliver some of the > > 'real' parts of the application. However, in the meantime we do have > > toy components to test and play with. > > > > > > Any input is greatly appreciated. > > > > > > Regards, > > -- > > Ketan > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From benc at hawaga.org.uk Wed May 2 12:54:58 2012 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 2 May 2012 19:54:58 +0200 Subject: [Swift-devel] Streams In-Reply-To: <264423839.1326.1335898841890.JavaMail.root@zimbra.anl.gov> References: <264423839.1326.1335898841890.JavaMail.root@zimbra.anl.gov> Message-ID: <53C6666E-C3A8-41B3-8924-CD9BFE929A7C@hawaga.org.uk> Relevant (at a language design level) to the problem of arrays getting almost-infinitely big and how to garbage collect. When you write: output = map f input then f is applied element at a time. The code in f doesn't do the indexing into 'input' or 'output' itself - instead the language handles that inside 'map'. What makes that interesting in this case is that you can change input/output to be something that isn't an indexable array - a "stream", for example. So the function f can only ever deal with the 'current position' in the stream - it can't go off trying to access either the future or the past. Which means, then, that the runtime system can assume more about how data will be used, and what it can forget about - once "now" has been processed completely, then the runtime can forget about "now". The application code can't go back and refer to the past. But that's sort of the point... There are other things on top of this to allow bounded access to the past so that you can do things like compare to the previous timestep, but they are bounded so that the runtime can still make useful inferences about when data is no longer needed. -- From ketancmaheshwari at gmail.com Wed May 2 13:38:29 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 2 May 2012 14:38:29 -0400 Subject: [Swift-devel] Streams In-Reply-To: <53C6666E-C3A8-41B3-8924-CD9BFE929A7C@hawaga.org.uk> References: <264423839.1326.1335898841890.JavaMail.root@zimbra.anl.gov> <53C6666E-C3A8-41B3-8924-CD9BFE929A7C@hawaga.org.uk> Message-ID: Here is how I see this from Swift's point of view (may be impractical in terms of implementation): Swift provides a type called "stream" with the following properties: 1. Fixed size: solves the garbage collection and resource over-allocation issues 2. Always open: Fixes the contradiction with "future" variables policy 3. Lossy: Things get overwritten as they arrive which means that Swift allocates a *fixed* amount of memory to a named-buffer mapped to a variable of type stream. Swift implements read and write primitives on streams. Write overwrites on the stream and read returns what is current in the stream. Swift does not hang when stream types are open. Agreed this would be a 'lossy' implementation but could suite many types of applications where each bit of input data is not important, just the ones that were captured on best effort basis. Meanwhile, as Mike suggested, it is a good idea to play with SwiftR which I am about to do. Regards, Ketan On Wed, May 2, 2012 at 1:54 PM, Ben Clifford wrote: > kinda pie in the sky> > > Relevant (at a language design level) to the problem of arrays getting > almost-infinitely big and how to garbage collect. > > When you write: > > output = map f input > > then f is applied element at a time. > > The code in f doesn't do the indexing into 'input' or 'output' itself - > instead the language handles that inside 'map'. > > What makes that interesting in this case is that you can change > input/output to be something that isn't an indexable array - a "stream", > for example. > > So the function f can only ever deal with the 'current position' in the > stream - it can't go off trying to access either the future or the past. > > Which means, then, that the runtime system can assume more about how data > will be used, and what it can forget about - once "now" has been processed > completely, then the runtime can forget about "now". > > The application code can't go back and refer to the past. But that's sort > of the point... > > There are other things on top of this to allow bounded access to the past > so that you can do things like compare to the previous timestep, but they > are bounded so that the runtime can still make useful inferences about when > data is no longer needed. > > -- > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu May 3 10:07:58 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 3 May 2012 10:07:58 -0500 (CDT) Subject: [Swift-devel] Coaster work underway In-Reply-To: <1495557330.4368.1336057285612.JavaMail.root@zimbra.anl.gov> Message-ID: <675462943.4380.1336057678189.JavaMail.root@zimbra.anl.gov> Mihael, Thanks very much for your help on the Cray benchmarks and coaster speedup. The numbers were very impressive. We'll post the slides on the Swift and Beagle sites soon. A temporary link is: http://www.ci.uchicago.edu/~wilde/Swift-CUG.2012.0503.v10.{pdf,pptx} I think next steps would be: - continue to work on bug 690 (provider staging issue) - see if we can increase efficiency for larger number of smaller tasks on Crays - support Borja's students (Jessica and Reed) in developing a Coaster client C API. For the efficiency benchmarks, it seems to me that Swift and the coaster service has some headroom in terms of task rates. It looks to me like the coaster worker is having a hard time keeping up and keeping the node saturated. I think we should re-run these tests with real CPU-burning app() calls, and see if we can confirm or refute this conjecture by measuring a single worker eg at 32 cores. Unless you have other ideas on where the next bottleneck lies. I think we can get fairly regular test runs on a 18K-core Cray (as batch jobs, which we can test on Raven and then pass to Cray for the benchmark runs). We should review the data we have, and then plan a set of proposed improvements and verifications. Ideally measuring on some local repeatable test setup like the MCS servers or bridled/communcado, or multiple PADS nodes. I think a good goal would be 95% efficiency for 60 sec tasks at 18K cores. 100%: 314 tasks/sec 95%: 298 tasks/sec 90%: 283 tasks/sec Coasters seems to be able to do over 330 tasks/sec., so at least hypothetically, this should be doable. And the 330/sec is with only 6 cores and an 8GB Java RAM limit. We can try to use a 32-core compute node to run Swift from, with up to 64GB RAM, so we should be able to get even more headroom on the Swift+CoasterService side. Lets slate this next for next week or later; till then, would be good to resolve 690. Regards, - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From benc at hawaga.org.uk Fri May 4 06:54:16 2012 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 4 May 2012 13:54:16 +0200 Subject: [Swift-devel] Fwd: [provenance-challenge] W3C Provenance Working Group Drafts -- ready for review References: Message-ID: The Open Provenance Model that swift helped make merged/turned into w3c provenance, which is still active... Begin forwarded message: > From: Paul Groth > Date: May 4, 2012 11:48:05 AM GMT+02:00 > To: public-lod at w3.org, semantic-web at w3.org, provenance-challenge at ipaw.info > Subject: [provenance-challenge] W3C Provenance Working Group Drafts -- ready for review > Reply-To: provenance-challenge at ipaw.info > > Hello, > > The Provenance Working Group is happy to announce that 5 working > drafts available for review including a primer, ontology and data > model. These drafts define a model for interchanging provenance on the > Web. We are looking for your input. > > To get into the specs, we've prepared some introductory blog posts. > Please have a look. > > PROV: synchronized and ready for your input > - http://www.w3.org/blog/SW/2012/05/03/prov-synchronized-and-ready-for-your-input/ > > The PROV ontology ? an update > - http://www.w3.org/blog/SW/2012/05/04/the-prov-ontology-an-update/ > > What is new in the Fourth Working Draft of the PROV provenance model? > - http://www.w3.org/blog/SW/2012/05/03/what-is-new-in-the-fourth-working-draft-of-the-prov-provenance-model/ > > > Again, the group is looking for your feedback and is looking to > finalize the interchange model soon. > > Thanks, > Paul > co-chair Provenance Working Group > > -- > Dr. Paul Groth (p.t.groth at vu.nl) > http://www.few.vu.nl/~pgroth/ > Assistant Professor > Knowledge Representation & Reasoning Group > Artificial Intelligence Section > Department of Computer Science > VU University Amsterdam > From wilde at mcs.anl.gov Thu May 10 14:16:09 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 10 May 2012 14:16:09 -0500 (CDT) Subject: [Swift-devel] Beagle needs from Swift Message-ID: <188082281.15367.1336677369889.JavaMail.root@zimbra.anl.gov> Hi All, Lorenzo and I met this morning to review beagle needs from Swift. Here's a summary: 1. Lorenzo is organizing a Swift tutorial for Beagle users. Suggested dates are last week of May. I prefer Wed-Fri that week, May 30-June1. Fri June 1 would give the most time to prepare. David and Justin, can you attend (with me and Lorenzo) to help users hands-on? Proposed time is 2 hours; short talk and then hands-on. 2. Jon, Lorenzo i interested in a portable swiftrun command. Can you check in your start on that, ideally with some asciidoc user guide info? 3. Lorenzo is having trouble running Java bio apps (picard) on beagle compute nodes. Memory crashes, runtime discrepancies, etc. Mihael, all: I suggested Lorenzo send the issues to swift-devel to tap your Java expertise, especially on the memory and GC options. 4. One Swift script Lorenzo is constructing may need to control jobsPerNode on a per-app basis. Justin, I forgot: can your latest mod for per-app attributes handle that? Lorenzo, did I miss anything? We should revert discussion of items 2,3,4 to the swift-devel list, which I will add Lorenzo to. - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Thu May 10 14:24:58 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 10 May 2012 14:24:58 -0500 (CDT) Subject: [Swift-devel] Beagle needs from Swift In-Reply-To: <188082281.15367.1336677369889.JavaMail.root@zimbra.anl.gov> Message-ID: <120389994.15381.1336677898989.JavaMail.root@zimbra.anl.gov> I added Lorenzo to swift-devel. - Mike ----- Original Message ----- > From: "Michael Wilde" > To: "Jon Monette" , "Justin M Wozniak" , "David Kelly" > , "Mihael Hategan" , "Lorenzo Pesce" > Cc: "Swift Devel" > Sent: Thursday, May 10, 2012 2:16:09 PM > Subject: [Swift-devel] Beagle needs from Swift > Hi All, > > Lorenzo and I met this morning to review beagle needs from Swift. > Here's a summary: > > 1. Lorenzo is organizing a Swift tutorial for Beagle users. Suggested > dates are last week of May. I prefer Wed-Fri that week, May 30-June1. > Fri June 1 would give the most time to prepare. David and Justin, can > you attend (with me and Lorenzo) to help users hands-on? Proposed time > is 2 hours; short talk and then hands-on. > > 2. Jon, Lorenzo i interested in a portable swiftrun command. Can you > check in your start on that, ideally with some asciidoc user guide > info? > > 3. Lorenzo is having trouble running Java bio apps (picard) on beagle > compute nodes. Memory crashes, runtime discrepancies, etc. Mihael, > all: I suggested Lorenzo send the issues to swift-devel to tap your > Java expertise, especially on the memory and GC options. > > 4. One Swift script Lorenzo is constructing may need to control > jobsPerNode on a per-app basis. Justin, I forgot: can your latest mod > for per-app attributes handle that? > > Lorenzo, did I miss anything? > > We should revert discussion of items 2,3,4 to the swift-devel list, > which I will add Lorenzo to. > > - Mike > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wozniak at mcs.anl.gov Thu May 10 14:35:26 2012 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Thu, 10 May 2012 14:35:26 -0500 Subject: [Swift-devel] Beagle needs from Swift In-Reply-To: <188082281.15367.1336677369889.JavaMail.root@zimbra.anl.gov> References: <188082281.15367.1336677369889.JavaMail.root@zimbra.anl.gov> Message-ID: <4FAC187E.7080502@mcs.anl.gov> On 05/10/2012 02:16 PM, Michael Wilde wrote: > Hi All, > > Lorenzo and I met this morning to review beagle needs from Swift. Here's a summary: > > 1. Lorenzo is organizing a Swift tutorial for Beagle users. Suggested dates are last week of May. I prefer Wed-Fri that week, May 30-June1. Fri June 1 would give the most time to prepare. David and Justin, can you attend (with me and Lorenzo) to help users hands-on? Proposed time is 2 hours; short talk and then hands-on. Sure. > 4. One Swift script Lorenzo is constructing may need to control jobsPerNode on a per-app basis. Justin, I forgot: can your latest mod for per-app attributes handle that? Not currently. That is not a simple per-job setting. To get that would take a more significant change to the Cpu allocation scheme. I would like to do that at some point. -- Justin M Wozniak From wilde at mcs.anl.gov Thu May 10 14:45:20 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 10 May 2012 14:45:20 -0500 (CDT) Subject: [Swift-devel] Beagle needs from Swift In-Reply-To: <4FAC187E.7080502@mcs.anl.gov> Message-ID: <23762832.15426.1336679120869.JavaMail.root@zimbra.anl.gov> > > 4. One Swift script Lorenzo is constructing may need to control > > jobsPerNode on a per-app basis. Justin, I forgot: can your latest > > mod for per-app attributes handle that? > > Not currently. That is not a simple per-job setting. To get that would > take a more significant change to the Cpu allocation scheme. I would > like to do that at some point. So for now, Lorenzo, the way to do this is to use a separate site (sites.xml entry) for each different value of jobsPerNode that you require. I assume most will be 24, and only the large-memory apps will need a lower number (2,4,8 etc). Then in the tc.data file, list those exception applications as being associated with only the right pool entry. - Mike From davidk at ci.uchicago.edu Thu May 10 14:49:54 2012 From: davidk at ci.uchicago.edu (David Kelly) Date: Thu, 10 May 2012 14:49:54 -0500 (CDT) Subject: [Swift-devel] Beagle needs from Swift In-Reply-To: <188082281.15367.1336677369889.JavaMail.root@zimbra.anl.gov> Message-ID: <1076706016.43271.1336679394668.JavaMail.root@zimbra-mb2.anl.gov> ----- Original Message ----- > David and Justin, can > you attend (with me and Lorenzo) to help users hands-on? Proposed time > is 2 hours; short talk and then hands-on. Sure, I'd be happy to help. > 2. Jon, Lorenzo i interested in a portable swiftrun command. Can you > check in your start on that, ideally with some asciidoc user guide > info? Yep, I'd like to start working on that again. We can probably start with with one of the shell scripts we have for DSSAT/PsiColSim/SWAT. From jonmon at mcs.anl.gov Thu May 10 14:53:47 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Thu, 10 May 2012 14:53:47 -0500 Subject: [Swift-devel] Beagle needs from Swift In-Reply-To: <1076706016.43271.1336679394668.JavaMail.root@zimbra-mb2.anl.gov> References: <1076706016.43271.1336679394668.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: <505E748E-DE04-4F73-8E59-95FD354344F5@mcs.anl.gov> On May 10, 2012, at 14:49, David Kelly wrote: > > > ----- Original Message ----- >> David and Justin, can >> you attend (with me and Lorenzo) to help users hands-on? Proposed time >> is 2 hours; short talk and then hands-on. > > Sure, I'd be happy to help. > >> 2. Jon, Lorenzo i interested in a portable swiftrun command. Can you >> check in your start on that, ideally with some asciidoc user guide >> info? > > Yep, I'd like to start working on that again. We can probably start with with one of the shell scripts we have for DSSAT/PsiColSim/SWAT. So that is the one have. It is based off of SciColSim. I used the same template to make the DSSAT run script and sleepn run script for the benchmarks. I will check in what I have and start on the Asciidoc. > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Thu May 10 20:23:33 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 10 May 2012 18:23:33 -0700 Subject: [Swift-devel] Beagle needs from Swift In-Reply-To: <188082281.15367.1336677369889.JavaMail.root@zimbra.anl.gov> References: <188082281.15367.1336677369889.JavaMail.root@zimbra.anl.gov> Message-ID: <1336699413.8831.1.camel@blabla> On Thu, 2012-05-10 at 14:16 -0500, Michael Wilde wrote: > 3. Lorenzo is having trouble running Java bio apps (picard) on beagle > compute nodes. Memory crashes, runtime discrepancies, etc. Mihael, > all: I suggested Lorenzo send the issues to swift-devel to tap your > Java expertise, especially on the memory and GC options. That seems like a good idea. > > 4. One Swift script Lorenzo is constructing may need to control > jobsPerNode on a per-app basis. Justin, I forgot: can your latest mod > for per-app attributes handle that? Was that the issue about scheduling based on maxmem requirements? From ketancmaheshwari at gmail.com Sun May 13 09:27:15 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Sun, 13 May 2012 10:27:15 -0400 Subject: [Swift-devel] Comparing Swift and Hadoop Message-ID: Hi, We are working on a project from GE Energy corporation which runs independent MonteCarlo simulations in order to find device reliability leading to a power grid wise device replacement decisions. The computation is repeated MC simulations done in parallel. Currently, this is running under Hadoop setup on Cornell Redcloud and EC2 (10 nodes). Looking at the computation, it struck me this is a good Swift candidate. And since the performance numbers etc are already extracted for Hadoop, it might also be nice to have a comparison between Swift and Hadoop. However, some reality check before diving in: has it been done before? Do we know how Swift fares against map-reduce? Are they even comparable? I have faced this question twice here: Why use Swift when you have Hadoop? I could see Hadoop needs quite a bit of setup effort before getting it to run. Could we quantify usability and compare the two? Any ideas and inputs are welcome. Regards, -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.g.armstrong at gmail.com Sun May 13 13:09:48 2012 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Sun, 13 May 2012 13:09:48 -0500 Subject: [Swift-devel] Comparing Swift and Hadoop In-Reply-To: References: Message-ID: I've worked on both Swift and Hadoop implementations and my tendency is to say that there isn't actually any deep similarity beyond them both supporting distributed data processing/computation. They both make fundamentally different assumptions about the clusters they run on and the applications they're supporting. Swift is mainly designed for time-shared clusters with reliable shared file systems. Hadoop assumes that it will be running on unreliable commodity machines with no shared file system, and will be running continuously on all machines on the cluster. Swift is designed for orchestrating existing executables with their own file formats, so mostly remains agnostic to the contents of the files it is processing. Hadoop needs to have some understanding of the contents of the files it is processing, to be able to segment them into records and perform key comparisons so it can do a distributed sort, etc. It provides its own file formats (including compression, serialization, etc) that users can use, although is extensible to custom file formats. - Hadoop implements its own distributed file-system with software redundancy, Swift uses an existing cluster filesystem or node-local file systems. For bulk data processing, this means Hadoop will generally be able to deliver more disk bandwidth and has a bunch of other implications. - Hadoop has a record-oriented view of the world, i.e. it is built around the idea that you are processing a record at at time, rather than a file at a time as in Swift - As a result, Hadoop includes a bunch of functionality to do with file formats, compression, serialization etc: Swift is B.Y.O. file format - Hadoop's distributed sort is a core part of the MapReduce (and something that a lot of effort has gone into implementing and optimizing), Swift doesn't have built-in support for anything similar - Swift lets you construct arbitrary dataflow graphs between tasks, so in some ways is less restrictive than the map-reduce pattern (although it doesn't directly support some things that the map-reduce pattern does, so I wouldn't say that it is strictly more general) I'd say that some applications might fit in both paradigms, but that neither supports a superset of the applications that the other supports. Performance would depend to a large extent on the application. Swift might actually be quicker to start up a job and dispatch tasks (Hadoop is notoriously slow on that front), but otherwise I'd say it just depends on the application, how you implement the application, the cluster, etc. I'm not sure that there is a fair comparison between the two systems since they're just very different: most of the results would be predictable just be looking at the design of the system (e.g. if the application needs to do a big distributed sort, Hadoop is much better) . If the application is embarrassingly parallel (like it sounds like your application is), then you could probably implement it in either, but I'm not sure that it would actually stress the differences between the systems if data sizes are small and runtime is mostly dominated by computation. I think the Cloudera Hadoop distribution is well documented reasonably easy to set up and run, provided that you're not on a time-shared cluster. Apache Hadoop is more of a pain to get working. - Tim On Sun, May 13, 2012 at 9:27 AM, Ketan Maheshwari < ketancmaheshwari at gmail.com> wrote: > Hi, > > We are working on a project from GE Energy corporation which runs > independent MonteCarlo simulations in order to find device reliability > leading to a power grid wise device replacement decisions. The computation > is repeated MC simulations done in parallel. > > Currently, this is running under Hadoop setup on Cornell Redcloud and EC2 > (10 nodes). Looking at the computation, it struck me this is a good Swift > candidate. And since the performance numbers etc are already extracted for > Hadoop, it might also be nice to have a comparison between Swift and Hadoop. > > However, some reality check before diving in: has it been done before? Do > we know how Swift fares against map-reduce? Are they even comparable? I > have faced this question twice here: Why use Swift when you have Hadoop? > > I could see Hadoop needs quite a bit of setup effort before getting it to > run. Could we quantify usability and compare the two? > > Any ideas and inputs are welcome. > > Regards, > -- > Ketan > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.iit.edu Sun May 13 14:19:27 2012 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Sun, 13 May 2012 14:19:27 -0500 Subject: [Swift-devel] Comparing Swift and Hadoop In-Reply-To: References: Message-ID: <86B8F55F-962C-42BC-B93F-981F426A852F@cs.iit.edu> Hi Ketan, We had Some preliminary work back in 2008 on this topic, we presented it as a poster. http://www.cs.iit.edu/~iraicu/research/publications/2008_TG08_swift+hadoop-poster.pdf Ioan -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor ================================================================= Computer Science Department Illinois Institute of Technology 10 W. 31st Street Chicago, IL 60616 ================================================================= Cel: 1-847-722-0876 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ ================================================================= ================================================================= On May 13, 2012, at 9:27 AM, Ketan Maheshwari wrote: > Hi, > > We are working on a project from GE Energy corporation which runs independent MonteCarlo simulations in order to find device reliability leading to a power grid wise device replacement decisions. The computation is repeated MC simulations done in parallel. > > Currently, this is running under Hadoop setup on Cornell Redcloud and EC2 (10 nodes). Looking at the computation, it struck me this is a good Swift candidate. And since the performance numbers etc are already extracted for Hadoop, it might also be nice to have a comparison between Swift and Hadoop. > > However, some reality check before diving in: has it been done before? Do we know how Swift fares against map-reduce? Are they even comparable? I have faced this question twice here: Why use Swift when you have Hadoop? > > I could see Hadoop needs quite a bit of setup effort before getting it to run. Could we quantify usability and compare the two? > > Any ideas and inputs are welcome. > > Regards, > -- > Ketan > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.iit.edu Sun May 13 15:57:54 2012 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Sun, 13 May 2012 15:57:54 -0500 Subject: [Swift-devel] Comparing Swift and Hadoop In-Reply-To: References:

Message-ID: Hi Tim, I always thought of MapReduce being a subset of workflow systems. Can you give me an example of an application that can be implemented in MapReduce, but not a workflow system such as Swift? I can't think of any off the top of my head. Ioan -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor ================================================================= Computer Science Department Illinois Institute of Technology 10 W. 31st Street Chicago, IL 60616 ================================================================= Cel: 1-847-722-0876 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ ================================================================= ================================================================= On May 13, 2012, at 1:09 PM, Tim Armstrong wrote: > I've worked on both Swift and Hadoop implementations and my tendency is to say that there isn't actually any deep similarity beyond them both supporting distributed data processing/computation. They both make fundamentally different assumptions about the clusters they run on and the applications they're supporting. > > Swift is mainly designed for time-shared clusters with reliable shared file systems. Hadoop assumes that it will be running on unreliable commodity machines with no shared file system, and will be running continuously on all machines on the cluster. Swift is designed for orchestrating existing executables with their own file formats, so mostly remains agnostic to the contents of the files it is processing. Hadoop needs to have some understanding of the contents of the files it is processing, to be able to segment them into records and perform key comparisons so it can do a distributed sort, etc. It provides its own file formats (including compression, serialization, etc) that users can use, although is extensible to custom file formats. > Hadoop implements its own distributed file-system with software redundancy, Swift uses an existing cluster filesystem or node-local file systems. For bulk data processing, this means Hadoop will generally be able to deliver more disk bandwidth and has a bunch of other implications. > Hadoop has a record-oriented view of the world, i.e. it is built around the idea that you are processing a record at at time, rather than a file at a time as in Swift > As a result, Hadoop includes a bunch of functionality to do with file formats, compression, serialization etc: Swift is B.Y.O. file format > Hadoop's distributed sort is a core part of the MapReduce (and something that a lot of effort has gone into implementing and optimizing), Swift doesn't have built-in support for anything similar > Swift lets you construct arbitrary dataflow graphs between tasks, so in some ways is less restrictive than the map-reduce pattern (although it doesn't directly support some things that the map-reduce pattern does, so I wouldn't say that it is strictly more general) > I'd say that some applications might fit in both paradigms, but that neither supports a superset of the applications that the other supports. Performance would depend to a large extent on the application. Swift might actually be quicker to start up a job and dispatch tasks (Hadoop is notoriously slow on that front), but otherwise I'd say it just depends on the application, how you implement the application, the cluster, etc. I'm not sure that there is a fair comparison between the two systems since they're just very different: most of the results would be predictable just be looking at the design of the system (e.g. if the application needs to do a big distributed sort, Hadoop is much better) . If the application is embarrassingly parallel (like it sounds like your application is), then you could probably implement it in either, but I'm not sure that it would actually stress the differences between the systems if data sizes are small and runtime is mostly dominated by computation. > I think the Cloudera Hadoop distribution is well documented reasonably easy to set up and run, provided that you're not on a time-shared cluster. Apache Hadoop is more of a pain to get working. > > - Tim > > > On Sun, May 13, 2012 at 9:27 AM, Ketan Maheshwari wrote: > Hi, > > We are working on a project from GE Energy corporation which runs independent MonteCarlo simulations in order to find device reliability leading to a power grid wise device replacement decisions. The computation is repeated MC simulations done in parallel. > > Currently, this is running under Hadoop setup on Cornell Redcloud and EC2 (10 nodes). Looking at the computation, it struck me this is a good Swift candidate. And since the performance numbers etc are already extracted for Hadoop, it might also be nice to have a comparison between Swift and Hadoop. > > However, some reality check before diving in: has it been done before? Do we know how Swift fares against map-reduce? Are they even comparable? I have faced this question twice here: Why use Swift when you have Hadoop? > > I could see Hadoop needs quite a bit of setup effort before getting it to run. Could we quantify usability and compare the two? > > Any ideas and inputs are welcome. > > Regards, > -- > Ketan > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.g.armstrong at gmail.com Mon May 14 16:15:01 2012 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Mon, 14 May 2012 16:15:01 -0500 Subject: [Swift-devel] Comparing Swift and Hadoop In-Reply-To: References:

Message-ID: To be clear, I'm not making the case that it's *impossible* to implement things in Swift that are implemented in MapReduce, just that Swift isn't well suited to them, because it wasn't designed with them in mind. I've seen the argument before that MapReduce is a particular data flow DAG, and that you can express arbitrary data flow DAGs in other systems, but I think that somewhat misses the point of what MapReduce is trying to provide to application developers. By treating all tasks and data dependencies as equivalent, it ignores all of the runtime infrastructure that MapReduce inserts into the processes, and ignores, for example, some of the details of how data is moved between mappers and reducers. For example, a substantial amount of code in the Hadoop MapReduce code base has to do with a) file formats b) compression c) checksums d) serialization e) buffering input and output data and f) bucketing/sorting the data. This is all difficult to implement well and important for many big data applications. I think that scientific workflow systems don't take any of these things seriously since it isn't important for most canonical scientific workflow applications. I think one of the other big differences is that Hadoop assumes that all you have are a bunch of unreliable machines on a network, so that it must provide its own a job scheduler and replicated distributed file system. Swift, in contrast, seems mostly designed for systems where there is a reliable shared file system, and where it acquires compute resources for a fixed blocks of time from some existing cluster manager. I know there are ways you can have Swift/Coaster/Falkon run on networks of unreliable machines, but it's not quite like Hadoop's job scheduler which is designed to actually be the primary submission mechanism for a multi-user cluster. I don't think it would make much sense to run Swift on a network of unreliable machines and then to just leave your data on those machines (you would normally stage the final data to some backed-up file system), but it would make perfect sense for Hadoop, especially if the data is so big that it's difficult to find someplace else to put it. In contrast, you can certainly stand up a Hadoop instance on a shared cluster for a few hours to run your jobs, and stage data in and out of HDFS, but that use case isn't what Hadoop was designed or optimized for. Most of the core developers on Hadoop are working in environments where they have devoted Hadoop clusters, where they can't afford much cluster downtime and where they need to reliably persist huge amounts of data for years on unreliable hardware. E.g. at the extreme end, this is the kind of thing Hadoop developers are thinking about: https://www.facebook.com/notes/paul-yang/moving-an-elephant-large-scale-hadoop-data-migration-at-facebook/10150246275318920 - Tim On Sun, May 13, 2012 at 3:57 PM, Ioan Raicu wrote: > Hi Tim, > I always thought of MapReduce being a subset of workflow systems. Can you > give me an example of an application that can be implemented in MapReduce, > but not a workflow system such as Swift? I can't think of any off the top > of my head. > > > Ioan > > -- > ================================================================= > Ioan Raicu, Ph.D. > Assistant Professor > ================================================================= > Computer Science Department > Illinois Institute of Technology > 10 W. 31st Street Chicago, IL 60616 > ================================================================= > Cel: 1-847-722-0876 > Email: iraicu at cs.iit.edu > Web: http://www.cs.iit.edu/~iraicu/ > ================================================================= > ================================================================= > > > > On May 13, 2012, at 1:09 PM, Tim Armstrong > wrote: > > I've worked on both Swift and Hadoop implementations and my tendency is to > say that there isn't actually any deep similarity beyond them both > supporting distributed data processing/computation. They both make > fundamentally different assumptions about the clusters they run on and the > applications they're supporting. > > Swift is mainly designed for time-shared clusters with reliable shared > file systems. Hadoop assumes that it will be running on unreliable > commodity machines with no shared file system, and will be running > continuously on all machines on the cluster. Swift is designed for > orchestrating existing executables with their own file formats, so mostly > remains agnostic to the contents of the files it is processing. Hadoop > needs to have some understanding of the contents of the files it is > processing, to be able to segment them into records and perform key > comparisons so it can do a distributed sort, etc. It provides its own file > formats (including compression, serialization, etc) that users can use, > although is extensible to custom file formats. > > - Hadoop implements its own distributed file-system with software > redundancy, Swift uses an existing cluster filesystem or node-local file > systems. For bulk data processing, this means Hadoop will generally be > able to deliver more disk bandwidth and has a bunch of other implications. > - Hadoop has a record-oriented view of the world, i.e. it is built > around the idea that you are processing a record at at time, rather than a > file at a time as in Swift > - As a result, Hadoop includes a bunch of functionality to do with > file formats, compression, serialization etc: Swift is B.Y.O. file format > - Hadoop's distributed sort is a core part of the MapReduce (and > something that a lot of effort has gone into implementing and optimizing), > Swift doesn't have built-in support for anything similar > - Swift lets you construct arbitrary dataflow graphs between tasks, so > in some ways is less restrictive than the map-reduce pattern (although it > doesn't directly support some things that the map-reduce pattern does, so I > wouldn't say that it is strictly more general) > > I'd say that some applications might fit in both paradigms, but that > neither supports a superset of the applications that the other supports. > Performance would depend to a large extent on the application. Swift might > actually be quicker to start up a job and dispatch tasks (Hadoop is > notoriously slow on that front), but otherwise I'd say it just depends on > the application, how you implement the application, the cluster, etc. I'm > not sure that there is a fair comparison between the two systems since > they're just very different: most of the results would be predictable just > be looking at the design of the system (e.g. if the application needs to do > a big distributed sort, Hadoop is much better) . If the application is > embarrassingly parallel (like it sounds like your application is), then you > could probably implement it in either, but I'm not sure that it would > actually stress the differences between the systems if data sizes are small > and runtime is mostly dominated by computation. > I think the Cloudera Hadoop distribution is well documented reasonably > easy to set up and run, provided that you're not on a time-shared cluster. > Apache Hadoop is more of a pain to get working. > > - Tim > > > On Sun, May 13, 2012 at 9:27 AM, Ketan Maheshwari < > ketancmaheshwari at gmail.com> wrote: > >> Hi, >> >> We are working on a project from GE Energy corporation which runs >> independent MonteCarlo simulations in order to find device reliability >> leading to a power grid wise device replacement decisions. The computation >> is repeated MC simulations done in parallel. >> >> Currently, this is running under Hadoop setup on Cornell Redcloud and EC2 >> (10 nodes). Looking at the computation, it struck me this is a good Swift >> candidate. And since the performance numbers etc are already extracted for >> Hadoop, it might also be nice to have a comparison between Swift and Hadoop. >> >> However, some reality check before diving in: has it been done before? Do >> we know how Swift fares against map-reduce? Are they even comparable? I >> have faced this question twice here: Why use Swift when you have Hadoop? >> >> I could see Hadoop needs quite a bit of setup effort before getting it to >> run. Could we quantify usability and compare the two? >> >> Any ideas and inputs are welcome. >> >> Regards, >> -- >> Ketan >> >> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> >> > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Mon May 14 17:15:42 2012 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 14 May 2012 17:15:42 -0500 Subject: [Swift-devel] Comparing Swift and Hadoop In-Reply-To: References:

Message-ID: Also, the point of the workflow systems was to make loose-coupling of apps to be parallelized easier. If one is thinking to invest on tight-coupling (be it MPI or Hadoop), then it can be assumed that you have specific optimizations in mind for your app. Allan 2012/5/14 Tim Armstrong : > To be clear, I'm not making the case that it's impossible to implement > things in Swift that are implemented in MapReduce, just that Swift isn't > well suited to them, because it wasn't designed with them in mind.? I've > seen the argument before that MapReduce is a particular data flow DAG, and > that you can express arbitrary data flow DAGs in other systems, but I think > that somewhat misses the point of what MapReduce is trying to provide to > application developers.? By treating all tasks and data dependencies as > equivalent, it ignores all of the runtime infrastructure that MapReduce > inserts into the processes, and ignores, for example, some of the details of > how data is moved between mappers and reducers. > > For example, a substantial amount of code in the Hadoop MapReduce code base > has to do with a) file formats b) compression c) checksums d) serialization > e) buffering input and output data and f) bucketing/sorting the data.? This > is all difficult to implement well and important for many big data > applications.? I think that scientific workflow systems don't take any of > these things seriously since it isn't important for most canonical > scientific workflow applications. > > I think one of the other big differences is that Hadoop assumes that all you > have are a bunch of unreliable machines on a network, so that it must > provide its own a job scheduler and replicated distributed file system. > Swift, in contrast, seems mostly designed for systems where there is a > reliable shared file system, and where it acquires compute resources for a > fixed blocks of time from some existing cluster manager.? I know there are > ways you can have Swift/Coaster/Falkon run on networks of unreliable > machines, but it's not quite like Hadoop's job scheduler which is designed > to actually be the primary submission mechanism for a multi-user cluster. > > I don't think it would make much sense to run Swift on a network of > unreliable machines and then to just leave your data on those machines (you > would normally stage the final data to some backed-up file system), but it > would make perfect sense for Hadoop, especially if the data is so big that > it's difficult to find someplace else to put it.? In contrast, you can > certainly stand up a Hadoop instance on a shared cluster for a few hours to > run your jobs, and stage data in and out of HDFS, but that use case isn't > what Hadoop was designed or optimized for. Most of the core developers on > Hadoop are working in environments where they have devoted Hadoop clusters, > where they can't afford much cluster downtime and where they need to > reliably persist huge amounts of data for years on unreliable hardware. > E.g. at the extreme end, this is the kind of thing Hadoop developers are > thinking about: > https://www.facebook.com/notes/paul-yang/moving-an-elephant-large-scale-hadoop-data-migration-at-facebook/10150246275318920 > > - Tim > > > > On Sun, May 13, 2012 at 3:57 PM, Ioan Raicu wrote: >> >> Hi Tim, >> I always thought of MapReduce being a subset of workflow systems. Can you >> give me an example of an application that can be implemented in MapReduce, >> but not a workflow system such as Swift? I can't think of any off the top of >> my head. >> >> >> Ioan >> >> -- >> ================================================================= >> Ioan Raicu, Ph.D. >> Assistant Professor >> ================================================================= >> Computer Science Department >> Illinois Institute of Technology >> 10 W. 31st Street Chicago, IL 60616 >> ================================================================= >> Cel: ? 1-847-722-0876 >> Email: iraicu at cs.iit.edu >> Web: ? http://www.cs.iit.edu/~iraicu/ >> ================================================================= >> ================================================================= >> >> >> >> On May 13, 2012, at 1:09 PM, Tim Armstrong >> wrote: >> >> I've worked on both Swift and Hadoop implementations and my tendency is to >> say that there isn't actually any deep similarity beyond them both >> supporting? distributed data processing/computation.? They both make >> fundamentally different assumptions about the clusters they run on and the >> applications they're supporting. >> >> Swift is mainly designed for time-shared clusters with reliable shared >> file systems. Hadoop assumes that it will be running on unreliable commodity >> machines with no shared file system, and will be running continuously on all >> machines on the cluster.? Swift is designed for orchestrating existing >> executables with their own file formats, so mostly remains agnostic to the >> contents of the files it is processing.? Hadoop needs to have some >> understanding of the contents of the files it is processing, to be able to >> segment them into records and perform key comparisons so it can do a >> distributed sort, etc.? It provides its own file formats (including >> compression, serialization, etc) that users can use, although is extensible >> to custom file formats. >> >> Hadoop implements its own distributed file-system with software >> redundancy, Swift uses an existing cluster filesystem or node-local file >> systems.? For bulk data processing, this means Hadoop will generally be able >> to deliver more disk bandwidth and has a bunch of other implications. >> Hadoop has a record-oriented view of the world, i.e. it is built around >> the idea that you are processing a record at at time, rather than a file at >> a time as in Swift >> As a result, Hadoop includes a bunch of functionality to do with file >> formats, compression, serialization etc: Swift is B.Y.O. file format >> Hadoop's distributed sort is a core part of the MapReduce (and something >> that a lot of effort has gone into implementing and optimizing), Swift >> doesn't have built-in support for anything similar >> Swift lets you construct arbitrary dataflow graphs between tasks, so in >> some ways is less restrictive than the map-reduce pattern (although it >> doesn't directly support some things that the map-reduce pattern does, so I >> wouldn't say that it is strictly more general) >> >> I'd say that some applications might fit in both paradigms, but that >> neither supports a superset of the applications that the other supports. >> Performance would depend to a large extent on the application.? Swift might >> actually be quicker to start up a job and dispatch tasks (Hadoop is >> notoriously slow on that front), but otherwise I'd say it just depends on >> the application, how you implement the application, the cluster, etc. I'm >> not sure that there is a fair comparison between the two systems since >> they're just very different: most of the results would be predictable just >> be looking at the design of the system (e.g. if the application needs to do >> a big distributed sort, Hadoop is much better) .? If the application is >> embarrassingly parallel (like it sounds like your application is), then you >> could probably implement it in either, but I'm not sure that it would >> actually stress the differences between the systems if data sizes are small >> and runtime is mostly dominated by computation. >> >> I think the Cloudera Hadoop distribution is well documented reasonably >> easy to set up and run, provided that you're not on a time-shared cluster. >> Apache Hadoop is more of a pain to get working. >> >> - Tim >> >> >> On Sun, May 13, 2012 at 9:27 AM, Ketan Maheshwari >> wrote: >>> >>> Hi, >>> >>> We are working on a project from GE Energy corporation which runs >>> independent MonteCarlo simulations in order to find device reliability >>> leading to a power grid wise device replacement decisions. The computation >>> is repeated MC simulations done in parallel. >>> >>> Currently, this is running under Hadoop setup on Cornell Redcloud and EC2 >>> (10 nodes). Looking at the computation, it struck me this is a good Swift >>> candidate. And since the performance numbers etc are already extracted for >>> Hadoop, it might also be nice to have a comparison between Swift and Hadoop. >>> >>> However, some reality check before diving in: has it been done before? Do >>> we know how Swift fares against map-reduce? Are they even comparable? I have >>> faced this question twice here: Why use Swift when you have Hadoop? >>> >>> I could see Hadoop needs quite a bit of setup effort before getting it to >>> run. Could we quantify usability and compare the two? >>> >>> Any ideas and inputs are welcome. >>> >>> Regards, >>> -- >>> Ketan From tim.g.armstrong at gmail.com Mon May 14 17:49:22 2012 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Mon, 14 May 2012 17:49:22 -0500 Subject: [Swift-devel] Comparing Swift and Hadoop In-Reply-To: References:

Message-ID: I quite don't see how Hadoop is tightly coupled - it's data parallel and there isn't any message passing... On Mon, May 14, 2012 at 5:15 PM, Allan Espinosa wrote: > Also, the point of the workflow systems was to make loose-coupling of > apps to be parallelized easier. If one is thinking to invest on > tight-coupling (be it MPI or Hadoop), then it can be assumed that you > have specific optimizations in mind for your app. > > Allan > > 2012/5/14 Tim Armstrong : > > To be clear, I'm not making the case that it's impossible to implement > > things in Swift that are implemented in MapReduce, just that Swift isn't > > well suited to them, because it wasn't designed with them in mind. I've > > seen the argument before that MapReduce is a particular data flow DAG, > and > > that you can express arbitrary data flow DAGs in other systems, but I > think > > that somewhat misses the point of what MapReduce is trying to provide to > > application developers. By treating all tasks and data dependencies as > > equivalent, it ignores all of the runtime infrastructure that MapReduce > > inserts into the processes, and ignores, for example, some of the > details of > > how data is moved between mappers and reducers. > > > > For example, a substantial amount of code in the Hadoop MapReduce code > base > > has to do with a) file formats b) compression c) checksums d) > serialization > > e) buffering input and output data and f) bucketing/sorting the data. > This > > is all difficult to implement well and important for many big data > > applications. I think that scientific workflow systems don't take any of > > these things seriously since it isn't important for most canonical > > scientific workflow applications. > > > > I think one of the other big differences is that Hadoop assumes that all > you > > have are a bunch of unreliable machines on a network, so that it must > > provide its own a job scheduler and replicated distributed file system. > > Swift, in contrast, seems mostly designed for systems where there is a > > reliable shared file system, and where it acquires compute resources for > a > > fixed blocks of time from some existing cluster manager. I know there > are > > ways you can have Swift/Coaster/Falkon run on networks of unreliable > > machines, but it's not quite like Hadoop's job scheduler which is > designed > > to actually be the primary submission mechanism for a multi-user cluster. > > > > I don't think it would make much sense to run Swift on a network of > > unreliable machines and then to just leave your data on those machines > (you > > would normally stage the final data to some backed-up file system), but > it > > would make perfect sense for Hadoop, especially if the data is so big > that > > it's difficult to find someplace else to put it. In contrast, you can > > certainly stand up a Hadoop instance on a shared cluster for a few hours > to > > run your jobs, and stage data in and out of HDFS, but that use case isn't > > what Hadoop was designed or optimized for. Most of the core developers on > > Hadoop are working in environments where they have devoted Hadoop > clusters, > > where they can't afford much cluster downtime and where they need to > > reliably persist huge amounts of data for years on unreliable hardware. > > E.g. at the extreme end, this is the kind of thing Hadoop developers are > > thinking about: > > > https://www.facebook.com/notes/paul-yang/moving-an-elephant-large-scale-hadoop-data-migration-at-facebook/10150246275318920 > > > > - Tim > > > > > > > > On Sun, May 13, 2012 at 3:57 PM, Ioan Raicu wrote: > >> > >> Hi Tim, > >> I always thought of MapReduce being a subset of workflow systems. Can > you > >> give me an example of an application that can be implemented in > MapReduce, > >> but not a workflow system such as Swift? I can't think of any off the > top of > >> my head. > >> > >> > >> Ioan > >> > >> -- > >> ================================================================= > >> Ioan Raicu, Ph.D. > >> Assistant Professor > >> ================================================================= > >> Computer Science Department > >> Illinois Institute of Technology > >> 10 W. 31st Street Chicago, IL 60616 > >> ================================================================= > >> Cel: 1-847-722-0876 > >> Email: iraicu at cs.iit.edu > >> Web: http://www.cs.iit.edu/~iraicu/ > >> ================================================================= > >> ================================================================= > >> > >> > >> > >> On May 13, 2012, at 1:09 PM, Tim Armstrong > >> wrote: > >> > >> I've worked on both Swift and Hadoop implementations and my tendency is > to > >> say that there isn't actually any deep similarity beyond them both > >> supporting distributed data processing/computation. They both make > >> fundamentally different assumptions about the clusters they run on and > the > >> applications they're supporting. > >> > >> Swift is mainly designed for time-shared clusters with reliable shared > >> file systems. Hadoop assumes that it will be running on unreliable > commodity > >> machines with no shared file system, and will be running continuously > on all > >> machines on the cluster. Swift is designed for orchestrating existing > >> executables with their own file formats, so mostly remains agnostic to > the > >> contents of the files it is processing. Hadoop needs to have some > >> understanding of the contents of the files it is processing, to be able > to > >> segment them into records and perform key comparisons so it can do a > >> distributed sort, etc. It provides its own file formats (including > >> compression, serialization, etc) that users can use, although is > extensible > >> to custom file formats. > >> > >> Hadoop implements its own distributed file-system with software > >> redundancy, Swift uses an existing cluster filesystem or node-local file > >> systems. For bulk data processing, this means Hadoop will generally be > able > >> to deliver more disk bandwidth and has a bunch of other implications. > >> Hadoop has a record-oriented view of the world, i.e. it is built around > >> the idea that you are processing a record at at time, rather than a > file at > >> a time as in Swift > >> As a result, Hadoop includes a bunch of functionality to do with file > >> formats, compression, serialization etc: Swift is B.Y.O. file format > >> Hadoop's distributed sort is a core part of the MapReduce (and something > >> that a lot of effort has gone into implementing and optimizing), Swift > >> doesn't have built-in support for anything similar > >> Swift lets you construct arbitrary dataflow graphs between tasks, so in > >> some ways is less restrictive than the map-reduce pattern (although it > >> doesn't directly support some things that the map-reduce pattern does, > so I > >> wouldn't say that it is strictly more general) > >> > >> I'd say that some applications might fit in both paradigms, but that > >> neither supports a superset of the applications that the other supports. > >> Performance would depend to a large extent on the application. Swift > might > >> actually be quicker to start up a job and dispatch tasks (Hadoop is > >> notoriously slow on that front), but otherwise I'd say it just depends > on > >> the application, how you implement the application, the cluster, etc. > I'm > >> not sure that there is a fair comparison between the two systems since > >> they're just very different: most of the results would be predictable > just > >> be looking at the design of the system (e.g. if the application needs > to do > >> a big distributed sort, Hadoop is much better) . If the application is > >> embarrassingly parallel (like it sounds like your application is), then > you > >> could probably implement it in either, but I'm not sure that it would > >> actually stress the differences between the systems if data sizes are > small > >> and runtime is mostly dominated by computation. > >> > >> I think the Cloudera Hadoop distribution is well documented reasonably > >> easy to set up and run, provided that you're not on a time-shared > cluster. > >> Apache Hadoop is more of a pain to get working. > >> > >> - Tim > >> > >> > >> On Sun, May 13, 2012 at 9:27 AM, Ketan Maheshwari > >> wrote: > >>> > >>> Hi, > >>> > >>> We are working on a project from GE Energy corporation which runs > >>> independent MonteCarlo simulations in order to find device reliability > >>> leading to a power grid wise device replacement decisions. The > computation > >>> is repeated MC simulations done in parallel. > >>> > >>> Currently, this is running under Hadoop setup on Cornell Redcloud and > EC2 > >>> (10 nodes). Looking at the computation, it struck me this is a good > Swift > >>> candidate. And since the performance numbers etc are already extracted > for > >>> Hadoop, it might also be nice to have a comparison between Swift and > Hadoop. > >>> > >>> However, some reality check before diving in: has it been done before? > Do > >>> we know how Swift fares against map-reduce? Are they even comparable? > I have > >>> faced this question twice here: Why use Swift when you have Hadoop? > >>> > >>> I could see Hadoop needs quite a bit of setup effort before getting it > to > >>> run. Could we quantify usability and compare the two? > >>> > >>> Any ideas and inputs are welcome. > >>> > >>> Regards, > >>> -- > >>> Ketan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From foster at anl.gov Mon May 14 17:50:57 2012 From: foster at anl.gov (Ian Foster) Date: Mon, 14 May 2012 17:50:57 -0500 Subject: [Swift-devel] Comparing Swift and Hadoop In-Reply-To: References:

Message-ID: <2DCCA0B5-D26F-4D6A-BD35-47AA44A44913@anl.gov> I think that Allan means in the sense that you can't directly compose two executables that communicate only via files, as you can do with Swift On May 14, 2012, at 5:49 PM, Tim Armstrong wrote: > I quite don't see how Hadoop is tightly coupled - it's data parallel and there isn't any message passing... > > On Mon, May 14, 2012 at 5:15 PM, Allan Espinosa wrote: > Also, the point of the workflow systems was to make loose-coupling of > apps to be parallelized easier. If one is thinking to invest on > tight-coupling (be it MPI or Hadoop), then it can be assumed that you > have specific optimizations in mind for your app. > > Allan > > 2012/5/14 Tim Armstrong : > > To be clear, I'm not making the case that it's impossible to implement > > things in Swift that are implemented in MapReduce, just that Swift isn't > > well suited to them, because it wasn't designed with them in mind. I've > > seen the argument before that MapReduce is a particular data flow DAG, and > > that you can express arbitrary data flow DAGs in other systems, but I think > > that somewhat misses the point of what MapReduce is trying to provide to > > application developers. By treating all tasks and data dependencies as > > equivalent, it ignores all of the runtime infrastructure that MapReduce > > inserts into the processes, and ignores, for example, some of the details of > > how data is moved between mappers and reducers. > > > > For example, a substantial amount of code in the Hadoop MapReduce code base > > has to do with a) file formats b) compression c) checksums d) serialization > > e) buffering input and output data and f) bucketing/sorting the data. This > > is all difficult to implement well and important for many big data > > applications. I think that scientific workflow systems don't take any of > > these things seriously since it isn't important for most canonical > > scientific workflow applications. > > > > I think one of the other big differences is that Hadoop assumes that all you > > have are a bunch of unreliable machines on a network, so that it must > > provide its own a job scheduler and replicated distributed file system. > > Swift, in contrast, seems mostly designed for systems where there is a > > reliable shared file system, and where it acquires compute resources for a > > fixed blocks of time from some existing cluster manager. I know there are > > ways you can have Swift/Coaster/Falkon run on networks of unreliable > > machines, but it's not quite like Hadoop's job scheduler which is designed > > to actually be the primary submission mechanism for a multi-user cluster. > > > > I don't think it would make much sense to run Swift on a network of > > unreliable machines and then to just leave your data on those machines (you > > would normally stage the final data to some backed-up file system), but it > > would make perfect sense for Hadoop, especially if the data is so big that > > it's difficult to find someplace else to put it. In contrast, you can > > certainly stand up a Hadoop instance on a shared cluster for a few hours to > > run your jobs, and stage data in and out of HDFS, but that use case isn't > > what Hadoop was designed or optimized for. Most of the core developers on > > Hadoop are working in environments where they have devoted Hadoop clusters, > > where they can't afford much cluster downtime and where they need to > > reliably persist huge amounts of data for years on unreliable hardware. > > E.g. at the extreme end, this is the kind of thing Hadoop developers are > > thinking about: > > https://www.facebook.com/notes/paul-yang/moving-an-elephant-large-scale-hadoop-data-migration-at-facebook/10150246275318920 > > > > - Tim > > > > > > > > On Sun, May 13, 2012 at 3:57 PM, Ioan Raicu wrote: > >> > >> Hi Tim, > >> I always thought of MapReduce being a subset of workflow systems. Can you > >> give me an example of an application that can be implemented in MapReduce, > >> but not a workflow system such as Swift? I can't think of any off the top of > >> my head. > >> > >> > >> Ioan > >> > >> -- > >> ================================================================= > >> Ioan Raicu, Ph.D. > >> Assistant Professor > >> ================================================================= > >> Computer Science Department > >> Illinois Institute of Technology > >> 10 W. 31st Street Chicago, IL 60616 > >> ================================================================= > >> Cel: 1-847-722-0876 > >> Email: iraicu at cs.iit.edu > >> Web: http://www.cs.iit.edu/~iraicu/ > >> ================================================================= > >> ================================================================= > >> > >> > >> > >> On May 13, 2012, at 1:09 PM, Tim Armstrong > >> wrote: > >> > >> I've worked on both Swift and Hadoop implementations and my tendency is to > >> say that there isn't actually any deep similarity beyond them both > >> supporting distributed data processing/computation. They both make > >> fundamentally different assumptions about the clusters they run on and the > >> applications they're supporting. > >> > >> Swift is mainly designed for time-shared clusters with reliable shared > >> file systems. Hadoop assumes that it will be running on unreliable commodity > >> machines with no shared file system, and will be running continuously on all > >> machines on the cluster. Swift is designed for orchestrating existing > >> executables with their own file formats, so mostly remains agnostic to the > >> contents of the files it is processing. Hadoop needs to have some > >> understanding of the contents of the files it is processing, to be able to > >> segment them into records and perform key comparisons so it can do a > >> distributed sort, etc. It provides its own file formats (including > >> compression, serialization, etc) that users can use, although is extensible > >> to custom file formats. > >> > >> Hadoop implements its own distributed file-system with software > >> redundancy, Swift uses an existing cluster filesystem or node-local file > >> systems. For bulk data processing, this means Hadoop will generally be able > >> to deliver more disk bandwidth and has a bunch of other implications. > >> Hadoop has a record-oriented view of the world, i.e. it is built around > >> the idea that you are processing a record at at time, rather than a file at > >> a time as in Swift > >> As a result, Hadoop includes a bunch of functionality to do with file > >> formats, compression, serialization etc: Swift is B.Y.O. file format > >> Hadoop's distributed sort is a core part of the MapReduce (and something > >> that a lot of effort has gone into implementing and optimizing), Swift > >> doesn't have built-in support for anything similar > >> Swift lets you construct arbitrary dataflow graphs between tasks, so in > >> some ways is less restrictive than the map-reduce pattern (although it > >> doesn't directly support some things that the map-reduce pattern does, so I > >> wouldn't say that it is strictly more general) > >> > >> I'd say that some applications might fit in both paradigms, but that > >> neither supports a superset of the applications that the other supports. > >> Performance would depend to a large extent on the application. Swift might > >> actually be quicker to start up a job and dispatch tasks (Hadoop is > >> notoriously slow on that front), but otherwise I'd say it just depends on > >> the application, how you implement the application, the cluster, etc. I'm > >> not sure that there is a fair comparison between the two systems since > >> they're just very different: most of the results would be predictable just > >> be looking at the design of the system (e.g. if the application needs to do > >> a big distributed sort, Hadoop is much better) . If the application is > >> embarrassingly parallel (like it sounds like your application is), then you > >> could probably implement it in either, but I'm not sure that it would > >> actually stress the differences between the systems if data sizes are small > >> and runtime is mostly dominated by computation. > >> > >> I think the Cloudera Hadoop distribution is well documented reasonably > >> easy to set up and run, provided that you're not on a time-shared cluster. > >> Apache Hadoop is more of a pain to get working. > >> > >> - Tim > >> > >> > >> On Sun, May 13, 2012 at 9:27 AM, Ketan Maheshwari > >> wrote: > >>> > >>> Hi, > >>> > >>> We are working on a project from GE Energy corporation which runs > >>> independent MonteCarlo simulations in order to find device reliability > >>> leading to a power grid wise device replacement decisions. The computation > >>> is repeated MC simulations done in parallel. > >>> > >>> Currently, this is running under Hadoop setup on Cornell Redcloud and EC2 > >>> (10 nodes). Looking at the computation, it struck me this is a good Swift > >>> candidate. And since the performance numbers etc are already extracted for > >>> Hadoop, it might also be nice to have a comparison between Swift and Hadoop. > >>> > >>> However, some reality check before diving in: has it been done before? Do > >>> we know how Swift fares against map-reduce? Are they even comparable? I have > >>> faced this question twice here: Why use Swift when you have Hadoop? > >>> > >>> I could see Hadoop needs quite a bit of setup effort before getting it to > >>> run. Could we quantify usability and compare the two? > >>> > >>> Any ideas and inputs are welcome. > >>> > >>> Regards, > >>> -- > >>> Ketan > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Mon May 14 19:54:03 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 14 May 2012 20:54:03 -0400 Subject: [Swift-devel] Comparing Swift and Hadoop In-Reply-To: References:

Message-ID: Tim, >From your description and my limited experience (~3 weeks) with Hadoop, I want to say that the differences between Hadoop and Swift are really "soft" ones. I have a feeling that since MapReduce happened to be used for internet scale/style reliability, Hadoop developers developed tools that you described (compression, checksum, serialization, etc.) around it. I want to think that Swift is in a sense superset of Hadoop or Hadoop+ in that it essentially provides the same or similar functionality as one would expect out of Hadoop with an added advantage of having an ability to express the computation as a chained stages. I do not really think the argument of running either on reliable or unreliable systems really holds, since, Swift could be easily adapted to unreliable systems by building functionalities (eg. data replication) around it. In another sense, I want to think Hadoop and Swift as tools solving same class of problems with a huge overlap between them among functionalities and only the extra 'muscles' bit which makes them different. >From a user's point of view, I still think Hadoop is difficult to setup and work with on medium sized applications (tens to hundreds of tasks). In terms of application performance, I want to think, it depends on how good a job one does tuning Hadoop and/or Swift for the application and infrastructure at hand. This particular thing, I am in the process of doing and soon will come up with some concrete numbers. Regards, Ketan On Mon, May 14, 2012 at 5:15 PM, Tim Armstrong wrote: > To be clear, I'm not making the case that it's *impossible* to implement > things in Swift that are implemented in MapReduce, just that Swift isn't > well suited to them, because it wasn't designed with them in mind. I've > seen the argument before that MapReduce is a particular data flow DAG, and > that you can express arbitrary data flow DAGs in other systems, but I think > that somewhat misses the point of what MapReduce is trying to provide to > application developers. By treating all tasks and data dependencies as > equivalent, it ignores all of the runtime infrastructure that MapReduce > inserts into the processes, and ignores, for example, some of the details > of how data is moved between mappers and reducers. > > For example, a substantial amount of code in the Hadoop MapReduce code > base has to do with a) file formats b) compression c) checksums d) > serialization e) buffering input and output data and f) bucketing/sorting > the data. This is all difficult to implement well and important for many > big data applications. I think that scientific workflow systems don't take > any of these things seriously since it isn't important for most canonical > scientific workflow applications. > > I think one of the other big differences is that Hadoop assumes that all > you have are a bunch of unreliable machines on a network, so that it must > provide its own a job scheduler and replicated distributed file system. > Swift, in contrast, seems mostly designed for systems where there is a > reliable shared file system, and where it acquires compute resources for a > fixed blocks of time from some existing cluster manager. I know there are > ways you can have Swift/Coaster/Falkon run on networks of unreliable > machines, but it's not quite like Hadoop's job scheduler which is designed > to actually be the primary submission mechanism for a multi-user cluster. > > I don't think it would make much sense to run Swift on a network of > unreliable machines and then to just leave your data on those machines (you > would normally stage the final data to some backed-up file system), but it > would make perfect sense for Hadoop, especially if the data is so big that > it's difficult to find someplace else to put it. In contrast, you can > certainly stand up a Hadoop instance on a shared cluster for a few hours to > run your jobs, and stage data in and out of HDFS, but that use case isn't > what Hadoop was designed or optimized for. Most of the core developers on > Hadoop are working in environments where they have devoted Hadoop clusters, > where they can't afford much cluster downtime and where they need to > reliably persist huge amounts of data for years on unreliable hardware. > E.g. at the extreme end, this is the kind of thing Hadoop developers are > thinking about: > https://www.facebook.com/notes/paul-yang/moving-an-elephant-large-scale-hadoop-data-migration-at-facebook/10150246275318920 > > - Tim > > > > On Sun, May 13, 2012 at 3:57 PM, Ioan Raicu wrote: > >> Hi Tim, >> I always thought of MapReduce being a subset of workflow systems. Can you >> give me an example of an application that can be implemented in MapReduce, >> but not a workflow system such as Swift? I can't think of any off the top >> of my head. >> >> >> Ioan >> >> -- >> ================================================================= >> Ioan Raicu, Ph.D. >> Assistant Professor >> ================================================================= >> Computer Science Department >> Illinois Institute of Technology >> 10 W. 31st Street Chicago, IL 60616 >> ================================================================= >> Cel: 1-847-722-0876 >> Email: iraicu at cs.iit.edu >> Web: http://www.cs.iit.edu/~iraicu/ >> ================================================================= >> ================================================================= >> >> >> >> On May 13, 2012, at 1:09 PM, Tim Armstrong >> wrote: >> >> I've worked on both Swift and Hadoop implementations and my tendency is >> to say that there isn't actually any deep similarity beyond them both >> supporting distributed data processing/computation. They both make >> fundamentally different assumptions about the clusters they run on and the >> applications they're supporting. >> >> Swift is mainly designed for time-shared clusters with reliable shared >> file systems. Hadoop assumes that it will be running on unreliable >> commodity machines with no shared file system, and will be running >> continuously on all machines on the cluster. Swift is designed for >> orchestrating existing executables with their own file formats, so mostly >> remains agnostic to the contents of the files it is processing. Hadoop >> needs to have some understanding of the contents of the files it is >> processing, to be able to segment them into records and perform key >> comparisons so it can do a distributed sort, etc. It provides its own file >> formats (including compression, serialization, etc) that users can use, >> although is extensible to custom file formats. >> >> - Hadoop implements its own distributed file-system with software >> redundancy, Swift uses an existing cluster filesystem or node-local file >> systems. For bulk data processing, this means Hadoop will generally be >> able to deliver more disk bandwidth and has a bunch of other implications. >> - Hadoop has a record-oriented view of the world, i.e. it is built >> around the idea that you are processing a record at at time, rather than a >> file at a time as in Swift >> - As a result, Hadoop includes a bunch of functionality to do with >> file formats, compression, serialization etc: Swift is B.Y.O. file format >> - Hadoop's distributed sort is a core part of the MapReduce (and >> something that a lot of effort has gone into implementing and optimizing), >> Swift doesn't have built-in support for anything similar >> - Swift lets you construct arbitrary dataflow graphs between tasks, >> so in some ways is less restrictive than the map-reduce pattern (although >> it doesn't directly support some things that the map-reduce pattern does, >> so I wouldn't say that it is strictly more general) >> >> I'd say that some applications might fit in both paradigms, but that >> neither supports a superset of the applications that the other supports. >> Performance would depend to a large extent on the application. Swift might >> actually be quicker to start up a job and dispatch tasks (Hadoop is >> notoriously slow on that front), but otherwise I'd say it just depends on >> the application, how you implement the application, the cluster, etc. I'm >> not sure that there is a fair comparison between the two systems since >> they're just very different: most of the results would be predictable just >> be looking at the design of the system (e.g. if the application needs to do >> a big distributed sort, Hadoop is much better) . If the application is >> embarrassingly parallel (like it sounds like your application is), then you >> could probably implement it in either, but I'm not sure that it would >> actually stress the differences between the systems if data sizes are small >> and runtime is mostly dominated by computation. >> I think the Cloudera Hadoop distribution is well documented reasonably >> easy to set up and run, provided that you're not on a time-shared cluster. >> Apache Hadoop is more of a pain to get working. >> >> - Tim >> >> >> On Sun, May 13, 2012 at 9:27 AM, Ketan Maheshwari < >> ketancmaheshwari at gmail.com> wrote: >> >>> Hi, >>> >>> We are working on a project from GE Energy corporation which runs >>> independent MonteCarlo simulations in order to find device reliability >>> leading to a power grid wise device replacement decisions. The computation >>> is repeated MC simulations done in parallel. >>> >>> Currently, this is running under Hadoop setup on Cornell Redcloud and >>> EC2 (10 nodes). Looking at the computation, it struck me this is a good >>> Swift candidate. And since the performance numbers etc are already >>> extracted for Hadoop, it might also be nice to have a comparison between >>> Swift and Hadoop. >>> >>> However, some reality check before diving in: has it been done before? >>> Do we know how Swift fares against map-reduce? Are they even comparable? I >>> have faced this question twice here: Why use Swift when you have Hadoop? >>> >>> I could see Hadoop needs quite a bit of setup effort before getting it >>> to run. Could we quantify usability and compare the two? >>> >>> Any ideas and inputs are welcome. >>> >>> Regards, >>> -- >>> Ketan >>> >>> >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >>> >>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> >> > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.iit.edu Sat May 19 08:47:38 2012 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Sat, 19 May 2012 08:47:38 -0500 Subject: [Swift-devel] Call for Participation: ACM HPDC 2012 -- Early registration deadline May 25th Message-ID: <4FB7A47A.7020109@cs.iit.edu> Call for Participation http://www.hpdc.org/2012/ The organizing committee is delighted to invite you to *HPDC'12*, the /21st International ACM Symposium on High-Performance Parallel and Distributed Computing/, to be held in *Delft, the Netherlands*, which is a historic, picturesque city that is less than one hour away from Amsterdam-Schiphol airport. HPDC is the premier annual conference on the design, the implementation, the evaluation, and the use of parallel and distributed systems for high-end computing. HPDC is sponsored by SIGARCH, the Special Interest Group on Computer Architecture of the Association for Computing Machinery . *HPDC'12* will be held at Delft University of Technology , with the main conference taking place on *June 20-22* (Wednesday to Friday 1 PM), and with affiliated workshops on *June 18-19* (Monday and Tuesday). Early registration closes on May 25th, so if you plan on attending, please register now at http://www.hpdc.org/2012/registration/. *Some highlights of the conference:* * *Awards:* o Achievement Award - Ian Foster of the University of Chicago and Argonne National Laboratory, USA * *Keynote Speakers:* o Mihai Budiu of Microsoft Research, Mountain View, USA. Title: Putting "Big-data" to Good Use: Building Kinect o Ricardo Bianchini of Rutgers University, USA. Title: "Leveraging Renewable Energy in Data Centers: Present and Future" * *Accepted Papers:* 1. vSlicer: Latency-aware Virtual Machine Scheduling via Differentiated-frequency CPU Slicing, Cong Xu (Purdue University), Sahan Gamage (Purdue University), Pawan N. Rao (Purdue University), Ardalan Kangarlou (NetApp), Ramana Kompella (Purdue University), Dongyan Xu (Purdue University) 2. Singleton: System-wide Page Deduplication in Virtual Environments, Prateek Sharma, Purushottam Kulkarni (IIT Bombay) 3. Locality-aware Dynamic VM Reconfiguration on MapReduce Clouds, Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng (KAIST) 4. Achieving Application-Centric Performance Targets via Consolidation on Multicores: Myth or Reality?, Lydia Y. Chen Chen (IBM Research Zurich Lab), Danilo Ansaloni (University of Lugano), Evgenia Smirni (College of William and Mary), Akira Yokokawa (University of Lugano), Walter Binder (University of Lugano) 5. Enabling Event Tracing at Leadership-Class Scale through I/O Forwarding Middleware, Thomas Ilsche (Technische Universit?t Dresden), Joseph Schuchart (Technische Universit?t Dresden), Jason Cope (Argonne National Laboratory), Dries Kimpe (Argonne National Laboratory), Terry Jones (Oak Ridge National Laboratory), Andreas Kn?pfer (Technische Universit?t Dresden), Kamil Iskra (Argonne National Laboratory), Robert Ross (Argonne National Laboratory), Wolfgang E. Nagel (Technische Universit?t Dresden), Stephen Poole (Oak Ridge National Laboratory) 6. ISOBAR Hybrid Compression-I/O Interleaving for Large-scale Parallel I/O Optimization, Eric R. Schendel (North Carolina State University), Saurabh V. Pendse (North Carolina State University), John Jenkins (North Carolina State University), David A. Boyuka (North Carolina State University), Zhenhuan Gong (North Carolina State University), Sriram Lakshminarasimhan (North Carolina State University), Qing Liu (Oak Ridge National Laboratory), Scott Klasky (Oak Ridge National Laboratory), Robert Ross (Argonne National Laboratory), Nagiza F. Samatova (North Carolina State University) 7. QBox: Guaranteeing I/O Performance on Black Box Storage Systems, Dimitris Skourtis, Shinpei Kato, Scott Brandt (University of California, Santa Cruz) 8. Towards Efficient Live Migration of I/O Intensive Workloads: A Transparent Storage Transfer Propo, Bogdan Nicolae (INRIA), Franck Cappello (INRIA/UIUC) 9. A Virtual Memory Based Runtime to Support Multi-tenancy in Clusters with GPUs, Michela Becchi (University of Missouri), Kittisak Sajjapongse (University of Missouri), Ian Graves (University of Missouri), Adam Procter (University of Missouri), Vignesh Ravi (Ohio State University), Srimat Chakradhar (NEC Laboratories America) 10. Interference-driven Scheduling and Resource Management for GPU-based Heterogeneous Clusters, Rajat Phull, Cheng-Hong Li, Kunal Rao, Hari Cadambi, Srimat Chakradhar (NEC Laboratories America) 11. Work Stealing and Persistence-based Load Balancers for Iterative Overdecomposed Applications, Jonathan Lifflander (UIUC), Sriram Krishnamoorthy (PNNL), Laxmikant V. Kale (UIUC) 12. Highly Scalable Graph Search for the Graph500 Benchmark, Koji Ueno (Tokyo Institute of Technology/JST CREST), Toyotaro Suzumura (Tokyo Institute of Technology/IBM Research Tokyo/JST CREST) 13. PonD : Dynamic Creation of HTC Pool on Demand Using a Decentralized Resource Discovery System, Kyungyong Lee (University of Florida), David Wolinsky (Yale University), Renato Figueiredo (University of Florida) 14. SpeQuloS: A QoS Service for BoT Applications Using Best Effort Distributed Computing Infrastructures, Simon Delamare (INRIA), Gilles Fedak (INRIA), Derrick Kondo (INRIA), Oleg Lodygensky (IN2P3) 15. Understanding the Effects and Implications of Compute Node Related Failures in Hadoop, Florin Dinu, T. S. Eugene Ng (Rice University) 16. Optimizing MapReduce for GPUs with Effective Shared Memory Usage, Linchuan Chen, Gagan Agrawal (The Ohio State University) 17. CAM: A Topology Aware Minimum Cost Flow Based Resource Manager for MapReduce Applications in the Cloud, Min Li (Virginia Tech), Dinesh Subhraveti (IBM Almaden Research Center), Ali Butt (Virginia Tech), Aleksandr Khasymski (Virginia Tech), Prasenjit Sarkar (IBM Almaden Research Center) 18. Distributed Approximate Spectral Clustering for Large-Scale Datasets, Fei Gao (Simon Fraser University), Wael Abd-Almageed (University of Maryland) 19. Exploring Cross-layer Power Management for PGAS Applications on the SCC Platform, Marc Gamell (Rutgers University), Ivan Rodero (Rutgers University), Manish Parashar (Rutgers University), Rajeev Muralidhar (Intel India) 20. Dynamic Adaptive Virtual Core Mapping to Improve Power, Energy, and Performance in Multi-socket Multicores, Chang Bae (Northwestern University), Lei Xia (Northwestern University), Peter Dinda (Northwestern University), John Lange (University of Pittsburgh) 21. VNET/P: Bridging the Cloud and High Performance Computing Through Fast Overlay Networking, Lei Xia (Northwestern University), Zheng Cui (University of New Mexico), John Lange (University of Pittsburgh), Yuan Tang (UESTC, China), Peter Dinda (Northwestern University), Patrick Bridges (University of New Mexico) 22. Massively-Parallel Stream Processing under QoS Constraints with Nephele, Bj?rn Lohrmann, Daniel Warneke, Odej Kao (Technische Universit?t Berlin) 23. A Resiliency Model for High Performance Infrastructure Based on Logical Encapsulation, James Moore (The University of Southern California/EMC Corporation), Carl Kesselman (The University of Southern California) * *Workshops:* o Astro-HPC: Workshop on High-Performance Computing for Astronomy, Ana Lucia Varbanescu, Rob van Nieuwpoort, and Simon Portegies Zwart o ECMLS: 3rd Int'l Emerging Computational Methods for the Life Sciences Workshop, Carole Goble, Judy Qiu, and Ian Foster o ScienceCloud: 3rd Workshop on Scientific Cloud Computing, Yogesh Simmhan, Gabriel Antoniu, and Carole Goble o DIDC: Fifth Int'l Workshop on Data-Intensive Distributed Computing, Tevfik Kosar and Douglas Thain o MapReduce: The Third Int'l Workshop on MapReduce and its Applications, Gilles Fedak and Geoffrey Fox o VTDC: 6th Int'l Workshop on Virtualization Technologies in Distributed Computing, Fr?d?ric Desprez and Adrien L?bre For more information on the full program, see http://www.hpdc.org/2012/program/conference-program/. Looking forward to seeing you in Delft! Regards, Ioan Raicu -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ ================================================================= ================================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sat May 19 17:27:36 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 19 May 2012 15:27:36 -0700 Subject: [Swift-devel] osg sites Message-ID: <1337466456.12330.1.camel@blabla> So I'm trying to get OSG stuff going. I ran gen_gridsites and got a total of zero sites working. Here's the log http://www.ci.uchicago.edu/~hategan/osggs.log Any clues? Mihael From hategan at mcs.anl.gov Sat May 19 17:54:01 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 19 May 2012 15:54:01 -0700 Subject: [Swift-devel] osg sites In-Reply-To: <1337466456.12330.1.camel@blabla> References: <1337466456.12330.1.camel@blabla> Message-ID: <1337468041.12558.0.camel@blabla> Nevermind. It needed GLOBUS_HOSTNAME and TCP_PORT_RANGE. On Sat, 2012-05-19 at 15:27 -0700, Mihael Hategan wrote: > So I'm trying to get OSG stuff going. I ran gen_gridsites and got a > total of zero sites working. Here's the log > http://www.ci.uchicago.edu/~hategan/osggs.log > > Any clues? > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Sun May 20 14:41:13 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 20 May 2012 12:41:13 -0700 Subject: [Swift-devel] osg sites In-Reply-To: <1337468041.12558.0.camel@blabla> References: <1337466456.12330.1.camel@blabla> <1337468041.12558.0.camel@blabla> Message-ID: <1337542873.9255.2.camel@blabla> I'm trying the swift-workers and related scripts in trunk. Are those supposed to work? For one, swift-workers has "input = worker.pl" which seems to indicate that worker.pl is sent to the condor job through stdin, but in run-worker.sh I see the "cat >worker.pl" line commented out. Mihael From davidk at ci.uchicago.edu Sun May 20 15:41:10 2012 From: davidk at ci.uchicago.edu (David Kelly) Date: Sun, 20 May 2012 15:41:10 -0500 (CDT) Subject: [Swift-devel] osg sites In-Reply-To: <1337542873.9255.2.camel@blabla> Message-ID: <1250687552.25972.1337546470311.JavaMail.root@zimbra-mb2.anl.gov> I would recommend trying the GlideinWMS scripts rather than the GRAM scripts. run-gwms-workers should work as far as I know (you may need to chmod +x it). The setup I was using to test provider staging on OSG are at /scratch3/davidk/coaster-stress-tests on engage-submit3. David ----- Original Message ----- > From: "Mihael Hategan" > To: "Swift Devel" > Sent: Sunday, May 20, 2012 2:41:13 PM > Subject: Re: [Swift-devel] osg sites > I'm trying the swift-workers and related scripts in trunk. Are those > supposed to work? > > For one, swift-workers has "input = worker.pl" which seems to indicate > that worker.pl is sent to the condor job through stdin, but in > run-worker.sh I see the "cat >worker.pl" line commented out. > > Mihael > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Sun May 20 18:27:07 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 20 May 2012 16:27:07 -0700 Subject: [Swift-devel] coaster service runtime info table Message-ID: <1337556427.15409.1.camel@blabla> I added an option (-stats) to the coaster service to show various runtime info. It's, I think, more useful than seeing a bunch of nonsensical logging messages. Mihael -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot from 2012-05-20 16:24:48.png Type: image/png Size: 6489 bytes Desc: not available URL: From jonmon at mcs.anl.gov Sun May 20 21:04:03 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Sun, 20 May 2012 21:04:03 -0500 Subject: [Swift-devel] coaster service runtime info table In-Reply-To: <1337556427.15409.1.camel@blabla> References: <1337556427.15409.1.camel@blabla> Message-ID: <851CF569-0F49-47E9-AFF3-795F965B9431@mcs.anl.gov> Is this an option we specify when starting a coaster service? If so does that screen take over the terminal and we need to run swift in a separate terminal or does that replace the swift stdout/stderr output? On May 20, 2012, at 18:27, Mihael Hategan wrote: > I added an option (-stats) to the coaster service to show various > runtime info. It's, I think, more useful than seeing a bunch of > nonsensical logging messages. > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Sun May 20 21:23:17 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 20 May 2012 19:23:17 -0700 Subject: [Swift-devel] coaster service runtime info table In-Reply-To: <851CF569-0F49-47E9-AFF3-795F965B9431@mcs.anl.gov> References: <1337556427.15409.1.camel@blabla> <851CF569-0F49-47E9-AFF3-795F965B9431@mcs.anl.gov> Message-ID: <1337566997.19236.1.camel@blabla> On Sun, 2012-05-20 at 21:04 -0500, Jonathan Monette wrote: > Is this an option we specify when starting a coaster service? Yes (-stats) > If so does that screen take over the terminal and we need to run swift in a separate terminal or does that replace the swift stdout/stderr output? Yes. So if you want the old behavior just don't add that option to the service command line. In other words you don't have to do anything if you don't want to change anything. From hategan at mcs.anl.gov Sat May 26 18:23:32 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 26 May 2012 16:23:32 -0700 Subject: [Swift-devel] wrapper staging Message-ID: <1338074612.29839.2.camel@blabla> There has been some discussion in the past where we agreed that it might be a good idea to try a staging mechanism in which the wrapper (or some other entity on the worker node) does the staging. I added a skeleton for that in trunk. The relevant files are vdl-int-wrapper-staging.k and _swiftwrap.wrapperstaging. There is a basic implementation that recognizes "file://" URLs and does a cp for stage-ins and outs. Mihael From iraicu at cs.iit.edu Sun May 27 18:58:05 2012 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Sun, 27 May 2012 18:58:05 -0500 Subject: [Swift-devel] CFP: 8th IEEE International Conference on eScience -- Chicago IL USA, October 8th-12 2012 Message-ID: <4FC2BF8D.4060400@cs.iit.edu> CALL FOR PAPERS 8th IEEE International Conference on eScience http://www.ci.uchicago.edu/escience2012/ October 8-12, 2012 Chicago, IL, USA Researchers in all disciplines are increasingly adopting digital tools, techniques and practices, often in communities and projects that span disciplines, laboratories, organizations, and national boundaries. The eScience 2012 conference is designed to bring together leading international and interdisciplinary research communities, developers, and users of eScience applications and enabling IT technologies. The conference serves as a forum to present the results of the latest applications research and product/tool developments and to highlight related activities from around the world. Also, we are now entering the second decade of eScience and the 2012 conference gives an opportunity to take stock of what has been achieved so far and look forward to the challenges and opportunities the next decade will bring. A special emphasis of the 2012 conference is on advances in the application of technology in a particular discipline. Accordingly, significant advances in applications science and technology will be considered as important as the development of new technologies themselves. Further, we welcome contributions in educational activities under any of these disciplines. As a result, the conference will be structured around two e-Science tracks: ? eScience Algorithms and Applications ? eScience application areas, including: ? Physical sciences ? Biomedical sciences ? Social sciences and humanities ? Data-oriented approaches and applications ? Compute-oriented approaches and applications ? Extreme scale approaches and applications ? Cyberinfrastructure to support eScience ? Novel hardware ? Novel uses of production infrastructure ? Software and services ? Tools The conference proceedings will be published by the IEEE Computer Society Press, USA and will be made available online through the IEEE Digital Library. Selected papers will be invited to submit extended versions to a special issue of the Future Generation Computer Systems (FGCS)journal. SUBMISSION PROCESS Authors are invited to submit papers with unpublished, original work of not more than 8 pages of double column text using single spaced 10 point size on 8.5 x 11 inch pages, as per IEEE 8.5 x 11 manuscript guidelines. (Up to 2 additional pages may be purchased for US$150/page) Templates are available from http://www.ieee.org/conferences_events/conferences/publishing/templates.html. Authors should submit a PDF file that will print on a PostScript printer to https://www.easychair.org/conferences/?conf=escience2012 (Note that paper submitters also must submit an abstract in advance of the paper deadline. This should be done through the same site where papers are submitted.) It is a requirement that at least one author of each accepted paper attend the conference. IMPORTANT DATES Abstract submission (required): 4 July 2012 Paper submission: 11 July 2012 Paper author notification: 22 August 2012 Camera-ready papers due: 10 September 2012 Conference: 8-12 October 2012 CONFERENCE ORGANIZATION General Chair Ian Foster, University of Chicago & Argonne National Laboratory, USA Program Co-Chairs Daniel S. Katz, University of Chicago & Argonne National Laboratory, USA Heinz Stockinger, SIB Swiss Institute of Bioinformatics, Switzerland Program Vice Co-Chairs eScience Algorithms and Applications Track David Abramson, Monash University, Australia Gabrielle Allen, Louisiana State University, USA Cyberinfrastructure to support eScience Track Rosa M. Badia, Barcelona Supercomputing Center / CSIC, Spain Geoffrey Fox, Indiana University, USA Sponsorship Chair Charlie Catlett, Argonne National Laboratory, USA Conference Manager and Finance Chair Julie Wulf-Knoerzer, University of Chicago & Argonne National Laboratory, USA Publicity Chairs Kento Aida, National Institute of Informatics, Japan Ioan Raicu, Illinois Institute of Technology, USA David Wallom, Oxford e-Research Centre, UK Local Organizing Committee Ninfa Mayorga, University of Chicago, USA Evelyn Rayburn, University of Chicago, USA Lynn Valentini, Argonne National Laboratory, USA Program Committee eScience Algorithms and Applications Track Srinivas Aluru, Iowa State University, USA Ashiq Anjum, University of Derby, UK David A. Bader, Georgia Institute of Technology, USA Jon Blower, University of Reading, UK Paul Bonnington, Monash University, Australia Simon Cox, University of Southampton, UK David De Roure, Oxford e-Research Centre, UK George Djorgovski, California Institute of Technology, USA Anshu Dubey, University of Chicago & Argonne National Laboratory, USA Yuri Estrin, Monash University, Australia Dan Fay, Microsoft, USA Jeremy Frey, University of Southampton, UK Wolfgang Gentzsch, HPC Consultant, Germany Lutz Gross, The University of Queensland, Austrialia Sverker Holmgren, Uppsala University, Sweden Bill Howe, University of Washington, USA Marina Jirotka, University of Oxford, UK Timoleon Kipouros, University of Cambridge, UK Kerstin Kleese van Dam, Pacific Northwest National Laboratory, USA Arun S. Konagurthu, Monash University, Australia Peter Kunszt, SystemsX.ch, Switzerland Alexey Lastovetsky, University College Dublin, Ireland Andrew Lewis, Griffith University, Australia Sergio Maffioletti, University of Zurich, Switzerland Amitava Majumdar, San Diego Supercomputer Center, University of California at San Diego, USA Rui Mao, Shenzhen University, China Madhav V. Marathe, Virginia Tech, USA Maryann Martone, University of California at San Diego, USA Louis Moresi, Monash University, Australia Riccardo Murri, University of Zurich, Switzerland Silvia D. Olabarriaga, Academic Medical Center of the University of Amsterdam, Netherlands Enrique S. Quintana-Ort?, Universidad Jaume I, Spain Abani Patra, University at Buffalo, USA Rob Pennington, NSF, USA Andrew Perry, Monash University, Australia Beth Plale, Indiana University, USA Michael Resch, University of Stuttgart, Germany Adrian Sandu, Virginia Tech, USA Mark Savill, Cranfield University, UK Erik Schnetter, Perimeter Institute for Theoretical Physics, Canada Edward Seidel, Louisiana State University, USA Suzanne M. Shontz, The Pennsylvania State University, USA David Skinner, Lawrence Berkeley National Laboratory, USA Alan Sussman, University of Maryland, USA Alex Szalay, Johns Hopkins University, USA Domenico Talia, ICAR-CNR & University of Calabria, Italy Jian Tao, Louisiana State University, USA David Wallom, Oxford e-Research Centre, UK Shaowen Wang, University of Illinois at Urbana-Champaign, USA Michael Wilde, Argonne National Laboratory & University of Chicago, USA Nancy Wilkins-Diehr, San Diego Supercomputer Center, University of California at San Diego, USA Wu Zhang, Shanghai University, China Yunquan Zhang, Chinese Academy of Sciences, China Cyberinfrastructure to support eScience Track Deb Agarwal, Lawrence Berkeley National Laboratory, USA Ilkay Altintas, San Diego Supercomputer Center, University of California at San Diego, USA Henri Bal, Vrije Universiteit, Netherlands Roger Barga, Microsoft, USA Martin Berzins, University of Utah, USA John Brooke, University of Manchester, UK Thomas Fahringer, University of Innsbruck, Austria Gilles Fedak, INRIA, France Jos? A. B. Fortes, University of Florida, USA Yolanda Gil, ISI/USC, USA Madhusudhan Govindaraju, SUNY Binghamton, USA Thomas Hacker, Purdue University, USA Ken Hawick, Massey University, New Zealand Marty Humphrey, University of Virginia, USA Hai Jin, Huazhong University of Science and Technology, China Thilo Kielmann, Vrije Universiteit, Netherlands Scott Klasky, Oak Ridge National Laboratory, USA Isao Kojima, AIST, Japan Tevfik Kosar, University at Buffalo, USA Dieter Kranzlmueller, LMU & LRZ Munich, Germany Erwin Laure, KTH, Sweden Jysoo Lee, KISTI, Korea Li Xiaoming, Peking University, China Bertram Lud?scher, University of California, Davis, USA Andrew Lumsdaine, Indiana University, USA Tanu Malik, University of Chicago, USA Satoshi Matsuoka, Tokyo Institute of Technology, Japan Reagan Moore, University of North Carolina at Chapel Hill, USA Shirley Moore, University of Kentucky, USA Steven Newhouse, EGI, Netherlands Dhabaleswar K. (DK) Panda, The Ohio State University, USA Manish Parashar, Rutgers University, USA Ron Perrott, University of Oxford, UK Depei Qian, Beihang University, China Judy Qui, Indiana University, USA Ioan Raicu, Illinois Institute of Technology, USA Lavanya Ramakrishnan, Lawrence Berkeley National Laboratory, USA Omer Rana, Cardiff University, UK Paul Roe, Queensland University of Technology, Australia Bruno Schulze, LNCC, Brazil Marc Snir, Argonne National Laboratory & University of Illinois at Urbana-Champaign, USA Xian-He Sun, Illinois Institute of Technology, USA Yoshio Tanaka, AIST, Japan Michela Taufer, University of Delaware, USA Kerry Taylor, CSIRO, Australia Douglas Thain, University of Notre Dame, USA Paul Watson, Newcastle University, UK Jun Zhao, University of Oxford, UK -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ ================================================================= ================================================================= From davidk at ci.uchicago.edu Mon May 28 09:18:35 2012 From: davidk at ci.uchicago.edu (David Kelly) Date: Mon, 28 May 2012 09:18:35 -0500 (CDT) Subject: [Swift-devel] wrapper staging In-Reply-To: <1338074612.29839.2.camel@blabla> Message-ID: <529903439.57426.1338214715353.JavaMail.root@zimbra-mb2.anl.gov> Mihael, Do you happen to have a script/config example that I can look at? I think I have it working - I get the output file I expect, but I am seeing this error too: Caused by: null Caused by: org.globus.cog.karajan.workflow.KarajanRuntimeException: Illegal extra argument `cat' to vdl:checkjobstatus @ vdl-int-wrapper-staging.k, line: 389 at org.globus.cog.karajan.arguments.NameBindingVariableArguments.append(NameBindingVariableArguments.java:51) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.ret(AbstractFunction.java:47) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:27) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:98) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Execution failed: Illegal extra argument `cat' to vdl:checkjobstatus @ vdl-int-wrapper-staging.k, line: 389 ----- Original Message ----- > From: "Mihael Hategan" > To: "Swift Devel" > Sent: Saturday, May 26, 2012 6:23:32 PM > Subject: [Swift-devel] wrapper staging > There has been some discussion in the past where we agreed that it > might > be a good idea to try a staging mechanism in which the wrapper (or > some > other entity on the worker node) does the staging. > > I added a skeleton for that in trunk. The relevant files are > vdl-int-wrapper-staging.k > and _swiftwrap.wrapperstaging. > > There is a basic implementation that recognizes "file://" URLs and > does > a cp for stage-ins and outs. > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From ketancmaheshwari at gmail.com Mon May 28 20:11:31 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 28 May 2012 21:11:31 -0400 Subject: [Swift-devel] Comparing Swift and Hadoop In-Reply-To: References:

Message-ID: So, continuing this discussion further, I've been working on getting the application running under Swift on the Cornell's Redcloud infrastructure. It works and seems to be as fast as Hadoop but I have not yet done any measurements. Application has 2 stages: for n first stage (~map) instances there is one second (and final) stage (~reduce) instance. Currently running this app under Coaster setup from a local workstation; I see that there are unnecessary (from app point of view) data movements involved. In the first stage, n datasets are staged out to n cloud VMs to perform the computation which is followed by staging in of the results of this stage back to the submit host. This is followed by staging out of these results back to a single cloud VM to perform the final stage. Can we tell Swift to: 1. Do not stage back the data at the end of first stage but keep it on the respective VMs on cloud. 2. Predesignate a VM (with an IP) to perform the reduce step and as soon as first step is over, stage the result of this step to that predesignated VM. I think, if we have some way of marking an app as "map" or "reduce" kind and a way of telling Swift if we care for intermediate results, this could be easy to achieve. Wondering if there is some configuration that already does this today? I think 2 can be achieved with site and tc settings may be. Regards, Ketan On Mon, May 14, 2012 at 8:54 PM, Ketan Maheshwari < ketancmaheshwari at gmail.com> wrote: > Tim, > > From your description and my limited experience (~3 weeks) with Hadoop, I > want to say that the differences between Hadoop and Swift are really "soft" > ones. I have a feeling that since MapReduce happened to be used for > internet scale/style reliability, Hadoop developers developed tools that > you described (compression, checksum, serialization, etc.) around it. > > I want to think that Swift is in a sense superset of Hadoop or Hadoop+ in > that it essentially provides the same or similar functionality as one would > expect out of Hadoop with an added advantage of having an ability to > express the computation as a chained stages. > > I do not really think the argument of running either on reliable or > unreliable systems really holds, since, Swift could be easily adapted to > unreliable systems by building functionalities (eg. data replication) > around it. > > In another sense, I want to think Hadoop and Swift as tools solving same > class of problems with a huge overlap between them among functionalities > and only the extra 'muscles' bit which makes them different. > > From a user's point of view, I still think Hadoop is difficult to setup > and work with on medium sized applications (tens to hundreds of tasks). In > terms of application performance, I want to think, it depends on how good a > job one does tuning Hadoop and/or Swift for the application and > infrastructure at hand. This particular thing, I am in the process of > doing and soon will come up with some concrete numbers. > > Regards, > Ketan > > On Mon, May 14, 2012 at 5:15 PM, Tim Armstrong wrote: > >> To be clear, I'm not making the case that it's *impossible* to implement >> things in Swift that are implemented in MapReduce, just that Swift isn't >> well suited to them, because it wasn't designed with them in mind. I've >> seen the argument before that MapReduce is a particular data flow DAG, and >> that you can express arbitrary data flow DAGs in other systems, but I think >> that somewhat misses the point of what MapReduce is trying to provide to >> application developers. By treating all tasks and data dependencies as >> equivalent, it ignores all of the runtime infrastructure that MapReduce >> inserts into the processes, and ignores, for example, some of the details >> of how data is moved between mappers and reducers. >> >> For example, a substantial amount of code in the Hadoop MapReduce code >> base has to do with a) file formats b) compression c) checksums d) >> serialization e) buffering input and output data and f) bucketing/sorting >> the data. This is all difficult to implement well and important for many >> big data applications. I think that scientific workflow systems don't take >> any of these things seriously since it isn't important for most canonical >> scientific workflow applications. >> >> I think one of the other big differences is that Hadoop assumes that all >> you have are a bunch of unreliable machines on a network, so that it must >> provide its own a job scheduler and replicated distributed file system. >> Swift, in contrast, seems mostly designed for systems where there is a >> reliable shared file system, and where it acquires compute resources for a >> fixed blocks of time from some existing cluster manager. I know there are >> ways you can have Swift/Coaster/Falkon run on networks of unreliable >> machines, but it's not quite like Hadoop's job scheduler which is designed >> to actually be the primary submission mechanism for a multi-user cluster. >> >> I don't think it would make much sense to run Swift on a network of >> unreliable machines and then to just leave your data on those machines (you >> would normally stage the final data to some backed-up file system), but it >> would make perfect sense for Hadoop, especially if the data is so big that >> it's difficult to find someplace else to put it. In contrast, you can >> certainly stand up a Hadoop instance on a shared cluster for a few hours to >> run your jobs, and stage data in and out of HDFS, but that use case isn't >> what Hadoop was designed or optimized for. Most of the core developers on >> Hadoop are working in environments where they have devoted Hadoop clusters, >> where they can't afford much cluster downtime and where they need to >> reliably persist huge amounts of data for years on unreliable hardware. >> E.g. at the extreme end, this is the kind of thing Hadoop developers are >> thinking about: >> https://www.facebook.com/notes/paul-yang/moving-an-elephant-large-scale-hadoop-data-migration-at-facebook/10150246275318920 >> >> - Tim >> >> >> >> On Sun, May 13, 2012 at 3:57 PM, Ioan Raicu wrote: >> >>> Hi Tim, >>> I always thought of MapReduce being a subset of workflow systems. Can >>> you give me an example of an application that can be implemented in >>> MapReduce, but not a workflow system such as Swift? I can't think of any >>> off the top of my head. >>> >>> >>> Ioan >>> >>> -- >>> ================================================================= >>> Ioan Raicu, Ph.D. >>> Assistant Professor >>> ================================================================= >>> Computer Science Department >>> Illinois Institute of Technology >>> 10 W. 31st Street Chicago, IL 60616 >>> ================================================================= >>> Cel: 1-847-722-0876 >>> Email: iraicu at cs.iit.edu >>> Web: http://www.cs.iit.edu/~iraicu/ >>> ================================================================= >>> ================================================================= >>> >>> >>> >>> On May 13, 2012, at 1:09 PM, Tim Armstrong >>> wrote: >>> >>> I've worked on both Swift and Hadoop implementations and my tendency is >>> to say that there isn't actually any deep similarity beyond them both >>> supporting distributed data processing/computation. They both make >>> fundamentally different assumptions about the clusters they run on and the >>> applications they're supporting. >>> >>> Swift is mainly designed for time-shared clusters with reliable shared >>> file systems. Hadoop assumes that it will be running on unreliable >>> commodity machines with no shared file system, and will be running >>> continuously on all machines on the cluster. Swift is designed for >>> orchestrating existing executables with their own file formats, so mostly >>> remains agnostic to the contents of the files it is processing. Hadoop >>> needs to have some understanding of the contents of the files it is >>> processing, to be able to segment them into records and perform key >>> comparisons so it can do a distributed sort, etc. It provides its own file >>> formats (including compression, serialization, etc) that users can use, >>> although is extensible to custom file formats. >>> >>> - Hadoop implements its own distributed file-system with software >>> redundancy, Swift uses an existing cluster filesystem or node-local file >>> systems. For bulk data processing, this means Hadoop will generally be >>> able to deliver more disk bandwidth and has a bunch of other implications. >>> - Hadoop has a record-oriented view of the world, i.e. it is built >>> around the idea that you are processing a record at at time, rather than a >>> file at a time as in Swift >>> - As a result, Hadoop includes a bunch of functionality to do with >>> file formats, compression, serialization etc: Swift is B.Y.O. file format >>> - Hadoop's distributed sort is a core part of the MapReduce (and >>> something that a lot of effort has gone into implementing and optimizing), >>> Swift doesn't have built-in support for anything similar >>> - Swift lets you construct arbitrary dataflow graphs between tasks, >>> so in some ways is less restrictive than the map-reduce pattern (although >>> it doesn't directly support some things that the map-reduce pattern does, >>> so I wouldn't say that it is strictly more general) >>> >>> I'd say that some applications might fit in both paradigms, but that >>> neither supports a superset of the applications that the other supports. >>> Performance would depend to a large extent on the application. Swift might >>> actually be quicker to start up a job and dispatch tasks (Hadoop is >>> notoriously slow on that front), but otherwise I'd say it just depends on >>> the application, how you implement the application, the cluster, etc. I'm >>> not sure that there is a fair comparison between the two systems since >>> they're just very different: most of the results would be predictable just >>> be looking at the design of the system (e.g. if the application needs to do >>> a big distributed sort, Hadoop is much better) . If the application is >>> embarrassingly parallel (like it sounds like your application is), then you >>> could probably implement it in either, but I'm not sure that it would >>> actually stress the differences between the systems if data sizes are small >>> and runtime is mostly dominated by computation. >>> I think the Cloudera Hadoop distribution is well documented reasonably >>> easy to set up and run, provided that you're not on a time-shared cluster. >>> Apache Hadoop is more of a pain to get working. >>> >>> - Tim >>> >>> >>> On Sun, May 13, 2012 at 9:27 AM, Ketan Maheshwari < >>> ketancmaheshwari at gmail.com> wrote: >>> >>>> Hi, >>>> >>>> We are working on a project from GE Energy corporation which runs >>>> independent MonteCarlo simulations in order to find device reliability >>>> leading to a power grid wise device replacement decisions. The computation >>>> is repeated MC simulations done in parallel. >>>> >>>> Currently, this is running under Hadoop setup on Cornell Redcloud and >>>> EC2 (10 nodes). Looking at the computation, it struck me this is a good >>>> Swift candidate. And since the performance numbers etc are already >>>> extracted for Hadoop, it might also be nice to have a comparison between >>>> Swift and Hadoop. >>>> >>>> However, some reality check before diving in: has it been done before? >>>> Do we know how Swift fares against map-reduce? Are they even comparable? I >>>> have faced this question twice here: Why use Swift when you have Hadoop? >>>> >>>> I could see Hadoop needs quite a bit of setup effort before getting it >>>> to run. Could we quantify usability and compare the two? >>>> >>>> Any ideas and inputs are welcome. >>>> >>>> Regards, >>>> -- >>>> Ketan >>>> >>>> >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >>>> >>>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >>> >>> >> > > > -- > Ketan > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: