From heather.stoller at gmail.com Wed May 2 12:47:53 2012 From: heather.stoller at gmail.com (Heather Stoller) Date: Wed, 2 May 2012 12:47:53 -0500 Subject: [Swift-user] example of @length Message-ID: Hello, Does anyone have an example of how to use @length? I have tried a few things, most recently @length(inputfiles) without success. Cheers! Heather complete script below. type file; file inputfiles1[] ; file inputfiles2[] ; app (file o) cat (file i1, file i2) { cat @i1 @i2 stdout=@o; } foreach j in [0:@length(inputfiles1)] { file c; c = cat(inputfiles1[j], inputfiles2[j]); } -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Wed May 2 12:51:24 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 2 May 2012 12:51:24 -0500 (CDT) Subject: [Swift-user] example of @length In-Reply-To: Message-ID: <2100438239.2997.1335981084799.JavaMail.root@zimbra.anl.gov> Heather, its best to avoid the use of length() where possible. Can you instead do (for example): foreach j, i in inputfiles { x[i] = f(j); } ? - Mike ----- Original Message ----- > From: "Heather Stoller" > To: swift-user at ci.uchicago.edu > Sent: Wednesday, May 2, 2012 12:47:53 PM > Subject: [Swift-user] example of @length > Hello, > > Does anyone have an example of how to use @length? > > I have tried a few things, most recently > @length(inputfiles) > without success. > > Cheers! > Heather > > > complete script below. > > type file; > > file inputfiles1[] ; > file inputfiles2[] ; > > app (file o) cat (file i1, file i2) > { > cat @i1 @i2 stdout=@o; > } > > foreach j in [0:@length(inputfiles1)] { > file c prefix="map.", > suffix=".out">; > c = cat(inputfiles1[j], inputfiles2[j]); > } > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Fri May 4 09:33:39 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Fri, 4 May 2012 10:33:39 -0400 Subject: [Swift-user] swift build fail Message-ID: I am trying to build Swift from trunk on a cluster head node which seems to be failing. Here is more info: cat /etc/redhat-release Red Hat Enterprise Linux AS release 4 (Nahant Update 9) $ which java /usr/local/jre1.6.0_16/bin/java $ java -version java version "1.6.0_16" Java(TM) SE Runtime Environment (build 1.6.0_16-b01) Java HotSpot(TM) Server VM (build 14.2-b01, mixed mode) $ which ant /usr/bin/ant $ ant -version Apache Ant version 1.6.5 compiled on August 14 2006 Attached is the ant redist out/err. Any idea what is happening? Regards, -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: antredist.out Type: application/octet-stream Size: 16723 bytes Desc: not available URL: From ketancmaheshwari at gmail.com Fri May 4 09:40:28 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Fri, 4 May 2012 10:40:28 -0400 Subject: [Swift-user] swift build fail In-Reply-To: References: Message-ID: ahh! javac is missing. It's got only jre. On Fri, May 4, 2012 at 10:33 AM, Ketan Maheshwari < ketancmaheshwari at gmail.com> wrote: > I am trying to build Swift from trunk on a cluster head node which seems > to be failing. > > Here is more info: > cat /etc/redhat-release > Red Hat Enterprise Linux AS release 4 (Nahant Update 9) > > $ which java > /usr/local/jre1.6.0_16/bin/java > $ java -version > java version "1.6.0_16" > Java(TM) SE Runtime Environment (build 1.6.0_16-b01) > Java HotSpot(TM) Server VM (build 14.2-b01, mixed mode) > > $ which ant > /usr/bin/ant > $ ant -version > Apache Ant version 1.6.5 compiled on August 14 2006 > > Attached is the ant redist out/err. > > Any idea what is happening? > > Regards, > -- > Ketan > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Fri May 4 13:42:35 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Fri, 4 May 2012 14:42:35 -0400 Subject: [Swift-user] manual coasters bag of workstations worker error Message-ID: I was trying to run a catsnsleep app to sanity test the coasters passive setup on a bag of workstation using the start-coaster-service script from 0.93. The execution proceeds normally in the first run. However, after killing this run and on resubmission, I see the following error message: Failed to process data: at /home/ketan/work/worker.pl line 788 This means that in order to rerun, the coaster service has to be killed, workers killed and the setup restarted which would be undesirable for a long running application which needs manual restarts. This is followed by a stalled progress text from Swift: Progress: time: Fri, 04 May 2012 14:28:40 -0400 Selecting site:875 Submitted:125 Progress: time: Fri, 04 May 2012 14:29:10 -0400 Selecting site:875 Submitted:125 Progress: time: Fri, 04 May 2012 14:29:40 -0400 Selecting site:875 Submitted:125 Progress: time: Fri, 04 May 2012 14:30:10 -0400 Selecting site:875 Submitted:125 Progress: time: Fri, 04 May 2012 14:30:40 -0400 Selecting site:875 Submitted:125 Progress: time: Fri, 04 May 2012 14:31:10 -0400 Selecting site:875 Submitted:125 Progress: time: Fri, 04 May 2012 14:31:40 -0400 Selecting site:875 Submitted:125 Progress: time: Fri, 04 May 2012 14:32:10 -0400 Selecting site:875 Submitted:125 Please find worker logs and coaster, app logs etc. attached. Any suggestions for corrective measures welcome. Regards, -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: logs.tgz Type: application/x-gzip Size: 615910 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: work.tgz Type: application/x-gzip Size: 16970 bytes Desc: not available URL: From heather.stoller at gmail.com Wed May 9 17:30:40 2012 From: heather.stoller at gmail.com (Heather Stoller) Date: Wed, 9 May 2012 15:30:40 -0700 Subject: [Swift-user] MODIS demo In-Reply-To: <4D712B0A-8F4C-4358-9237-01BFD7A46A26@mcs.anl.gov> References: <1884709006.141180.1334377021910.JavaMail.root@zimbra-mb2.anl.gov> <4D712B0A-8F4C-4358-9237-01BFD7A46A26@mcs.anl.gov> Message-ID: Hello Swift Group, coming back to MODIS: I followed Jonathan's advice to set retries to 0 and looked at the wrapper section in the files in the *.d directory as per David's advice and now get an error right away, as follows: hstoller at ubuntu:~/modis$ ./demo.local urban 4 runid=modis-2012.0509.1527-urban-4-10 Swift 0.93 swift-r5483 cog-r3339 RunID: 20120509-1527-bsxjpz4g (input): found 0 files Progress: time: Wed, 09 May 2012 15:27:53 -0700 Progress: time: Wed, 09 May 2012 15:27:54 -0700 Submitting:7 Submitted:1 Progress: time: Wed, 09 May 2012 15:27:55 -0700 Active:7 Checking status:1 Progress: time: Wed, 09 May 2012 15:27:56 -0700 Stage in:1 Checking status:3 Stage out:1 Finished successfully:4 Execution failed: Progress: time: Wed, 09 May 2012 15:27:57 -0700 Checking status:1 Failed:4 Finished successfully:4 File not found: /home/hstoller/swiftwork/modis-20120509-1527-bsxjpz4g/shared/landuse/h15v05.color.png Progress: time: Wed, 09 May 2012 15:28:00 -0700 Initializing:1 Failed:4 Finished successfully:5 I think my question is, where should that shared/landuse directory be getting the missing file? Thank you for your help! Sincerely, Heather On Fri, Apr 13, 2012 at 10:17 PM, Jonathan Monette wrote: > I echo David's suggestion but would like to add another. It looks like > soft error handling is being used which may not be the best approach when > exploring Swift. In the config file you should set the retry count to 0 and > set lazy.errors=false. This will cause Swift to fail as soon as the first > error is encountered and will provide an error message. This is useful for > when you are exploring Swift behavior. > > On Apr 13, 2012, at 23:17, David Kelly wrote: > > > Heather, > > > > You might want to check the path names in the tc.local file. Since about > half the tasks fail, I'm guessing either colormodis or getlanduse is > pointing to the wrong place. You can verify this by looking at the > directory called modis-2012.d. In there is a list of files that > end with -info. Look at the "Wrapper" section of these files and you should > find more information about what is causing the failures. > > > > David > > > > ----- Original Message ----- > >> From: "Heather Stoller" > >> To: swift-user at ci.uchicago.edu > >> Sent: Thursday, April 5, 2012 9:37:33 AM > >> Subject: [Swift-user] MODIS demo > >> Hello, > >> > >> I'm a UC student working with Mike Wilde doing some Swift stuff - at > >> present, I'm trying to run the demo to see what can be seen. I get: > >> > >> ^Cheather at ubuntu:~/modis$ ./demo.local urban 10 > >> runid=modis-2012.0405.0704-urban-10-10 > >> Swift 0.93 swift-r5483 cog-r3339 > >> > >> RunID: 20120405-0704-drh1g0ob > >> (input): found 0 files > >> Progress: time: Thu, 05 Apr 2012 07:04:48 -0700 > >> Progress: time: Thu, 05 Apr 2012 07:04:49 -0700 Stage in:19 > >> Submitting:1 > >> Progress: time: Thu, 05 Apr 2012 07:04:50 -0700 Stage in:13 > >> Submitting:1 Active:6 > >> Progress: time: Thu, 05 Apr 2012 07:04:53 -0700 Stage in:10 > >> Submitting:2 Submitted:2 Active:6 > >> Progress: time: Thu, 05 Apr 2012 07:04:54 -0700 Stage in:6 > >> Submitting:1 Submitted:2 Active:9 Checking status:2 > >> Progress: time: Thu, 05 Apr 2012 07:04:55 -0700 Stage in:2 > >> Submitting:2 Submitted:1 Active:9 Checking status:3 Stage out:3 > >> Progress: time: Thu, 05 Apr 2012 07:04:57 -0700 Submitting:1 > >> Submitted:2 Active:9 Checking status:1 Stage out:7 > >> Progress: time: Thu, 05 Apr 2012 07:04:58 -0700 Active:3 Checking > >> status:2 Stage out:7 Finished successfully:7 Failed but can retry:1 > >> Progress: time: Thu, 05 Apr 2012 07:04:59 -0700 Stage in:2 > >> Submitting:2 Submitted:1 Active:3 Stage out:1 Finished successfully:10 > >> Failed but can retry:2 > >> Progress: time: Thu, 05 Apr 2012 07:05:00 -0700 Active:10 Stage out:1 > >> Finished successfully:10 > >> Progress: time: Thu, 05 Apr 2012 07:05:01 -0700 Active:9 Checking > >> status:2 Finished successfully:11 > >> Progress: time: Thu, 05 Apr 2012 07:05:02 -0700 Submitting:1 Active:3 > >> Checking status:3 Stage out:2 Finished successfully:11 Failed but can > >> retry:2 > >> Progress: time: Thu, 05 Apr 2012 07:05:03 -0700 Stage in:2 Active:6 > >> Stage out:1 Finished successfully:11 Failed but can retry:2 > >> Execution failed: > >> Progress: time: Thu, 05 Apr 2012 07:05:04 -0700 Active:7 Checking > >> status:2 Stage out:1 Failed:1 Finished successfully:11 > >> Progress: time: Thu, 05 Apr 2012 07:05:05 -0700 Active:3 Checking > >> status:1 Stage out:3 Failed:4 Finished successfully:11 > >> Progress: time: Thu, 05 Apr 2012 07:05:06 -0700 Failed:11 Finished > >> successfully:11 > >> > >> lots of "failed but can retry". Does this look right? > >> > >> -- > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidk at ci.uchicago.edu Wed May 9 17:56:25 2012 From: davidk at ci.uchicago.edu (David Kelly) Date: Wed, 9 May 2012 17:56:25 -0500 (CDT) Subject: [Swift-user] MODIS demo In-Reply-To: Message-ID: <1168589721.39072.1336604185429.JavaMail.root@zimbra-mb2.anl.gov> Heather, Can you please paste the contents of your tc.local file? Thanks, David ----- Original Message ----- > From: "Heather Stoller" > To: swift-user at ci.uchicago.edu > Cc: "David Kelly" , "Jonathan Monette" > Sent: Wednesday, May 9, 2012 5:30:40 PM > Subject: Re: [Swift-user] MODIS demo > Hello Swift Group, coming back to MODIS: > > I followed Jonathan's advice to set retries to 0 and looked at the > wrapper section in the files in the *.d directory as per David's > advice and now get an error right away, as follows: > hstoller at ubuntu:~/modis$ ./demo.local urban 4 > runid=modis-2012.0509.1527-urban-4-10 > Swift 0.93 swift-r5483 cog-r3339 > > RunID: 20120509-1527-bsxjpz4g > (input): found 0 files > Progress: time: Wed, 09 May 2012 15:27:53 -0700 > Progress: time: Wed, 09 May 2012 15:27:54 -0700 Submitting:7 > Submitted:1 > Progress: time: Wed, 09 May 2012 15:27:55 -0700 Active:7 Checking > status:1 > Progress: time: Wed, 09 May 2012 15:27:56 -0700 Stage in:1 Checking > status:3 Stage out:1 Finished successfully:4 > Execution failed: > Progress: time: Wed, 09 May 2012 15:27:57 -0700 Checking status:1 > Failed:4 Finished successfully:4 > File not found: > /home/hstoller/swiftwork/modis-20120509-1527-bsxjpz4g/shared/landuse/h15v05.color.png > Progress: time: Wed, 09 May 2012 15:28:00 -0700 Initializing:1 > Failed:4 Finished successfully:5 > > I think my question is, where should that shared/landuse directory be > getting the missing file? > > Thank you for your help! > > Sincerely, > Heather > > > > > On Fri, Apr 13, 2012 at 10:17 PM, Jonathan Monette < > jonmon at mcs.anl.gov > wrote: > > > I echo David's suggestion but would like to add another. It looks like > soft error handling is being used which may not be the best approach > when exploring Swift. In the config file you should set the retry > count to 0 and set lazy.errors=false. This will cause Swift to fail as > soon as the first error is encountered and will provide an error > message. This is useful for when you are exploring Swift behavior. > > > > On Apr 13, 2012, at 23:17, David Kelly < davidk at ci.uchicago.edu > > wrote: > > > Heather, > > > > You might want to check the path names in the tc.local file. Since > > about half the tasks fail, I'm guessing either colormodis or > > getlanduse is pointing to the wrong place. You can verify this by > > looking at the directory called modis-2012.d. In there is > > a list of files that end with -info. Look at the "Wrapper" section > > of these files and you should find more information about what is > > causing the failures. > > > > David > > > > ----- Original Message ----- > >> From: "Heather Stoller" < heather.stoller at gmail.com > > >> To: swift-user at ci.uchicago.edu > >> Sent: Thursday, April 5, 2012 9:37:33 AM > >> Subject: [Swift-user] MODIS demo > >> Hello, > >> > >> I'm a UC student working with Mike Wilde doing some Swift stuff - > >> at > >> present, I'm trying to run the demo to see what can be seen. I get: > >> > >> ^Cheather at ubuntu:~/modis$ ./demo.local urban 10 > >> runid=modis-2012.0405.0704-urban-10-10 > >> Swift 0.93 swift-r5483 cog-r3339 > >> > >> RunID: 20120405-0704-drh1g0ob > >> (input): found 0 files > >> Progress: time: Thu, 05 Apr 2012 07:04:48 -0700 > >> Progress: time: Thu, 05 Apr 2012 07:04:49 -0700 Stage in:19 > >> Submitting:1 > >> Progress: time: Thu, 05 Apr 2012 07:04:50 -0700 Stage in:13 > >> Submitting:1 Active:6 > >> Progress: time: Thu, 05 Apr 2012 07:04:53 -0700 Stage in:10 > >> Submitting:2 Submitted:2 Active:6 > >> Progress: time: Thu, 05 Apr 2012 07:04:54 -0700 Stage in:6 > >> Submitting:1 Submitted:2 Active:9 Checking status:2 > >> Progress: time: Thu, 05 Apr 2012 07:04:55 -0700 Stage in:2 > >> Submitting:2 Submitted:1 Active:9 Checking status:3 Stage out:3 > >> Progress: time: Thu, 05 Apr 2012 07:04:57 -0700 Submitting:1 > >> Submitted:2 Active:9 Checking status:1 Stage out:7 > >> Progress: time: Thu, 05 Apr 2012 07:04:58 -0700 Active:3 Checking > >> status:2 Stage out:7 Finished successfully:7 Failed but can retry:1 > >> Progress: time: Thu, 05 Apr 2012 07:04:59 -0700 Stage in:2 > >> Submitting:2 Submitted:1 Active:3 Stage out:1 Finished > >> successfully:10 > >> Failed but can retry:2 > >> Progress: time: Thu, 05 Apr 2012 07:05:00 -0700 Active:10 Stage > >> out:1 > >> Finished successfully:10 > >> Progress: time: Thu, 05 Apr 2012 07:05:01 -0700 Active:9 Checking > >> status:2 Finished successfully:11 > >> Progress: time: Thu, 05 Apr 2012 07:05:02 -0700 Submitting:1 > >> Active:3 > >> Checking status:3 Stage out:2 Finished successfully:11 Failed but > >> can > >> retry:2 > >> Progress: time: Thu, 05 Apr 2012 07:05:03 -0700 Stage in:2 Active:6 > >> Stage out:1 Finished successfully:11 Failed but can retry:2 > >> Execution failed: > >> Progress: time: Thu, 05 Apr 2012 07:05:04 -0700 Active:7 Checking > >> status:2 Stage out:1 Failed:1 Finished successfully:11 > >> Progress: time: Thu, 05 Apr 2012 07:05:05 -0700 Active:3 Checking > >> status:1 Stage out:3 Failed:4 Finished successfully:11 > >> Progress: time: Thu, 05 Apr 2012 07:05:06 -0700 Failed:11 Finished > >> successfully:11 > >> > >> lots of "failed but can retry". Does this look right? > >> > >> -- From jonmon at mcs.anl.gov Wed May 9 18:02:48 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Wed, 9 May 2012 18:02:48 -0500 Subject: [Swift-user] MODIS demo In-Reply-To: <1168589721.39072.1336604185429.JavaMail.root@zimbra-mb2.anl.gov> References: <1168589721.39072.1336604185429.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: <076257D8-8F7A-4F77-A155-B27AF1BC87DF@mcs.anl.gov> Also, to get a better error message(I think) set lazy.errors=false in you configuration file. On May 9, 2012, at 17:56, David Kelly wrote: > Heather, > > Can you please paste the contents of your tc.local file? > > Thanks, > David > > ----- Original Message ----- >> From: "Heather Stoller" >> To: swift-user at ci.uchicago.edu >> Cc: "David Kelly" , "Jonathan Monette" >> Sent: Wednesday, May 9, 2012 5:30:40 PM >> Subject: Re: [Swift-user] MODIS demo >> Hello Swift Group, coming back to MODIS: >> >> I followed Jonathan's advice to set retries to 0 and looked at the >> wrapper section in the files in the *.d directory as per David's >> advice and now get an error right away, as follows: >> hstoller at ubuntu:~/modis$ ./demo.local urban 4 >> runid=modis-2012.0509.1527-urban-4-10 >> Swift 0.93 swift-r5483 cog-r3339 >> >> RunID: 20120509-1527-bsxjpz4g >> (input): found 0 files >> Progress: time: Wed, 09 May 2012 15:27:53 -0700 >> Progress: time: Wed, 09 May 2012 15:27:54 -0700 Submitting:7 >> Submitted:1 >> Progress: time: Wed, 09 May 2012 15:27:55 -0700 Active:7 Checking >> status:1 >> Progress: time: Wed, 09 May 2012 15:27:56 -0700 Stage in:1 Checking >> status:3 Stage out:1 Finished successfully:4 >> Execution failed: >> Progress: time: Wed, 09 May 2012 15:27:57 -0700 Checking status:1 >> Failed:4 Finished successfully:4 >> File not found: >> /home/hstoller/swiftwork/modis-20120509-1527-bsxjpz4g/shared/landuse/h15v05.color.png >> Progress: time: Wed, 09 May 2012 15:28:00 -0700 Initializing:1 >> Failed:4 Finished successfully:5 >> >> I think my question is, where should that shared/landuse directory be >> getting the missing file? >> >> Thank you for your help! >> >> Sincerely, >> Heather >> >> >> >> >> On Fri, Apr 13, 2012 at 10:17 PM, Jonathan Monette < >> jonmon at mcs.anl.gov > wrote: >> >> >> I echo David's suggestion but would like to add another. It looks like >> soft error handling is being used which may not be the best approach >> when exploring Swift. In the config file you should set the retry >> count to 0 and set lazy.errors=false. This will cause Swift to fail as >> soon as the first error is encountered and will provide an error >> message. This is useful for when you are exploring Swift behavior. >> >> >> >> On Apr 13, 2012, at 23:17, David Kelly < davidk at ci.uchicago.edu > >> wrote: >> >>> Heather, >>> >>> You might want to check the path names in the tc.local file. Since >>> about half the tasks fail, I'm guessing either colormodis or >>> getlanduse is pointing to the wrong place. You can verify this by >>> looking at the directory called modis-2012.d. In there is >>> a list of files that end with -info. Look at the "Wrapper" section >>> of these files and you should find more information about what is >>> causing the failures. >>> >>> David >>> >>> ----- Original Message ----- >>>> From: "Heather Stoller" < heather.stoller at gmail.com > >>>> To: swift-user at ci.uchicago.edu >>>> Sent: Thursday, April 5, 2012 9:37:33 AM >>>> Subject: [Swift-user] MODIS demo >>>> Hello, >>>> >>>> I'm a UC student working with Mike Wilde doing some Swift stuff - >>>> at >>>> present, I'm trying to run the demo to see what can be seen. I get: >>>> >>>> ^Cheather at ubuntu:~/modis$ ./demo.local urban 10 >>>> runid=modis-2012.0405.0704-urban-10-10 >>>> Swift 0.93 swift-r5483 cog-r3339 >>>> >>>> RunID: 20120405-0704-drh1g0ob >>>> (input): found 0 files >>>> Progress: time: Thu, 05 Apr 2012 07:04:48 -0700 >>>> Progress: time: Thu, 05 Apr 2012 07:04:49 -0700 Stage in:19 >>>> Submitting:1 >>>> Progress: time: Thu, 05 Apr 2012 07:04:50 -0700 Stage in:13 >>>> Submitting:1 Active:6 >>>> Progress: time: Thu, 05 Apr 2012 07:04:53 -0700 Stage in:10 >>>> Submitting:2 Submitted:2 Active:6 >>>> Progress: time: Thu, 05 Apr 2012 07:04:54 -0700 Stage in:6 >>>> Submitting:1 Submitted:2 Active:9 Checking status:2 >>>> Progress: time: Thu, 05 Apr 2012 07:04:55 -0700 Stage in:2 >>>> Submitting:2 Submitted:1 Active:9 Checking status:3 Stage out:3 >>>> Progress: time: Thu, 05 Apr 2012 07:04:57 -0700 Submitting:1 >>>> Submitted:2 Active:9 Checking status:1 Stage out:7 >>>> Progress: time: Thu, 05 Apr 2012 07:04:58 -0700 Active:3 Checking >>>> status:2 Stage out:7 Finished successfully:7 Failed but can retry:1 >>>> Progress: time: Thu, 05 Apr 2012 07:04:59 -0700 Stage in:2 >>>> Submitting:2 Submitted:1 Active:3 Stage out:1 Finished >>>> successfully:10 >>>> Failed but can retry:2 >>>> Progress: time: Thu, 05 Apr 2012 07:05:00 -0700 Active:10 Stage >>>> out:1 >>>> Finished successfully:10 >>>> Progress: time: Thu, 05 Apr 2012 07:05:01 -0700 Active:9 Checking >>>> status:2 Finished successfully:11 >>>> Progress: time: Thu, 05 Apr 2012 07:05:02 -0700 Submitting:1 >>>> Active:3 >>>> Checking status:3 Stage out:2 Finished successfully:11 Failed but >>>> can >>>> retry:2 >>>> Progress: time: Thu, 05 Apr 2012 07:05:03 -0700 Stage in:2 Active:6 >>>> Stage out:1 Finished successfully:11 Failed but can retry:2 >>>> Execution failed: >>>> Progress: time: Thu, 05 Apr 2012 07:05:04 -0700 Active:7 Checking >>>> status:2 Stage out:1 Failed:1 Finished successfully:11 >>>> Progress: time: Thu, 05 Apr 2012 07:05:05 -0700 Active:3 Checking >>>> status:1 Stage out:3 Failed:4 Finished successfully:11 >>>> Progress: time: Thu, 05 Apr 2012 07:05:06 -0700 Failed:11 Finished >>>> successfully:11 >>>> >>>> lots of "failed but can retry". Does this look right? >>>> >>>> -- From heather.stoller at gmail.com Wed May 9 18:46:24 2012 From: heather.stoller at gmail.com (Heather Stoller) Date: Wed, 9 May 2012 16:46:24 -0700 Subject: [Swift-user] MODIS demo In-Reply-To: <1168589721.39072.1336604185429.JavaMail.root@zimbra-mb2.anl.gov> References: <1168589721.39072.1336604185429.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: Hi David, Here are the contents of my tc.local file. Jonathan, I have duly checked that lazy.errors = false. Thank you both! # site transformation path obsolete fields for compatibility localhost echo /bin/echo null null null localhost cat /bin/cat null null null localhost ls /bin/ls null null null localhost grep /bin/grep null null null localhost sort /bin/sort null null null localhost paste /bin/paste null null null localhost pwd /bin/pwd null null null # For cluster usage #pbs getlanduse /home/hstoller/swift/demo/modis/bin/getlanduse.sh null null null #pbs analyzelanduse /home/hstoller/swift/demo/modis/bin/analyzelanduse.sh null null null #pbs colormodis /home/hstoller/swift/demo/modis/bin/colormodis.sh null null null #pbs assemble /home/hstoller/swift/demo/modis/bin/assemble.sh null null null # For localhost testing localhost getlanduse /home/hstoller/modis/bin/getlanduse.sh null null null localhost analyzelanduse /home/hstoller/modis/bin/analyzelanduse2.sh null null null localhost colormodis /home/hstoller/modis/bin/colormodis.sh null null null localhost assemble /home/hstoller/modis/bin/assemble2.sh null null null localhost markmap /home/hstoller/modis/bin/markmap.sh null null null On Wed, May 9, 2012 at 3:56 PM, David Kelly wrote: > Heather, > > Can you please paste the contents of your tc.local file? > > Thanks, > David > > ----- Original Message ----- > > From: "Heather Stoller" > > To: swift-user at ci.uchicago.edu > > Cc: "David Kelly" , "Jonathan Monette" < > jonmon at mcs.anl.gov> > > Sent: Wednesday, May 9, 2012 5:30:40 PM > > Subject: Re: [Swift-user] MODIS demo > > Hello Swift Group, coming back to MODIS: > > > > I followed Jonathan's advice to set retries to 0 and looked at the > > wrapper section in the files in the *.d directory as per David's > > advice and now get an error right away, as follows: > > hstoller at ubuntu:~/modis$ ./demo.local urban 4 > > runid=modis-2012.0509.1527-urban-4-10 > > Swift 0.93 swift-r5483 cog-r3339 > > > > RunID: 20120509-1527-bsxjpz4g > > (input): found 0 files > > Progress: time: Wed, 09 May 2012 15:27:53 -0700 > > Progress: time: Wed, 09 May 2012 15:27:54 -0700 Submitting:7 > > Submitted:1 > > Progress: time: Wed, 09 May 2012 15:27:55 -0700 Active:7 Checking > > status:1 > > Progress: time: Wed, 09 May 2012 15:27:56 -0700 Stage in:1 Checking > > status:3 Stage out:1 Finished successfully:4 > > Execution failed: > > Progress: time: Wed, 09 May 2012 15:27:57 -0700 Checking status:1 > > Failed:4 Finished successfully:4 > > File not found: > > > /home/hstoller/swiftwork/modis-20120509-1527-bsxjpz4g/shared/landuse/h15v05.color.png > > Progress: time: Wed, 09 May 2012 15:28:00 -0700 Initializing:1 > > Failed:4 Finished successfully:5 > > > > I think my question is, where should that shared/landuse directory be > > getting the missing file? > > > > Thank you for your help! > > > > Sincerely, > > Heather > > > > > > > > > > On Fri, Apr 13, 2012 at 10:17 PM, Jonathan Monette < > > jonmon at mcs.anl.gov > wrote: > > > > > > I echo David's suggestion but would like to add another. It looks like > > soft error handling is being used which may not be the best approach > > when exploring Swift. In the config file you should set the retry > > count to 0 and set lazy.errors=false. This will cause Swift to fail as > > soon as the first error is encountered and will provide an error > > message. This is useful for when you are exploring Swift behavior. > > > > > > > > On Apr 13, 2012, at 23:17, David Kelly < davidk at ci.uchicago.edu > > > wrote: > > > > > Heather, > > > > > > You might want to check the path names in the tc.local file. Since > > > about half the tasks fail, I'm guessing either colormodis or > > > getlanduse is pointing to the wrong place. You can verify this by > > > looking at the directory called modis-2012.d. In there is > > > a list of files that end with -info. Look at the "Wrapper" section > > > of these files and you should find more information about what is > > > causing the failures. > > > > > > David > > > > > > ----- Original Message ----- > > >> From: "Heather Stoller" < heather.stoller at gmail.com > > > >> To: swift-user at ci.uchicago.edu > > >> Sent: Thursday, April 5, 2012 9:37:33 AM > > >> Subject: [Swift-user] MODIS demo > > >> Hello, > > >> > > >> I'm a UC student working with Mike Wilde doing some Swift stuff - > > >> at > > >> present, I'm trying to run the demo to see what can be seen. I get: > > >> > > >> ^Cheather at ubuntu:~/modis$ ./demo.local urban 10 > > >> runid=modis-2012.0405.0704-urban-10-10 > > >> Swift 0.93 swift-r5483 cog-r3339 > > >> > > >> RunID: 20120405-0704-drh1g0ob > > >> (input): found 0 files > > >> Progress: time: Thu, 05 Apr 2012 07:04:48 -0700 > > >> Progress: time: Thu, 05 Apr 2012 07:04:49 -0700 Stage in:19 > > >> Submitting:1 > > >> Progress: time: Thu, 05 Apr 2012 07:04:50 -0700 Stage in:13 > > >> Submitting:1 Active:6 > > >> Progress: time: Thu, 05 Apr 2012 07:04:53 -0700 Stage in:10 > > >> Submitting:2 Submitted:2 Active:6 > > >> Progress: time: Thu, 05 Apr 2012 07:04:54 -0700 Stage in:6 > > >> Submitting:1 Submitted:2 Active:9 Checking status:2 > > >> Progress: time: Thu, 05 Apr 2012 07:04:55 -0700 Stage in:2 > > >> Submitting:2 Submitted:1 Active:9 Checking status:3 Stage out:3 > > >> Progress: time: Thu, 05 Apr 2012 07:04:57 -0700 Submitting:1 > > >> Submitted:2 Active:9 Checking status:1 Stage out:7 > > >> Progress: time: Thu, 05 Apr 2012 07:04:58 -0700 Active:3 Checking > > >> status:2 Stage out:7 Finished successfully:7 Failed but can retry:1 > > >> Progress: time: Thu, 05 Apr 2012 07:04:59 -0700 Stage in:2 > > >> Submitting:2 Submitted:1 Active:3 Stage out:1 Finished > > >> successfully:10 > > >> Failed but can retry:2 > > >> Progress: time: Thu, 05 Apr 2012 07:05:00 -0700 Active:10 Stage > > >> out:1 > > >> Finished successfully:10 > > >> Progress: time: Thu, 05 Apr 2012 07:05:01 -0700 Active:9 Checking > > >> status:2 Finished successfully:11 > > >> Progress: time: Thu, 05 Apr 2012 07:05:02 -0700 Submitting:1 > > >> Active:3 > > >> Checking status:3 Stage out:2 Finished successfully:11 Failed but > > >> can > > >> retry:2 > > >> Progress: time: Thu, 05 Apr 2012 07:05:03 -0700 Stage in:2 Active:6 > > >> Stage out:1 Finished successfully:11 Failed but can retry:2 > > >> Execution failed: > > >> Progress: time: Thu, 05 Apr 2012 07:05:04 -0700 Active:7 Checking > > >> status:2 Stage out:1 Failed:1 Finished successfully:11 > > >> Progress: time: Thu, 05 Apr 2012 07:05:05 -0700 Active:3 Checking > > >> status:1 Stage out:3 Failed:4 Finished successfully:11 > > >> Progress: time: Thu, 05 Apr 2012 07:05:06 -0700 Failed:11 Finished > > >> successfully:11 > > >> > > >> lots of "failed but can retry". Does this look right? > > >> > > >> -- > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidk at ci.uchicago.edu Wed May 9 18:54:11 2012 From: davidk at ci.uchicago.edu (David Kelly) Date: Wed, 9 May 2012 18:54:11 -0500 (CDT) Subject: [Swift-user] MODIS demo In-Reply-To: Message-ID: <1811174490.39223.1336607651689.JavaMail.root@zimbra-mb2.anl.gov> Thanks - which machine are you testing this on? ----- Original Message ----- > From: "Heather Stoller" > To: "David Kelly" > Cc: "Jonathan Monette" , swift-user at ci.uchicago.edu > Sent: Wednesday, May 9, 2012 6:46:24 PM > Subject: Re: [Swift-user] MODIS demo > Hi David, > > Here are the contents of my tc.local file. Jonathan, I have duly > checked that lazy.errors = false. Thank you both! > > # site transformation path obsolete fields for compatibility > > localhost echo /bin/echo null null null > localhost cat /bin/cat null null null > localhost ls /bin/ls null null null > localhost grep /bin/grep null null null > localhost sort /bin/sort null null null > localhost paste /bin/paste null null null > localhost pwd /bin/pwd null null null > > # For cluster usage > > #pbs getlanduse /home/hstoller/swift/demo/modis/bin/getlanduse.sh null > null null > #pbs analyzelanduse > /home/hstoller/swift/demo/modis/bin/analyzelanduse.sh null null null > #pbs colormodis /home/hstoller/swift/demo/modis/bin/colormodis.sh null > null null > #pbs assemble /home/hstoller/swift/demo/modis/bin/assemble.sh null > null null > > # For localhost testing > > localhost getlanduse /home/hstoller/modis/bin/getlanduse.sh null null > null > localhost analyzelanduse /home/hstoller/modis/bin/analyzelanduse2.sh > null null null > localhost colormodis /home/hstoller/modis/bin/colormodis.sh null null > null > localhost assemble /home/hstoller/modis/bin/assemble2.sh null null > null > localhost markmap /home/hstoller/modis/bin/markmap.sh null null null > > > > On Wed, May 9, 2012 at 3:56 PM, David Kelly < davidk at ci.uchicago.edu > > wrote: > > > Heather, > > Can you please paste the contents of your tc.local file? > > Thanks, > > David > > ----- Original Message ----- > > From: "Heather Stoller" < heather.stoller at gmail.com > > > To: swift-user at ci.uchicago.edu > > > > Cc: "David Kelly" < davidk at ci.uchicago.edu >, "Jonathan Monette" < > > jonmon at mcs.anl.gov > > > Sent: Wednesday, May 9, 2012 5:30:40 PM > > Subject: Re: [Swift-user] MODIS demo > > Hello Swift Group, coming back to MODIS: > > > > I followed Jonathan's advice to set retries to 0 and looked at the > > wrapper section in the files in the *.d directory as per David's > > advice and now get an error right away, as follows: > > hstoller at ubuntu:~/modis$ ./demo.local urban 4 > > runid=modis-2012.0509.1527-urban-4-10 > > Swift 0.93 swift-r5483 cog-r3339 > > > > RunID: 20120509-1527-bsxjpz4g > > (input): found 0 files > > Progress: time: Wed, 09 May 2012 15:27:53 -0700 > > Progress: time: Wed, 09 May 2012 15:27:54 -0700 Submitting:7 > > Submitted:1 > > Progress: time: Wed, 09 May 2012 15:27:55 -0700 Active:7 Checking > > status:1 > > Progress: time: Wed, 09 May 2012 15:27:56 -0700 Stage in:1 Checking > > status:3 Stage out:1 Finished successfully:4 > > Execution failed: > > Progress: time: Wed, 09 May 2012 15:27:57 -0700 Checking status:1 > > Failed:4 Finished successfully:4 > > File not found: > > /home/hstoller/swiftwork/modis-20120509-1527-bsxjpz4g/shared/landuse/h15v05.color.png > > Progress: time: Wed, 09 May 2012 15:28:00 -0700 Initializing:1 > > Failed:4 Finished successfully:5 > > > > I think my question is, where should that shared/landuse directory > > be > > getting the missing file? > > > > Thank you for your help! > > > > Sincerely, > > Heather > > > > > > > > > > On Fri, Apr 13, 2012 at 10:17 PM, Jonathan Monette < > > jonmon at mcs.anl.gov > wrote: > > > > > > I echo David's suggestion but would like to add another. It looks > > like > > soft error handling is being used which may not be the best approach > > when exploring Swift. In the config file you should set the retry > > count to 0 and set lazy.errors=false. This will cause Swift to fail > > as > > soon as the first error is encountered and will provide an error > > message. This is useful for when you are exploring Swift behavior. > > > > > > > > On Apr 13, 2012, at 23:17, David Kelly < davidk at ci.uchicago.edu > > > wrote: > > > > > Heather, > > > > > > You might want to check the path names in the tc.local file. Since > > > about half the tasks fail, I'm guessing either colormodis or > > > getlanduse is pointing to the wrong place. You can verify this by > > > looking at the directory called modis-2012.d. In there > > > is > > > a list of files that end with -info. Look at the "Wrapper" section > > > of these files and you should find more information about what is > > > causing the failures. > > > > > > David > > > > > > ----- Original Message ----- > > >> From: "Heather Stoller" < heather.stoller at gmail.com > > > >> To: swift-user at ci.uchicago.edu > > >> Sent: Thursday, April 5, 2012 9:37:33 AM > > >> Subject: [Swift-user] MODIS demo > > >> Hello, > > >> > > >> I'm a UC student working with Mike Wilde doing some Swift stuff - > > >> at > > >> present, I'm trying to run the demo to see what can be seen. I > > >> get: > > >> > > >> ^Cheather at ubuntu:~/modis$ ./demo.local urban 10 > > >> runid=modis-2012.0405.0704-urban-10-10 > > >> Swift 0.93 swift-r5483 cog-r3339 > > >> > > >> RunID: 20120405-0704-drh1g0ob > > >> (input): found 0 files > > >> Progress: time: Thu, 05 Apr 2012 07:04:48 -0700 > > >> Progress: time: Thu, 05 Apr 2012 07:04:49 -0700 Stage in:19 > > >> Submitting:1 > > >> Progress: time: Thu, 05 Apr 2012 07:04:50 -0700 Stage in:13 > > >> Submitting:1 Active:6 > > >> Progress: time: Thu, 05 Apr 2012 07:04:53 -0700 Stage in:10 > > >> Submitting:2 Submitted:2 Active:6 > > >> Progress: time: Thu, 05 Apr 2012 07:04:54 -0700 Stage in:6 > > >> Submitting:1 Submitted:2 Active:9 Checking status:2 > > >> Progress: time: Thu, 05 Apr 2012 07:04:55 -0700 Stage in:2 > > >> Submitting:2 Submitted:1 Active:9 Checking status:3 Stage out:3 > > >> Progress: time: Thu, 05 Apr 2012 07:04:57 -0700 Submitting:1 > > >> Submitted:2 Active:9 Checking status:1 Stage out:7 > > >> Progress: time: Thu, 05 Apr 2012 07:04:58 -0700 Active:3 Checking > > >> status:2 Stage out:7 Finished successfully:7 Failed but can > > >> retry:1 > > >> Progress: time: Thu, 05 Apr 2012 07:04:59 -0700 Stage in:2 > > >> Submitting:2 Submitted:1 Active:3 Stage out:1 Finished > > >> successfully:10 > > >> Failed but can retry:2 > > >> Progress: time: Thu, 05 Apr 2012 07:05:00 -0700 Active:10 Stage > > >> out:1 > > >> Finished successfully:10 > > >> Progress: time: Thu, 05 Apr 2012 07:05:01 -0700 Active:9 Checking > > >> status:2 Finished successfully:11 > > >> Progress: time: Thu, 05 Apr 2012 07:05:02 -0700 Submitting:1 > > >> Active:3 > > >> Checking status:3 Stage out:2 Finished successfully:11 Failed but > > >> can > > >> retry:2 > > >> Progress: time: Thu, 05 Apr 2012 07:05:03 -0700 Stage in:2 > > >> Active:6 > > >> Stage out:1 Finished successfully:11 Failed but can retry:2 > > >> Execution failed: > > >> Progress: time: Thu, 05 Apr 2012 07:05:04 -0700 Active:7 Checking > > >> status:2 Stage out:1 Failed:1 Finished successfully:11 > > >> Progress: time: Thu, 05 Apr 2012 07:05:05 -0700 Active:3 Checking > > >> status:1 Stage out:3 Failed:4 Finished successfully:11 > > >> Progress: time: Thu, 05 Apr 2012 07:05:06 -0700 Failed:11 > > >> Finished > > >> successfully:11 > > >> > > >> lots of "failed but can retry". Does this look right? > > >> > > >> -- From heather.stoller at gmail.com Wed May 9 18:56:40 2012 From: heather.stoller at gmail.com (Heather Stoller) Date: Wed, 9 May 2012 16:56:40 -0700 Subject: [Swift-user] MODIS demo In-Reply-To: <1811174490.39223.1336607651689.JavaMail.root@zimbra-mb2.anl.gov> References: <1811174490.39223.1336607651689.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: Hi David, Thank you - this is on my local machine. An ls from home gives: hstoller at ubuntu:~$ ls Desktop Documents Downloads modis Music Pictures Public swift-0.93 swiftwork Templates Videos --Heather On Wed, May 9, 2012 at 4:54 PM, David Kelly wrote: > Thanks - which machine are you testing this on? > > ----- Original Message ----- > > From: "Heather Stoller" > > To: "David Kelly" > > Cc: "Jonathan Monette" , swift-user at ci.uchicago.edu > > Sent: Wednesday, May 9, 2012 6:46:24 PM > > Subject: Re: [Swift-user] MODIS demo > > Hi David, > > > > Here are the contents of my tc.local file. Jonathan, I have duly > > checked that lazy.errors = false. Thank you both! > > > > # site transformation path obsolete fields for compatibility > > > > localhost echo /bin/echo null null null > > localhost cat /bin/cat null null null > > localhost ls /bin/ls null null null > > localhost grep /bin/grep null null null > > localhost sort /bin/sort null null null > > localhost paste /bin/paste null null null > > localhost pwd /bin/pwd null null null > > > > # For cluster usage > > > > #pbs getlanduse /home/hstoller/swift/demo/modis/bin/getlanduse.sh null > > null null > > #pbs analyzelanduse > > /home/hstoller/swift/demo/modis/bin/analyzelanduse.sh null null null > > #pbs colormodis /home/hstoller/swift/demo/modis/bin/colormodis.sh null > > null null > > #pbs assemble /home/hstoller/swift/demo/modis/bin/assemble.sh null > > null null > > > > # For localhost testing > > > > localhost getlanduse /home/hstoller/modis/bin/getlanduse.sh null null > > null > > localhost analyzelanduse /home/hstoller/modis/bin/analyzelanduse2.sh > > null null null > > localhost colormodis /home/hstoller/modis/bin/colormodis.sh null null > > null > > localhost assemble /home/hstoller/modis/bin/assemble2.sh null null > > null > > localhost markmap /home/hstoller/modis/bin/markmap.sh null null null > > > > > > > > On Wed, May 9, 2012 at 3:56 PM, David Kelly < davidk at ci.uchicago.edu > > > wrote: > > > > > > Heather, > > > > Can you please paste the contents of your tc.local file? > > > > Thanks, > > > > David > > > > ----- Original Message ----- > > > From: "Heather Stoller" < heather.stoller at gmail.com > > > > To: swift-user at ci.uchicago.edu > > > > > > > Cc: "David Kelly" < davidk at ci.uchicago.edu >, "Jonathan Monette" < > > > jonmon at mcs.anl.gov > > > > Sent: Wednesday, May 9, 2012 5:30:40 PM > > > Subject: Re: [Swift-user] MODIS demo > > > Hello Swift Group, coming back to MODIS: > > > > > > I followed Jonathan's advice to set retries to 0 and looked at the > > > wrapper section in the files in the *.d directory as per David's > > > advice and now get an error right away, as follows: > > > hstoller at ubuntu:~/modis$ ./demo.local urban 4 > > > runid=modis-2012.0509.1527-urban-4-10 > > > Swift 0.93 swift-r5483 cog-r3339 > > > > > > RunID: 20120509-1527-bsxjpz4g > > > (input): found 0 files > > > Progress: time: Wed, 09 May 2012 15:27:53 -0700 > > > Progress: time: Wed, 09 May 2012 15:27:54 -0700 Submitting:7 > > > Submitted:1 > > > Progress: time: Wed, 09 May 2012 15:27:55 -0700 Active:7 Checking > > > status:1 > > > Progress: time: Wed, 09 May 2012 15:27:56 -0700 Stage in:1 Checking > > > status:3 Stage out:1 Finished successfully:4 > > > Execution failed: > > > Progress: time: Wed, 09 May 2012 15:27:57 -0700 Checking status:1 > > > Failed:4 Finished successfully:4 > > > File not found: > > > > /home/hstoller/swiftwork/modis-20120509-1527-bsxjpz4g/shared/landuse/h15v05.color.png > > > Progress: time: Wed, 09 May 2012 15:28:00 -0700 Initializing:1 > > > Failed:4 Finished successfully:5 > > > > > > I think my question is, where should that shared/landuse directory > > > be > > > getting the missing file? > > > > > > Thank you for your help! > > > > > > Sincerely, > > > Heather > > > > > > > > > > > > > > > On Fri, Apr 13, 2012 at 10:17 PM, Jonathan Monette < > > > jonmon at mcs.anl.gov > wrote: > > > > > > > > > I echo David's suggestion but would like to add another. It looks > > > like > > > soft error handling is being used which may not be the best approach > > > when exploring Swift. In the config file you should set the retry > > > count to 0 and set lazy.errors=false. This will cause Swift to fail > > > as > > > soon as the first error is encountered and will provide an error > > > message. This is useful for when you are exploring Swift behavior. > > > > > > > > > > > > On Apr 13, 2012, at 23:17, David Kelly < davidk at ci.uchicago.edu > > > > wrote: > > > > > > > Heather, > > > > > > > > You might want to check the path names in the tc.local file. Since > > > > about half the tasks fail, I'm guessing either colormodis or > > > > getlanduse is pointing to the wrong place. You can verify this by > > > > looking at the directory called modis-2012.d. In there > > > > is > > > > a list of files that end with -info. Look at the "Wrapper" section > > > > of these files and you should find more information about what is > > > > causing the failures. > > > > > > > > David > > > > > > > > ----- Original Message ----- > > > >> From: "Heather Stoller" < heather.stoller at gmail.com > > > > >> To: swift-user at ci.uchicago.edu > > > >> Sent: Thursday, April 5, 2012 9:37:33 AM > > > >> Subject: [Swift-user] MODIS demo > > > >> Hello, > > > >> > > > >> I'm a UC student working with Mike Wilde doing some Swift stuff - > > > >> at > > > >> present, I'm trying to run the demo to see what can be seen. I > > > >> get: > > > >> > > > >> ^Cheather at ubuntu:~/modis$ ./demo.local urban 10 > > > >> runid=modis-2012.0405.0704-urban-10-10 > > > >> Swift 0.93 swift-r5483 cog-r3339 > > > >> > > > >> RunID: 20120405-0704-drh1g0ob > > > >> (input): found 0 files > > > >> Progress: time: Thu, 05 Apr 2012 07:04:48 -0700 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:49 -0700 Stage in:19 > > > >> Submitting:1 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:50 -0700 Stage in:13 > > > >> Submitting:1 Active:6 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:53 -0700 Stage in:10 > > > >> Submitting:2 Submitted:2 Active:6 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:54 -0700 Stage in:6 > > > >> Submitting:1 Submitted:2 Active:9 Checking status:2 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:55 -0700 Stage in:2 > > > >> Submitting:2 Submitted:1 Active:9 Checking status:3 Stage out:3 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:57 -0700 Submitting:1 > > > >> Submitted:2 Active:9 Checking status:1 Stage out:7 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:58 -0700 Active:3 Checking > > > >> status:2 Stage out:7 Finished successfully:7 Failed but can > > > >> retry:1 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:59 -0700 Stage in:2 > > > >> Submitting:2 Submitted:1 Active:3 Stage out:1 Finished > > > >> successfully:10 > > > >> Failed but can retry:2 > > > >> Progress: time: Thu, 05 Apr 2012 07:05:00 -0700 Active:10 Stage > > > >> out:1 > > > >> Finished successfully:10 > > > >> Progress: time: Thu, 05 Apr 2012 07:05:01 -0700 Active:9 Checking > > > >> status:2 Finished successfully:11 > > > >> Progress: time: Thu, 05 Apr 2012 07:05:02 -0700 Submitting:1 > > > >> Active:3 > > > >> Checking status:3 Stage out:2 Finished successfully:11 Failed but > > > >> can > > > >> retry:2 > > > >> Progress: time: Thu, 05 Apr 2012 07:05:03 -0700 Stage in:2 > > > >> Active:6 > > > >> Stage out:1 Finished successfully:11 Failed but can retry:2 > > > >> Execution failed: > > > >> Progress: time: Thu, 05 Apr 2012 07:05:04 -0700 Active:7 Checking > > > >> status:2 Stage out:1 Failed:1 Finished successfully:11 > > > >> Progress: time: Thu, 05 Apr 2012 07:05:05 -0700 Active:3 Checking > > > >> status:1 Stage out:3 Failed:4 Finished successfully:11 > > > >> Progress: time: Thu, 05 Apr 2012 07:05:06 -0700 Failed:11 > > > >> Finished > > > >> successfully:11 > > > >> > > > >> lots of "failed but can retry". Does this look right? > > > >> > > > >> -- > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonmon at mcs.anl.gov Wed May 9 18:58:38 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Wed, 9 May 2012 18:58:38 -0500 Subject: [Swift-user] MODIS demo In-Reply-To: References: <1811174490.39223.1336607651689.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: <0C6FD5BA-1318-48F3-8781-A943452F4159@mcs.anl.gov> Do you have imagemagick installed on your personal computer and in you PATH? On May 9, 2012, at 18:56, Heather Stoller wrote: > Hi David, > > Thank you - this is on my local machine. An ls from home gives: > > hstoller at ubuntu:~$ ls > Desktop Documents Downloads modis Music Pictures Public swift-0.93 swiftwork Templates Videos > > --Heather > > On Wed, May 9, 2012 at 4:54 PM, David Kelly wrote: > Thanks - which machine are you testing this on? > > ----- Original Message ----- > > From: "Heather Stoller" > > To: "David Kelly" > > Cc: "Jonathan Monette" , swift-user at ci.uchicago.edu > > Sent: Wednesday, May 9, 2012 6:46:24 PM > > Subject: Re: [Swift-user] MODIS demo > > Hi David, > > > > Here are the contents of my tc.local file. Jonathan, I have duly > > checked that lazy.errors = false. Thank you both! > > > > # site transformation path obsolete fields for compatibility > > > > localhost echo /bin/echo null null null > > localhost cat /bin/cat null null null > > localhost ls /bin/ls null null null > > localhost grep /bin/grep null null null > > localhost sort /bin/sort null null null > > localhost paste /bin/paste null null null > > localhost pwd /bin/pwd null null null > > > > # For cluster usage > > > > #pbs getlanduse /home/hstoller/swift/demo/modis/bin/getlanduse.sh null > > null null > > #pbs analyzelanduse > > /home/hstoller/swift/demo/modis/bin/analyzelanduse.sh null null null > > #pbs colormodis /home/hstoller/swift/demo/modis/bin/colormodis.sh null > > null null > > #pbs assemble /home/hstoller/swift/demo/modis/bin/assemble.sh null > > null null > > > > # For localhost testing > > > > localhost getlanduse /home/hstoller/modis/bin/getlanduse.sh null null > > null > > localhost analyzelanduse /home/hstoller/modis/bin/analyzelanduse2.sh > > null null null > > localhost colormodis /home/hstoller/modis/bin/colormodis.sh null null > > null > > localhost assemble /home/hstoller/modis/bin/assemble2.sh null null > > null > > localhost markmap /home/hstoller/modis/bin/markmap.sh null null null > > > > > > > > On Wed, May 9, 2012 at 3:56 PM, David Kelly < davidk at ci.uchicago.edu > > > wrote: > > > > > > Heather, > > > > Can you please paste the contents of your tc.local file? > > > > Thanks, > > > > David > > > > ----- Original Message ----- > > > From: "Heather Stoller" < heather.stoller at gmail.com > > > > To: swift-user at ci.uchicago.edu > > > > > > > Cc: "David Kelly" < davidk at ci.uchicago.edu >, "Jonathan Monette" < > > > jonmon at mcs.anl.gov > > > > Sent: Wednesday, May 9, 2012 5:30:40 PM > > > Subject: Re: [Swift-user] MODIS demo > > > Hello Swift Group, coming back to MODIS: > > > > > > I followed Jonathan's advice to set retries to 0 and looked at the > > > wrapper section in the files in the *.d directory as per David's > > > advice and now get an error right away, as follows: > > > hstoller at ubuntu:~/modis$ ./demo.local urban 4 > > > runid=modis-2012.0509.1527-urban-4-10 > > > Swift 0.93 swift-r5483 cog-r3339 > > > > > > RunID: 20120509-1527-bsxjpz4g > > > (input): found 0 files > > > Progress: time: Wed, 09 May 2012 15:27:53 -0700 > > > Progress: time: Wed, 09 May 2012 15:27:54 -0700 Submitting:7 > > > Submitted:1 > > > Progress: time: Wed, 09 May 2012 15:27:55 -0700 Active:7 Checking > > > status:1 > > > Progress: time: Wed, 09 May 2012 15:27:56 -0700 Stage in:1 Checking > > > status:3 Stage out:1 Finished successfully:4 > > > Execution failed: > > > Progress: time: Wed, 09 May 2012 15:27:57 -0700 Checking status:1 > > > Failed:4 Finished successfully:4 > > > File not found: > > > /home/hstoller/swiftwork/modis-20120509-1527-bsxjpz4g/shared/landuse/h15v05.color.png > > > Progress: time: Wed, 09 May 2012 15:28:00 -0700 Initializing:1 > > > Failed:4 Finished successfully:5 > > > > > > I think my question is, where should that shared/landuse directory > > > be > > > getting the missing file? > > > > > > Thank you for your help! > > > > > > Sincerely, > > > Heather > > > > > > > > > > > > > > > On Fri, Apr 13, 2012 at 10:17 PM, Jonathan Monette < > > > jonmon at mcs.anl.gov > wrote: > > > > > > > > > I echo David's suggestion but would like to add another. It looks > > > like > > > soft error handling is being used which may not be the best approach > > > when exploring Swift. In the config file you should set the retry > > > count to 0 and set lazy.errors=false. This will cause Swift to fail > > > as > > > soon as the first error is encountered and will provide an error > > > message. This is useful for when you are exploring Swift behavior. > > > > > > > > > > > > On Apr 13, 2012, at 23:17, David Kelly < davidk at ci.uchicago.edu > > > > wrote: > > > > > > > Heather, > > > > > > > > You might want to check the path names in the tc.local file. Since > > > > about half the tasks fail, I'm guessing either colormodis or > > > > getlanduse is pointing to the wrong place. You can verify this by > > > > looking at the directory called modis-2012.d. In there > > > > is > > > > a list of files that end with -info. Look at the "Wrapper" section > > > > of these files and you should find more information about what is > > > > causing the failures. > > > > > > > > David > > > > > > > > ----- Original Message ----- > > > >> From: "Heather Stoller" < heather.stoller at gmail.com > > > > >> To: swift-user at ci.uchicago.edu > > > >> Sent: Thursday, April 5, 2012 9:37:33 AM > > > >> Subject: [Swift-user] MODIS demo > > > >> Hello, > > > >> > > > >> I'm a UC student working with Mike Wilde doing some Swift stuff - > > > >> at > > > >> present, I'm trying to run the demo to see what can be seen. I > > > >> get: > > > >> > > > >> ^Cheather at ubuntu:~/modis$ ./demo.local urban 10 > > > >> runid=modis-2012.0405.0704-urban-10-10 > > > >> Swift 0.93 swift-r5483 cog-r3339 > > > >> > > > >> RunID: 20120405-0704-drh1g0ob > > > >> (input): found 0 files > > > >> Progress: time: Thu, 05 Apr 2012 07:04:48 -0700 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:49 -0700 Stage in:19 > > > >> Submitting:1 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:50 -0700 Stage in:13 > > > >> Submitting:1 Active:6 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:53 -0700 Stage in:10 > > > >> Submitting:2 Submitted:2 Active:6 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:54 -0700 Stage in:6 > > > >> Submitting:1 Submitted:2 Active:9 Checking status:2 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:55 -0700 Stage in:2 > > > >> Submitting:2 Submitted:1 Active:9 Checking status:3 Stage out:3 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:57 -0700 Submitting:1 > > > >> Submitted:2 Active:9 Checking status:1 Stage out:7 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:58 -0700 Active:3 Checking > > > >> status:2 Stage out:7 Finished successfully:7 Failed but can > > > >> retry:1 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:59 -0700 Stage in:2 > > > >> Submitting:2 Submitted:1 Active:3 Stage out:1 Finished > > > >> successfully:10 > > > >> Failed but can retry:2 > > > >> Progress: time: Thu, 05 Apr 2012 07:05:00 -0700 Active:10 Stage > > > >> out:1 > > > >> Finished successfully:10 > > > >> Progress: time: Thu, 05 Apr 2012 07:05:01 -0700 Active:9 Checking > > > >> status:2 Finished successfully:11 > > > >> Progress: time: Thu, 05 Apr 2012 07:05:02 -0700 Submitting:1 > > > >> Active:3 > > > >> Checking status:3 Stage out:2 Finished successfully:11 Failed but > > > >> can > > > >> retry:2 > > > >> Progress: time: Thu, 05 Apr 2012 07:05:03 -0700 Stage in:2 > > > >> Active:6 > > > >> Stage out:1 Finished successfully:11 Failed but can retry:2 > > > >> Execution failed: > > > >> Progress: time: Thu, 05 Apr 2012 07:05:04 -0700 Active:7 Checking > > > >> status:2 Stage out:1 Failed:1 Finished successfully:11 > > > >> Progress: time: Thu, 05 Apr 2012 07:05:05 -0700 Active:3 Checking > > > >> status:1 Stage out:3 Failed:4 Finished successfully:11 > > > >> Progress: time: Thu, 05 Apr 2012 07:05:06 -0700 Failed:11 > > > >> Finished > > > >> successfully:11 > > > >> > > > >> lots of "failed but can retry". Does this look right? > > > >> > > > >> -- > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidk at ci.uchicago.edu Wed May 9 19:00:48 2012 From: davidk at ci.uchicago.edu (David Kelly) Date: Wed, 9 May 2012 19:00:48 -0500 (CDT) Subject: [Swift-user] MODIS demo In-Reply-To: Message-ID: <1534926790.39233.1336608048089.JavaMail.root@zimbra-mb2.anl.gov> Is ImageMagick installed on your machine? Can you try "which convert" and "which montage"? ----- Original Message ----- > From: "Heather Stoller" > To: "David Kelly" > Cc: "Jonathan Monette" , swift-user at ci.uchicago.edu > Sent: Wednesday, May 9, 2012 6:56:40 PM > Subject: Re: [Swift-user] MODIS demo > Hi David, > > Thank you - this is on my local machine. An ls from home gives: > > hstoller at ubuntu:~$ ls > Desktop Documents Downloads modis Music Pictures Public swift-0.93 > swiftwork Templates Videos > > --Heather > > > On Wed, May 9, 2012 at 4:54 PM, David Kelly < davidk at ci.uchicago.edu > > wrote: > > > Thanks - which machine are you testing this on? > > > ----- Original Message ----- > > From: "Heather Stoller" < heather.stoller at gmail.com > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > Cc: "Jonathan Monette" < jonmon at mcs.anl.gov >, > > swift-user at ci.uchicago.edu > > Sent: Wednesday, May 9, 2012 6:46:24 PM > > Subject: Re: [Swift-user] MODIS demo > > Hi David, > > > > Here are the contents of my tc.local file. Jonathan, I have duly > > checked that lazy.errors = false. Thank you both! > > > > # site transformation path obsolete fields for compatibility > > > > localhost echo /bin/echo null null null > > localhost cat /bin/cat null null null > > localhost ls /bin/ls null null null > > localhost grep /bin/grep null null null > > localhost sort /bin/sort null null null > > localhost paste /bin/paste null null null > > localhost pwd /bin/pwd null null null > > > > # For cluster usage > > > > #pbs getlanduse /home/hstoller/swift/demo/modis/bin/getlanduse.sh > > null > > null null > > #pbs analyzelanduse > > /home/hstoller/swift/demo/modis/bin/analyzelanduse.sh null null null > > #pbs colormodis /home/hstoller/swift/demo/modis/bin/colormodis.sh > > null > > null null > > #pbs assemble /home/hstoller/swift/demo/modis/bin/assemble.sh null > > null null > > > > # For localhost testing > > > > localhost getlanduse /home/hstoller/modis/bin/getlanduse.sh null > > null > > null > > localhost analyzelanduse /home/hstoller/modis/bin/analyzelanduse2.sh > > null null null > > localhost colormodis /home/hstoller/modis/bin/colormodis.sh null > > null > > null > > localhost assemble /home/hstoller/modis/bin/assemble2.sh null null > > null > > localhost markmap /home/hstoller/modis/bin/markmap.sh null null null > > > > > > > > On Wed, May 9, 2012 at 3:56 PM, David Kelly < davidk at ci.uchicago.edu > > > > > wrote: > > > > > > Heather, > > > > Can you please paste the contents of your tc.local file? > > > > Thanks, > > > > David > > > > ----- Original Message ----- > > > From: "Heather Stoller" < heather.stoller at gmail.com > > > > To: swift-user at ci.uchicago.edu > > > > > > > Cc: "David Kelly" < davidk at ci.uchicago.edu >, "Jonathan Monette" < > > > jonmon at mcs.anl.gov > > > > Sent: Wednesday, May 9, 2012 5:30:40 PM > > > Subject: Re: [Swift-user] MODIS demo > > > Hello Swift Group, coming back to MODIS: > > > > > > I followed Jonathan's advice to set retries to 0 and looked at the > > > wrapper section in the files in the *.d directory as per David's > > > advice and now get an error right away, as follows: > > > hstoller at ubuntu:~/modis$ ./demo.local urban 4 > > > runid=modis-2012.0509.1527-urban-4-10 > > > Swift 0.93 swift-r5483 cog-r3339 > > > > > > RunID: 20120509-1527-bsxjpz4g > > > (input): found 0 files > > > Progress: time: Wed, 09 May 2012 15:27:53 -0700 > > > Progress: time: Wed, 09 May 2012 15:27:54 -0700 Submitting:7 > > > Submitted:1 > > > Progress: time: Wed, 09 May 2012 15:27:55 -0700 Active:7 Checking > > > status:1 > > > Progress: time: Wed, 09 May 2012 15:27:56 -0700 Stage in:1 > > > Checking > > > status:3 Stage out:1 Finished successfully:4 > > > Execution failed: > > > Progress: time: Wed, 09 May 2012 15:27:57 -0700 Checking status:1 > > > Failed:4 Finished successfully:4 > > > File not found: > > > /home/hstoller/swiftwork/modis-20120509-1527-bsxjpz4g/shared/landuse/h15v05.color.png > > > Progress: time: Wed, 09 May 2012 15:28:00 -0700 Initializing:1 > > > Failed:4 Finished successfully:5 > > > > > > I think my question is, where should that shared/landuse directory > > > be > > > getting the missing file? > > > > > > Thank you for your help! > > > > > > Sincerely, > > > Heather > > > > > > > > > > > > > > > On Fri, Apr 13, 2012 at 10:17 PM, Jonathan Monette < > > > jonmon at mcs.anl.gov > wrote: > > > > > > > > > I echo David's suggestion but would like to add another. It looks > > > like > > > soft error handling is being used which may not be the best > > > approach > > > when exploring Swift. In the config file you should set the retry > > > count to 0 and set lazy.errors=false. This will cause Swift to > > > fail > > > as > > > soon as the first error is encountered and will provide an error > > > message. This is useful for when you are exploring Swift behavior. > > > > > > > > > > > > On Apr 13, 2012, at 23:17, David Kelly < davidk at ci.uchicago.edu > > > > wrote: > > > > > > > Heather, > > > > > > > > You might want to check the path names in the tc.local file. > > > > Since > > > > about half the tasks fail, I'm guessing either colormodis or > > > > getlanduse is pointing to the wrong place. You can verify this > > > > by > > > > looking at the directory called modis-2012.d. In > > > > there > > > > is > > > > a list of files that end with -info. Look at the "Wrapper" > > > > section > > > > of these files and you should find more information about what > > > > is > > > > causing the failures. > > > > > > > > David > > > > > > > > ----- Original Message ----- > > > >> From: "Heather Stoller" < heather.stoller at gmail.com > > > > >> To: swift-user at ci.uchicago.edu > > > >> Sent: Thursday, April 5, 2012 9:37:33 AM > > > >> Subject: [Swift-user] MODIS demo > > > >> Hello, > > > >> > > > >> I'm a UC student working with Mike Wilde doing some Swift stuff > > > >> - > > > >> at > > > >> present, I'm trying to run the demo to see what can be seen. I > > > >> get: > > > >> > > > >> ^Cheather at ubuntu:~/modis$ ./demo.local urban 10 > > > >> runid=modis-2012.0405.0704-urban-10-10 > > > >> Swift 0.93 swift-r5483 cog-r3339 > > > >> > > > >> RunID: 20120405-0704-drh1g0ob > > > >> (input): found 0 files > > > >> Progress: time: Thu, 05 Apr 2012 07:04:48 -0700 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:49 -0700 Stage in:19 > > > >> Submitting:1 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:50 -0700 Stage in:13 > > > >> Submitting:1 Active:6 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:53 -0700 Stage in:10 > > > >> Submitting:2 Submitted:2 Active:6 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:54 -0700 Stage in:6 > > > >> Submitting:1 Submitted:2 Active:9 Checking status:2 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:55 -0700 Stage in:2 > > > >> Submitting:2 Submitted:1 Active:9 Checking status:3 Stage out:3 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:57 -0700 Submitting:1 > > > >> Submitted:2 Active:9 Checking status:1 Stage out:7 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:58 -0700 Active:3 > > > >> Checking > > > >> status:2 Stage out:7 Finished successfully:7 Failed but can > > > >> retry:1 > > > >> Progress: time: Thu, 05 Apr 2012 07:04:59 -0700 Stage in:2 > > > >> Submitting:2 Submitted:1 Active:3 Stage out:1 Finished > > > >> successfully:10 > > > >> Failed but can retry:2 > > > >> Progress: time: Thu, 05 Apr 2012 07:05:00 -0700 Active:10 Stage > > > >> out:1 > > > >> Finished successfully:10 > > > >> Progress: time: Thu, 05 Apr 2012 07:05:01 -0700 Active:9 > > > >> Checking > > > >> status:2 Finished successfully:11 > > > >> Progress: time: Thu, 05 Apr 2012 07:05:02 -0700 Submitting:1 > > > >> Active:3 > > > >> Checking status:3 Stage out:2 Finished successfully:11 Failed > > > >> but > > > >> can > > > >> retry:2 > > > >> Progress: time: Thu, 05 Apr 2012 07:05:03 -0700 Stage in:2 > > > >> Active:6 > > > >> Stage out:1 Finished successfully:11 Failed but can retry:2 > > > >> Execution failed: > > > >> Progress: time: Thu, 05 Apr 2012 07:05:04 -0700 Active:7 > > > >> Checking > > > >> status:2 Stage out:1 Failed:1 Finished successfully:11 > > > >> Progress: time: Thu, 05 Apr 2012 07:05:05 -0700 Active:3 > > > >> Checking > > > >> status:1 Stage out:3 Failed:4 Finished successfully:11 > > > >> Progress: time: Thu, 05 Apr 2012 07:05:06 -0700 Failed:11 > > > >> Finished > > > >> successfully:11 > > > >> > > > >> lots of "failed but can retry". Does this look right? > > > >> > > > >> -- From heather.stoller at gmail.com Wed May 9 19:00:58 2012 From: heather.stoller at gmail.com (Heather Stoller) Date: Wed, 9 May 2012 19:00:58 -0500 Subject: [Swift-user] MODIS demo In-Reply-To: <0C6FD5BA-1318-48F3-8781-A943452F4159@mcs.anl.gov> References: <1811174490.39223.1336607651689.JavaMail.root@zimbra-mb2.anl.gov> <0C6FD5BA-1318-48F3-8781-A943452F4159@mcs.anl.gov> Message-ID: Ah yes, imagemagick. No I do not. I shall do that. Good idea. Thank you, Jonathan. I predict that half of the jobs will work once I do that. I seem to remember something of the sort from about a month ago. Thank you! Heather On Wed, May 9, 2012 at 6:58 PM, Jonathan Monette wrote: > Do you have imagemagick installed on your personal computer and in you > PATH? > > > On May 9, 2012, at 18:56, Heather Stoller > wrote: > > Hi David, > > Thank you - this is on my local machine. An ls from home gives: > > hstoller at ubuntu:~$ ls > Desktop Documents Downloads modis Music Pictures Public swift-0.93 > swiftwork Templates Videos > > --Heather > > On Wed, May 9, 2012 at 4:54 PM, David Kelly wrote: > >> Thanks - which machine are you testing this on? >> >> ----- Original Message ----- >> > From: "Heather Stoller" >> > To: "David Kelly" >> > Cc: "Jonathan Monette" , swift-user at ci.uchicago.edu >> > Sent: Wednesday, May 9, 2012 6:46:24 PM >> > Subject: Re: [Swift-user] MODIS demo >> > Hi David, >> > >> > Here are the contents of my tc.local file. Jonathan, I have duly >> > checked that lazy.errors = false. Thank you both! >> > >> > # site transformation path obsolete fields for compatibility >> > >> > localhost echo /bin/echo null null null >> > localhost cat /bin/cat null null null >> > localhost ls /bin/ls null null null >> > localhost grep /bin/grep null null null >> > localhost sort /bin/sort null null null >> > localhost paste /bin/paste null null null >> > localhost pwd /bin/pwd null null null >> > >> > # For cluster usage >> > >> > #pbs getlanduse /home/hstoller/swift/demo/modis/bin/getlanduse.sh null >> > null null >> > #pbs analyzelanduse >> > /home/hstoller/swift/demo/modis/bin/analyzelanduse.sh null null null >> > #pbs colormodis /home/hstoller/swift/demo/modis/bin/colormodis.sh null >> > null null >> > #pbs assemble /home/hstoller/swift/demo/modis/bin/assemble.sh null >> > null null >> > >> > # For localhost testing >> > >> > localhost getlanduse /home/hstoller/modis/bin/getlanduse.sh null null >> > null >> > localhost analyzelanduse /home/hstoller/modis/bin/analyzelanduse2.sh >> > null null null >> > localhost colormodis /home/hstoller/modis/bin/colormodis.sh null null >> > null >> > localhost assemble /home/hstoller/modis/bin/assemble2.sh null null >> > null >> > localhost markmap /home/hstoller/modis/bin/markmap.sh null null null >> > >> > >> > >> > On Wed, May 9, 2012 at 3:56 PM, David Kelly < davidk at ci.uchicago.edu > >> > wrote: >> > >> > >> > Heather, >> > >> > Can you please paste the contents of your tc.local file? >> > >> > Thanks, >> > >> > David >> > >> > ----- Original Message ----- >> > > From: "Heather Stoller" < heather.stoller at gmail.com > >> > > To: swift-user at ci.uchicago.edu >> > >> > >> > > Cc: "David Kelly" < davidk at ci.uchicago.edu >, "Jonathan Monette" < >> > > jonmon at mcs.anl.gov > >> > > Sent: Wednesday, May 9, 2012 5:30:40 PM >> > > Subject: Re: [Swift-user] MODIS demo >> > > Hello Swift Group, coming back to MODIS: >> > > >> > > I followed Jonathan's advice to set retries to 0 and looked at the >> > > wrapper section in the files in the *.d directory as per David's >> > > advice and now get an error right away, as follows: >> > > hstoller at ubuntu:~/modis$ ./demo.local urban 4 >> > > runid=modis-2012.0509.1527-urban-4-10 >> > > Swift 0.93 swift-r5483 cog-r3339 >> > > >> > > RunID: 20120509-1527-bsxjpz4g >> > > (input): found 0 files >> > > Progress: time: Wed, 09 May 2012 15:27:53 -0700 >> > > Progress: time: Wed, 09 May 2012 15:27:54 -0700 Submitting:7 >> > > Submitted:1 >> > > Progress: time: Wed, 09 May 2012 15:27:55 -0700 Active:7 Checking >> > > status:1 >> > > Progress: time: Wed, 09 May 2012 15:27:56 -0700 Stage in:1 Checking >> > > status:3 Stage out:1 Finished successfully:4 >> > > Execution failed: >> > > Progress: time: Wed, 09 May 2012 15:27:57 -0700 Checking status:1 >> > > Failed:4 Finished successfully:4 >> > > File not found: >> > > >> /home/hstoller/swiftwork/modis-20120509-1527-bsxjpz4g/shared/landuse/h15v05.color.png >> > > Progress: time: Wed, 09 May 2012 15:28:00 -0700 Initializing:1 >> > > Failed:4 Finished successfully:5 >> > > >> > > I think my question is, where should that shared/landuse directory >> > > be >> > > getting the missing file? >> > > >> > > Thank you for your help! >> > > >> > > Sincerely, >> > > Heather >> > > >> > > >> > > >> > > >> > > On Fri, Apr 13, 2012 at 10:17 PM, Jonathan Monette < >> > > jonmon at mcs.anl.gov > wrote: >> > > >> > > >> > > I echo David's suggestion but would like to add another. It looks >> > > like >> > > soft error handling is being used which may not be the best approach >> > > when exploring Swift. In the config file you should set the retry >> > > count to 0 and set lazy.errors=false. This will cause Swift to fail >> > > as >> > > soon as the first error is encountered and will provide an error >> > > message. This is useful for when you are exploring Swift behavior. >> > > >> > > >> > > >> > > On Apr 13, 2012, at 23:17, David Kelly < davidk at ci.uchicago.edu > >> > > wrote: >> > > >> > > > Heather, >> > > > >> > > > You might want to check the path names in the tc.local file. Since >> > > > about half the tasks fail, I'm guessing either colormodis or >> > > > getlanduse is pointing to the wrong place. You can verify this by >> > > > looking at the directory called modis-2012.d. In there >> > > > is >> > > > a list of files that end with -info. Look at the "Wrapper" section >> > > > of these files and you should find more information about what is >> > > > causing the failures. >> > > > >> > > > David >> > > > >> > > > ----- Original Message ----- >> > > >> From: "Heather Stoller" < heather.stoller at gmail.com > >> > > >> To: swift-user at ci.uchicago.edu >> > > >> Sent: Thursday, April 5, 2012 9:37:33 AM >> > > >> Subject: [Swift-user] MODIS demo >> > > >> Hello, >> > > >> >> > > >> I'm a UC student working with Mike Wilde doing some Swift stuff - >> > > >> at >> > > >> present, I'm trying to run the demo to see what can be seen. I >> > > >> get: >> > > >> >> > > >> ^Cheather at ubuntu:~/modis$ ./demo.local urban 10 >> > > >> runid=modis-2012.0405.0704-urban-10-10 >> > > >> Swift 0.93 swift-r5483 cog-r3339 >> > > >> >> > > >> RunID: 20120405-0704-drh1g0ob >> > > >> (input): found 0 files >> > > >> Progress: time: Thu, 05 Apr 2012 07:04:48 -0700 >> > > >> Progress: time: Thu, 05 Apr 2012 07:04:49 -0700 Stage in:19 >> > > >> Submitting:1 >> > > >> Progress: time: Thu, 05 Apr 2012 07:04:50 -0700 Stage in:13 >> > > >> Submitting:1 Active:6 >> > > >> Progress: time: Thu, 05 Apr 2012 07:04:53 -0700 Stage in:10 >> > > >> Submitting:2 Submitted:2 Active:6 >> > > >> Progress: time: Thu, 05 Apr 2012 07:04:54 -0700 Stage in:6 >> > > >> Submitting:1 Submitted:2 Active:9 Checking status:2 >> > > >> Progress: time: Thu, 05 Apr 2012 07:04:55 -0700 Stage in:2 >> > > >> Submitting:2 Submitted:1 Active:9 Checking status:3 Stage out:3 >> > > >> Progress: time: Thu, 05 Apr 2012 07:04:57 -0700 Submitting:1 >> > > >> Submitted:2 Active:9 Checking status:1 Stage out:7 >> > > >> Progress: time: Thu, 05 Apr 2012 07:04:58 -0700 Active:3 Checking >> > > >> status:2 Stage out:7 Finished successfully:7 Failed but can >> > > >> retry:1 >> > > >> Progress: time: Thu, 05 Apr 2012 07:04:59 -0700 Stage in:2 >> > > >> Submitting:2 Submitted:1 Active:3 Stage out:1 Finished >> > > >> successfully:10 >> > > >> Failed but can retry:2 >> > > >> Progress: time: Thu, 05 Apr 2012 07:05:00 -0700 Active:10 Stage >> > > >> out:1 >> > > >> Finished successfully:10 >> > > >> Progress: time: Thu, 05 Apr 2012 07:05:01 -0700 Active:9 Checking >> > > >> status:2 Finished successfully:11 >> > > >> Progress: time: Thu, 05 Apr 2012 07:05:02 -0700 Submitting:1 >> > > >> Active:3 >> > > >> Checking status:3 Stage out:2 Finished successfully:11 Failed but >> > > >> can >> > > >> retry:2 >> > > >> Progress: time: Thu, 05 Apr 2012 07:05:03 -0700 Stage in:2 >> > > >> Active:6 >> > > >> Stage out:1 Finished successfully:11 Failed but can retry:2 >> > > >> Execution failed: >> > > >> Progress: time: Thu, 05 Apr 2012 07:05:04 -0700 Active:7 Checking >> > > >> status:2 Stage out:1 Failed:1 Finished successfully:11 >> > > >> Progress: time: Thu, 05 Apr 2012 07:05:05 -0700 Active:3 Checking >> > > >> status:1 Stage out:3 Failed:4 Finished successfully:11 >> > > >> Progress: time: Thu, 05 Apr 2012 07:05:06 -0700 Failed:11 >> > > >> Finished >> > > >> successfully:11 >> > > >> >> > > >> lots of "failed but can retry". Does this look right? >> > > >> >> > > >> -- >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From heather.stoller at gmail.com Wed May 9 19:13:19 2012 From: heather.stoller at gmail.com (Heather Stoller) Date: Wed, 9 May 2012 19:13:19 -0500 Subject: [Swift-user] MODIS demo In-Reply-To: References: <1811174490.39223.1336607651689.JavaMail.root@zimbra-mb2.anl.gov> <0C6FD5BA-1318-48F3-8781-A943452F4159@mcs.anl.gov> Message-ID: Thank you, David and Jonathan! It's looking good now: I've run ./demo.local urban -4 and ./demo.local urban -34 successfully. Thank you for your help! --Heather On Wed, May 9, 2012 at 7:00 PM, Heather Stoller wrote: > Ah yes, imagemagick. No I do not. I shall do that. Good idea. Thank > you, Jonathan. > > I predict that half of the jobs will work once I do that. I seem to > remember something of the sort from about a month ago. > > Thank you! > > Heather > > > On Wed, May 9, 2012 at 6:58 PM, Jonathan Monette wrote: > >> Do you have imagemagick installed on your personal computer and in you >> PATH? >> >> >> On May 9, 2012, at 18:56, Heather Stoller >> wrote: >> >> Hi David, >> >> Thank you - this is on my local machine. An ls from home gives: >> >> hstoller at ubuntu:~$ ls >> Desktop Documents Downloads modis Music Pictures Public >> swift-0.93 swiftwork Templates Videos >> >> --Heather >> >> On Wed, May 9, 2012 at 4:54 PM, David Kelly wrote: >> >>> Thanks - which machine are you testing this on? >>> >>> ----- Original Message ----- >>> > From: "Heather Stoller" >>> > To: "David Kelly" >>> > Cc: "Jonathan Monette" , >>> swift-user at ci.uchicago.edu >>> > Sent: Wednesday, May 9, 2012 6:46:24 PM >>> > Subject: Re: [Swift-user] MODIS demo >>> > Hi David, >>> > >>> > Here are the contents of my tc.local file. Jonathan, I have duly >>> > checked that lazy.errors = false. Thank you both! >>> > >>> > # site transformation path obsolete fields for compatibility >>> > >>> > localhost echo /bin/echo null null null >>> > localhost cat /bin/cat null null null >>> > localhost ls /bin/ls null null null >>> > localhost grep /bin/grep null null null >>> > localhost sort /bin/sort null null null >>> > localhost paste /bin/paste null null null >>> > localhost pwd /bin/pwd null null null >>> > >>> > # For cluster usage >>> > >>> > #pbs getlanduse /home/hstoller/swift/demo/modis/bin/getlanduse.sh null >>> > null null >>> > #pbs analyzelanduse >>> > /home/hstoller/swift/demo/modis/bin/analyzelanduse.sh null null null >>> > #pbs colormodis /home/hstoller/swift/demo/modis/bin/colormodis.sh null >>> > null null >>> > #pbs assemble /home/hstoller/swift/demo/modis/bin/assemble.sh null >>> > null null >>> > >>> > # For localhost testing >>> > >>> > localhost getlanduse /home/hstoller/modis/bin/getlanduse.sh null null >>> > null >>> > localhost analyzelanduse /home/hstoller/modis/bin/analyzelanduse2.sh >>> > null null null >>> > localhost colormodis /home/hstoller/modis/bin/colormodis.sh null null >>> > null >>> > localhost assemble /home/hstoller/modis/bin/assemble2.sh null null >>> > null >>> > localhost markmap /home/hstoller/modis/bin/markmap.sh null null null >>> > >>> > >>> > >>> > On Wed, May 9, 2012 at 3:56 PM, David Kelly < davidk at ci.uchicago.edu > >>> > wrote: >>> > >>> > >>> > Heather, >>> > >>> > Can you please paste the contents of your tc.local file? >>> > >>> > Thanks, >>> > >>> > David >>> > >>> > ----- Original Message ----- >>> > > From: "Heather Stoller" < heather.stoller at gmail.com > >>> > > To: swift-user at ci.uchicago.edu >>> > >>> > >>> > > Cc: "David Kelly" < davidk at ci.uchicago.edu >, "Jonathan Monette" < >>> > > jonmon at mcs.anl.gov > >>> > > Sent: Wednesday, May 9, 2012 5:30:40 PM >>> > > Subject: Re: [Swift-user] MODIS demo >>> > > Hello Swift Group, coming back to MODIS: >>> > > >>> > > I followed Jonathan's advice to set retries to 0 and looked at the >>> > > wrapper section in the files in the *.d directory as per David's >>> > > advice and now get an error right away, as follows: >>> > > hstoller at ubuntu:~/modis$ ./demo.local urban 4 >>> > > runid=modis-2012.0509.1527-urban-4-10 >>> > > Swift 0.93 swift-r5483 cog-r3339 >>> > > >>> > > RunID: 20120509-1527-bsxjpz4g >>> > > (input): found 0 files >>> > > Progress: time: Wed, 09 May 2012 15:27:53 -0700 >>> > > Progress: time: Wed, 09 May 2012 15:27:54 -0700 Submitting:7 >>> > > Submitted:1 >>> > > Progress: time: Wed, 09 May 2012 15:27:55 -0700 Active:7 Checking >>> > > status:1 >>> > > Progress: time: Wed, 09 May 2012 15:27:56 -0700 Stage in:1 Checking >>> > > status:3 Stage out:1 Finished successfully:4 >>> > > Execution failed: >>> > > Progress: time: Wed, 09 May 2012 15:27:57 -0700 Checking status:1 >>> > > Failed:4 Finished successfully:4 >>> > > File not found: >>> > > >>> /home/hstoller/swiftwork/modis-20120509-1527-bsxjpz4g/shared/landuse/h15v05.color.png >>> > > Progress: time: Wed, 09 May 2012 15:28:00 -0700 Initializing:1 >>> > > Failed:4 Finished successfully:5 >>> > > >>> > > I think my question is, where should that shared/landuse directory >>> > > be >>> > > getting the missing file? >>> > > >>> > > Thank you for your help! >>> > > >>> > > Sincerely, >>> > > Heather >>> > > >>> > > >>> > > >>> > > >>> > > On Fri, Apr 13, 2012 at 10:17 PM, Jonathan Monette < >>> > > jonmon at mcs.anl.gov > wrote: >>> > > >>> > > >>> > > I echo David's suggestion but would like to add another. It looks >>> > > like >>> > > soft error handling is being used which may not be the best approach >>> > > when exploring Swift. In the config file you should set the retry >>> > > count to 0 and set lazy.errors=false. This will cause Swift to fail >>> > > as >>> > > soon as the first error is encountered and will provide an error >>> > > message. This is useful for when you are exploring Swift behavior. >>> > > >>> > > >>> > > >>> > > On Apr 13, 2012, at 23:17, David Kelly < davidk at ci.uchicago.edu > >>> > > wrote: >>> > > >>> > > > Heather, >>> > > > >>> > > > You might want to check the path names in the tc.local file. Since >>> > > > about half the tasks fail, I'm guessing either colormodis or >>> > > > getlanduse is pointing to the wrong place. You can verify this by >>> > > > looking at the directory called modis-2012.d. In there >>> > > > is >>> > > > a list of files that end with -info. Look at the "Wrapper" section >>> > > > of these files and you should find more information about what is >>> > > > causing the failures. >>> > > > >>> > > > David >>> > > > >>> > > > ----- Original Message ----- >>> > > >> From: "Heather Stoller" < heather.stoller at gmail.com > >>> > > >> To: swift-user at ci.uchicago.edu >>> > > >> Sent: Thursday, April 5, 2012 9:37:33 AM >>> > > >> Subject: [Swift-user] MODIS demo >>> > > >> Hello, >>> > > >> >>> > > >> I'm a UC student working with Mike Wilde doing some Swift stuff - >>> > > >> at >>> > > >> present, I'm trying to run the demo to see what can be seen. I >>> > > >> get: >>> > > >> >>> > > >> ^Cheather at ubuntu:~/modis$ ./demo.local urban 10 >>> > > >> runid=modis-2012.0405.0704-urban-10-10 >>> > > >> Swift 0.93 swift-r5483 cog-r3339 >>> > > >> >>> > > >> RunID: 20120405-0704-drh1g0ob >>> > > >> (input): found 0 files >>> > > >> Progress: time: Thu, 05 Apr 2012 07:04:48 -0700 >>> > > >> Progress: time: Thu, 05 Apr 2012 07:04:49 -0700 Stage in:19 >>> > > >> Submitting:1 >>> > > >> Progress: time: Thu, 05 Apr 2012 07:04:50 -0700 Stage in:13 >>> > > >> Submitting:1 Active:6 >>> > > >> Progress: time: Thu, 05 Apr 2012 07:04:53 -0700 Stage in:10 >>> > > >> Submitting:2 Submitted:2 Active:6 >>> > > >> Progress: time: Thu, 05 Apr 2012 07:04:54 -0700 Stage in:6 >>> > > >> Submitting:1 Submitted:2 Active:9 Checking status:2 >>> > > >> Progress: time: Thu, 05 Apr 2012 07:04:55 -0700 Stage in:2 >>> > > >> Submitting:2 Submitted:1 Active:9 Checking status:3 Stage out:3 >>> > > >> Progress: time: Thu, 05 Apr 2012 07:04:57 -0700 Submitting:1 >>> > > >> Submitted:2 Active:9 Checking status:1 Stage out:7 >>> > > >> Progress: time: Thu, 05 Apr 2012 07:04:58 -0700 Active:3 Checking >>> > > >> status:2 Stage out:7 Finished successfully:7 Failed but can >>> > > >> retry:1 >>> > > >> Progress: time: Thu, 05 Apr 2012 07:04:59 -0700 Stage in:2 >>> > > >> Submitting:2 Submitted:1 Active:3 Stage out:1 Finished >>> > > >> successfully:10 >>> > > >> Failed but can retry:2 >>> > > >> Progress: time: Thu, 05 Apr 2012 07:05:00 -0700 Active:10 Stage >>> > > >> out:1 >>> > > >> Finished successfully:10 >>> > > >> Progress: time: Thu, 05 Apr 2012 07:05:01 -0700 Active:9 Checking >>> > > >> status:2 Finished successfully:11 >>> > > >> Progress: time: Thu, 05 Apr 2012 07:05:02 -0700 Submitting:1 >>> > > >> Active:3 >>> > > >> Checking status:3 Stage out:2 Finished successfully:11 Failed but >>> > > >> can >>> > > >> retry:2 >>> > > >> Progress: time: Thu, 05 Apr 2012 07:05:03 -0700 Stage in:2 >>> > > >> Active:6 >>> > > >> Stage out:1 Finished successfully:11 Failed but can retry:2 >>> > > >> Execution failed: >>> > > >> Progress: time: Thu, 05 Apr 2012 07:05:04 -0700 Active:7 Checking >>> > > >> status:2 Stage out:1 Failed:1 Finished successfully:11 >>> > > >> Progress: time: Thu, 05 Apr 2012 07:05:05 -0700 Active:3 Checking >>> > > >> status:1 Stage out:3 Failed:4 Finished successfully:11 >>> > > >> Progress: time: Thu, 05 Apr 2012 07:05:06 -0700 Failed:11 >>> > > >> Finished >>> > > >> successfully:11 >>> > > >> >>> > > >> lots of "failed but can retry". Does this look right? >>> > > >> >>> > > >> -- >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Fri May 11 03:42:21 2012 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 11 May 2012 03:42:21 -0500 Subject: [Swift-user] vim syntax file update Message-ID: Hi, I moved the Swift-vim vim syntax file to my github repo so it can be easily installed w/ a vim plugin manager like Vundle or vim-update-bundles. https://github.com/aespinosa/Swift-vim Original post: http://lists.ci.uchicago.edu/pipermail/swift-user/2010-March/001436.html Allan From ketancmaheshwari at gmail.com Fri May 11 08:51:42 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Fri, 11 May 2012 09:51:42 -0400 Subject: [Swift-user] SwiftR Message-ID: Hi, I am trying to get started with SwiftR on my ubuntu box. I installed and tested SwiftR successfully : > library(Swift) > basicSwiftTest() Working in /tmp/ketan/SwiftR/swift.SsHn Running in /tmp/ketan/SwiftR/swift.SsHn (linked to /tmp/ketan/SwiftR/swift.local) project=NONE cores=2 nodes=1 queue=NONE server=local Started worker manager with pid 10837 *** Starting test 1.1 *** Test of local do.call(sumstuff) local result= [1] 4505 Test of swiftapply(sumstuff,arglist) swiftapply to 1 arg lists. Swift properties: server = callsperbatch = 1 runmode = service tmpdir = /tmp workerhosts = localhost initialexpr = 1 Swift request files written to: /tmp/ketan/SwiftR/requests.P10826/R0000000 Removing /tmp/ketan/SwiftR/requests.P10826/R0000000 Swift result: =============== However, when trying to use a SwiftR example shown on http://wiki.ci.uchicago.edu/SWFT/SwiftR, I am getting errors: > add = function(x,y) {return (x*y);} > swiftapply(add, list( list(2,3), list(3,4))) Information about current server of type not found swiftapply to 2 arg lists. Swift properties: server = callsperbatch = 1 runmode = service tmpdir = /tmp workerhosts = localhost initialexpr = 2 Swift request files written to: /tmp/ketan/SwiftR/requests.P10826/R0000002 Error in getWorkerDir(server) : No SwiftR servers launched within R and no server type specified, can't identify a likely location for a swiftR service =========== In learning about SwiftR, my goal is to test its 'infinite' loop and streams functionality. Does some examples demonstrating these features exist for me to try on my local machine? Regards, -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Fri May 11 09:48:35 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 11 May 2012 09:48:35 -0500 (CDT) Subject: [Swift-user] SwiftR In-Reply-To: Message-ID: <1926643501.16387.1336747715070.JavaMail.root@zimbra.anl.gov> Ketan, Until we get a chance to look into SwiftR, you can probably find what you need simply by locating and adapting the SwiftR server loop. Thats a .swift file embedded in the SwiftR code; you'll recognize it by the infinite iterate() loop which reads commands on an input FIFO (using readData() I think). The loop exits when it gets a special message from the input FIFO. Thats all you really need from SwiftR at this point, I think. - Mike ----- Original Message ----- > From: "Ketan Maheshwari" > To: "Swift User" > Sent: Friday, May 11, 2012 8:51:42 AM > Subject: [Swift-user] SwiftR > Hi, > > > I am trying to get started with SwiftR on my ubuntu box. > > > I installed and tested SwiftR successfully : > > > > > library(Swift) > > > > basicSwiftTest() > Working in /tmp/ketan/SwiftR/swift.SsHn > Running in /tmp/ketan/SwiftR/swift.SsHn (linked to > /tmp/ketan/SwiftR/swift.local) > project=NONE cores=2 nodes=1 queue=NONE server=local > Started worker manager with pid 10837 > > > *** Starting test 1.1 *** > > > Test of local do.call(sumstuff) > local result= > [1] 4505 > > > Test of swiftapply(sumstuff,arglist) > > > swiftapply to 1 arg lists. > > > Swift properties: > server = > callsperbatch = 1 > runmode = service > tmpdir = /tmp > workerhosts = localhost > initialexpr = > > > 1 Swift request files written to: > /tmp/ketan/SwiftR/requests.P10826/R0000000 > Removing /tmp/ketan/SwiftR/requests.P10826/R0000000 > Swift result: > > > =============== > > > However, when trying to use a SwiftR example shown on > http://wiki.ci.uchicago.edu/SWFT/SwiftR , I am getting errors: > > > > > add = function(x,y) {return (x*y);} > > swiftapply(add, list( list(2,3), list(3,4))) > Information about current server of type not found > swiftapply to 2 arg lists. > > > Swift properties: > server = > callsperbatch = 1 > runmode = service > tmpdir = /tmp > workerhosts = localhost > initialexpr = > > > 2 Swift request files written to: > /tmp/ketan/SwiftR/requests.P10826/R0000002 > Error in getWorkerDir(server) : > No SwiftR servers launched within R and no server type specified, > can't identify a likely location for a swiftR service > =========== > > > In learning about SwiftR, my goal is to test its 'infinite' loop and > streams functionality. Does some examples demonstrating these features > exist for me to try on my local machine? > > > > > Regards, -- > Ketan > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Wed May 16 14:41:58 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 16 May 2012 15:41:58 -0400 Subject: [Swift-user] swift hangs on local Message-ID: I am trying to run the GE Energy stuff through Swift but seems like Swift hangs without progress. Details below: Inputs: a control file Binary: marsMain expected outputs: mars.ot* files and a .bin file Needs : LIC and 3 .in files to be present in the pwd. The directory is on mcs workstations: ~ketan/ketan_mars. Standalone, it runs as follows: ./marsMain ctlfiles/mars.ctl The swift file is mars.swift log for a hanged run is in the directory: mars-20120516-1434-1xfq9zuc.log stdout looks like this: [steamroller:ketan_mars]$ swift -config cf -tc.file tc -sites.file sites.xml mars.swift Swift 0.93 swift-r5520 cog-r3338 (cog modified locally) RunID: 20120516-1440-u3srmzyd Progress: time: Wed, 16 May 2012 14:40:30 -0500 (input): found 1 files (input): found 3 files No events in 10s. Registered futures: file[] res Open, 0 elements, 1 listeners ---- Waiting threads: 0-7-0-3 ---- ======= swift.workdir is not created. Clues? -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonmon at mcs.anl.gov Wed May 16 14:46:53 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Wed, 16 May 2012 14:46:53 -0500 Subject: [Swift-user] swift hangs on local In-Reply-To: References: Message-ID: <4F681905-6E60-4D30-9C6D-9996BB1C92AD@mcs.anl.gov> Are the paths in your tc correct? On May 16, 2012, at 14:41, Ketan Maheshwari wrote: > I am trying to run the GE Energy stuff through Swift but seems like Swift hangs without progress. Details below: > > Inputs: a control file > Binary: marsMain > expected outputs: mars.ot* files and a .bin file > > Needs : LIC and 3 .in files to be present in the pwd. > > The directory is on mcs workstations: ~ketan/ketan_mars. > > Standalone, it runs as follows: > > ./marsMain ctlfiles/mars.ctl > > The swift file is mars.swift > > log for a hanged run is in the directory: mars-20120516-1434-1xfq9zuc.log > > stdout looks like this: > > [steamroller:ketan_mars]$ swift -config cf -tc.file tc -sites.file sites.xml mars.swift > Swift 0.93 swift-r5520 cog-r3338 (cog modified locally) > > RunID: 20120516-1440-u3srmzyd > Progress: time: Wed, 16 May 2012 14:40:30 -0500 > (input): found 1 files > (input): found 3 files > No events in 10s. > > Registered futures: > file[] res Open, 0 elements, 1 listeners > ---- > > Waiting threads: > 0-7-0-3 > ---- > > ======= > > > swift.workdir is not created. > > Clues? > -- > Ketan > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Wed May 16 14:50:20 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 16 May 2012 15:50:20 -0400 Subject: [Swift-user] swift hangs on local In-Reply-To: <4F681905-6E60-4D30-9C6D-9996BB1C92AD@mcs.anl.gov> References: <4F681905-6E60-4D30-9C6D-9996BB1C92AD@mcs.anl.gov> Message-ID: On Wed, May 16, 2012 at 3:46 PM, Jonathan Monette wrote: > Are the paths in your tc correct? > yes. > > > On May 16, 2012, at 14:41, Ketan Maheshwari > wrote: > > I am trying to run the GE Energy stuff through Swift but seems like Swift > hangs without progress. Details below: > > Inputs: a control file > Binary: marsMain > expected outputs: mars.ot* files and a .bin file > > Needs : LIC and 3 .in files to be present in the pwd. > > The directory is on mcs workstations: ~ketan/ketan_mars. > > Standalone, it runs as follows: > > ./marsMain ctlfiles/mars.ctl > > The swift file is mars.swift > > log for a hanged run is in the directory: mars-20120516-1434-1xfq9zuc.log > > stdout looks like this: > > [steamroller:ketan_mars]$ swift -config cf -tc.file tc -sites.file > sites.xml mars.swift > Swift 0.93 swift-r5520 cog-r3338 (cog modified locally) > > RunID: 20120516-1440-u3srmzyd > Progress: time: Wed, 16 May 2012 14:40:30 -0500 > (input): found 1 files > (input): found 3 files > No events in 10s. > > Registered futures: > file[] res Open, 0 elements, 1 listeners > ---- > > Waiting threads: > 0-7-0-3 > ---- > > ======= > > > swift.workdir is not created. > > Clues? > -- > Ketan > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Wed May 16 15:01:16 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 16 May 2012 15:01:16 -0500 (CDT) Subject: [Swift-user] swift hangs on local In-Reply-To: Message-ID: <508210603.3907.1337198476073.JavaMail.root@zimbra.anl.gov> Its possibly hanging because Swift cant determine the size of the resulting res[] array: --- app (file o, file _res[]) mars (file _ctl, file lic, file _inp[]) { mars @_ctl stdout=@o; } ... foreach ctlfile, i in ctl { file res[]; (out[i], res) = mars(ctlfile, licence, inp); } --- You mapped it with a "dynamic" output mapper (vs. a static output mapper like the ext mapper). A dynamic mapper will map each element of an array on demand, as its added to an array. In your script, however, the array would get set by the app() function mares(), but Swift has no way of knowing how many elements mars() is placing in that array. Is that a number you can know up-front, and set it in an ext mapper? There is no way I know of in Swift to have an app function return a dynamically-determined number of elements in an output array. Im not even certain that returning an output array works. If it truly needs to be dynamic, can you return a single file of filenames, and then use readData or other mappers to map that to a new array of output files? In any case, try for a first step just removing the res[] argument and array, and see if the rest of your data flow works. - Mike ----- Original Message ----- > From: "Ketan Maheshwari" > To: "Swift User" > Sent: Wednesday, May 16, 2012 2:41:58 PM > Subject: [Swift-user] swift hangs on local > I am trying to run the GE Energy stuff through Swift but seems like > Swift hangs without progress. Details below: > > > Inputs: a control file > Binary: marsMain > expected outputs: mars.ot* files and a .bin file > > > Needs : LIC and 3 .in files to be present in the pwd. > > > The directory is on mcs workstations: ~ketan/ketan_mars. > > > Standalone, it runs as follows: > > > ./marsMain ctlfiles/mars.ctl > > > The swift file is mars.swift > > > log for a hanged run is in the directory: > mars-20120516-1434-1xfq9zuc.log > > > stdout looks like this: > > > > [steamroller:ketan_mars]$ swift -config cf -tc.file tc -sites.file > sites.xml mars.swift > Swift 0.93 swift-r5520 cog-r3338 (cog modified locally) > > > RunID: 20120516-1440-u3srmzyd > Progress: time: Wed, 16 May 2012 14:40:30 -0500 > (input): found 1 files > (input): found 3 files > No events in 10s. > > > Registered futures: > file[] res Open, 0 elements, 1 listeners > ---- > > > Waiting threads: > 0-7-0-3 > ---- > > > > ======= > > > > > swift.workdir is not created. > > > Clues? -- > Ketan > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Wed May 16 22:06:09 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 16 May 2012 23:06:09 -0400 Subject: [Swift-user] swift hangs on local In-Reply-To: <508210603.3907.1337198476073.JavaMail.root@zimbra.anl.gov> References: <508210603.3907.1337198476073.JavaMail.root@zimbra.anl.gov> Message-ID: Thanks Mike! Replaced the output mapper with ext mapper and it works now. On Wed, May 16, 2012 at 4:01 PM, Michael Wilde wrote: > Its possibly hanging because Swift cant determine the size of the > resulting res[] array: > --- > app (file o, file _res[]) mars (file _ctl, file lic, file _inp[]) > { > mars @_ctl stdout=@o; > } > > ... > foreach ctlfile, i in ctl { > file res[] suffix=".ot*">; > (out[i], res) = mars(ctlfile, licence, inp); > } > --- > > You mapped it with a "dynamic" output mapper (vs. a static output mapper > like the ext mapper). A dynamic mapper will map each element of an array on > demand, as its added to an array. In your script, however, the array would > get set by the app() function mares(), but Swift has no way of knowing how > many elements mars() is placing in that array. Is that a number you can > know up-front, and set it in an ext mapper? There is no way I know of in > Swift to have an app function return a dynamically-determined number of > elements in an output array. Im not even certain that returning an output > array works. If it truly needs to be dynamic, can you return a single file > of filenames, and then use readData or other mappers to map that to a new > array of output files? > > In any case, try for a first step just removing the res[] argument and > array, and see if the rest of your data flow works. > > - Mike > > > ----- Original Message ----- > > From: "Ketan Maheshwari" > > To: "Swift User" > > Sent: Wednesday, May 16, 2012 2:41:58 PM > > Subject: [Swift-user] swift hangs on local > > I am trying to run the GE Energy stuff through Swift but seems like > > Swift hangs without progress. Details below: > > > > > > Inputs: a control file > > Binary: marsMain > > expected outputs: mars.ot* files and a .bin file > > > > > > Needs : LIC and 3 .in files to be present in the pwd. > > > > > > The directory is on mcs workstations: ~ketan/ketan_mars. > > > > > > Standalone, it runs as follows: > > > > > > ./marsMain ctlfiles/mars.ctl > > > > > > The swift file is mars.swift > > > > > > log for a hanged run is in the directory: > > mars-20120516-1434-1xfq9zuc.log > > > > > > stdout looks like this: > > > > > > > > [steamroller:ketan_mars]$ swift -config cf -tc.file tc -sites.file > > sites.xml mars.swift > > Swift 0.93 swift-r5520 cog-r3338 (cog modified locally) > > > > > > RunID: 20120516-1440-u3srmzyd > > Progress: time: Wed, 16 May 2012 14:40:30 -0500 > > (input): found 1 files > > (input): found 3 files > > No events in 10s. > > > > > > Registered futures: > > file[] res Open, 0 elements, 1 listeners > > ---- > > > > > > Waiting threads: > > 0-7-0-3 > > ---- > > > > > > > > ======= > > > > > > > > > > swift.workdir is not created. > > > > > > Clues? -- > > Ketan > > > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.iit.edu Sat May 19 08:47:38 2012 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Sat, 19 May 2012 08:47:38 -0500 Subject: [Swift-user] Call for Participation: ACM HPDC 2012 -- Early registration deadline May 25th Message-ID: <4FB7A47A.7020109@cs.iit.edu> Call for Participation http://www.hpdc.org/2012/ The organizing committee is delighted to invite you to *HPDC'12*, the /21st International ACM Symposium on High-Performance Parallel and Distributed Computing/, to be held in *Delft, the Netherlands*, which is a historic, picturesque city that is less than one hour away from Amsterdam-Schiphol airport. HPDC is the premier annual conference on the design, the implementation, the evaluation, and the use of parallel and distributed systems for high-end computing. HPDC is sponsored by SIGARCH, the Special Interest Group on Computer Architecture of the Association for Computing Machinery . *HPDC'12* will be held at Delft University of Technology , with the main conference taking place on *June 20-22* (Wednesday to Friday 1 PM), and with affiliated workshops on *June 18-19* (Monday and Tuesday). Early registration closes on May 25th, so if you plan on attending, please register now at http://www.hpdc.org/2012/registration/. *Some highlights of the conference:* * *Awards:* o Achievement Award - Ian Foster of the University of Chicago and Argonne National Laboratory, USA * *Keynote Speakers:* o Mihai Budiu of Microsoft Research, Mountain View, USA. Title: Putting "Big-data" to Good Use: Building Kinect o Ricardo Bianchini of Rutgers University, USA. Title: "Leveraging Renewable Energy in Data Centers: Present and Future" * *Accepted Papers:* 1. vSlicer: Latency-aware Virtual Machine Scheduling via Differentiated-frequency CPU Slicing, Cong Xu (Purdue University), Sahan Gamage (Purdue University), Pawan N. Rao (Purdue University), Ardalan Kangarlou (NetApp), Ramana Kompella (Purdue University), Dongyan Xu (Purdue University) 2. Singleton: System-wide Page Deduplication in Virtual Environments, Prateek Sharma, Purushottam Kulkarni (IIT Bombay) 3. Locality-aware Dynamic VM Reconfiguration on MapReduce Clouds, Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng (KAIST) 4. Achieving Application-Centric Performance Targets via Consolidation on Multicores: Myth or Reality?, Lydia Y. Chen Chen (IBM Research Zurich Lab), Danilo Ansaloni (University of Lugano), Evgenia Smirni (College of William and Mary), Akira Yokokawa (University of Lugano), Walter Binder (University of Lugano) 5. Enabling Event Tracing at Leadership-Class Scale through I/O Forwarding Middleware, Thomas Ilsche (Technische Universit?t Dresden), Joseph Schuchart (Technische Universit?t Dresden), Jason Cope (Argonne National Laboratory), Dries Kimpe (Argonne National Laboratory), Terry Jones (Oak Ridge National Laboratory), Andreas Kn?pfer (Technische Universit?t Dresden), Kamil Iskra (Argonne National Laboratory), Robert Ross (Argonne National Laboratory), Wolfgang E. Nagel (Technische Universit?t Dresden), Stephen Poole (Oak Ridge National Laboratory) 6. ISOBAR Hybrid Compression-I/O Interleaving for Large-scale Parallel I/O Optimization, Eric R. Schendel (North Carolina State University), Saurabh V. Pendse (North Carolina State University), John Jenkins (North Carolina State University), David A. Boyuka (North Carolina State University), Zhenhuan Gong (North Carolina State University), Sriram Lakshminarasimhan (North Carolina State University), Qing Liu (Oak Ridge National Laboratory), Scott Klasky (Oak Ridge National Laboratory), Robert Ross (Argonne National Laboratory), Nagiza F. Samatova (North Carolina State University) 7. QBox: Guaranteeing I/O Performance on Black Box Storage Systems, Dimitris Skourtis, Shinpei Kato, Scott Brandt (University of California, Santa Cruz) 8. Towards Efficient Live Migration of I/O Intensive Workloads: A Transparent Storage Transfer Propo, Bogdan Nicolae (INRIA), Franck Cappello (INRIA/UIUC) 9. A Virtual Memory Based Runtime to Support Multi-tenancy in Clusters with GPUs, Michela Becchi (University of Missouri), Kittisak Sajjapongse (University of Missouri), Ian Graves (University of Missouri), Adam Procter (University of Missouri), Vignesh Ravi (Ohio State University), Srimat Chakradhar (NEC Laboratories America) 10. Interference-driven Scheduling and Resource Management for GPU-based Heterogeneous Clusters, Rajat Phull, Cheng-Hong Li, Kunal Rao, Hari Cadambi, Srimat Chakradhar (NEC Laboratories America) 11. Work Stealing and Persistence-based Load Balancers for Iterative Overdecomposed Applications, Jonathan Lifflander (UIUC), Sriram Krishnamoorthy (PNNL), Laxmikant V. Kale (UIUC) 12. Highly Scalable Graph Search for the Graph500 Benchmark, Koji Ueno (Tokyo Institute of Technology/JST CREST), Toyotaro Suzumura (Tokyo Institute of Technology/IBM Research Tokyo/JST CREST) 13. PonD : Dynamic Creation of HTC Pool on Demand Using a Decentralized Resource Discovery System, Kyungyong Lee (University of Florida), David Wolinsky (Yale University), Renato Figueiredo (University of Florida) 14. SpeQuloS: A QoS Service for BoT Applications Using Best Effort Distributed Computing Infrastructures, Simon Delamare (INRIA), Gilles Fedak (INRIA), Derrick Kondo (INRIA), Oleg Lodygensky (IN2P3) 15. Understanding the Effects and Implications of Compute Node Related Failures in Hadoop, Florin Dinu, T. S. Eugene Ng (Rice University) 16. Optimizing MapReduce for GPUs with Effective Shared Memory Usage, Linchuan Chen, Gagan Agrawal (The Ohio State University) 17. CAM: A Topology Aware Minimum Cost Flow Based Resource Manager for MapReduce Applications in the Cloud, Min Li (Virginia Tech), Dinesh Subhraveti (IBM Almaden Research Center), Ali Butt (Virginia Tech), Aleksandr Khasymski (Virginia Tech), Prasenjit Sarkar (IBM Almaden Research Center) 18. Distributed Approximate Spectral Clustering for Large-Scale Datasets, Fei Gao (Simon Fraser University), Wael Abd-Almageed (University of Maryland) 19. Exploring Cross-layer Power Management for PGAS Applications on the SCC Platform, Marc Gamell (Rutgers University), Ivan Rodero (Rutgers University), Manish Parashar (Rutgers University), Rajeev Muralidhar (Intel India) 20. Dynamic Adaptive Virtual Core Mapping to Improve Power, Energy, and Performance in Multi-socket Multicores, Chang Bae (Northwestern University), Lei Xia (Northwestern University), Peter Dinda (Northwestern University), John Lange (University of Pittsburgh) 21. VNET/P: Bridging the Cloud and High Performance Computing Through Fast Overlay Networking, Lei Xia (Northwestern University), Zheng Cui (University of New Mexico), John Lange (University of Pittsburgh), Yuan Tang (UESTC, China), Peter Dinda (Northwestern University), Patrick Bridges (University of New Mexico) 22. Massively-Parallel Stream Processing under QoS Constraints with Nephele, Bj?rn Lohrmann, Daniel Warneke, Odej Kao (Technische Universit?t Berlin) 23. A Resiliency Model for High Performance Infrastructure Based on Logical Encapsulation, James Moore (The University of Southern California/EMC Corporation), Carl Kesselman (The University of Southern California) * *Workshops:* o Astro-HPC: Workshop on High-Performance Computing for Astronomy, Ana Lucia Varbanescu, Rob van Nieuwpoort, and Simon Portegies Zwart o ECMLS: 3rd Int'l Emerging Computational Methods for the Life Sciences Workshop, Carole Goble, Judy Qiu, and Ian Foster o ScienceCloud: 3rd Workshop on Scientific Cloud Computing, Yogesh Simmhan, Gabriel Antoniu, and Carole Goble o DIDC: Fifth Int'l Workshop on Data-Intensive Distributed Computing, Tevfik Kosar and Douglas Thain o MapReduce: The Third Int'l Workshop on MapReduce and its Applications, Gilles Fedak and Geoffrey Fox o VTDC: 6th Int'l Workshop on Virtualization Technologies in Distributed Computing, Fr?d?ric Desprez and Adrien L?bre For more information on the full program, see http://www.hpdc.org/2012/program/conference-program/. Looking forward to seeing you in Delft! Regards, Ioan Raicu -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ ================================================================= ================================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: From clberger at uchicago.edu Sat May 19 16:17:03 2012 From: clberger at uchicago.edu (Carsen Berger) Date: Sat, 19 May 2012 16:17:03 -0500 Subject: [Swift-user] Running coasters on a bag of workstations Message-ID: Hi all, I'm trying to run coasters on a collection of commodity machines and I'm running into some issues. The machines all share a common /home directory on a NFS file system, so to give the coaster workers local storage, I decided to set the WORKER_LOCATION directory to the local /var/tmp on each machine. Running start-coaster-service installs worker.pl in /var/tmp just fine, but then when we try to run a swift script it spits out a slew of "No such file or directory" errors for all the files that the worker is supposed to create. It seems as if coasters cannot write to /var/tmp for whatever reason. Has anyone encountered a problem like this before? Note that setting WORKER_LOCATION to a directory in /home circumvents this problem, but that solution won't work because we need a directory that is local to each machine. Thank you, Carsen Berger -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Sat May 19 17:51:46 2012 From: ketancmaheshwari at gmail.com (Ketan) Date: Sat, 19 May 2012 18:51:46 -0400 Subject: [Swift-user] Running coasters on a bag of workstations In-Reply-To: References: Message-ID: Is it possible /tmp is mounted on /tmp and not /var/tmp .. just a quick guess. On May 19, 2012, at 5:17 PM, Carsen Berger wrote: > Hi all, > > I'm trying to run coasters on a collection of commodity machines and I'm running into some issues. The machines all share a common /home directory on a NFS file system, so to give the coaster workers local storage, I decided to set the WORKER_LOCATION directory to the local /var/tmp on each machine. > > Running start-coaster-service installs worker.pl in /var/tmp just fine, but then when we try to run a swift script it spits out a slew of "No such file or directory" errors for all the files that the worker is supposed to create. It seems as if coasters cannot write to /var/tmp for whatever reason. Has anyone encountered a problem like this before? > > Note that setting WORKER_LOCATION to a directory in /home circumvents this problem, but that solution won't work because we need a directory that is local to each machine. > > Thank you, > Carsen Berger > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidk at ci.uchicago.edu Sat May 19 18:00:00 2012 From: davidk at ci.uchicago.edu (David Kelly) Date: Sat, 19 May 2012 18:00:00 -0500 (CDT) Subject: [Swift-user] Running coasters on a bag of workstations In-Reply-To: Message-ID: <1345427125.24787.1337468400191.JavaMail.root@zimbra-mb2.anl.gov> Carsen, When you set WORKER_LOCATION to a non-shared directory, you need to enable coaster provider staging. To do this, add a line to your coaster-service.conf that says: export SHARED_FILESYSTEM=no When you run start-coaster-service the next time, a file called 'cf' will be created. Then run swift with something like this: swift -sites.file sites.xml -tc.file tc.data -config cf myscript.swift I think this should do the trick, but please let me know if you run into anything else. I'll try to update the documentation to make this more clear. Regards, David ----- Original Message ----- > From: "Carsen Berger" > To: swift-user at ci.uchicago.edu > Sent: Saturday, May 19, 2012 4:17:03 PM > Subject: [Swift-user] Running coasters on a bag of workstations > Hi all, > > I'm trying to run coasters on a collection of commodity machines and > I'm running into some issues. The machines all share a common /home > directory on a NFS file system, so to give the coaster workers local > storage, I decided to set the WORKER_LOCATION directory to the local > /var/tmp on each machine. > > Running start-coaster-service installs worker.pl in /var/tmp just > fine, but then when we try to run a swift script it spits out a slew > of "No such file or directory" errors for all the files that the > worker is supposed to create. It seems as if coasters cannot write to > /var/tmp for whatever reason. Has anyone encountered a problem like > this before? > > Note that setting WORKER_LOCATION to a directory in /home circumvents > this problem, but that solution won't work because we need a directory > that is local to each machine. > > Thank you, > Carsen Berger > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From jonmon at mcs.anl.gov Sat May 19 18:01:53 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Sat, 19 May 2012 18:01:53 -0500 Subject: [Swift-user] Running coasters on a bag of workstations In-Reply-To: References: Message-ID: I have not tried the using a bag of workstations very often but I think there is a SHARED_FILESYSTEM option. Did you try setting that to no? On May 19, 2012, at 17:51, Ketan wrote: > Is it possible /tmp is mounted on /tmp and not /var/tmp .. just a quick guess. > On May 19, 2012, at 5:17 PM, Carsen Berger wrote: > >> Hi all, >> >> I'm trying to run coasters on a collection of commodity machines and I'm running into some issues. The machines all share a common /home directory on a NFS file system, so to give the coaster workers local storage, I decided to set the WORKER_LOCATION directory to the local /var/tmp on each machine. >> >> Running start-coaster-service installs worker.pl in /var/tmp just fine, but then when we try to run a swift script it spits out a slew of "No such file or directory" errors for all the files that the worker is supposed to create. It seems as if coasters cannot write to /var/tmp for whatever reason. Has anyone encountered a problem like this before? >> >> Note that setting WORKER_LOCATION to a directory in /home circumvents this problem, but that solution won't work because we need a directory that is local to each machine. >> >> Thank you, >> Carsen Berger >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From clberger at uchicago.edu Sat May 19 20:24:01 2012 From: clberger at uchicago.edu (Carsen Berger) Date: Sat, 19 May 2012 20:24:01 -0500 Subject: [Swift-user] Running coasters on a bag of workstations In-Reply-To: <1345427125.24787.1337468400191.JavaMail.root@zimbra-mb2.anl.gov> References: <1345427125.24787.1337468400191.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: Thanks everyone for the replies! David, you were right: I just needed to include the -config cf option when running Swift. Best, Carsen On Sat, May 19, 2012 at 6:00 PM, David Kelly wrote: > Carsen, > > When you set WORKER_LOCATION to a non-shared directory, you need to enable > coaster provider staging. To do this, add a line to your > coaster-service.conf that says: > > export SHARED_FILESYSTEM=no > > When you run start-coaster-service the next time, a file called 'cf' will > be created. Then run swift with something like this: > swift -sites.file sites.xml -tc.file tc.data -config cf myscript.swift > > I think this should do the trick, but please let me know if you run into > anything else. I'll try to update the documentation to make this more clear. > > Regards, > David > > ----- Original Message ----- > > From: "Carsen Berger" > > To: swift-user at ci.uchicago.edu > > Sent: Saturday, May 19, 2012 4:17:03 PM > > Subject: [Swift-user] Running coasters on a bag of workstations > > Hi all, > > > > I'm trying to run coasters on a collection of commodity machines and > > I'm running into some issues. The machines all share a common /home > > directory on a NFS file system, so to give the coaster workers local > > storage, I decided to set the WORKER_LOCATION directory to the local > > /var/tmp on each machine. > > > > Running start-coaster-service installs worker.pl in /var/tmp just > > fine, but then when we try to run a swift script it spits out a slew > > of "No such file or directory" errors for all the files that the > > worker is supposed to create. It seems as if coasters cannot write to > > /var/tmp for whatever reason. Has anyone encountered a problem like > > this before? > > > > Note that setting WORKER_LOCATION to a directory in /home circumvents > > this problem, but that solution won't work because we need a directory > > that is local to each machine. > > > > Thank you, > > Carsen Berger > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From clberger at uchicago.edu Sun May 20 23:38:11 2012 From: clberger at uchicago.edu (Carsen Berger) Date: Sun, 20 May 2012 23:38:11 -0500 Subject: [Swift-user] Timing Swift runs Message-ID: Hello again, I need to get performance numbers to benchmark various Swift jobs on a generic bag of workstations. Is there some easy way to do this, e.g. perhaps a mechanism built into Swift that allows it to report how long an execution took? Or would I need to come up with a site-specific solution? Thank you, Carsen Berger -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrlee at uchicago.edu Mon May 21 08:33:12 2012 From: mrlee at uchicago.edu (Matt Lee) Date: Mon, 21 May 2012 08:33:12 -0500 Subject: [Swift-user] Running Python scripts on Swift Message-ID: Hello, I have a few python scripts that chain input/output, and I was wondering what the best way to call these scripts? Thus far, I've tried placing them into a shell script, but I'm not sure how to make the shell scripts take arguments. How would one do this? Here's what I have: type shellfile; type pythonfile; type inputfile; app (reducefile sout) runmapreduce(shellfile shellscript, inputfile f, pythonfile map, pythonfile combine, pythonfile reduce){ shell "source" @shellscript @f @map @combine @reduce stdout=@filename(sout); } shellfile mapreduce_shell_script; pythonfile map_fun; pythonfile combine_fun; pythonfile reduce_fun; The shell script doesn't take the arguments, which leads me to believe I have specified them incorrectly. Would it be better to make the python files themselves executable, and simply chain their operation within a swift app function? I also was wondering if there was a way within swift to specify the node upon which a specific job was run? More specifically, I'd like to run a job on the master node that aggregates the results of the computations on the worker nodes. Would such a thing be possible in swift, or would I need to configure coasters to accommodate this? Thanks, Matt Lee -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon May 21 09:37:18 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 21 May 2012 09:37:18 -0500 (CDT) Subject: [Swift-user] Timing Swift runs In-Reply-To: Message-ID: <67659530.9285.1337611038549.JavaMail.root@zimbra.anl.gov> Hi Carsen, The "info" files that are produced by each application invocation include (I think) the run time stats. These are returned in a ".d" directory for successful app invocations when you specify this option in the swift.properties file (-config option): wrapperlog.always.transfer=false In addition you can glean a lot of info about run times from the Swift .log file. There are tools to plot information from the log, but Im not sure how up to date these are in the 0.93 and trunk releases. Can anyone with log plotting/analysis expertise provide more info? Thanks, - Mike ----- Original Message ----- > From: "Carsen Berger" > To: swift-user at ci.uchicago.edu > Sent: Sunday, May 20, 2012 11:38:11 PM > Subject: [Swift-user] Timing Swift runs > Hello again, > > I need to get performance numbers to benchmark various Swift jobs on a > generic bag of workstations. Is there some easy way to do this, e.g. > perhaps a mechanism built into Swift that allows it to report how long > an execution took? Or would I need to come up with a site-specific > solution? > > Thank you, > Carsen Berger > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wozniak at mcs.anl.gov Mon May 21 11:16:44 2012 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Mon, 21 May 2012 11:16:44 -0500 Subject: [Swift-user] Timing Swift runs In-Reply-To: <67659530.9285.1337611038549.JavaMail.root@zimbra.anl.gov> References: <67659530.9285.1337611038549.JavaMail.root@zimbra.anl.gov> Message-ID: <4FBA6A6C.2060100@mcs.anl.gov> Hello The log processing tools are in swift/libexec/log-processing . README.txt is an overview of the tools. It contains a section about a "job run time distribution plot" which might be what you're looking for. You can simply extract the data or use the plotter in: https://svn.ci.uchicago.edu/svn/vdl2/usertools/plotter Justin On 05/21/2012 09:37 AM, Michael Wilde wrote: > Hi Carsen, > > The "info" files that are produced by each application invocation include (I think) the run time stats. These are returned in a ".d" directory for successful app invocations when you specify this option in the swift.properties file (-config option): > > wrapperlog.always.transfer=false > > In addition you can glean a lot of info about run times from the Swift .log file. > There are tools to plot information from the log, but Im not sure how up to date these are in the 0.93 and trunk releases. Can anyone with log plotting/analysis expertise provide more info? > > Thanks, > > - Mike > > > ----- Original Message ----- >> From: "Carsen Berger" >> To: swift-user at ci.uchicago.edu >> Sent: Sunday, May 20, 2012 11:38:11 PM >> Subject: [Swift-user] Timing Swift runs >> Hello again, >> >> I need to get performance numbers to benchmark various Swift jobs on a >> generic bag of workstations. Is there some easy way to do this, e.g. >> perhaps a mechanism built into Swift that allows it to report how long >> an execution took? Or would I need to come up with a site-specific >> solution? >> >> Thank you, >> Carsen Berger >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Justin M Wozniak From ketancmaheshwari at gmail.com Mon May 21 13:54:34 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 21 May 2012 14:54:34 -0400 Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 Message-ID: Hi, I am trying to run the GE mars script on a bag of workstations. I tested the script for a sufficient number of tasks and seems to be working fine on localhost. However, it fails in this setup. I get the error message as follows after seemingly right invocation: Find: keepalive(120), reconnect - http://128.84.97.46:41287 Progress: time: Mon, 21 May 2012 14:43:18 -0400 Stage in:7 Submitted:3 Progress: time: Mon, 21 May 2012 14:43:19 -0400 Stage in:8 Active:2 Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.plline 1349. Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.plline 1349. Progress: time: Mon, 21 May 2012 14:43:20 -0400 Active:3 Stage out:7 Obviously the staging out of results fails and seems that the number of files in the stageout stage is causing the error. The application needs to stage out about 120 files. One solution I could quickly think of is to wrap the app in a shell and zip the outputs making it just one staged out file. However, the current setup would still be useful since we are trying to compare the existing Hadoop solution with the Swift one. Is there any possible workaround, some env setting or so that I could try and get the stageout going? The logs are: http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log and http://www.mcs.anl.gov/~ketan/workerlogs.tgz Regards, -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon May 21 14:35:26 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 21 May 2012 14:35:26 -0500 (CDT) Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: Message-ID: <1273271123.10060.1337628926407.JavaMail.root@zimbra.anl.gov> Ketan, as far as I can tell, that message, coming from worker.pl, is just a warning. Programing Perl sec 33, Diagnostic Messages: "Deep recursion on subroutine "%s" (W recursion) This subroutine has called itself (directly or indirectly) 100 times more than it has returned. This probably indicates an infinite recursion, unless you're writing strange benchmark programs, in which case it indicates something else." The stageout code in worker.pl is indeed recursive, and the warning could be suppressed: "Try placing no warnings 'recursion'; within the same scope as that code ..." Can you try a simple mod to catsn, using your ext mapper, to see if it is indeed failing due to the deeply recursive stageout? If you could dig a bit deeper into this, and see whether its really failing when staging back so many files or failing for some other, or related, reason, that would be great. Thanks, - Mike ----- Original Message ----- > From: "Ketan Maheshwari" > To: "Swift User" > Sent: Monday, May 21, 2012 1:54:34 PM > Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 > Hi, > > > I am trying to run the GE mars script on a bag of workstations. I > tested the script for a sufficient number of tasks and seems to be > working fine on localhost. > > > However, it fails in this setup. I get the error message as follows > after seemingly right invocation: > > > > > Find: keepalive(120), reconnect - http://128.84.97.46:41287 > Progress: time: Mon, 21 May 2012 14:43:18 -0400 Stage in:7 Submitted:3 > Progress: time: Mon, 21 May 2012 14:43:19 -0400 Stage in:8 Active:2 > Deep recursion on subroutine "main::stageout" at /home/ketan/work/ > worker.pl line 1349. > Deep recursion on subroutine "main::stageout" at /home/ketan/work/ > worker.pl line 1349. > Progress: time: Mon, 21 May 2012 14:43:20 -0400 Active:3 Stage out:7 > > > Obviously the staging out of results fails and seems that the number > of files in the stageout stage is causing the error. The application > needs to stage out about 120 files. > > > One solution I could quickly think of is to wrap the app in a shell > and zip the outputs making it just one staged out file. > > > However, the current setup would still be useful since we are trying > to compare the existing Hadoop solution with the Swift one. > > > Is there any possible workaround, some env setting or so that I could > try and get the stageout going? > > > The logs are: > http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log > and http://www.mcs.anl.gov/~ketan/workerlogs.tgz > > > > > Regards, -- > Ketan > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Mon May 21 16:28:02 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 21 May 2012 17:28:02 -0400 Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: <1273271123.10060.1337628926407.JavaMail.root@zimbra.anl.gov> References: <1273271123.10060.1337628926407.JavaMail.root@zimbra.anl.gov> Message-ID: Thanks Mike. Indeed the recursion was a warning. I found the problem was that the binary could not find the licence in the cwd from where it was being called. This is an application requirement that the licence file must be present in the cwd from where the call is made. However, Swift makes a dirtree in the workdir, stages the files and calls the binary from *outside* of this tree. Is it possible to make swift stage the licence file and put it on the top level without writing a wrapper to do a cp. Again, the point of not wrapping the binary into a script is to mimic the Hadoop setup as close as possible. On Mon, May 21, 2012 at 3:35 PM, Michael Wilde wrote: > Ketan, as far as I can tell, that message, coming from worker.pl, is just > a warning. > > Programing Perl sec 33, Diagnostic Messages: "Deep recursion on subroutine > "%s" > > (W recursion) This subroutine has called itself (directly or indirectly) > 100 times more than it has returned. This probably indicates an infinite > recursion, unless you're writing strange benchmark programs, in which case > it indicates something else." > > The stageout code in worker.pl is indeed recursive, and the warning could > be suppressed: > > "Try placing > > no warnings 'recursion'; > > within the same scope as that code ..." > > Can you try a simple mod to catsn, using your ext mapper, to see if it is > indeed failing due to the deeply recursive stageout? > > If you could dig a bit deeper into this, and see whether its really > failing when staging back so many files or failing for some other, or > related, reason, that would be great. > > Thanks, > > - Mike > > ----- Original Message ----- > > From: "Ketan Maheshwari" > > To: "Swift User" > > Sent: Monday, May 21, 2012 1:54:34 PM > > Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at > /home/ketan/work/worker.pl line 1349 > > Hi, > > > > > > I am trying to run the GE mars script on a bag of workstations. I > > tested the script for a sufficient number of tasks and seems to be > > working fine on localhost. > > > > > > However, it fails in this setup. I get the error message as follows > > after seemingly right invocation: > > > > > > > > > > Find: keepalive(120), reconnect - http://128.84.97.46:41287 > > Progress: time: Mon, 21 May 2012 14:43:18 -0400 Stage in:7 Submitted:3 > > Progress: time: Mon, 21 May 2012 14:43:19 -0400 Stage in:8 Active:2 > > Deep recursion on subroutine "main::stageout" at /home/ketan/work/ > > worker.pl line 1349. > > Deep recursion on subroutine "main::stageout" at /home/ketan/work/ > > worker.pl line 1349. > > Progress: time: Mon, 21 May 2012 14:43:20 -0400 Active:3 Stage out:7 > > > > > > Obviously the staging out of results fails and seems that the number > > of files in the stageout stage is causing the error. The application > > needs to stage out about 120 files. > > > > > > One solution I could quickly think of is to wrap the app in a shell > > and zip the outputs making it just one staged out file. > > > > > > However, the current setup would still be useful since we are trying > > to compare the existing Hadoop solution with the Swift one. > > > > > > Is there any possible workaround, some env setting or so that I could > > try and get the stageout going? > > > > > > The logs are: > > http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log > > and http://www.mcs.anl.gov/~ketan/workerlogs.tgz > > > > > > > > > > Regards, -- > > Ketan > > > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon May 21 18:51:09 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 21 May 2012 18:51:09 -0500 (CDT) Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: Message-ID: <64063349.10413.1337644269535.JavaMail.root@zimbra.anl.gov> Im surprised that Swift isn't setting the current working dir (cwd) to be the job dir, but perhaps that's controlled by this property: # Determines if Swift remote wrappers will be executed by specifying an # absolute path, or a path relative to the job initial working directory # # valid values: absolute, relative # wrapper.invocation.mode=absolute Can you try your script with this property set to "relative"? ...but looking at this further: I see that if youre using coasters with provider staging, the logic for job launch is quite different. We need to study this and get back to you. For now, best to force the right cd's with a wrapper. You might be able to remove the wrapper later, once we resolve how the job dir management should work in these various cases. - Mike ----- Original Message ----- > From: "Ketan Maheshwari" > To: "Michael Wilde" > Cc: "Swift User" > Sent: Monday, May 21, 2012 4:28:02 PM > Subject: Re: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 > Thanks Mike. Indeed the recursion was a warning. > > > I found the problem was that the binary could not find the licence in > the cwd from where it was being called. This is an application > requirement that the licence file must be present in the cwd from > where the call is made. > > > However, Swift makes a dirtree in the workdir, stages the files and > calls the binary from *outside* of this tree. Is it possible to make > swift stage the licence file and put it on the top level without > writing a wrapper to do a cp. Again, the point of not wrapping the > binary into a script is to mimic the Hadoop setup as close as > possible. > > > On Mon, May 21, 2012 at 3:35 PM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > Ketan, as far as I can tell, that message, coming from worker.pl , is > just a warning. > > Programing Perl sec 33, Diagnostic Messages: "Deep recursion on > subroutine "%s" > > (W recursion) This subroutine has called itself (directly or > indirectly) 100 times more than it has returned. This probably > indicates an infinite recursion, unless you're writing strange > benchmark programs, in which case it indicates something else." > > The stageout code in worker.pl is indeed recursive, and the warning > could be suppressed: > > "Try placing > > no warnings 'recursion'; > > within the same scope as that code ..." > > Can you try a simple mod to catsn, using your ext mapper, to see if it > is indeed failing due to the deeply recursive stageout? > > If you could dig a bit deeper into this, and see whether its really > failing when staging back so many files or failing for some other, or > related, reason, that would be great. > > Thanks, > > - Mike > > > > ----- Original Message ----- > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > To: "Swift User" < swift-user at ci.uchicago.edu > > > Sent: Monday, May 21, 2012 1:54:34 PM > > Subject: [Swift-user] Deep recursion on subroutine "main::stageout" > > at /home/ketan/work/ worker.pl line 1349 > > Hi, > > > > > > I am trying to run the GE mars script on a bag of workstations. I > > tested the script for a sufficient number of tasks and seems to be > > working fine on localhost. > > > > > > However, it fails in this setup. I get the error message as follows > > after seemingly right invocation: > > > > > > > > > > Find: keepalive(120), reconnect - http://128.84.97.46:41287 > > Progress: time: Mon, 21 May 2012 14:43:18 -0400 Stage in:7 > > Submitted:3 > > Progress: time: Mon, 21 May 2012 14:43:19 -0400 Stage in:8 Active:2 > > Deep recursion on subroutine "main::stageout" at /home/ketan/work/ > > worker.pl line 1349. > > Deep recursion on subroutine "main::stageout" at /home/ketan/work/ > > worker.pl line 1349. > > Progress: time: Mon, 21 May 2012 14:43:20 -0400 Active:3 Stage out:7 > > > > > > Obviously the staging out of results fails and seems that the number > > of files in the stageout stage is causing the error. The application > > needs to stage out about 120 files. > > > > > > One solution I could quickly think of is to wrap the app in a shell > > and zip the outputs making it just one staged out file. > > > > > > However, the current setup would still be useful since we are trying > > to compare the existing Hadoop solution with the Swift one. > > > > > > Is there any possible workaround, some env setting or so that I > > could > > try and get the stageout going? > > > > > > The logs are: > > http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log > > and http://www.mcs.anl.gov/~ketan/workerlogs.tgz > > > > > > > > > > Regards, -- > > Ketan > > > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > > > > -- > Ketan -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Tue May 22 10:18:11 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 22 May 2012 11:18:11 -0400 Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: <64063349.10413.1337644269535.JavaMail.root@zimbra.anl.gov> References: <64063349.10413.1337644269535.JavaMail.root@zimbra.anl.gov> Message-ID: Looking this further, I now have a wrapper in place which copies the licence file in the cwd before running the executable. However, the executable still gets into error as if the licence file is not present. When I cd into this dir (swift.workdir/mars-20120519-1203-3l....) and manually run the executable, it works. So, the question is does the _swiftwrap.staging does some internal cd'ing before calling the executable? I will take a look inside, but would be useful if someone knows this. The wrapper script is simply the following two lines: """ cp -v home/ketan/ketan_mars/MARS-LIC . /home/ketan/ketan_mars/marsMain $1 """ Regards, Ketan On Mon, May 21, 2012 at 7:51 PM, Michael Wilde wrote: > Im surprised that Swift isn't setting the current working dir (cwd) to be > the job dir, but perhaps that's controlled by this property: > > # Determines if Swift remote wrappers will be executed by specifying an > # absolute path, or a path relative to the job initial working directory > # > # valid values: absolute, relative > # wrapper.invocation.mode=absolute > > Can you try your script with this property set to "relative"? > > ...but looking at this further: I see that if youre using coasters with > provider staging, the logic for job launch is quite different. We need to > study this and get back to you. For now, best to force the right cd's with > a wrapper. You might be able to remove the wrapper later, once we resolve > how the job dir management should work in these various cases. > > - Mike > > > ----- Original Message ----- > > From: "Ketan Maheshwari" > > To: "Michael Wilde" > > Cc: "Swift User" > > Sent: Monday, May 21, 2012 4:28:02 PM > > Subject: Re: [Swift-user] Deep recursion on subroutine "main::stageout" > at /home/ketan/work/worker.pl line 1349 > > Thanks Mike. Indeed the recursion was a warning. > > > > > > I found the problem was that the binary could not find the licence in > > the cwd from where it was being called. This is an application > > requirement that the licence file must be present in the cwd from > > where the call is made. > > > > > > However, Swift makes a dirtree in the workdir, stages the files and > > calls the binary from *outside* of this tree. Is it possible to make > > swift stage the licence file and put it on the top level without > > writing a wrapper to do a cp. Again, the point of not wrapping the > > binary into a script is to mimic the Hadoop setup as close as > > possible. > > > > > > On Mon, May 21, 2012 at 3:35 PM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > Ketan, as far as I can tell, that message, coming from worker.pl , is > > just a warning. > > > > Programing Perl sec 33, Diagnostic Messages: "Deep recursion on > > subroutine "%s" > > > > (W recursion) This subroutine has called itself (directly or > > indirectly) 100 times more than it has returned. This probably > > indicates an infinite recursion, unless you're writing strange > > benchmark programs, in which case it indicates something else." > > > > The stageout code in worker.pl is indeed recursive, and the warning > > could be suppressed: > > > > "Try placing > > > > no warnings 'recursion'; > > > > within the same scope as that code ..." > > > > Can you try a simple mod to catsn, using your ext mapper, to see if it > > is indeed failing due to the deeply recursive stageout? > > > > If you could dig a bit deeper into this, and see whether its really > > failing when staging back so many files or failing for some other, or > > related, reason, that would be great. > > > > Thanks, > > > > - Mike > > > > > > > > ----- Original Message ----- > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > To: "Swift User" < swift-user at ci.uchicago.edu > > > > Sent: Monday, May 21, 2012 1:54:34 PM > > > Subject: [Swift-user] Deep recursion on subroutine "main::stageout" > > > at /home/ketan/work/ worker.pl line 1349 > > > Hi, > > > > > > > > > I am trying to run the GE mars script on a bag of workstations. I > > > tested the script for a sufficient number of tasks and seems to be > > > working fine on localhost. > > > > > > > > > However, it fails in this setup. I get the error message as follows > > > after seemingly right invocation: > > > > > > > > > > > > > > > Find: keepalive(120), reconnect - http://128.84.97.46:41287 > > > Progress: time: Mon, 21 May 2012 14:43:18 -0400 Stage in:7 > > > Submitted:3 > > > Progress: time: Mon, 21 May 2012 14:43:19 -0400 Stage in:8 Active:2 > > > Deep recursion on subroutine "main::stageout" at /home/ketan/work/ > > > worker.pl line 1349. > > > Deep recursion on subroutine "main::stageout" at /home/ketan/work/ > > > worker.pl line 1349. > > > Progress: time: Mon, 21 May 2012 14:43:20 -0400 Active:3 Stage out:7 > > > > > > > > > Obviously the staging out of results fails and seems that the number > > > of files in the stageout stage is causing the error. The application > > > needs to stage out about 120 files. > > > > > > > > > One solution I could quickly think of is to wrap the app in a shell > > > and zip the outputs making it just one staged out file. > > > > > > > > > However, the current setup would still be useful since we are trying > > > to compare the existing Hadoop solution with the Swift one. > > > > > > > > > Is there any possible workaround, some env setting or so that I > > > could > > > try and get the stageout going? > > > > > > > > > The logs are: > > > http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log > > > and http://www.mcs.anl.gov/~ketan/workerlogs.tgz > > > > > > > > > > > > > > > Regards, -- > > > Ketan > > > > > > > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > > > -- > > Ketan > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Tue May 22 10:27:17 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 22 May 2012 10:27:17 -0500 (CDT) Subject: [Swift-user] Running Python scripts on Swift In-Reply-To: Message-ID: <1056948389.11149.1337700437069.JavaMail.root@zimbra.anl.gov> > I have a few python scripts that chain input/output, and I was > wondering what the best way to call these scripts? Can you clarify? Do you want to have Swift chain them, or chain them within a single shell script? Either approach is possible and reasonable. Chain them in Swift if: - there's a modularity benefit to keeping them separate - you might want to run them on different resources (sites) Chain them in a shell (or similar) script if: - they will never need to be run separately - their run time is so short that its better to pass the data within a shell script on local filesystems or env vars, etc. > Thus far, I've tried placing them into a shell script, but I'm not > sure how to make the shell scripts take arguments. How would one do > this? I think you're just missing something like the shell's "-c" argument. You should test your calling conventions from a shell command line first; then make sure that your Swift script is generating the correct command line from your app() function specification. In your example below, where you are constructing the shell script in a file, just remove the word "source". > app (reducefile sout) runmapreduce(shellfile shellscript, inputfile f, > pythonfile map, pythonfile combine, pythonfile reduce){ > shell "source" @shellscript @f @map @combine @reduce > stdout=@filename(sout); > } > shellfile > mapreduce_shell_script; Omit the argument "source" since your next argument is an actual shell script. Assuming "shell" is a tc.data entry pointing to /bin/sh etc, the shell wont take "source" as a command unless its preceded by "-c". But then it wants a single string as an argument. There's a few subtleties you need to work out between Swift, its qouting and command-line construction conventions, and the shell youre using, and theres a few different ways to do this. We need to document these approaches, but for now, with a bit of experimentation its pretty easy to make such shell calling conventions work very reasonably. - Mike From wilde at mcs.anl.gov Tue May 22 10:34:29 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 22 May 2012 10:34:29 -0500 (CDT) Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: Message-ID: <1939140758.11164.1337700869993.JavaMail.root@zimbra.anl.gov> Isnt this line problematic if you dont know where the wrapper script has you cd'ed to: cp -v home/ketan/ketan_mars/MARS-LIC . ^^^ The relative path doesnt seem safe. - Mike ----- Original Message ----- > From: "Ketan Maheshwari" > To: "Michael Wilde" > Cc: "Swift User" > Sent: Tuesday, May 22, 2012 10:18:11 AM > Subject: Re: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 > Looking this further, I now have a wrapper in place which copies the > licence file in the cwd before running the executable. However, the > executable still gets into error as if the licence file is not > present. > > > When I cd into this dir (swift.workdir/mars-20120519-1203-3l....) and > manually run the executable, it works. > > > So, the question is does the _swiftwrap.staging does some internal > cd'ing before calling the executable? I will take a look inside, but > would be useful if someone knows this. > > > The wrapper script is simply the following two lines: > > > """ > cp -v home/ketan/ketan_mars/MARS-LIC . > /home/ketan/ketan_mars/marsMain $1 > """ > > > Regards, > Ketan > > > On Mon, May 21, 2012 at 7:51 PM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > Im surprised that Swift isn't setting the current working dir (cwd) to > be the job dir, but perhaps that's controlled by this property: > > # Determines if Swift remote wrappers will be executed by specifying > an > # absolute path, or a path relative to the job initial working > directory > # > # valid values: absolute, relative > # wrapper.invocation.mode=absolute > > Can you try your script with this property set to "relative"? > > ...but looking at this further: I see that if youre using coasters > with provider staging, the logic for job launch is quite different. We > need to study this and get back to you. For now, best to force the > right cd's with a wrapper. You might be able to remove the wrapper > later, once we resolve how the job dir management should work in these > various cases. > > > - Mike > > > ----- Original Message ----- > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > Cc: "Swift User" < swift-user at ci.uchicago.edu > > > Sent: Monday, May 21, 2012 4:28:02 PM > > Subject: Re: [Swift-user] Deep recursion on subroutine > > "main::stageout" at /home/ketan/work/ worker.pl line 1349 > > Thanks Mike. Indeed the recursion was a warning. > > > > > > I found the problem was that the binary could not find the licence > > in > > the cwd from where it was being called. This is an application > > requirement that the licence file must be present in the cwd from > > where the call is made. > > > > > > However, Swift makes a dirtree in the workdir, stages the files and > > calls the binary from *outside* of this tree. Is it possible to make > > swift stage the licence file and put it on the top level without > > writing a wrapper to do a cp. Again, the point of not wrapping the > > binary into a script is to mimic the Hadoop setup as close as > > possible. > > > > > > On Mon, May 21, 2012 at 3:35 PM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > Ketan, as far as I can tell, that message, coming from worker.pl , > > is > > > just a warning. > > > > Programing Perl sec 33, Diagnostic Messages: "Deep recursion on > > subroutine "%s" > > > > (W recursion) This subroutine has called itself (directly or > > indirectly) 100 times more than it has returned. This probably > > indicates an infinite recursion, unless you're writing strange > > benchmark programs, in which case it indicates something else." > > > > The stageout code in worker.pl is indeed recursive, and the warning > > could be suppressed: > > > > "Try placing > > > > no warnings 'recursion'; > > > > within the same scope as that code ..." > > > > Can you try a simple mod to catsn, using your ext mapper, to see if > > it > > is indeed failing due to the deeply recursive stageout? > > > > If you could dig a bit deeper into this, and see whether its really > > failing when staging back so many files or failing for some other, > > or > > related, reason, that would be great. > > > > Thanks, > > > > - Mike > > > > > > > > ----- Original Message ----- > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > To: "Swift User" < swift-user at ci.uchicago.edu > > > > Sent: Monday, May 21, 2012 1:54:34 PM > > > Subject: [Swift-user] Deep recursion on subroutine > > > "main::stageout" > > > > > at /home/ketan/work/ worker.pl line 1349 > > > Hi, > > > > > > > > > I am trying to run the GE mars script on a bag of workstations. I > > > tested the script for a sufficient number of tasks and seems to be > > > working fine on localhost. > > > > > > > > > However, it fails in this setup. I get the error message as > > > follows > > > after seemingly right invocation: > > > > > > > > > > > > > > > Find: keepalive(120), reconnect - http://128.84.97.46:41287 > > > Progress: time: Mon, 21 May 2012 14:43:18 -0400 Stage in:7 > > > Submitted:3 > > > Progress: time: Mon, 21 May 2012 14:43:19 -0400 Stage in:8 > > > Active:2 > > > Deep recursion on subroutine "main::stageout" at /home/ketan/work/ > > > worker.pl line 1349. > > > Deep recursion on subroutine "main::stageout" at /home/ketan/work/ > > > worker.pl line 1349. > > > Progress: time: Mon, 21 May 2012 14:43:20 -0400 Active:3 Stage > > > out:7 > > > > > > > > > Obviously the staging out of results fails and seems that the > > > number > > > of files in the stageout stage is causing the error. The > > > application > > > needs to stage out about 120 files. > > > > > > > > > One solution I could quickly think of is to wrap the app in a > > > shell > > > and zip the outputs making it just one staged out file. > > > > > > > > > However, the current setup would still be useful since we are > > > trying > > > to compare the existing Hadoop solution with the Swift one. > > > > > > > > > Is there any possible workaround, some env setting or so that I > > > could > > > try and get the stageout going? > > > > > > > > > The logs are: > > > http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log > > > and http://www.mcs.anl.gov/~ketan/workerlogs.tgz > > > > > > > > > > > > > > > Regards, -- > > > Ketan > > > > > > > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > > > -- > > Ketan > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > > > > -- > Ketan -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Tue May 22 12:01:49 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 22 May 2012 13:01:49 -0400 Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: <1939140758.11164.1337700869993.JavaMail.root@zimbra.anl.gov> References: <1939140758.11164.1337700869993.JavaMail.root@zimbra.anl.gov> Message-ID: The line works fine because Swift creates the dir tree starting at /home but in the swift.workdir. With -v, I could see the file gets copied to the cwd and is present there. So, I assume that the wrapper script is not cd'ing me anywhere. So, it still is a mystery why the app complaint about the file not present when run from wrapper and it works when run manually in the same dir. On Tue, May 22, 2012 at 11:34 AM, Michael Wilde wrote: > Isnt this line problematic if you dont know where the wrapper script has > you cd'ed to: > > cp -v home/ketan/ketan_mars/MARS-LIC . > ^^^ > > The relative path doesnt seem safe. > > - Mike > > > ----- Original Message ----- > > From: "Ketan Maheshwari" > > To: "Michael Wilde" > > Cc: "Swift User" > > Sent: Tuesday, May 22, 2012 10:18:11 AM > > Subject: Re: [Swift-user] Deep recursion on subroutine "main::stageout" > at /home/ketan/work/worker.pl line 1349 > > Looking this further, I now have a wrapper in place which copies the > > licence file in the cwd before running the executable. However, the > > executable still gets into error as if the licence file is not > > present. > > > > > > When I cd into this dir (swift.workdir/mars-20120519-1203-3l....) and > > manually run the executable, it works. > > > > > > So, the question is does the _swiftwrap.staging does some internal > > cd'ing before calling the executable? I will take a look inside, but > > would be useful if someone knows this. > > > > > > The wrapper script is simply the following two lines: > > > > > > """ > > cp -v home/ketan/ketan_mars/MARS-LIC . > > /home/ketan/ketan_mars/marsMain $1 > > """ > > > > > > Regards, > > Ketan > > > > > > On Mon, May 21, 2012 at 7:51 PM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > Im surprised that Swift isn't setting the current working dir (cwd) to > > be the job dir, but perhaps that's controlled by this property: > > > > # Determines if Swift remote wrappers will be executed by specifying > > an > > # absolute path, or a path relative to the job initial working > > directory > > # > > # valid values: absolute, relative > > # wrapper.invocation.mode=absolute > > > > Can you try your script with this property set to "relative"? > > > > ...but looking at this further: I see that if youre using coasters > > with provider staging, the logic for job launch is quite different. We > > need to study this and get back to you. For now, best to force the > > right cd's with a wrapper. You might be able to remove the wrapper > > later, once we resolve how the job dir management should work in these > > various cases. > > > > > > - Mike > > > > > > ----- Original Message ----- > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > Cc: "Swift User" < swift-user at ci.uchicago.edu > > > > Sent: Monday, May 21, 2012 4:28:02 PM > > > Subject: Re: [Swift-user] Deep recursion on subroutine > > > "main::stageout" at /home/ketan/work/ worker.pl line 1349 > > > Thanks Mike. Indeed the recursion was a warning. > > > > > > > > > I found the problem was that the binary could not find the licence > > > in > > > the cwd from where it was being called. This is an application > > > requirement that the licence file must be present in the cwd from > > > where the call is made. > > > > > > > > > However, Swift makes a dirtree in the workdir, stages the files and > > > calls the binary from *outside* of this tree. Is it possible to make > > > swift stage the licence file and put it on the top level without > > > writing a wrapper to do a cp. Again, the point of not wrapping the > > > binary into a script is to mimic the Hadoop setup as close as > > > possible. > > > > > > > > > On Mon, May 21, 2012 at 3:35 PM, Michael Wilde < wilde at mcs.anl.gov > > > > wrote: > > > > > > > > > Ketan, as far as I can tell, that message, coming from worker.pl , > > > is > > > > > just a warning. > > > > > > Programing Perl sec 33, Diagnostic Messages: "Deep recursion on > > > subroutine "%s" > > > > > > (W recursion) This subroutine has called itself (directly or > > > indirectly) 100 times more than it has returned. This probably > > > indicates an infinite recursion, unless you're writing strange > > > benchmark programs, in which case it indicates something else." > > > > > > The stageout code in worker.pl is indeed recursive, and the warning > > > could be suppressed: > > > > > > "Try placing > > > > > > no warnings 'recursion'; > > > > > > within the same scope as that code ..." > > > > > > Can you try a simple mod to catsn, using your ext mapper, to see if > > > it > > > is indeed failing due to the deeply recursive stageout? > > > > > > If you could dig a bit deeper into this, and see whether its really > > > failing when staging back so many files or failing for some other, > > > or > > > related, reason, that would be great. > > > > > > Thanks, > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > > To: "Swift User" < swift-user at ci.uchicago.edu > > > > > Sent: Monday, May 21, 2012 1:54:34 PM > > > > Subject: [Swift-user] Deep recursion on subroutine > > > > "main::stageout" > > > > > > > > at /home/ketan/work/ worker.pl line 1349 > > > > Hi, > > > > > > > > > > > > I am trying to run the GE mars script on a bag of workstations. I > > > > tested the script for a sufficient number of tasks and seems to be > > > > working fine on localhost. > > > > > > > > > > > > However, it fails in this setup. I get the error message as > > > > follows > > > > after seemingly right invocation: > > > > > > > > > > > > > > > > > > > > Find: keepalive(120), reconnect - http://128.84.97.46:41287 > > > > Progress: time: Mon, 21 May 2012 14:43:18 -0400 Stage in:7 > > > > Submitted:3 > > > > Progress: time: Mon, 21 May 2012 14:43:19 -0400 Stage in:8 > > > > Active:2 > > > > Deep recursion on subroutine "main::stageout" at /home/ketan/work/ > > > > worker.pl line 1349. > > > > Deep recursion on subroutine "main::stageout" at /home/ketan/work/ > > > > worker.pl line 1349. > > > > Progress: time: Mon, 21 May 2012 14:43:20 -0400 Active:3 Stage > > > > out:7 > > > > > > > > > > > > Obviously the staging out of results fails and seems that the > > > > number > > > > of files in the stageout stage is causing the error. The > > > > application > > > > needs to stage out about 120 files. > > > > > > > > > > > > One solution I could quickly think of is to wrap the app in a > > > > shell > > > > and zip the outputs making it just one staged out file. > > > > > > > > > > > > However, the current setup would still be useful since we are > > > > trying > > > > to compare the existing Hadoop solution with the Swift one. > > > > > > > > > > > > Is there any possible workaround, some env setting or so that I > > > > could > > > > try and get the stageout going? > > > > > > > > > > > > The logs are: > > > > http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log > > > > and http://www.mcs.anl.gov/~ketan/workerlogs.tgz > > > > > > > > > > > > > > > > > > > > Regards, -- > > > > Ketan > > > > > > > > > > > > > > > > _______________________________________________ > > > > Swift-user mailing list > > > > Swift-user at ci.uchicago.edu > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > -- > > > Ketan > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > > > -- > > Ketan > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Tue May 22 12:24:55 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 22 May 2012 12:24:55 -0500 (CDT) Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: Message-ID: <120576366.11470.1337707495967.JavaMail.root@zimbra.anl.gov> If that path home/ketan/ketan_mars/MARS-LIC is being correctly copied to the workdir (and I stand corrected: thats exactly what should happen) then another possibility is that the program doesnt like getting a symlink for the license file? Can you test that case externally (outside of Swift) before we go further? You reported the problem as "...the executable still gets into error as if the licence file is not present." The license file will appear to the MARS executable (and the wrapper script) as a symlink (from the jobdir to the workdir, to use the terminology f the Swift User Guide). If that is indeed the problem, your wrapper script might be able to get around this with: cp MARS-LIC tmplic rm MARS-LIC mv tmplic MARS-LIC Exactly what error is MARS generating for this problem? - Mike ----- Original Message ----- > From: "Ketan Maheshwari" > To: "Michael Wilde" > Cc: "Swift User" > Sent: Tuesday, May 22, 2012 12:01:49 PM > Subject: Re: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 > The line works fine because Swift creates the dir tree starting at > /home but in the swift.workdir. With -v, I could see the file gets > copied to the cwd and is present there. > > > So, I assume that the wrapper script is not cd'ing me anywhere. So, it > still is a mystery why the app complaint about the file not present > when run from wrapper and it works when run manually in the same dir. > > On Tue, May 22, 2012 at 11:34 AM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > Isnt this line problematic if you dont know where the wrapper script > has you cd'ed to: > > cp -v home/ketan/ketan_mars/MARS-LIC . > ^^^ > > The relative path doesnt seem safe. > > > - Mike > > > ----- Original Message ----- > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > Cc: "Swift User" < swift-user at ci.uchicago.edu > > > > > Sent: Tuesday, May 22, 2012 10:18:11 AM > > Subject: Re: [Swift-user] Deep recursion on subroutine > > "main::stageout" at /home/ketan/work/ worker.pl line 1349 > > Looking this further, I now have a wrapper in place which copies the > > licence file in the cwd before running the executable. However, the > > executable still gets into error as if the licence file is not > > present. > > > > > > When I cd into this dir (swift.workdir/mars-20120519-1203-3l....) > > and > > manually run the executable, it works. > > > > > > So, the question is does the _swiftwrap.staging does some internal > > cd'ing before calling the executable? I will take a look inside, but > > would be useful if someone knows this. > > > > > > The wrapper script is simply the following two lines: > > > > > > """ > > cp -v home/ketan/ketan_mars/MARS-LIC . > > /home/ketan/ketan_mars/marsMain $1 > > """ > > > > > > Regards, > > Ketan > > > > > > On Mon, May 21, 2012 at 7:51 PM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > Im surprised that Swift isn't setting the current working dir (cwd) > > to > > be the job dir, but perhaps that's controlled by this property: > > > > # Determines if Swift remote wrappers will be executed by specifying > > an > > # absolute path, or a path relative to the job initial working > > directory > > # > > # valid values: absolute, relative > > # wrapper.invocation.mode=absolute > > > > Can you try your script with this property set to "relative"? > > > > ...but looking at this further: I see that if youre using coasters > > with provider staging, the logic for job launch is quite different. > > We > > need to study this and get back to you. For now, best to force the > > right cd's with a wrapper. You might be able to remove the wrapper > > later, once we resolve how the job dir management should work in > > these > > various cases. > > > > > > - Mike > > > > > > ----- Original Message ----- > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > Cc: "Swift User" < swift-user at ci.uchicago.edu > > > > Sent: Monday, May 21, 2012 4:28:02 PM > > > Subject: Re: [Swift-user] Deep recursion on subroutine > > > "main::stageout" at /home/ketan/work/ worker.pl line 1349 > > > > > Thanks Mike. Indeed the recursion was a warning. > > > > > > > > > I found the problem was that the binary could not find the licence > > > in > > > the cwd from where it was being called. This is an application > > > requirement that the licence file must be present in the cwd from > > > where the call is made. > > > > > > > > > However, Swift makes a dirtree in the workdir, stages the files > > > and > > > calls the binary from *outside* of this tree. Is it possible to > > > make > > > swift stage the licence file and put it on the top level without > > > writing a wrapper to do a cp. Again, the point of not wrapping the > > > binary into a script is to mimic the Hadoop setup as close as > > > possible. > > > > > > > > > On Mon, May 21, 2012 at 3:35 PM, Michael Wilde < wilde at mcs.anl.gov > > > > > > > wrote: > > > > > > > > > Ketan, as far as I can tell, that message, coming from worker.pl , > > > is > > > > > just a warning. > > > > > > Programing Perl sec 33, Diagnostic Messages: "Deep recursion on > > > subroutine "%s" > > > > > > (W recursion) This subroutine has called itself (directly or > > > indirectly) 100 times more than it has returned. This probably > > > indicates an infinite recursion, unless you're writing strange > > > benchmark programs, in which case it indicates something else." > > > > > > The stageout code in worker.pl is indeed recursive, and the > > > warning > > > could be suppressed: > > > > > > "Try placing > > > > > > no warnings 'recursion'; > > > > > > within the same scope as that code ..." > > > > > > Can you try a simple mod to catsn, using your ext mapper, to see > > > if > > > it > > > is indeed failing due to the deeply recursive stageout? > > > > > > If you could dig a bit deeper into this, and see whether its > > > really > > > failing when staging back so many files or failing for some other, > > > or > > > related, reason, that would be great. > > > > > > Thanks, > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > > To: "Swift User" < swift-user at ci.uchicago.edu > > > > > Sent: Monday, May 21, 2012 1:54:34 PM > > > > Subject: [Swift-user] Deep recursion on subroutine > > > > "main::stageout" > > > > > > > > at /home/ketan/work/ worker.pl line 1349 > > > > Hi, > > > > > > > > > > > > I am trying to run the GE mars script on a bag of workstations. > > > > I > > > > tested the script for a sufficient number of tasks and seems to > > > > be > > > > working fine on localhost. > > > > > > > > > > > > However, it fails in this setup. I get the error message as > > > > follows > > > > after seemingly right invocation: > > > > > > > > > > > > > > > > > > > > Find: keepalive(120), reconnect - http://128.84.97.46:41287 > > > > Progress: time: Mon, 21 May 2012 14:43:18 -0400 Stage in:7 > > > > Submitted:3 > > > > Progress: time: Mon, 21 May 2012 14:43:19 -0400 Stage in:8 > > > > Active:2 > > > > Deep recursion on subroutine "main::stageout" at > > > > /home/ketan/work/ > > > > worker.pl line 1349. > > > > Deep recursion on subroutine "main::stageout" at > > > > /home/ketan/work/ > > > > worker.pl line 1349. > > > > Progress: time: Mon, 21 May 2012 14:43:20 -0400 Active:3 Stage > > > > out:7 > > > > > > > > > > > > Obviously the staging out of results fails and seems that the > > > > number > > > > of files in the stageout stage is causing the error. The > > > > application > > > > needs to stage out about 120 files. > > > > > > > > > > > > One solution I could quickly think of is to wrap the app in a > > > > shell > > > > and zip the outputs making it just one staged out file. > > > > > > > > > > > > However, the current setup would still be useful since we are > > > > trying > > > > to compare the existing Hadoop solution with the Swift one. > > > > > > > > > > > > Is there any possible workaround, some env setting or so that I > > > > could > > > > try and get the stageout going? > > > > > > > > > > > > The logs are: > > > > http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log > > > > and http://www.mcs.anl.gov/~ketan/workerlogs.tgz > > > > > > > > > > > > > > > > > > > > Regards, -- > > > > Ketan > > > > > > > > > > > > > > > > _______________________________________________ > > > > Swift-user mailing list > > > > Swift-user at ci.uchicago.edu > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > -- > > > Ketan > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > > > -- > > Ketan > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > > > > -- > Ketan -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Tue May 22 16:10:15 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 22 May 2012 17:10:15 -0400 Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: <120576366.11470.1337707495967.JavaMail.root@zimbra.anl.gov> References: <120576366.11470.1337707495967.JavaMail.root@zimbra.anl.gov> Message-ID: Mike, The jobdir and the workdir are the same right? At least that is what the pwd in my marswrapper shows. The following is the stdout section of swiftwrap: _____________________________________________________________________________ stdout _____________________________________________________________________________ # pwd /amd/camel/b/ketan/ketan_mars/swift.workdir/mars-20120522-1702-j6gtml62-k-marswrap-kcj9rork # cp -v home/ketan/ketan_mars/MARS-LIC . `home/ketan/ketan_mars/MARS-LIC' -> `./MARS-LIC' # The error message thrown by mars" <**> ERROR: *** Unable to open License Date File MARS-LIC *** =================== This is why I said Mars is running as if the licence file is not present even though it is present. Also, I do not see any symlinks here in the workdir. They are all real files. On Tue, May 22, 2012 at 1:24 PM, Michael Wilde wrote: > If that path home/ketan/ketan_mars/MARS-LIC is being correctly copied to > the workdir (and I stand corrected: thats exactly what should happen) then > another possibility is that the program doesnt like getting a symlink for > the license file? Can you test that case externally (outside of Swift) > before we go further? > > You reported the problem as "...the executable still gets into error as if > the licence file is not present." > > The license file will appear to the MARS executable (and the wrapper > script) as a symlink (from the jobdir to the workdir, to use the > terminology f the Swift User Guide). > > If that is indeed the problem, your wrapper script might be able to get > around this with: > cp MARS-LIC tmplic > rm MARS-LIC > mv tmplic MARS-LIC > > Exactly what error is MARS generating for this problem? > > - Mike > > ----- Original Message ----- > > From: "Ketan Maheshwari" > > To: "Michael Wilde" > > Cc: "Swift User" > > Sent: Tuesday, May 22, 2012 12:01:49 PM > > Subject: Re: [Swift-user] Deep recursion on subroutine "main::stageout" > at /home/ketan/work/worker.pl line 1349 > > The line works fine because Swift creates the dir tree starting at > > /home but in the swift.workdir. With -v, I could see the file gets > > copied to the cwd and is present there. > > > > > > So, I assume that the wrapper script is not cd'ing me anywhere. So, it > > still is a mystery why the app complaint about the file not present > > when run from wrapper and it works when run manually in the same dir. > > > > On Tue, May 22, 2012 at 11:34 AM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > Isnt this line problematic if you dont know where the wrapper script > > has you cd'ed to: > > > > cp -v home/ketan/ketan_mars/MARS-LIC . > > ^^^ > > > > The relative path doesnt seem safe. > > > > > > - Mike > > > > > > ----- Original Message ----- > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > Cc: "Swift User" < swift-user at ci.uchicago.edu > > > > > > > > Sent: Tuesday, May 22, 2012 10:18:11 AM > > > Subject: Re: [Swift-user] Deep recursion on subroutine > > > "main::stageout" at /home/ketan/work/ worker.pl line 1349 > > > Looking this further, I now have a wrapper in place which copies the > > > licence file in the cwd before running the executable. However, the > > > executable still gets into error as if the licence file is not > > > present. > > > > > > > > > When I cd into this dir (swift.workdir/mars-20120519-1203-3l....) > > > and > > > manually run the executable, it works. > > > > > > > > > So, the question is does the _swiftwrap.staging does some internal > > > cd'ing before calling the executable? I will take a look inside, but > > > would be useful if someone knows this. > > > > > > > > > The wrapper script is simply the following two lines: > > > > > > > > > """ > > > cp -v home/ketan/ketan_mars/MARS-LIC . > > > /home/ketan/ketan_mars/marsMain $1 > > > """ > > > > > > > > > Regards, > > > Ketan > > > > > > > > > On Mon, May 21, 2012 at 7:51 PM, Michael Wilde < wilde at mcs.anl.gov > > > > wrote: > > > > > > > > > Im surprised that Swift isn't setting the current working dir (cwd) > > > to > > > be the job dir, but perhaps that's controlled by this property: > > > > > > # Determines if Swift remote wrappers will be executed by specifying > > > an > > > # absolute path, or a path relative to the job initial working > > > directory > > > # > > > # valid values: absolute, relative > > > # wrapper.invocation.mode=absolute > > > > > > Can you try your script with this property set to "relative"? > > > > > > ...but looking at this further: I see that if youre using coasters > > > with provider staging, the logic for job launch is quite different. > > > We > > > need to study this and get back to you. For now, best to force the > > > right cd's with a wrapper. You might be able to remove the wrapper > > > later, once we resolve how the job dir management should work in > > > these > > > various cases. > > > > > > > > > - Mike > > > > > > > > > ----- Original Message ----- > > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > Cc: "Swift User" < swift-user at ci.uchicago.edu > > > > > Sent: Monday, May 21, 2012 4:28:02 PM > > > > Subject: Re: [Swift-user] Deep recursion on subroutine > > > > "main::stageout" at /home/ketan/work/ worker.pl line 1349 > > > > > > > > Thanks Mike. Indeed the recursion was a warning. > > > > > > > > > > > > I found the problem was that the binary could not find the licence > > > > in > > > > the cwd from where it was being called. This is an application > > > > requirement that the licence file must be present in the cwd from > > > > where the call is made. > > > > > > > > > > > > However, Swift makes a dirtree in the workdir, stages the files > > > > and > > > > calls the binary from *outside* of this tree. Is it possible to > > > > make > > > > swift stage the licence file and put it on the top level without > > > > writing a wrapper to do a cp. Again, the point of not wrapping the > > > > binary into a script is to mimic the Hadoop setup as close as > > > > possible. > > > > > > > > > > > > On Mon, May 21, 2012 at 3:35 PM, Michael Wilde < wilde at mcs.anl.gov > > > > > > > > > wrote: > > > > > > > > > > > > Ketan, as far as I can tell, that message, coming from worker.pl , > > > > is > > > > > > > just a warning. > > > > > > > > Programing Perl sec 33, Diagnostic Messages: "Deep recursion on > > > > subroutine "%s" > > > > > > > > (W recursion) This subroutine has called itself (directly or > > > > indirectly) 100 times more than it has returned. This probably > > > > indicates an infinite recursion, unless you're writing strange > > > > benchmark programs, in which case it indicates something else." > > > > > > > > The stageout code in worker.pl is indeed recursive, and the > > > > warning > > > > could be suppressed: > > > > > > > > "Try placing > > > > > > > > no warnings 'recursion'; > > > > > > > > within the same scope as that code ..." > > > > > > > > Can you try a simple mod to catsn, using your ext mapper, to see > > > > if > > > > it > > > > is indeed failing due to the deeply recursive stageout? > > > > > > > > If you could dig a bit deeper into this, and see whether its > > > > really > > > > failing when staging back so many files or failing for some other, > > > > or > > > > related, reason, that would be great. > > > > > > > > Thanks, > > > > > > > > - Mike > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > > > To: "Swift User" < swift-user at ci.uchicago.edu > > > > > > Sent: Monday, May 21, 2012 1:54:34 PM > > > > > Subject: [Swift-user] Deep recursion on subroutine > > > > > "main::stageout" > > > > > > > > > > > at /home/ketan/work/ worker.pl line 1349 > > > > > Hi, > > > > > > > > > > > > > > > I am trying to run the GE mars script on a bag of workstations. > > > > > I > > > > > tested the script for a sufficient number of tasks and seems to > > > > > be > > > > > working fine on localhost. > > > > > > > > > > > > > > > However, it fails in this setup. I get the error message as > > > > > follows > > > > > after seemingly right invocation: > > > > > > > > > > > > > > > > > > > > > > > > > Find: keepalive(120), reconnect - http://128.84.97.46:41287 > > > > > Progress: time: Mon, 21 May 2012 14:43:18 -0400 Stage in:7 > > > > > Submitted:3 > > > > > Progress: time: Mon, 21 May 2012 14:43:19 -0400 Stage in:8 > > > > > Active:2 > > > > > Deep recursion on subroutine "main::stageout" at > > > > > /home/ketan/work/ > > > > > worker.pl line 1349. > > > > > Deep recursion on subroutine "main::stageout" at > > > > > /home/ketan/work/ > > > > > worker.pl line 1349. > > > > > Progress: time: Mon, 21 May 2012 14:43:20 -0400 Active:3 Stage > > > > > out:7 > > > > > > > > > > > > > > > Obviously the staging out of results fails and seems that the > > > > > number > > > > > of files in the stageout stage is causing the error. The > > > > > application > > > > > needs to stage out about 120 files. > > > > > > > > > > > > > > > One solution I could quickly think of is to wrap the app in a > > > > > shell > > > > > and zip the outputs making it just one staged out file. > > > > > > > > > > > > > > > However, the current setup would still be useful since we are > > > > > trying > > > > > to compare the existing Hadoop solution with the Swift one. > > > > > > > > > > > > > > > Is there any possible workaround, some env setting or so that I > > > > > could > > > > > try and get the stageout going? > > > > > > > > > > > > > > > The logs are: > > > > > http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log > > > > > and http://www.mcs.anl.gov/~ketan/workerlogs.tgz > > > > > > > > > > > > > > > > > > > > > > > > > Regards, -- > > > > > Ketan > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-user mailing list > > > > > Swift-user at ci.uchicago.edu > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > -- > > > > Michael Wilde > > > > Computation Institute, University of Chicago > > > > Mathematics and Computer Science Division > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Ketan > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > -- > > > Ketan > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > > > -- > > Ketan > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonmon at mcs.anl.gov Tue May 22 16:27:59 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Tue, 22 May 2012 16:27:59 -0500 Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: References: <120576366.11470.1337707495967.JavaMail.root@zimbra.anl.gov> Message-ID: <76A6ED59-1380-4BFA-AEA3-63F0E08181A7@mcs.anl.gov> The work dir and job dir are two separate things. The work dir is where swift sets ups the work directory. The job dir is where the job is run from. The job dir is in the jobs directory under the work dir. The job dir has symlinks to the data in the shared dir. On May 22, 2012, at 16:10, Ketan Maheshwari wrote: > Mike, > > The jobdir and the workdir are the same right? At least that is what the pwd in my marswrapper shows. > > The following is the stdout section of swiftwrap: > _____________________________________________________________________________ > > stdout > _____________________________________________________________________________ > > # pwd > /amd/camel/b/ketan/ketan_mars/swift.workdir/mars-20120522-1702-j6gtml62-k-marswrap-kcj9rork > > # cp -v home/ketan/ketan_mars/MARS-LIC . > `home/ketan/ketan_mars/MARS-LIC' -> `./MARS-LIC' > > # The error message thrown by mars" > <**> ERROR: *** Unable to open License Date File MARS-LIC *** > =================== > > This is why I said Mars is running as if the licence file is not present even though it is present. > > Also, I do not see any symlinks here in the workdir. They are all real files. > > On Tue, May 22, 2012 at 1:24 PM, Michael Wilde wrote: > If that path home/ketan/ketan_mars/MARS-LIC is being correctly copied to the workdir (and I stand corrected: thats exactly what should happen) then another possibility is that the program doesnt like getting a symlink for the license file? Can you test that case externally (outside of Swift) before we go further? > > You reported the problem as "...the executable still gets into error as if the licence file is not present." > > The license file will appear to the MARS executable (and the wrapper script) as a symlink (from the jobdir to the workdir, to use the terminology f the Swift User Guide). > > If that is indeed the problem, your wrapper script might be able to get around this with: > cp MARS-LIC tmplic > rm MARS-LIC > mv tmplic MARS-LIC > > Exactly what error is MARS generating for this problem? > > - Mike > > ----- Original Message ----- > > From: "Ketan Maheshwari" > > To: "Michael Wilde" > > Cc: "Swift User" > > Sent: Tuesday, May 22, 2012 12:01:49 PM > > Subject: Re: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 > > The line works fine because Swift creates the dir tree starting at > > /home but in the swift.workdir. With -v, I could see the file gets > > copied to the cwd and is present there. > > > > > > So, I assume that the wrapper script is not cd'ing me anywhere. So, it > > still is a mystery why the app complaint about the file not present > > when run from wrapper and it works when run manually in the same dir. > > > > On Tue, May 22, 2012 at 11:34 AM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > Isnt this line problematic if you dont know where the wrapper script > > has you cd'ed to: > > > > cp -v home/ketan/ketan_mars/MARS-LIC . > > ^^^ > > > > The relative path doesnt seem safe. > > > > > > - Mike > > > > > > ----- Original Message ----- > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > Cc: "Swift User" < swift-user at ci.uchicago.edu > > > > > > > > Sent: Tuesday, May 22, 2012 10:18:11 AM > > > Subject: Re: [Swift-user] Deep recursion on subroutine > > > "main::stageout" at /home/ketan/work/ worker.pl line 1349 > > > Looking this further, I now have a wrapper in place which copies the > > > licence file in the cwd before running the executable. However, the > > > executable still gets into error as if the licence file is not > > > present. > > > > > > > > > When I cd into this dir (swift.workdir/mars-20120519-1203-3l....) > > > and > > > manually run the executable, it works. > > > > > > > > > So, the question is does the _swiftwrap.staging does some internal > > > cd'ing before calling the executable? I will take a look inside, but > > > would be useful if someone knows this. > > > > > > > > > The wrapper script is simply the following two lines: > > > > > > > > > """ > > > cp -v home/ketan/ketan_mars/MARS-LIC . > > > /home/ketan/ketan_mars/marsMain $1 > > > """ > > > > > > > > > Regards, > > > Ketan > > > > > > > > > On Mon, May 21, 2012 at 7:51 PM, Michael Wilde < wilde at mcs.anl.gov > > > > wrote: > > > > > > > > > Im surprised that Swift isn't setting the current working dir (cwd) > > > to > > > be the job dir, but perhaps that's controlled by this property: > > > > > > # Determines if Swift remote wrappers will be executed by specifying > > > an > > > # absolute path, or a path relative to the job initial working > > > directory > > > # > > > # valid values: absolute, relative > > > # wrapper.invocation.mode=absolute > > > > > > Can you try your script with this property set to "relative"? > > > > > > ...but looking at this further: I see that if youre using coasters > > > with provider staging, the logic for job launch is quite different. > > > We > > > need to study this and get back to you. For now, best to force the > > > right cd's with a wrapper. You might be able to remove the wrapper > > > later, once we resolve how the job dir management should work in > > > these > > > various cases. > > > > > > > > > - Mike > > > > > > > > > ----- Original Message ----- > > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > Cc: "Swift User" < swift-user at ci.uchicago.edu > > > > > Sent: Monday, May 21, 2012 4:28:02 PM > > > > Subject: Re: [Swift-user] Deep recursion on subroutine > > > > "main::stageout" at /home/ketan/work/ worker.pl line 1349 > > > > > > > > Thanks Mike. Indeed the recursion was a warning. > > > > > > > > > > > > I found the problem was that the binary could not find the licence > > > > in > > > > the cwd from where it was being called. This is an application > > > > requirement that the licence file must be present in the cwd from > > > > where the call is made. > > > > > > > > > > > > However, Swift makes a dirtree in the workdir, stages the files > > > > and > > > > calls the binary from *outside* of this tree. Is it possible to > > > > make > > > > swift stage the licence file and put it on the top level without > > > > writing a wrapper to do a cp. Again, the point of not wrapping the > > > > binary into a script is to mimic the Hadoop setup as close as > > > > possible. > > > > > > > > > > > > On Mon, May 21, 2012 at 3:35 PM, Michael Wilde < wilde at mcs.anl.gov > > > > > > > > > wrote: > > > > > > > > > > > > Ketan, as far as I can tell, that message, coming from worker.pl , > > > > is > > > > > > > just a warning. > > > > > > > > Programing Perl sec 33, Diagnostic Messages: "Deep recursion on > > > > subroutine "%s" > > > > > > > > (W recursion) This subroutine has called itself (directly or > > > > indirectly) 100 times more than it has returned. This probably > > > > indicates an infinite recursion, unless you're writing strange > > > > benchmark programs, in which case it indicates something else." > > > > > > > > The stageout code in worker.pl is indeed recursive, and the > > > > warning > > > > could be suppressed: > > > > > > > > "Try placing > > > > > > > > no warnings 'recursion'; > > > > > > > > within the same scope as that code ..." > > > > > > > > Can you try a simple mod to catsn, using your ext mapper, to see > > > > if > > > > it > > > > is indeed failing due to the deeply recursive stageout? > > > > > > > > If you could dig a bit deeper into this, and see whether its > > > > really > > > > failing when staging back so many files or failing for some other, > > > > or > > > > related, reason, that would be great. > > > > > > > > Thanks, > > > > > > > > - Mike > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > > > To: "Swift User" < swift-user at ci.uchicago.edu > > > > > > Sent: Monday, May 21, 2012 1:54:34 PM > > > > > Subject: [Swift-user] Deep recursion on subroutine > > > > > "main::stageout" > > > > > > > > > > > at /home/ketan/work/ worker.pl line 1349 > > > > > Hi, > > > > > > > > > > > > > > > I am trying to run the GE mars script on a bag of workstations. > > > > > I > > > > > tested the script for a sufficient number of tasks and seems to > > > > > be > > > > > working fine on localhost. > > > > > > > > > > > > > > > However, it fails in this setup. I get the error message as > > > > > follows > > > > > after seemingly right invocation: > > > > > > > > > > > > > > > > > > > > > > > > > Find: keepalive(120), reconnect - http://128.84.97.46:41287 > > > > > Progress: time: Mon, 21 May 2012 14:43:18 -0400 Stage in:7 > > > > > Submitted:3 > > > > > Progress: time: Mon, 21 May 2012 14:43:19 -0400 Stage in:8 > > > > > Active:2 > > > > > Deep recursion on subroutine "main::stageout" at > > > > > /home/ketan/work/ > > > > > worker.pl line 1349. > > > > > Deep recursion on subroutine "main::stageout" at > > > > > /home/ketan/work/ > > > > > worker.pl line 1349. > > > > > Progress: time: Mon, 21 May 2012 14:43:20 -0400 Active:3 Stage > > > > > out:7 > > > > > > > > > > > > > > > Obviously the staging out of results fails and seems that the > > > > > number > > > > > of files in the stageout stage is causing the error. The > > > > > application > > > > > needs to stage out about 120 files. > > > > > > > > > > > > > > > One solution I could quickly think of is to wrap the app in a > > > > > shell > > > > > and zip the outputs making it just one staged out file. > > > > > > > > > > > > > > > However, the current setup would still be useful since we are > > > > > trying > > > > > to compare the existing Hadoop solution with the Swift one. > > > > > > > > > > > > > > > Is there any possible workaround, some env setting or so that I > > > > > could > > > > > try and get the stageout going? > > > > > > > > > > > > > > > The logs are: > > > > > http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log > > > > > and http://www.mcs.anl.gov/~ketan/workerlogs.tgz > > > > > > > > > > > > > > > > > > > > > > > > > Regards, -- > > > > > Ketan > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-user mailing list > > > > > Swift-user at ci.uchicago.edu > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > -- > > > > Michael Wilde > > > > Computation Institute, University of Chicago > > > > Mathematics and Computer Science Division > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Ketan > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > -- > > > Ketan > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > > > -- > > Ketan > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > > > -- > Ketan > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Tue May 22 17:25:48 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 22 May 2012 18:25:48 -0400 Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: <76A6ED59-1380-4BFA-AEA3-63F0E08181A7@mcs.anl.gov> References: <120576366.11470.1337707495967.JavaMail.root@zimbra.anl.gov> <76A6ED59-1380-4BFA-AEA3-63F0E08181A7@mcs.anl.gov> Message-ID: I do not see any dir named 'jobs' in my workdir: following is my workdir and its contents: $ pwd /home/ketan/ketan_mars/swift.workdir $ ls total 8.0K drwxrwxr-x 5 ketan 4.0K May 22 17:00 mars-20120522-1700-a0a4l957-e-marswrap-e696rork drwxrwxr-x 5 ketan 4.0K May 22 17:02 mars-20120522-1702-j6gtml62-k-marswrap-kcj9rork On Tue, May 22, 2012 at 5:27 PM, Jonathan Monette wrote: > The work dir and job dir are two separate things. The work dir is where > swift sets ups the work directory. The job dir is where the job is run > from. The job dir is in the jobs directory under the work dir. The job dir > has symlinks to the data in the shared dir. > > On May 22, 2012, at 16:10, Ketan Maheshwari > wrote: > > Mike, > > The jobdir and the workdir are the same right? At least that is what the > pwd in my marswrapper shows. > > The following is the stdout section of swiftwrap: > > _____________________________________________________________________________ > > stdout > > _____________________________________________________________________________ > > # pwd > > /amd/camel/b/ketan/ketan_mars/swift.workdir/mars-20120522-1702-j6gtml62-k-marswrap-kcj9rork > > # cp -v home/ketan/ketan_mars/MARS-LIC . > `home/ketan/ketan_mars/MARS-LIC' -> `./MARS-LIC' > > # The error message thrown by mars" > <**> ERROR: *** Unable to open License Date File MARS-LIC *** > =================== > > This is why I said Mars is running as if the licence file is not present > even though it is present. > > Also, I do not see any symlinks here in the workdir. They are all real > files. > > On Tue, May 22, 2012 at 1:24 PM, Michael Wilde wrote: > >> If that path home/ketan/ketan_mars/MARS-LIC is being correctly copied to >> the workdir (and I stand corrected: thats exactly what should happen) then >> another possibility is that the program doesnt like getting a symlink for >> the license file? Can you test that case externally (outside of Swift) >> before we go further? >> >> You reported the problem as "...the executable still gets into error as >> if the licence file is not present." >> >> The license file will appear to the MARS executable (and the wrapper >> script) as a symlink (from the jobdir to the workdir, to use the >> terminology f the Swift User Guide). >> >> If that is indeed the problem, your wrapper script might be able to get >> around this with: >> cp MARS-LIC tmplic >> rm MARS-LIC >> mv tmplic MARS-LIC >> >> Exactly what error is MARS generating for this problem? >> >> - Mike >> >> ----- Original Message ----- >> > From: "Ketan Maheshwari" >> > To: "Michael Wilde" >> > Cc: "Swift User" >> > Sent: Tuesday, May 22, 2012 12:01:49 PM >> > Subject: Re: [Swift-user] Deep recursion on subroutine "main::stageout" >> at /home/ketan/work/worker.pl line 1349 >> > The line works fine because Swift creates the dir tree starting at >> > /home but in the swift.workdir. With -v, I could see the file gets >> > copied to the cwd and is present there. >> > >> > >> > So, I assume that the wrapper script is not cd'ing me anywhere. So, it >> > still is a mystery why the app complaint about the file not present >> > when run from wrapper and it works when run manually in the same dir. >> > >> > On Tue, May 22, 2012 at 11:34 AM, Michael Wilde < wilde at mcs.anl.gov > >> > wrote: >> > >> > >> > Isnt this line problematic if you dont know where the wrapper script >> > has you cd'ed to: >> > >> > cp -v home/ketan/ketan_mars/MARS-LIC . >> > ^^^ >> > >> > The relative path doesnt seem safe. >> > >> > >> > - Mike >> > >> > >> > ----- Original Message ----- >> > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > >> > > To: "Michael Wilde" < wilde at mcs.anl.gov > >> > > Cc: "Swift User" < swift-user at ci.uchicago.edu > >> > >> > >> > > Sent: Tuesday, May 22, 2012 10:18:11 AM >> > > Subject: Re: [Swift-user] Deep recursion on subroutine >> > > "main::stageout" at /home/ketan/work/ worker.pl line 1349 >> > > Looking this further, I now have a wrapper in place which copies the >> > > licence file in the cwd before running the executable. However, the >> > > executable still gets into error as if the licence file is not >> > > present. >> > > >> > > >> > > When I cd into this dir (swift.workdir/mars-20120519-1203-3l....) >> > > and >> > > manually run the executable, it works. >> > > >> > > >> > > So, the question is does the _swiftwrap.staging does some internal >> > > cd'ing before calling the executable? I will take a look inside, but >> > > would be useful if someone knows this. >> > > >> > > >> > > The wrapper script is simply the following two lines: >> > > >> > > >> > > """ >> > > cp -v home/ketan/ketan_mars/MARS-LIC . >> > > /home/ketan/ketan_mars/marsMain $1 >> > > """ >> > > >> > > >> > > Regards, >> > > Ketan >> > > >> > > >> > > On Mon, May 21, 2012 at 7:51 PM, Michael Wilde < wilde at mcs.anl.gov > >> > > wrote: >> > > >> > > >> > > Im surprised that Swift isn't setting the current working dir (cwd) >> > > to >> > > be the job dir, but perhaps that's controlled by this property: >> > > >> > > # Determines if Swift remote wrappers will be executed by specifying >> > > an >> > > # absolute path, or a path relative to the job initial working >> > > directory >> > > # >> > > # valid values: absolute, relative >> > > # wrapper.invocation.mode=absolute >> > > >> > > Can you try your script with this property set to "relative"? >> > > >> > > ...but looking at this further: I see that if youre using coasters >> > > with provider staging, the logic for job launch is quite different. >> > > We >> > > need to study this and get back to you. For now, best to force the >> > > right cd's with a wrapper. You might be able to remove the wrapper >> > > later, once we resolve how the job dir management should work in >> > > these >> > > various cases. >> > > >> > > >> > > - Mike >> > > >> > > >> > > ----- Original Message ----- >> > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > >> > > >> > > > To: "Michael Wilde" < wilde at mcs.anl.gov > >> > > > Cc: "Swift User" < swift-user at ci.uchicago.edu > >> > > > Sent: Monday, May 21, 2012 4:28:02 PM >> > > > Subject: Re: [Swift-user] Deep recursion on subroutine >> > > > "main::stageout" at /home/ketan/work/ worker.pl line 1349 >> > >> > >> > > > Thanks Mike. Indeed the recursion was a warning. >> > > > >> > > > >> > > > I found the problem was that the binary could not find the licence >> > > > in >> > > > the cwd from where it was being called. This is an application >> > > > requirement that the licence file must be present in the cwd from >> > > > where the call is made. >> > > > >> > > > >> > > > However, Swift makes a dirtree in the workdir, stages the files >> > > > and >> > > > calls the binary from *outside* of this tree. Is it possible to >> > > > make >> > > > swift stage the licence file and put it on the top level without >> > > > writing a wrapper to do a cp. Again, the point of not wrapping the >> > > > binary into a script is to mimic the Hadoop setup as close as >> > > > possible. >> > > > >> > > > >> > > > On Mon, May 21, 2012 at 3:35 PM, Michael Wilde < wilde at mcs.anl.gov >> > > > > >> > > > wrote: >> > > > >> > > > >> > > > Ketan, as far as I can tell, that message, coming from worker.pl , >> > > > is >> > > >> > > > just a warning. >> > > > >> > > > Programing Perl sec 33, Diagnostic Messages: "Deep recursion on >> > > > subroutine "%s" >> > > > >> > > > (W recursion) This subroutine has called itself (directly or >> > > > indirectly) 100 times more than it has returned. This probably >> > > > indicates an infinite recursion, unless you're writing strange >> > > > benchmark programs, in which case it indicates something else." >> > > > >> > > > The stageout code in worker.pl is indeed recursive, and the >> > > > warning >> > > > could be suppressed: >> > > > >> > > > "Try placing >> > > > >> > > > no warnings 'recursion'; >> > > > >> > > > within the same scope as that code ..." >> > > > >> > > > Can you try a simple mod to catsn, using your ext mapper, to see >> > > > if >> > > > it >> > > > is indeed failing due to the deeply recursive stageout? >> > > > >> > > > If you could dig a bit deeper into this, and see whether its >> > > > really >> > > > failing when staging back so many files or failing for some other, >> > > > or >> > > > related, reason, that would be great. >> > > > >> > > > Thanks, >> > > > >> > > > - Mike >> > > > >> > > > >> > > > >> > > > ----- Original Message ----- >> > > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > >> > > > > To: "Swift User" < swift-user at ci.uchicago.edu > >> > > > > Sent: Monday, May 21, 2012 1:54:34 PM >> > > > > Subject: [Swift-user] Deep recursion on subroutine >> > > > > "main::stageout" >> > > >> > > >> > > > > at /home/ketan/work/ worker.pl line 1349 >> > > > > Hi, >> > > > > >> > > > > >> > > > > I am trying to run the GE mars script on a bag of workstations. >> > > > > I >> > > > > tested the script for a sufficient number of tasks and seems to >> > > > > be >> > > > > working fine on localhost. >> > > > > >> > > > > >> > > > > However, it fails in this setup. I get the error message as >> > > > > follows >> > > > > after seemingly right invocation: >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > Find: keepalive(120), reconnect - http://128.84.97.46:41287 >> > > > > Progress: time: Mon, 21 May 2012 14:43:18 -0400 Stage in:7 >> > > > > Submitted:3 >> > > > > Progress: time: Mon, 21 May 2012 14:43:19 -0400 Stage in:8 >> > > > > Active:2 >> > > > > Deep recursion on subroutine "main::stageout" at >> > > > > /home/ketan/work/ >> > > > > worker.pl line 1349. >> > > > > Deep recursion on subroutine "main::stageout" at >> > > > > /home/ketan/work/ >> > > > > worker.pl line 1349. >> > > > > Progress: time: Mon, 21 May 2012 14:43:20 -0400 Active:3 Stage >> > > > > out:7 >> > > > > >> > > > > >> > > > > Obviously the staging out of results fails and seems that the >> > > > > number >> > > > > of files in the stageout stage is causing the error. The >> > > > > application >> > > > > needs to stage out about 120 files. >> > > > > >> > > > > >> > > > > One solution I could quickly think of is to wrap the app in a >> > > > > shell >> > > > > and zip the outputs making it just one staged out file. >> > > > > >> > > > > >> > > > > However, the current setup would still be useful since we are >> > > > > trying >> > > > > to compare the existing Hadoop solution with the Swift one. >> > > > > >> > > > > >> > > > > Is there any possible workaround, some env setting or so that I >> > > > > could >> > > > > try and get the stageout going? >> > > > > >> > > > > >> > > > > The logs are: >> > > > > http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log >> > > > > and http://www.mcs.anl.gov/~ketan/workerlogs.tgz >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > Regards, -- >> > > > > Ketan >> > > > > >> > > > > >> > > > > >> > > > > _______________________________________________ >> > > > > Swift-user mailing list >> > > > > Swift-user at ci.uchicago.edu >> > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > > > >> > > > -- >> > > > Michael Wilde >> > > > Computation Institute, University of Chicago >> > > > Mathematics and Computer Science Division >> > > > Argonne National Laboratory >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > Ketan >> > > >> > > -- >> > > Michael Wilde >> > > Computation Institute, University of Chicago >> > > Mathematics and Computer Science Division >> > > Argonne National Laboratory >> > > >> > > >> > > >> > > >> > > >> > > -- >> > > Ketan >> > >> > -- >> > Michael Wilde >> > Computation Institute, University of Chicago >> > Mathematics and Computer Science Division >> > Argonne National Laboratory >> > >> > >> > >> > >> > >> > -- >> > Ketan >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> > > > -- > Ketan > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonmon at mcs.anl.gov Tue May 22 17:33:13 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Tue, 22 May 2012 17:33:13 -0500 Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: References: <120576366.11470.1337707495967.JavaMail.root@zimbra.anl.gov> <76A6ED59-1380-4BFA-AEA3-63F0E08181A7@mcs.anl.gov> Message-ID: <33D5B64E-734F-49A7-B479-9B0808FFDCA4@mcs.anl.gov> The work dir tells swift where to put the work dir. There should be a jobs dir in one of those directory. On May 22, 2012, at 17:25, Ketan Maheshwari wrote: > I do not see any dir named 'jobs' in my workdir: > > following is my workdir and its contents: > $ pwd > /home/ketan/ketan_mars/swift.workdir > $ ls > total 8.0K > drwxrwxr-x 5 ketan 4.0K May 22 17:00 mars-20120522-1700-a0a4l957-e-marswrap-e696rork > drwxrwxr-x 5 ketan 4.0K May 22 17:02 mars-20120522-1702-j6gtml62-k-marswrap-kcj9rork > > > On Tue, May 22, 2012 at 5:27 PM, Jonathan Monette wrote: > The work dir and job dir are two separate things. The work dir is where swift sets ups the work directory. The job dir is where the job is run from. The job dir is in the jobs directory under the work dir. The job dir has symlinks to the data in the shared dir. > > On May 22, 2012, at 16:10, Ketan Maheshwari wrote: > >> Mike, >> >> The jobdir and the workdir are the same right? At least that is what the pwd in my marswrapper shows. >> >> The following is the stdout section of swiftwrap: >> _____________________________________________________________________________ >> >> stdout >> _____________________________________________________________________________ >> >> # pwd >> /amd/camel/b/ketan/ketan_mars/swift.workdir/mars-20120522-1702-j6gtml62-k-marswrap-kcj9rork >> >> # cp -v home/ketan/ketan_mars/MARS-LIC . >> `home/ketan/ketan_mars/MARS-LIC' -> `./MARS-LIC' >> >> # The error message thrown by mars" >> <**> ERROR: *** Unable to open License Date File MARS-LIC *** >> =================== >> >> This is why I said Mars is running as if the licence file is not present even though it is present. >> >> Also, I do not see any symlinks here in the workdir. They are all real files. >> >> On Tue, May 22, 2012 at 1:24 PM, Michael Wilde wrote: >> If that path home/ketan/ketan_mars/MARS-LIC is being correctly copied to the workdir (and I stand corrected: thats exactly what should happen) then another possibility is that the program doesnt like getting a symlink for the license file? Can you test that case externally (outside of Swift) before we go further? >> >> You reported the problem as "...the executable still gets into error as if the licence file is not present." >> >> The license file will appear to the MARS executable (and the wrapper script) as a symlink (from the jobdir to the workdir, to use the terminology f the Swift User Guide). >> >> If that is indeed the problem, your wrapper script might be able to get around this with: >> cp MARS-LIC tmplic >> rm MARS-LIC >> mv tmplic MARS-LIC >> >> Exactly what error is MARS generating for this problem? >> >> - Mike >> >> ----- Original Message ----- >> > From: "Ketan Maheshwari" >> > To: "Michael Wilde" >> > Cc: "Swift User" >> > Sent: Tuesday, May 22, 2012 12:01:49 PM >> > Subject: Re: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 >> > The line works fine because Swift creates the dir tree starting at >> > /home but in the swift.workdir. With -v, I could see the file gets >> > copied to the cwd and is present there. >> > >> > >> > So, I assume that the wrapper script is not cd'ing me anywhere. So, it >> > still is a mystery why the app complaint about the file not present >> > when run from wrapper and it works when run manually in the same dir. >> > >> > On Tue, May 22, 2012 at 11:34 AM, Michael Wilde < wilde at mcs.anl.gov > >> > wrote: >> > >> > >> > Isnt this line problematic if you dont know where the wrapper script >> > has you cd'ed to: >> > >> > cp -v home/ketan/ketan_mars/MARS-LIC . >> > ^^^ >> > >> > The relative path doesnt seem safe. >> > >> > >> > - Mike >> > >> > >> > ----- Original Message ----- >> > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > >> > > To: "Michael Wilde" < wilde at mcs.anl.gov > >> > > Cc: "Swift User" < swift-user at ci.uchicago.edu > >> > >> > >> > > Sent: Tuesday, May 22, 2012 10:18:11 AM >> > > Subject: Re: [Swift-user] Deep recursion on subroutine >> > > "main::stageout" at /home/ketan/work/ worker.pl line 1349 >> > > Looking this further, I now have a wrapper in place which copies the >> > > licence file in the cwd before running the executable. However, the >> > > executable still gets into error as if the licence file is not >> > > present. >> > > >> > > >> > > When I cd into this dir (swift.workdir/mars-20120519-1203-3l....) >> > > and >> > > manually run the executable, it works. >> > > >> > > >> > > So, the question is does the _swiftwrap.staging does some internal >> > > cd'ing before calling the executable? I will take a look inside, but >> > > would be useful if someone knows this. >> > > >> > > >> > > The wrapper script is simply the following two lines: >> > > >> > > >> > > """ >> > > cp -v home/ketan/ketan_mars/MARS-LIC . >> > > /home/ketan/ketan_mars/marsMain $1 >> > > """ >> > > >> > > >> > > Regards, >> > > Ketan >> > > >> > > >> > > On Mon, May 21, 2012 at 7:51 PM, Michael Wilde < wilde at mcs.anl.gov > >> > > wrote: >> > > >> > > >> > > Im surprised that Swift isn't setting the current working dir (cwd) >> > > to >> > > be the job dir, but perhaps that's controlled by this property: >> > > >> > > # Determines if Swift remote wrappers will be executed by specifying >> > > an >> > > # absolute path, or a path relative to the job initial working >> > > directory >> > > # >> > > # valid values: absolute, relative >> > > # wrapper.invocation.mode=absolute >> > > >> > > Can you try your script with this property set to "relative"? >> > > >> > > ...but looking at this further: I see that if youre using coasters >> > > with provider staging, the logic for job launch is quite different. >> > > We >> > > need to study this and get back to you. For now, best to force the >> > > right cd's with a wrapper. You might be able to remove the wrapper >> > > later, once we resolve how the job dir management should work in >> > > these >> > > various cases. >> > > >> > > >> > > - Mike >> > > >> > > >> > > ----- Original Message ----- >> > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > >> > > >> > > > To: "Michael Wilde" < wilde at mcs.anl.gov > >> > > > Cc: "Swift User" < swift-user at ci.uchicago.edu > >> > > > Sent: Monday, May 21, 2012 4:28:02 PM >> > > > Subject: Re: [Swift-user] Deep recursion on subroutine >> > > > "main::stageout" at /home/ketan/work/ worker.pl line 1349 >> > >> > >> > > > Thanks Mike. Indeed the recursion was a warning. >> > > > >> > > > >> > > > I found the problem was that the binary could not find the licence >> > > > in >> > > > the cwd from where it was being called. This is an application >> > > > requirement that the licence file must be present in the cwd from >> > > > where the call is made. >> > > > >> > > > >> > > > However, Swift makes a dirtree in the workdir, stages the files >> > > > and >> > > > calls the binary from *outside* of this tree. Is it possible to >> > > > make >> > > > swift stage the licence file and put it on the top level without >> > > > writing a wrapper to do a cp. Again, the point of not wrapping the >> > > > binary into a script is to mimic the Hadoop setup as close as >> > > > possible. >> > > > >> > > > >> > > > On Mon, May 21, 2012 at 3:35 PM, Michael Wilde < wilde at mcs.anl.gov >> > > > > >> > > > wrote: >> > > > >> > > > >> > > > Ketan, as far as I can tell, that message, coming from worker.pl , >> > > > is >> > > >> > > > just a warning. >> > > > >> > > > Programing Perl sec 33, Diagnostic Messages: "Deep recursion on >> > > > subroutine "%s" >> > > > >> > > > (W recursion) This subroutine has called itself (directly or >> > > > indirectly) 100 times more than it has returned. This probably >> > > > indicates an infinite recursion, unless you're writing strange >> > > > benchmark programs, in which case it indicates something else." >> > > > >> > > > The stageout code in worker.pl is indeed recursive, and the >> > > > warning >> > > > could be suppressed: >> > > > >> > > > "Try placing >> > > > >> > > > no warnings 'recursion'; >> > > > >> > > > within the same scope as that code ..." >> > > > >> > > > Can you try a simple mod to catsn, using your ext mapper, to see >> > > > if >> > > > it >> > > > is indeed failing due to the deeply recursive stageout? >> > > > >> > > > If you could dig a bit deeper into this, and see whether its >> > > > really >> > > > failing when staging back so many files or failing for some other, >> > > > or >> > > > related, reason, that would be great. >> > > > >> > > > Thanks, >> > > > >> > > > - Mike >> > > > >> > > > >> > > > >> > > > ----- Original Message ----- >> > > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > >> > > > > To: "Swift User" < swift-user at ci.uchicago.edu > >> > > > > Sent: Monday, May 21, 2012 1:54:34 PM >> > > > > Subject: [Swift-user] Deep recursion on subroutine >> > > > > "main::stageout" >> > > >> > > >> > > > > at /home/ketan/work/ worker.pl line 1349 >> > > > > Hi, >> > > > > >> > > > > >> > > > > I am trying to run the GE mars script on a bag of workstations. >> > > > > I >> > > > > tested the script for a sufficient number of tasks and seems to >> > > > > be >> > > > > working fine on localhost. >> > > > > >> > > > > >> > > > > However, it fails in this setup. I get the error message as >> > > > > follows >> > > > > after seemingly right invocation: >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > Find: keepalive(120), reconnect - http://128.84.97.46:41287 >> > > > > Progress: time: Mon, 21 May 2012 14:43:18 -0400 Stage in:7 >> > > > > Submitted:3 >> > > > > Progress: time: Mon, 21 May 2012 14:43:19 -0400 Stage in:8 >> > > > > Active:2 >> > > > > Deep recursion on subroutine "main::stageout" at >> > > > > /home/ketan/work/ >> > > > > worker.pl line 1349. >> > > > > Deep recursion on subroutine "main::stageout" at >> > > > > /home/ketan/work/ >> > > > > worker.pl line 1349. >> > > > > Progress: time: Mon, 21 May 2012 14:43:20 -0400 Active:3 Stage >> > > > > out:7 >> > > > > >> > > > > >> > > > > Obviously the staging out of results fails and seems that the >> > > > > number >> > > > > of files in the stageout stage is causing the error. The >> > > > > application >> > > > > needs to stage out about 120 files. >> > > > > >> > > > > >> > > > > One solution I could quickly think of is to wrap the app in a >> > > > > shell >> > > > > and zip the outputs making it just one staged out file. >> > > > > >> > > > > >> > > > > However, the current setup would still be useful since we are >> > > > > trying >> > > > > to compare the existing Hadoop solution with the Swift one. >> > > > > >> > > > > >> > > > > Is there any possible workaround, some env setting or so that I >> > > > > could >> > > > > try and get the stageout going? >> > > > > >> > > > > >> > > > > The logs are: >> > > > > http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log >> > > > > and http://www.mcs.anl.gov/~ketan/workerlogs.tgz >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > Regards, -- >> > > > > Ketan >> > > > > >> > > > > >> > > > > >> > > > > _______________________________________________ >> > > > > Swift-user mailing list >> > > > > Swift-user at ci.uchicago.edu >> > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > > > >> > > > -- >> > > > Michael Wilde >> > > > Computation Institute, University of Chicago >> > > > Mathematics and Computer Science Division >> > > > Argonne National Laboratory >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > Ketan >> > > >> > > -- >> > > Michael Wilde >> > > Computation Institute, University of Chicago >> > > Mathematics and Computer Science Division >> > > Argonne National Laboratory >> > > >> > > >> > > >> > > >> > > >> > > -- >> > > Ketan >> > >> > -- >> > Michael Wilde >> > Computation Institute, University of Chicago >> > Mathematics and Computer Science Division >> > Argonne National Laboratory >> > >> > >> > >> > >> > >> > -- >> > Ketan >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> >> >> >> -- >> Ketan >> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > -- > Ketan > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue May 22 17:42:16 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 22 May 2012 15:42:16 -0700 Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: References: <120576366.11470.1337707495967.JavaMail.root@zimbra.anl.gov> Message-ID: <1337726536.16795.4.camel@blabla> With provider staging the directory where stuff gets run in (i.e. job CWD) is set through the sumbmission protocol. In other words, _swiftwrap.stagiing gets run there. _swiftwrap.staging does not change directories. The environment might be different between a swift run and a manual login and run. Is there maybe something in the environment that your app uses to look up the license file? Mihael On Tue, 2012-05-22 at 17:10 -0400, Ketan Maheshwari wrote: > Mike, > > > The jobdir and the workdir are the same right? At least that is what > the pwd in my marswrapper shows. > > > The following is the stdout section of swiftwrap: > _____________________________________________________________________________ > > > stdout > _____________________________________________________________________________ > > > # pwd > /amd/camel/b/ketan/ketan_mars/swift.workdir/mars-20120522-1702-j6gtml62-k-marswrap-kcj9rork > > > # cp -v home/ketan/ketan_mars/MARS-LIC . > `home/ketan/ketan_mars/MARS-LIC' -> `./MARS-LIC' > > > # The error message thrown by mars" > <**> ERROR: *** Unable to open License Date File MARS-LIC *** > =================== > > > This is why I said Mars is running as if the licence file is not > present even though it is present. > > > Also, I do not see any symlinks here in the workdir. They are all real > files. > > On Tue, May 22, 2012 at 1:24 PM, Michael Wilde > wrote: > If that path home/ketan/ketan_mars/MARS-LIC is being correctly > copied to the workdir (and I stand corrected: thats exactly > what should happen) then another possibility is that the > program doesnt like getting a symlink for the license file? > Can you test that case externally (outside of Swift) before > we go further? > > You reported the problem as "...the executable still gets into > error as if the licence file is not present." > > The license file will appear to the MARS executable (and the > wrapper script) as a symlink (from the jobdir to the workdir, > to use the terminology f the Swift User Guide). > > If that is indeed the problem, your wrapper script might be > able to get around this with: > cp MARS-LIC tmplic > rm MARS-LIC > mv tmplic MARS-LIC > > Exactly what error is MARS generating for this problem? > > - Mike > > ----- Original Message ----- > > From: "Ketan Maheshwari" > > To: "Michael Wilde" > > Cc: "Swift User" > > > Sent: Tuesday, May 22, 2012 12:01:49 PM > > Subject: Re: [Swift-user] Deep recursion on subroutine > "main::stageout" at /home/ketan/work/worker.pl line 1349 > > > The line works fine because Swift creates the dir tree > starting at > > /home but in the swift.workdir. With -v, I could see the > file gets > > copied to the cwd and is present there. > > > > > > So, I assume that the wrapper script is not cd'ing me > anywhere. So, it > > still is a mystery why the app complaint about the file not > present > > when run from wrapper and it works when run manually in the > same dir. > > > > On Tue, May 22, 2012 at 11:34 AM, Michael Wilde < > wilde at mcs.anl.gov > > > wrote: > > > > > > Isnt this line problematic if you dont know where the > wrapper script > > has you cd'ed to: > > > > cp -v home/ketan/ketan_mars/MARS-LIC . > > ^^^ > > > > The relative path doesnt seem safe. > > > > > > - Mike > > > > > > ----- Original Message ----- > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > Cc: "Swift User" < swift-user at ci.uchicago.edu > > > > > > > > Sent: Tuesday, May 22, 2012 10:18:11 AM > > > Subject: Re: [Swift-user] Deep recursion on subroutine > > > > "main::stageout" at /home/ketan/work/ worker.pl line 1349 > > > Looking this further, I now have a wrapper in place which > copies the > > > licence file in the cwd before running the executable. > However, the > > > executable still gets into error as if the licence file is > not > > > present. > > > > > > > > > When I cd into this dir > (swift.workdir/mars-20120519-1203-3l....) > > > and > > > manually run the executable, it works. > > > > > > > > > So, the question is does the _swiftwrap.staging does some > internal > > > cd'ing before calling the executable? I will take a look > inside, but > > > would be useful if someone knows this. > > > > > > > > > The wrapper script is simply the following two lines: > > > > > > > > > """ > > > cp -v home/ketan/ketan_mars/MARS-LIC . > > > /home/ketan/ketan_mars/marsMain $1 > > > """ > > > > > > > > > Regards, > > > Ketan > > > > > > > > > On Mon, May 21, 2012 at 7:51 PM, Michael Wilde < > wilde at mcs.anl.gov > > > > wrote: > > > > > > > > > Im surprised that Swift isn't setting the current working > dir (cwd) > > > to > > > be the job dir, but perhaps that's controlled by this > property: > > > > > > # Determines if Swift remote wrappers will be executed by > specifying > > > an > > > # absolute path, or a path relative to the job initial > working > > > directory > > > # > > > # valid values: absolute, relative > > > # wrapper.invocation.mode=absolute > > > > > > Can you try your script with this property set to > "relative"? > > > > > > ...but looking at this further: I see that if youre using > coasters > > > with provider staging, the logic for job launch is quite > different. > > > We > > > need to study this and get back to you. For now, best to > force the > > > right cd's with a wrapper. You might be able to remove the > wrapper > > > later, once we resolve how the job dir management should > work in > > > these > > > various cases. > > > > > > > > > - Mike > > > > > > > > > ----- Original Message ----- > > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > Cc: "Swift User" < swift-user at ci.uchicago.edu > > > > > Sent: Monday, May 21, 2012 4:28:02 PM > > > > Subject: Re: [Swift-user] Deep recursion on subroutine > > > > "main::stageout" at /home/ketan/work/ worker.pl line > 1349 > > > > > > > > Thanks Mike. Indeed the recursion was a warning. > > > > > > > > > > > > I found the problem was that the binary could not find > the licence > > > > in > > > > the cwd from where it was being called. This is an > application > > > > requirement that the licence file must be present in the > cwd from > > > > where the call is made. > > > > > > > > > > > > However, Swift makes a dirtree in the workdir, stages > the files > > > > and > > > > calls the binary from *outside* of this tree. Is it > possible to > > > > make > > > > swift stage the licence file and put it on the top level > without > > > > writing a wrapper to do a cp. Again, the point of not > wrapping the > > > > binary into a script is to mimic the Hadoop setup as > close as > > > > possible. > > > > > > > > > > > > On Mon, May 21, 2012 at 3:35 PM, Michael Wilde < > wilde at mcs.anl.gov > > > > > > > > > wrote: > > > > > > > > > > > > Ketan, as far as I can tell, that message, coming from > worker.pl , > > > > is > > > > > > > just a warning. > > > > > > > > Programing Perl sec 33, Diagnostic Messages: "Deep > recursion on > > > > subroutine "%s" > > > > > > > > (W recursion) This subroutine has called itself > (directly or > > > > indirectly) 100 times more than it has returned. This > probably > > > > indicates an infinite recursion, unless you're writing > strange > > > > benchmark programs, in which case it indicates something > else." > > > > > > > > The stageout code in worker.pl is indeed recursive, and > the > > > > warning > > > > could be suppressed: > > > > > > > > "Try placing > > > > > > > > no warnings 'recursion'; > > > > > > > > within the same scope as that code ..." > > > > > > > > Can you try a simple mod to catsn, using your ext > mapper, to see > > > > if > > > > it > > > > is indeed failing due to the deeply recursive stageout? > > > > > > > > If you could dig a bit deeper into this, and see whether > its > > > > really > > > > failing when staging back so many files or failing for > some other, > > > > or > > > > related, reason, that would be great. > > > > > > > > Thanks, > > > > > > > > - Mike > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > > > > To: "Swift User" < swift-user at ci.uchicago.edu > > > > > > Sent: Monday, May 21, 2012 1:54:34 PM > > > > > Subject: [Swift-user] Deep recursion on subroutine > > > > > "main::stageout" > > > > > > > > > > > at /home/ketan/work/ worker.pl line 1349 > > > > > Hi, > > > > > > > > > > > > > > > I am trying to run the GE mars script on a bag of > workstations. > > > > > I > > > > > tested the script for a sufficient number of tasks and > seems to > > > > > be > > > > > working fine on localhost. > > > > > > > > > > > > > > > However, it fails in this setup. I get the error > message as > > > > > follows > > > > > after seemingly right invocation: > > > > > > > > > > > > > > > > > > > > > > > > > Find: keepalive(120), reconnect - > http://128.84.97.46:41287 > > > > > Progress: time: Mon, 21 May 2012 14:43:18 -0400 Stage > in:7 > > > > > Submitted:3 > > > > > Progress: time: Mon, 21 May 2012 14:43:19 -0400 Stage > in:8 > > > > > Active:2 > > > > > Deep recursion on subroutine "main::stageout" at > > > > > /home/ketan/work/ > > > > > worker.pl line 1349. > > > > > Deep recursion on subroutine "main::stageout" at > > > > > /home/ketan/work/ > > > > > worker.pl line 1349. > > > > > Progress: time: Mon, 21 May 2012 14:43:20 -0400 > Active:3 Stage > > > > > out:7 > > > > > > > > > > > > > > > Obviously the staging out of results fails and seems > that the > > > > > number > > > > > of files in the stageout stage is causing the error. > The > > > > > application > > > > > needs to stage out about 120 files. > > > > > > > > > > > > > > > One solution I could quickly think of is to wrap the > app in a > > > > > shell > > > > > and zip the outputs making it just one staged out > file. > > > > > > > > > > > > > > > However, the current setup would still be useful since > we are > > > > > trying > > > > > to compare the existing Hadoop solution with the Swift > one. > > > > > > > > > > > > > > > Is there any possible workaround, some env setting or > so that I > > > > > could > > > > > try and get the stageout going? > > > > > > > > > > > > > > > The logs are: > > > > > > http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log > > > > > and http://www.mcs.anl.gov/~ketan/workerlogs.tgz > > > > > > > > > > > > > > > > > > > > > > > > > Regards, -- > > > > > Ketan > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-user mailing list > > > > > Swift-user at ci.uchicago.edu > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > -- > > > > Michael Wilde > > > > Computation Institute, University of Chicago > > > > Mathematics and Computer Science Division > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Ketan > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > -- > > > Ketan > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > > > -- > > Ketan > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > > > > > -- > Ketan > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From wilde at mcs.anl.gov Tue May 22 18:07:19 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 22 May 2012 18:07:19 -0500 (CDT) Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: <1337726536.16795.4.camel@blabla> Message-ID: <1388466172.12176.1337728039136.JavaMail.root@zimbra.anl.gov> And, Ketan: can you put an ls -l and pwd in your wrapper script, to get some more diagnostic info? ----- Original Message ----- > From: "Mihael Hategan" > To: "Ketan Maheshwari" > Cc: "Michael Wilde" , "Swift User" > Sent: Tuesday, May 22, 2012 5:42:16 PM > Subject: Re: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 > With provider staging the directory where stuff gets run in (i.e. job > CWD) is set through the sumbmission protocol. In other words, > _swiftwrap.stagiing gets run there. > > _swiftwrap.staging does not change directories. > > The environment might be different between a swift run and a manual > login and run. Is there maybe something in the environment that your > app > uses to look up the license file? > > Mihael > > On Tue, 2012-05-22 at 17:10 -0400, Ketan Maheshwari wrote: > > Mike, > > > > > > The jobdir and the workdir are the same right? At least that is what > > the pwd in my marswrapper shows. > > > > > > The following is the stdout section of swiftwrap: > > _____________________________________________________________________________ > > > > > > stdout > > _____________________________________________________________________________ > > > > > > # pwd > > /amd/camel/b/ketan/ketan_mars/swift.workdir/mars-20120522-1702-j6gtml62-k-marswrap-kcj9rork > > > > > > # cp -v home/ketan/ketan_mars/MARS-LIC . > > `home/ketan/ketan_mars/MARS-LIC' -> `./MARS-LIC' > > > > > > # The error message thrown by mars" > > <**> ERROR: *** Unable to open License Date File MARS-LIC *** > > =================== > > > > > > This is why I said Mars is running as if the licence file is not > > present even though it is present. > > > > > > Also, I do not see any symlinks here in the workdir. They are all > > real > > files. > > > > On Tue, May 22, 2012 at 1:24 PM, Michael Wilde > > wrote: > > If that path home/ketan/ketan_mars/MARS-LIC is being > > correctly > > copied to the workdir (and I stand corrected: thats exactly > > what should happen) then another possibility is that the > > program doesnt like getting a symlink for the license file? > > Can you test that case externally (outside of Swift) before > > we go further? > > > > You reported the problem as "...the executable still gets > > into > > error as if the licence file is not present." > > > > The license file will appear to the MARS executable (and the > > wrapper script) as a symlink (from the jobdir to the > > workdir, > > to use the terminology f the Swift User Guide). > > > > If that is indeed the problem, your wrapper script might be > > able to get around this with: > > cp MARS-LIC tmplic > > rm MARS-LIC > > mv tmplic MARS-LIC > > > > Exactly what error is MARS generating for this problem? > > > > - Mike > > > > ----- Original Message ----- > > > From: "Ketan Maheshwari" > > > To: "Michael Wilde" > > > Cc: "Swift User" > > > > > Sent: Tuesday, May 22, 2012 12:01:49 PM > > > Subject: Re: [Swift-user] Deep recursion on subroutine > > "main::stageout" at /home/ketan/work/worker.pl line 1349 > > > > > The line works fine because Swift creates the dir tree > > starting at > > > /home but in the swift.workdir. With -v, I could see the > > file gets > > > copied to the cwd and is present there. > > > > > > > > > So, I assume that the wrapper script is not cd'ing me > > anywhere. So, it > > > still is a mystery why the app complaint about the file > > > not > > present > > > when run from wrapper and it works when run manually in > > > the > > same dir. > > > > > > On Tue, May 22, 2012 at 11:34 AM, Michael Wilde < > > wilde at mcs.anl.gov > > > > wrote: > > > > > > > > > Isnt this line problematic if you dont know where the > > wrapper script > > > has you cd'ed to: > > > > > > cp -v home/ketan/ketan_mars/MARS-LIC . > > > ^^^ > > > > > > The relative path doesnt seem safe. > > > > > > > > > - Mike > > > > > > > > > ----- Original Message ----- > > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > Cc: "Swift User" < swift-user at ci.uchicago.edu > > > > > > > > > > > Sent: Tuesday, May 22, 2012 10:18:11 AM > > > > Subject: Re: [Swift-user] Deep recursion on subroutine > > > > > > "main::stageout" at /home/ketan/work/ worker.pl line > > > > 1349 > > > > Looking this further, I now have a wrapper in place > > > > which > > copies the > > > > licence file in the cwd before running the executable. > > However, the > > > > executable still gets into error as if the licence file > > > > is > > not > > > > present. > > > > > > > > > > > > When I cd into this dir > > (swift.workdir/mars-20120519-1203-3l....) > > > > and > > > > manually run the executable, it works. > > > > > > > > > > > > So, the question is does the _swiftwrap.staging does > > > > some > > internal > > > > cd'ing before calling the executable? I will take a look > > inside, but > > > > would be useful if someone knows this. > > > > > > > > > > > > The wrapper script is simply the following two lines: > > > > > > > > > > > > """ > > > > cp -v home/ketan/ketan_mars/MARS-LIC . > > > > /home/ketan/ketan_mars/marsMain $1 > > > > """ > > > > > > > > > > > > Regards, > > > > Ketan > > > > > > > > > > > > On Mon, May 21, 2012 at 7:51 PM, Michael Wilde < > > wilde at mcs.anl.gov > > > > > wrote: > > > > > > > > > > > > Im surprised that Swift isn't setting the current > > > > working > > dir (cwd) > > > > to > > > > be the job dir, but perhaps that's controlled by this > > property: > > > > > > > > # Determines if Swift remote wrappers will be executed > > > > by > > specifying > > > > an > > > > # absolute path, or a path relative to the job initial > > working > > > > directory > > > > # > > > > # valid values: absolute, relative > > > > # wrapper.invocation.mode=absolute > > > > > > > > Can you try your script with this property set to > > "relative"? > > > > > > > > ...but looking at this further: I see that if youre > > > > using > > coasters > > > > with provider staging, the logic for job launch is quite > > different. > > > > We > > > > need to study this and get back to you. For now, best to > > force the > > > > right cd's with a wrapper. You might be able to remove > > > > the > > wrapper > > > > later, once we resolve how the job dir management should > > work in > > > > these > > > > various cases. > > > > > > > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > > > > > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > Cc: "Swift User" < swift-user at ci.uchicago.edu > > > > > > Sent: Monday, May 21, 2012 4:28:02 PM > > > > > Subject: Re: [Swift-user] Deep recursion on subroutine > > > > > "main::stageout" at /home/ketan/work/ worker.pl line > > 1349 > > > > > > > > > > > Thanks Mike. Indeed the recursion was a warning. > > > > > > > > > > > > > > > I found the problem was that the binary could not find > > the licence > > > > > in > > > > > the cwd from where it was being called. This is an > > application > > > > > requirement that the licence file must be present in > > > > > the > > cwd from > > > > > where the call is made. > > > > > > > > > > > > > > > However, Swift makes a dirtree in the workdir, stages > > the files > > > > > and > > > > > calls the binary from *outside* of this tree. Is it > > possible to > > > > > make > > > > > swift stage the licence file and put it on the top > > > > > level > > without > > > > > writing a wrapper to do a cp. Again, the point of not > > wrapping the > > > > > binary into a script is to mimic the Hadoop setup as > > close as > > > > > possible. > > > > > > > > > > > > > > > On Mon, May 21, 2012 at 3:35 PM, Michael Wilde < > > wilde at mcs.anl.gov > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Ketan, as far as I can tell, that message, coming from > > worker.pl , > > > > > is > > > > > > > > > just a warning. > > > > > > > > > > Programing Perl sec 33, Diagnostic Messages: "Deep > > recursion on > > > > > subroutine "%s" > > > > > > > > > > (W recursion) This subroutine has called itself > > (directly or > > > > > indirectly) 100 times more than it has returned. This > > probably > > > > > indicates an infinite recursion, unless you're writing > > strange > > > > > benchmark programs, in which case it indicates > > > > > something > > else." > > > > > > > > > > The stageout code in worker.pl is indeed recursive, > > > > > and > > the > > > > > warning > > > > > could be suppressed: > > > > > > > > > > "Try placing > > > > > > > > > > no warnings 'recursion'; > > > > > > > > > > within the same scope as that code ..." > > > > > > > > > > Can you try a simple mod to catsn, using your ext > > mapper, to see > > > > > if > > > > > it > > > > > is indeed failing due to the deeply recursive > > > > > stageout? > > > > > > > > > > If you could dig a bit deeper into this, and see > > > > > whether > > its > > > > > really > > > > > failing when staging back so many files or failing for > > some other, > > > > > or > > > > > related, reason, that would be great. > > > > > > > > > > Thanks, > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > From: "Ketan Maheshwari" < > > > > > > ketancmaheshwari at gmail.com > > > > > > > > > To: "Swift User" < swift-user at ci.uchicago.edu > > > > > > > Sent: Monday, May 21, 2012 1:54:34 PM > > > > > > Subject: [Swift-user] Deep recursion on subroutine > > > > > > "main::stageout" > > > > > > > > > > > > > > at /home/ketan/work/ worker.pl line 1349 > > > > > > Hi, > > > > > > > > > > > > > > > > > > I am trying to run the GE mars script on a bag of > > workstations. > > > > > > I > > > > > > tested the script for a sufficient number of tasks > > > > > > and > > seems to > > > > > > be > > > > > > working fine on localhost. > > > > > > > > > > > > > > > > > > However, it fails in this setup. I get the error > > message as > > > > > > follows > > > > > > after seemingly right invocation: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Find: keepalive(120), reconnect - > > http://128.84.97.46:41287 > > > > > > Progress: time: Mon, 21 May 2012 14:43:18 -0400 > > > > > > Stage > > in:7 > > > > > > Submitted:3 > > > > > > Progress: time: Mon, 21 May 2012 14:43:19 -0400 > > > > > > Stage > > in:8 > > > > > > Active:2 > > > > > > Deep recursion on subroutine "main::stageout" at > > > > > > /home/ketan/work/ > > > > > > worker.pl line 1349. > > > > > > Deep recursion on subroutine "main::stageout" at > > > > > > /home/ketan/work/ > > > > > > worker.pl line 1349. > > > > > > Progress: time: Mon, 21 May 2012 14:43:20 -0400 > > Active:3 Stage > > > > > > out:7 > > > > > > > > > > > > > > > > > > Obviously the staging out of results fails and seems > > that the > > > > > > number > > > > > > of files in the stageout stage is causing the error. > > The > > > > > > application > > > > > > needs to stage out about 120 files. > > > > > > > > > > > > > > > > > > One solution I could quickly think of is to wrap the > > app in a > > > > > > shell > > > > > > and zip the outputs making it just one staged out > > file. > > > > > > > > > > > > > > > > > > However, the current setup would still be useful > > > > > > since > > we are > > > > > > trying > > > > > > to compare the existing Hadoop solution with the > > > > > > Swift > > one. > > > > > > > > > > > > > > > > > > Is there any possible workaround, some env setting > > > > > > or > > so that I > > > > > > could > > > > > > try and get the stageout going? > > > > > > > > > > > > > > > > > > The logs are: > > > > > > > > http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log > > > > > > and http://www.mcs.anl.gov/~ketan/workerlogs.tgz > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, -- > > > > > > Ketan > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-user mailing list > > > > > > Swift-user at ci.uchicago.edu > > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > -- > > > > > Michael Wilde > > > > > Computation Institute, University of Chicago > > > > > Mathematics and Computer Science Division > > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Ketan > > > > > > > > -- > > > > Michael Wilde > > > > Computation Institute, University of Chicago > > > > Mathematics and Computer Science Division > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Ketan > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > -- > > > Ketan > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > > > > > -- > > Ketan > > > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Tue May 22 19:54:22 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 22 May 2012 20:54:22 -0400 Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: <1388466172.12176.1337728039136.JavaMail.root@zimbra.anl.gov> References: <1337726536.16795.4.camel@blabla> <1388466172.12176.1337728039136.JavaMail.root@zimbra.anl.gov> Message-ID: Mihael, As far as I know there is no environment setup required before running mars. I do see the lic file on putting ls -l in wrapper script and pwd seems to be showing the expected dir: #pwd /nfs2/ketan/ketan_mars/swift.workdir/mars-20120522-1933-44hycbr1-e-marswrap-et4o0prk #cp -v `home/ketan/ketan_mars/MARS-LIC' -> `./MARS-LIC' #ls -l total 4 -rw-r--r-- 1 ketan collab 0 2012-05-22 19:33 3 drwxr-xr-x 3 ketan collab 3 2012-05-22 19:33 home -rw-r--r-- 1 ketan collab 75 2012-05-22 19:33 MARS-LIC drwxr-xr-x 2 ketan collab 3 2012-05-22 19:33 outs drwxr-xr-x 2 ketan collab 2 2012-05-22 19:33 result0 -rw-r--r-- 1 ketan collab 0 2012-05-22 19:33 stderr.txt -rw-r--r-- 1 ketan collab 6070 2012-05-22 19:33 _swiftwrap.staging -rw-r--r-- 1 ketan collab 5729 2012-05-22 19:33 wrapper.log #still the error message <**> ERROR: *** Unable to open License Date File MARS-LIC *** When I run Mars manually from the same dir it works: [steamroller:mars-20120522-1933-44hycbr1-e-marswrap-et4o0prk]$ /home/ketan/ketan_mars/marsMain home/ketan/ketan_mars/ctlfiles/mars.ctl.0 # normal output This time I tried the same setup on MCS cluster and the result is the same as with the Cornell one. On Tue, May 22, 2012 at 7:07 PM, Michael Wilde wrote: > And, Ketan: can you put an ls -l and pwd in your wrapper script, to get > some more diagnostic info? > > ----- Original Message ----- > > From: "Mihael Hategan" > > To: "Ketan Maheshwari" > > Cc: "Michael Wilde" , "Swift User" < > swift-user at ci.uchicago.edu> > > Sent: Tuesday, May 22, 2012 5:42:16 PM > > Subject: Re: [Swift-user] Deep recursion on subroutine "main::stageout" > at /home/ketan/work/worker.pl line 1349 > > With provider staging the directory where stuff gets run in (i.e. job > > CWD) is set through the sumbmission protocol. In other words, > > _swiftwrap.stagiing gets run there. > > > > _swiftwrap.staging does not change directories. > > > > The environment might be different between a swift run and a manual > > login and run. Is there maybe something in the environment that your > > app > > uses to look up the license file? > > > > Mihael > > > > On Tue, 2012-05-22 at 17:10 -0400, Ketan Maheshwari wrote: > > > Mike, > > > > > > > > > The jobdir and the workdir are the same right? At least that is what > > > the pwd in my marswrapper shows. > > > > > > > > > The following is the stdout section of swiftwrap: > > > > _____________________________________________________________________________ > > > > > > > > > stdout > > > > _____________________________________________________________________________ > > > > > > > > > # pwd > > > > /amd/camel/b/ketan/ketan_mars/swift.workdir/mars-20120522-1702-j6gtml62-k-marswrap-kcj9rork > > > > > > > > > # cp -v home/ketan/ketan_mars/MARS-LIC . > > > `home/ketan/ketan_mars/MARS-LIC' -> `./MARS-LIC' > > > > > > > > > # The error message thrown by mars" > > > <**> ERROR: *** Unable to open License Date File MARS-LIC *** > > > =================== > > > > > > > > > This is why I said Mars is running as if the licence file is not > > > present even though it is present. > > > > > > > > > Also, I do not see any symlinks here in the workdir. They are all > > > real > > > files. > > > > > > On Tue, May 22, 2012 at 1:24 PM, Michael Wilde > > > wrote: > > > If that path home/ketan/ketan_mars/MARS-LIC is being > > > correctly > > > copied to the workdir (and I stand corrected: thats exactly > > > what should happen) then another possibility is that the > > > program doesnt like getting a symlink for the license file? > > > Can you test that case externally (outside of Swift) before > > > we go further? > > > > > > You reported the problem as "...the executable still gets > > > into > > > error as if the licence file is not present." > > > > > > The license file will appear to the MARS executable (and the > > > wrapper script) as a symlink (from the jobdir to the > > > workdir, > > > to use the terminology f the Swift User Guide). > > > > > > If that is indeed the problem, your wrapper script might be > > > able to get around this with: > > > cp MARS-LIC tmplic > > > rm MARS-LIC > > > mv tmplic MARS-LIC > > > > > > Exactly what error is MARS generating for this problem? > > > > > > - Mike > > > > > > ----- Original Message ----- > > > > From: "Ketan Maheshwari" > > > > To: "Michael Wilde" > > > > Cc: "Swift User" > > > > > > > Sent: Tuesday, May 22, 2012 12:01:49 PM > > > > Subject: Re: [Swift-user] Deep recursion on subroutine > > > "main::stageout" at /home/ketan/work/worker.pl line 1349 > > > > > > > The line works fine because Swift creates the dir tree > > > starting at > > > > /home but in the swift.workdir. With -v, I could see the > > > file gets > > > > copied to the cwd and is present there. > > > > > > > > > > > > So, I assume that the wrapper script is not cd'ing me > > > anywhere. So, it > > > > still is a mystery why the app complaint about the file > > > > not > > > present > > > > when run from wrapper and it works when run manually in > > > > the > > > same dir. > > > > > > > > On Tue, May 22, 2012 at 11:34 AM, Michael Wilde < > > > wilde at mcs.anl.gov > > > > > wrote: > > > > > > > > > > > > Isnt this line problematic if you dont know where the > > > wrapper script > > > > has you cd'ed to: > > > > > > > > cp -v home/ketan/ketan_mars/MARS-LIC . > > > > ^^^ > > > > > > > > The relative path doesnt seem safe. > > > > > > > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > Cc: "Swift User" < swift-user at ci.uchicago.edu > > > > > > > > > > > > > > Sent: Tuesday, May 22, 2012 10:18:11 AM > > > > > Subject: Re: [Swift-user] Deep recursion on subroutine > > > > > > > > "main::stageout" at /home/ketan/work/ worker.pl line > > > > > 1349 > > > > > Looking this further, I now have a wrapper in place > > > > > which > > > copies the > > > > > licence file in the cwd before running the executable. > > > However, the > > > > > executable still gets into error as if the licence file > > > > > is > > > not > > > > > present. > > > > > > > > > > > > > > > When I cd into this dir > > > (swift.workdir/mars-20120519-1203-3l....) > > > > > and > > > > > manually run the executable, it works. > > > > > > > > > > > > > > > So, the question is does the _swiftwrap.staging does > > > > > some > > > internal > > > > > cd'ing before calling the executable? I will take a look > > > inside, but > > > > > would be useful if someone knows this. > > > > > > > > > > > > > > > The wrapper script is simply the following two lines: > > > > > > > > > > > > > > > """ > > > > > cp -v home/ketan/ketan_mars/MARS-LIC . > > > > > /home/ketan/ketan_mars/marsMain $1 > > > > > """ > > > > > > > > > > > > > > > Regards, > > > > > Ketan > > > > > > > > > > > > > > > On Mon, May 21, 2012 at 7:51 PM, Michael Wilde < > > > wilde at mcs.anl.gov > > > > > > wrote: > > > > > > > > > > > > > > > Im surprised that Swift isn't setting the current > > > > > working > > > dir (cwd) > > > > > to > > > > > be the job dir, but perhaps that's controlled by this > > > property: > > > > > > > > > > # Determines if Swift remote wrappers will be executed > > > > > by > > > specifying > > > > > an > > > > > # absolute path, or a path relative to the job initial > > > working > > > > > directory > > > > > # > > > > > # valid values: absolute, relative > > > > > # wrapper.invocation.mode=absolute > > > > > > > > > > Can you try your script with this property set to > > > "relative"? > > > > > > > > > > ...but looking at this further: I see that if youre > > > > > using > > > coasters > > > > > with provider staging, the logic for job launch is quite > > > different. > > > > > We > > > > > need to study this and get back to you. For now, best to > > > force the > > > > > right cd's with a wrapper. You might be able to remove > > > > > the > > > wrapper > > > > > later, once we resolve how the job dir management should > > > work in > > > > > these > > > > > various cases. > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > > > > > > > > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > Cc: "Swift User" < swift-user at ci.uchicago.edu > > > > > > > Sent: Monday, May 21, 2012 4:28:02 PM > > > > > > Subject: Re: [Swift-user] Deep recursion on subroutine > > > > > > "main::stageout" at /home/ketan/work/ worker.pl line > > > 1349 > > > > > > > > > > > > > > Thanks Mike. Indeed the recursion was a warning. > > > > > > > > > > > > > > > > > > I found the problem was that the binary could not find > > > the licence > > > > > > in > > > > > > the cwd from where it was being called. This is an > > > application > > > > > > requirement that the licence file must be present in > > > > > > the > > > cwd from > > > > > > where the call is made. > > > > > > > > > > > > > > > > > > However, Swift makes a dirtree in the workdir, stages > > > the files > > > > > > and > > > > > > calls the binary from *outside* of this tree. Is it > > > possible to > > > > > > make > > > > > > swift stage the licence file and put it on the top > > > > > > level > > > without > > > > > > writing a wrapper to do a cp. Again, the point of not > > > wrapping the > > > > > > binary into a script is to mimic the Hadoop setup as > > > close as > > > > > > possible. > > > > > > > > > > > > > > > > > > On Mon, May 21, 2012 at 3:35 PM, Michael Wilde < > > > wilde at mcs.anl.gov > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Ketan, as far as I can tell, that message, coming from > > > worker.pl , > > > > > > is > > > > > > > > > > > just a warning. > > > > > > > > > > > > Programing Perl sec 33, Diagnostic Messages: "Deep > > > recursion on > > > > > > subroutine "%s" > > > > > > > > > > > > (W recursion) This subroutine has called itself > > > (directly or > > > > > > indirectly) 100 times more than it has returned. This > > > probably > > > > > > indicates an infinite recursion, unless you're writing > > > strange > > > > > > benchmark programs, in which case it indicates > > > > > > something > > > else." > > > > > > > > > > > > The stageout code in worker.pl is indeed recursive, > > > > > > and > > > the > > > > > > warning > > > > > > could be suppressed: > > > > > > > > > > > > "Try placing > > > > > > > > > > > > no warnings 'recursion'; > > > > > > > > > > > > within the same scope as that code ..." > > > > > > > > > > > > Can you try a simple mod to catsn, using your ext > > > mapper, to see > > > > > > if > > > > > > it > > > > > > is indeed failing due to the deeply recursive > > > > > > stageout? > > > > > > > > > > > > If you could dig a bit deeper into this, and see > > > > > > whether > > > its > > > > > > really > > > > > > failing when staging back so many files or failing for > > > some other, > > > > > > or > > > > > > related, reason, that would be great. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > From: "Ketan Maheshwari" < > > > > > > > ketancmaheshwari at gmail.com > > > > > > > > > > > To: "Swift User" < swift-user at ci.uchicago.edu > > > > > > > > Sent: Monday, May 21, 2012 1:54:34 PM > > > > > > > Subject: [Swift-user] Deep recursion on subroutine > > > > > > > "main::stageout" > > > > > > > > > > > > > > > > > at /home/ketan/work/ worker.pl line 1349 > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > I am trying to run the GE mars script on a bag of > > > workstations. > > > > > > > I > > > > > > > tested the script for a sufficient number of tasks > > > > > > > and > > > seems to > > > > > > > be > > > > > > > working fine on localhost. > > > > > > > > > > > > > > > > > > > > > However, it fails in this setup. I get the error > > > message as > > > > > > > follows > > > > > > > after seemingly right invocation: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Find: keepalive(120), reconnect - > > > http://128.84.97.46:41287 > > > > > > > Progress: time: Mon, 21 May 2012 14:43:18 -0400 > > > > > > > Stage > > > in:7 > > > > > > > Submitted:3 > > > > > > > Progress: time: Mon, 21 May 2012 14:43:19 -0400 > > > > > > > Stage > > > in:8 > > > > > > > Active:2 > > > > > > > Deep recursion on subroutine "main::stageout" at > > > > > > > /home/ketan/work/ > > > > > > > worker.pl line 1349. > > > > > > > Deep recursion on subroutine "main::stageout" at > > > > > > > /home/ketan/work/ > > > > > > > worker.pl line 1349. > > > > > > > Progress: time: Mon, 21 May 2012 14:43:20 -0400 > > > Active:3 Stage > > > > > > > out:7 > > > > > > > > > > > > > > > > > > > > > Obviously the staging out of results fails and seems > > > that the > > > > > > > number > > > > > > > of files in the stageout stage is causing the error. > > > The > > > > > > > application > > > > > > > needs to stage out about 120 files. > > > > > > > > > > > > > > > > > > > > > One solution I could quickly think of is to wrap the > > > app in a > > > > > > > shell > > > > > > > and zip the outputs making it just one staged out > > > file. > > > > > > > > > > > > > > > > > > > > > However, the current setup would still be useful > > > > > > > since > > > we are > > > > > > > trying > > > > > > > to compare the existing Hadoop solution with the > > > > > > > Swift > > > one. > > > > > > > > > > > > > > > > > > > > > Is there any possible workaround, some env setting > > > > > > > or > > > so that I > > > > > > > could > > > > > > > try and get the stageout going? > > > > > > > > > > > > > > > > > > > > > The logs are: > > > > > > > > > > http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log > > > > > > > and http://www.mcs.anl.gov/~ketan/workerlogs.tgz > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, -- > > > > > > > Ketan > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Swift-user mailing list > > > > > > > Swift-user at ci.uchicago.edu > > > > > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > -- > > > > > > Michael Wilde > > > > > > Computation Institute, University of Chicago > > > > > > Mathematics and Computer Science Division > > > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Ketan > > > > > > > > > > -- > > > > > Michael Wilde > > > > > Computation Institute, University of Chicago > > > > > Mathematics and Computer Science Division > > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Ketan > > > > > > > > -- > > > > Michael Wilde > > > > Computation Institute, University of Chicago > > > > Mathematics and Computer Science Division > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Ketan > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > -- > > > Ketan > > > > > > > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue May 22 20:37:56 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 22 May 2012 18:37:56 -0700 Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: References: <1337726536.16795.4.camel@blabla> <1388466172.12176.1337728039136.JavaMail.root@zimbra.anl.gov> Message-ID: <1337737076.19423.1.camel@blabla> On Tue, 2012-05-22 at 20:54 -0400, Ketan Maheshwari wrote: > Mihael, > > > As far as I know there is no environment setup required before running > mars. > There's an easy way to check. Type env after you verified that you were able to run the app and paste the output here. Also, you can strace the app from the wrapper and then we can see where it's looking for that file. Mihael From wilde at mcs.anl.gov Tue May 22 22:00:20 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 22 May 2012 22:00:20 -0500 (CDT) Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: <1337737076.19423.1.camel@blabla> Message-ID: <618425606.12312.1337742020310.JavaMail.root@zimbra.anl.gov> Thanks, Mihael - strace solved the mystery. Turns out that when Swift runs an app, it closes stdin unless you specify stdin= in the app command line body. This confused the MARS app, likely around the logic where it was reading the license file. (It seemed to be checking if stdin was a tty?) So it was failing even before it tried to open the license file. The remedy was to specify stdin="/dev/null" on the app cmd body. - Mike ----- Original Message ----- > From: "Mihael Hategan" > To: "Ketan Maheshwari" > Cc: "Michael Wilde" , "Swift User" > Sent: Tuesday, May 22, 2012 8:37:56 PM > Subject: Re: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 > On Tue, 2012-05-22 at 20:54 -0400, Ketan Maheshwari wrote: > > Mihael, > > > > > > As far as I know there is no environment setup required before > > running > > mars. > > > > There's an easy way to check. Type env after you verified that you > were > able to run the app and paste the output here. > > Also, you can strace the app from the wrapper and then we can see > where > it's looking for that file. > > Mihael -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Tue May 22 22:03:40 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 22 May 2012 23:03:40 -0400 Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: <618425606.12312.1337742020310.JavaMail.root@zimbra.anl.gov> References: <1337737076.19423.1.camel@blabla> <618425606.12312.1337742020310.JavaMail.root@zimbra.anl.gov> Message-ID: Thanks a lot Mike and Mihael! On Tue, May 22, 2012 at 11:00 PM, Michael Wilde wrote: > Thanks, Mihael - strace solved the mystery. Turns out that when Swift runs > an app, it closes stdin unless you specify stdin= in the app command line > body. > > This confused the MARS app, likely around the logic where it was reading > the license file. (It seemed to be checking if stdin was a tty?) So it was > failing even before it tried to open the license file. > > The remedy was to specify stdin="/dev/null" on the app cmd body. > > - Mike > > > ----- Original Message ----- > > From: "Mihael Hategan" > > To: "Ketan Maheshwari" > > Cc: "Michael Wilde" , "Swift User" < > swift-user at ci.uchicago.edu> > > Sent: Tuesday, May 22, 2012 8:37:56 PM > > Subject: Re: [Swift-user] Deep recursion on subroutine "main::stageout" > at /home/ketan/work/worker.pl line 1349 > > On Tue, 2012-05-22 at 20:54 -0400, Ketan Maheshwari wrote: > > > Mihael, > > > > > > > > > As far as I know there is no environment setup required before > > > running > > > mars. > > > > > > > There's an easy way to check. Type env after you verified that you > > were > > able to run the app and paste the output here. > > > > Also, you can strace the app from the wrapper and then we can see > > where > > it's looking for that file. > > > > Mihael > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Wed May 23 08:59:32 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 23 May 2012 09:59:32 -0400 Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: <618425606.12312.1337742020310.JavaMail.root@zimbra.anl.gov> References: <1337737076.19423.1.camel@blabla> <618425606.12312.1337742020310.JavaMail.root@zimbra.anl.gov> Message-ID: Just a note that this behavior (closed stdin) is not seen in the local execution provider. On Tue, May 22, 2012 at 11:00 PM, Michael Wilde wrote: > Thanks, Mihael - strace solved the mystery. Turns out that when Swift runs > an app, it closes stdin unless you specify stdin= in the app command line > body. > > This confused the MARS app, likely around the logic where it was reading > the license file. (It seemed to be checking if stdin was a tty?) So it was > failing even before it tried to open the license file. > > The remedy was to specify stdin="/dev/null" on the app cmd body. > > - Mike > > > ----- Original Message ----- > > From: "Mihael Hategan" > > To: "Ketan Maheshwari" > > Cc: "Michael Wilde" , "Swift User" < > swift-user at ci.uchicago.edu> > > Sent: Tuesday, May 22, 2012 8:37:56 PM > > Subject: Re: [Swift-user] Deep recursion on subroutine "main::stageout" > at /home/ketan/work/worker.pl line 1349 > > On Tue, 2012-05-22 at 20:54 -0400, Ketan Maheshwari wrote: > > > Mihael, > > > > > > > > > As far as I know there is no environment setup required before > > > running > > > mars. > > > > > > > There's an easy way to check. Type env after you verified that you > > were > > able to run the app and paste the output here. > > > > Also, you can strace the app from the wrapper and then we can see > > where > > it's looking for that file. > > > > Mihael > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed May 23 13:01:43 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 23 May 2012 11:01:43 -0700 Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: References: <1337737076.19423.1.camel@blabla> <618425606.12312.1337742020310.JavaMail.root@zimbra.anl.gov> Message-ID: <1337796103.24323.2.camel@blabla> Good point. It probably should be there. I first saw it in Globus somewhere. The rationale for it was that there are applications that, given a stdin, may hang waiting for user input (which is useless when the app is running on some remote cluster), whereas a closed stdin might determine them to go ahead and do whatever it is that they do without user input. I'm assuming that Globus folks have actually seen this behavior before putting the respective piece of code in there. Mihael On Wed, 2012-05-23 at 09:59 -0400, Ketan Maheshwari wrote: > Just a note that this behavior (closed stdin) is not seen in the local > execution provider. > > On Tue, May 22, 2012 at 11:00 PM, Michael Wilde > wrote: > Thanks, Mihael - strace solved the mystery. Turns out that > when Swift runs an app, it closes stdin unless you specify > stdin= in the app command line body. > > This confused the MARS app, likely around the logic where it > was reading the license file. (It seemed to be checking if > stdin was a tty?) So it was failing even before it tried to > open the license file. > > The remedy was to specify stdin="/dev/null" on the app cmd > body. > > - Mike > > > ----- Original Message ----- > > > From: "Mihael Hategan" > > To: "Ketan Maheshwari" > > Cc: "Michael Wilde" , "Swift User" > > > > Sent: Tuesday, May 22, 2012 8:37:56 PM > > Subject: Re: [Swift-user] Deep recursion on subroutine > "main::stageout" at /home/ketan/work/worker.pl line 1349 > > > On Tue, 2012-05-22 at 20:54 -0400, Ketan Maheshwari wrote: > > > Mihael, > > > > > > > > > As far as I know there is no environment setup required > before > > > running > > > mars. > > > > > > > There's an easy way to check. Type env after you verified > that you > > were > > able to run the app and paste the output here. > > > > Also, you can strace the app from the wrapper and then we > can see > > where > > it's looking for that file. > > > > Mihael > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > > > > > -- > Ketan > > From wilde at mcs.anl.gov Wed May 23 13:10:14 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 23 May 2012 13:10:14 -0500 (CDT) Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: <1337796103.24323.2.camel@blabla> Message-ID: <989792367.13127.1337796614038.JavaMail.root@zimbra.anl.gov> Do you have a sense as to whats better: a closed stdin, or a stdin opened to /dev/null? In this case, the app worked OK with stdin redirected from /dev/null, but failed with the closed stdin. I guess we could leave the closed stdin semantics as-is, but make them consistent across all providers, and remedy apps that fail in that mode with the same solution we used here, stdin="/dev/null" on the app cmd line. - Mike ----- Original Message ----- > From: "Mihael Hategan" > To: "Ketan Maheshwari" > Cc: "Michael Wilde" , "Swift User" > Sent: Wednesday, May 23, 2012 1:01:43 PM > Subject: Re: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 > Good point. It probably should be there. > > I first saw it in Globus somewhere. The rationale for it was that > there > are applications that, given a stdin, may hang waiting for user input > (which is useless when the app is running on some remote cluster), > whereas a closed stdin might determine them to go ahead and do > whatever > it is that they do without user input. > > I'm assuming that Globus folks have actually seen this behavior before > putting the respective piece of code in there. > > Mihael > > On Wed, 2012-05-23 at 09:59 -0400, Ketan Maheshwari wrote: > > Just a note that this behavior (closed stdin) is not seen in the > > local > > execution provider. > > > > On Tue, May 22, 2012 at 11:00 PM, Michael Wilde > > wrote: > > Thanks, Mihael - strace solved the mystery. Turns out that > > when Swift runs an app, it closes stdin unless you specify > > stdin= in the app command line body. > > > > This confused the MARS app, likely around the logic where it > > was reading the license file. (It seemed to be checking if > > stdin was a tty?) So it was failing even before it tried to > > open the license file. > > > > The remedy was to specify stdin="/dev/null" on the app cmd > > body. > > > > - Mike > > > > > > ----- Original Message ----- > > > > > From: "Mihael Hategan" > > > To: "Ketan Maheshwari" > > > Cc: "Michael Wilde" , "Swift User" > > > > > > > Sent: Tuesday, May 22, 2012 8:37:56 PM > > > Subject: Re: [Swift-user] Deep recursion on subroutine > > "main::stageout" at /home/ketan/work/worker.pl line 1349 > > > > > On Tue, 2012-05-22 at 20:54 -0400, Ketan Maheshwari wrote: > > > > Mihael, > > > > > > > > > > > > As far as I know there is no environment setup required > > before > > > > running > > > > mars. > > > > > > > > > > There's an easy way to check. Type env after you verified > > that you > > > were > > > able to run the app and paste the output here. > > > > > > Also, you can strace the app from the wrapper and then we > > can see > > > where > > > it's looking for that file. > > > > > > Mihael > > > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > > > > > -- > > Ketan > > > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Wed May 23 13:36:00 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 23 May 2012 11:36:00 -0700 Subject: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349 In-Reply-To: <989792367.13127.1337796614038.JavaMail.root@zimbra.anl.gov> References: <989792367.13127.1337796614038.JavaMail.root@zimbra.anl.gov> Message-ID: <1337798160.25244.4.camel@blabla> On Wed, 2012-05-23 at 13:10 -0500, Michael Wilde wrote: > Do you have a sense as to whats better: a closed stdin, or a stdin opened to /dev/null? For programs that try to open a tty it probably won't make a difference (since no tty will be there for < /dev/null). For programs that read plainly from stdin, I don't know. Maybe we should test this. From iraicu at cs.iit.edu Sun May 27 18:58:05 2012 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Sun, 27 May 2012 18:58:05 -0500 Subject: [Swift-user] CFP: 8th IEEE International Conference on eScience -- Chicago IL USA, October 8th-12 2012 Message-ID: <4FC2BF8D.4060400@cs.iit.edu> CALL FOR PAPERS 8th IEEE International Conference on eScience http://www.ci.uchicago.edu/escience2012/ October 8-12, 2012 Chicago, IL, USA Researchers in all disciplines are increasingly adopting digital tools, techniques and practices, often in communities and projects that span disciplines, laboratories, organizations, and national boundaries. The eScience 2012 conference is designed to bring together leading international and interdisciplinary research communities, developers, and users of eScience applications and enabling IT technologies. The conference serves as a forum to present the results of the latest applications research and product/tool developments and to highlight related activities from around the world. Also, we are now entering the second decade of eScience and the 2012 conference gives an opportunity to take stock of what has been achieved so far and look forward to the challenges and opportunities the next decade will bring. A special emphasis of the 2012 conference is on advances in the application of technology in a particular discipline. Accordingly, significant advances in applications science and technology will be considered as important as the development of new technologies themselves. Further, we welcome contributions in educational activities under any of these disciplines. As a result, the conference will be structured around two e-Science tracks: ? eScience Algorithms and Applications ? eScience application areas, including: ? Physical sciences ? Biomedical sciences ? Social sciences and humanities ? Data-oriented approaches and applications ? Compute-oriented approaches and applications ? Extreme scale approaches and applications ? Cyberinfrastructure to support eScience ? Novel hardware ? Novel uses of production infrastructure ? Software and services ? Tools The conference proceedings will be published by the IEEE Computer Society Press, USA and will be made available online through the IEEE Digital Library. Selected papers will be invited to submit extended versions to a special issue of the Future Generation Computer Systems (FGCS)journal. SUBMISSION PROCESS Authors are invited to submit papers with unpublished, original work of not more than 8 pages of double column text using single spaced 10 point size on 8.5 x 11 inch pages, as per IEEE 8.5 x 11 manuscript guidelines. (Up to 2 additional pages may be purchased for US$150/page) Templates are available from http://www.ieee.org/conferences_events/conferences/publishing/templates.html. Authors should submit a PDF file that will print on a PostScript printer to https://www.easychair.org/conferences/?conf=escience2012 (Note that paper submitters also must submit an abstract in advance of the paper deadline. This should be done through the same site where papers are submitted.) It is a requirement that at least one author of each accepted paper attend the conference. IMPORTANT DATES Abstract submission (required): 4 July 2012 Paper submission: 11 July 2012 Paper author notification: 22 August 2012 Camera-ready papers due: 10 September 2012 Conference: 8-12 October 2012 CONFERENCE ORGANIZATION General Chair Ian Foster, University of Chicago & Argonne National Laboratory, USA Program Co-Chairs Daniel S. Katz, University of Chicago & Argonne National Laboratory, USA Heinz Stockinger, SIB Swiss Institute of Bioinformatics, Switzerland Program Vice Co-Chairs eScience Algorithms and Applications Track David Abramson, Monash University, Australia Gabrielle Allen, Louisiana State University, USA Cyberinfrastructure to support eScience Track Rosa M. Badia, Barcelona Supercomputing Center / CSIC, Spain Geoffrey Fox, Indiana University, USA Sponsorship Chair Charlie Catlett, Argonne National Laboratory, USA Conference Manager and Finance Chair Julie Wulf-Knoerzer, University of Chicago & Argonne National Laboratory, USA Publicity Chairs Kento Aida, National Institute of Informatics, Japan Ioan Raicu, Illinois Institute of Technology, USA David Wallom, Oxford e-Research Centre, UK Local Organizing Committee Ninfa Mayorga, University of Chicago, USA Evelyn Rayburn, University of Chicago, USA Lynn Valentini, Argonne National Laboratory, USA Program Committee eScience Algorithms and Applications Track Srinivas Aluru, Iowa State University, USA Ashiq Anjum, University of Derby, UK David A. Bader, Georgia Institute of Technology, USA Jon Blower, University of Reading, UK Paul Bonnington, Monash University, Australia Simon Cox, University of Southampton, UK David De Roure, Oxford e-Research Centre, UK George Djorgovski, California Institute of Technology, USA Anshu Dubey, University of Chicago & Argonne National Laboratory, USA Yuri Estrin, Monash University, Australia Dan Fay, Microsoft, USA Jeremy Frey, University of Southampton, UK Wolfgang Gentzsch, HPC Consultant, Germany Lutz Gross, The University of Queensland, Austrialia Sverker Holmgren, Uppsala University, Sweden Bill Howe, University of Washington, USA Marina Jirotka, University of Oxford, UK Timoleon Kipouros, University of Cambridge, UK Kerstin Kleese van Dam, Pacific Northwest National Laboratory, USA Arun S. Konagurthu, Monash University, Australia Peter Kunszt, SystemsX.ch, Switzerland Alexey Lastovetsky, University College Dublin, Ireland Andrew Lewis, Griffith University, Australia Sergio Maffioletti, University of Zurich, Switzerland Amitava Majumdar, San Diego Supercomputer Center, University of California at San Diego, USA Rui Mao, Shenzhen University, China Madhav V. Marathe, Virginia Tech, USA Maryann Martone, University of California at San Diego, USA Louis Moresi, Monash University, Australia Riccardo Murri, University of Zurich, Switzerland Silvia D. Olabarriaga, Academic Medical Center of the University of Amsterdam, Netherlands Enrique S. Quintana-Ort?, Universidad Jaume I, Spain Abani Patra, University at Buffalo, USA Rob Pennington, NSF, USA Andrew Perry, Monash University, Australia Beth Plale, Indiana University, USA Michael Resch, University of Stuttgart, Germany Adrian Sandu, Virginia Tech, USA Mark Savill, Cranfield University, UK Erik Schnetter, Perimeter Institute for Theoretical Physics, Canada Edward Seidel, Louisiana State University, USA Suzanne M. Shontz, The Pennsylvania State University, USA David Skinner, Lawrence Berkeley National Laboratory, USA Alan Sussman, University of Maryland, USA Alex Szalay, Johns Hopkins University, USA Domenico Talia, ICAR-CNR & University of Calabria, Italy Jian Tao, Louisiana State University, USA David Wallom, Oxford e-Research Centre, UK Shaowen Wang, University of Illinois at Urbana-Champaign, USA Michael Wilde, Argonne National Laboratory & University of Chicago, USA Nancy Wilkins-Diehr, San Diego Supercomputer Center, University of California at San Diego, USA Wu Zhang, Shanghai University, China Yunquan Zhang, Chinese Academy of Sciences, China Cyberinfrastructure to support eScience Track Deb Agarwal, Lawrence Berkeley National Laboratory, USA Ilkay Altintas, San Diego Supercomputer Center, University of California at San Diego, USA Henri Bal, Vrije Universiteit, Netherlands Roger Barga, Microsoft, USA Martin Berzins, University of Utah, USA John Brooke, University of Manchester, UK Thomas Fahringer, University of Innsbruck, Austria Gilles Fedak, INRIA, France Jos? A. B. Fortes, University of Florida, USA Yolanda Gil, ISI/USC, USA Madhusudhan Govindaraju, SUNY Binghamton, USA Thomas Hacker, Purdue University, USA Ken Hawick, Massey University, New Zealand Marty Humphrey, University of Virginia, USA Hai Jin, Huazhong University of Science and Technology, China Thilo Kielmann, Vrije Universiteit, Netherlands Scott Klasky, Oak Ridge National Laboratory, USA Isao Kojima, AIST, Japan Tevfik Kosar, University at Buffalo, USA Dieter Kranzlmueller, LMU & LRZ Munich, Germany Erwin Laure, KTH, Sweden Jysoo Lee, KISTI, Korea Li Xiaoming, Peking University, China Bertram Lud?scher, University of California, Davis, USA Andrew Lumsdaine, Indiana University, USA Tanu Malik, University of Chicago, USA Satoshi Matsuoka, Tokyo Institute of Technology, Japan Reagan Moore, University of North Carolina at Chapel Hill, USA Shirley Moore, University of Kentucky, USA Steven Newhouse, EGI, Netherlands Dhabaleswar K. (DK) Panda, The Ohio State University, USA Manish Parashar, Rutgers University, USA Ron Perrott, University of Oxford, UK Depei Qian, Beihang University, China Judy Qui, Indiana University, USA Ioan Raicu, Illinois Institute of Technology, USA Lavanya Ramakrishnan, Lawrence Berkeley National Laboratory, USA Omer Rana, Cardiff University, UK Paul Roe, Queensland University of Technology, Australia Bruno Schulze, LNCC, Brazil Marc Snir, Argonne National Laboratory & University of Illinois at Urbana-Champaign, USA Xian-He Sun, Illinois Institute of Technology, USA Yoshio Tanaka, AIST, Japan Michela Taufer, University of Delaware, USA Kerry Taylor, CSIRO, Australia Douglas Thain, University of Notre Dame, USA Paul Watson, Newcastle University, UK Jun Zhao, University of Oxford, UK -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ ================================================================= ================================================================= From clberger at cs.uchicago.edu Sun May 20 23:35:46 2012 From: clberger at cs.uchicago.edu (Carsen Berger) Date: Mon, 21 May 2012 04:35:46 -0000 Subject: [Swift-user] Timing Swift runs Message-ID: Hello again, I need to get performance numbers to benchmark various Swift jobs on a generic bag of workstations. Is there some easy way to do this, e.g. perhaps a mechanism built into Swift that allows it to report how long an execution took? Or would I need to come up with a site-specific solution? Thank you, Carsen Berger -------------- next part -------------- An HTML attachment was scrubbed... URL: