From jon.monette at gmail.com Sun Oct 3 13:27:16 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 03 Oct 2010 13:27:16 -0500 Subject: [Swift-user] PADS Message-ID: <4CA8CB04.2000708@gmail.com> Hello, Anyone having a problem using Swift on PADS? I updated Swift and cog to the most recent from trunk and now I cannot compile Swift on PADS. I have to use bridled or another ci machine that shares the filesystem and compile there. I then come back to PADS to execute my swift script and get all sorts of errors. Is anyone experiencing similar problems when using PADS? -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From hategan at mcs.anl.gov Sun Oct 3 13:30:10 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 03 Oct 2010 11:30:10 -0700 Subject: [Swift-user] PADS In-Reply-To: <4CA8CB04.2000708@gmail.com> References: <4CA8CB04.2000708@gmail.com> Message-ID: <1286130610.1951.0.camel@blabla2.none> On Sun, 2010-10-03 at 13:27 -0500, Jonathan Monette wrote: > Hello, > Anyone having a problem using Swift on PADS? I updated Swift and > cog to the most recent from trunk and now I cannot compile Swift on > PADS. I made some recent commits which might be the cause. But I need specific errors. Mihael > I have to use bridled or another ci machine that shares the > filesystem and compile there. I then come back to PADS to execute my > swift script and get all sorts of errors. Is anyone experiencing > similar problems when using PADS? > From jon.monette at gmail.com Sun Oct 3 13:44:04 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 03 Oct 2010 13:44:04 -0500 Subject: [Swift-user] PADS In-Reply-To: <1286130610.1951.0.camel@blabla2.none> References: <4CA8CB04.2000708@gmail.com> <1286130610.1951.0.camel@blabla2.none> Message-ID: <4CA8CEF4.6080703@gmail.com> Here is the compile error: generateVersion: antlr: [java] ANTLR Parser Generator Version 2.7.5 (20050128) 1989-2005 jGuru.com [java] resources/swiftscript.g:1028: warning:nondeterminism upon [java] resources/swiftscript.g:1028: k==1:LBRACK [java] resources/swiftscript.g:1028: k==2:ID,STRING_LITERAL,LBRACK,LPAREN,AT,PLUS,MINUS,STAR,NOT,INT_LITERAL,FLOAT_LITERAL,"true","false" [java] resources/swiftscript.g:1028: between alt 1 and exit branch of block compileSchema: [java] IO Error java.io.FileNotFoundException: /tmp/xbean5901864127478979310.d/classes/schemaorg_apache_xmlbeans/src/swiftscript.xsd (No such file or directory) [java] Time to build schema type system: 0.559 seconds [java] Exception in thread "main" org.apache.xmlbeans.SchemaTypeLoaderException: /tmp/xbean5901864127478979310.d/classes/schemaorg_apache_xmlbeans/system/s4846B13C10E24B6C12C8DCBE3348DA75/procedure8537type.xsb (No such file or directory) (schemaorg_apache_xmlbeans.system.s4846B13C10E24B6C12C8DCBE3348DA75.procedure8537type) - code 9 [java] at org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl$XsbReader.getSaverStream(SchemaTypeSystemImpl.java:2214) [java] at org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl$XsbReader.writeRealHeader(SchemaTypeSystemImpl.java:1589) [java] at org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.saveType(SchemaTypeSystemImpl.java:1440) [java] at org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.saveTypesRecursively(SchemaTypeSystemImpl.java:1316) [java] at org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.save(SchemaTypeSystemImpl.java:1291) [java] at org.apache.xmlbeans.impl.tool.SchemaCompiler.compile(SchemaCompiler.java:1098) [java] at org.apache.xmlbeans.impl.tool.SchemaCompiler.main(SchemaCompiler.java:368) BUILD FAILED /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/build.xml:247: Java returned: 1 and here is the run error I receive once I compile on a different machine: Failed to transfer wrapper log from unrectified-20101003-1339-voon0t62/info/l on pads Execution failed: Failed to transfer wrapper log from unrectified-20101003-1339-voon0t62/info/8 on pads Exception in mProject: Arguments: [-X, raw_dir/2mass-atlas-000713s-j0760245.fits, proj_dir/proj_2mass-atlas-000713s-j0760245.fits, header.hdr] Host: pads Directory: unrectified-20101003-1339-voon0t62/jobs/l/mProject-lvknimzj stderr.txt: stdout.txt: ---- Caused by: Task failed: org.globus.cog.abstraction.impl.scheduler.common.ProcessException: Exitcode file (/home/jonmon/.globus/scripts/PBS6388672747278247642.submit.exitcode) not found 5 queue polls after the job was reported done at org.globus.cog.abstraction.impl.scheduler.common.Job.close(Job.java:66) at org.globus.cog.abstraction.impl.scheduler.common.Job.setState(Job.java:177) at org.globus.cog.abstraction.impl.scheduler.pbs.QueuePoller.processStdout(QueuePoller.java:126) at org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.pollQueue(AbstractQueuePoller.java:169) at org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.run(AbstractQueuePoller.java:82) at java.lang.Thread.run(Thread.java:619) The run directory with the files needed to execute and log files is in ~jonmon/Workspace/Swift/Montage/katz_slides_test/run.0001 On 10/3/10 1:30 PM, Mihael Hategan wrote: > On Sun, 2010-10-03 at 13:27 -0500, Jonathan Monette wrote: >> Hello, >> Anyone having a problem using Swift on PADS? I updated Swift and >> cog to the most recent from trunk and now I cannot compile Swift on >> PADS. > I made some recent commits which might be the cause. But I need specific > errors. > > Mihael > >> I have to use bridled or another ci machine that shares the >> filesystem and compile there. I then come back to PADS to execute my >> swift script and get all sorts of errors. Is anyone experiencing >> similar problems when using PADS? >> > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From hategan at mcs.anl.gov Sun Oct 3 14:17:51 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 03 Oct 2010 12:17:51 -0700 Subject: [Swift-user] PADS In-Reply-To: <4CA8CEF4.6080703@gmail.com> References: <4CA8CB04.2000708@gmail.com> <1286130610.1951.0.camel@blabla2.none> <4CA8CEF4.6080703@gmail.com> Message-ID: <1286133471.1951.1.camel@blabla2.none> Ok. I don't think that's related to my commits. On Sun, 2010-10-03 at 13:44 -0500, Jonathan Monette wrote: > Here is the compile error: > generateVersion: > > antlr: > [java] ANTLR Parser Generator Version 2.7.5 (20050128) > 1989-2005 jGuru.com > [java] resources/swiftscript.g:1028: warning:nondeterminism upon > [java] resources/swiftscript.g:1028: k==1:LBRACK > [java] resources/swiftscript.g:1028: > k==2:ID,STRING_LITERAL,LBRACK,LPAREN,AT,PLUS,MINUS,STAR,NOT,INT_LITERAL,FLOAT_LITERAL,"true","false" > [java] resources/swiftscript.g:1028: between alt 1 and exit > branch of block > > compileSchema: > [java] IO Error java.io.FileNotFoundException: > /tmp/xbean5901864127478979310.d/classes/schemaorg_apache_xmlbeans/src/swiftscript.xsd > (No such file or directory) > [java] Time to build schema type system: 0.559 seconds > [java] Exception in thread "main" > org.apache.xmlbeans.SchemaTypeLoaderException: > /tmp/xbean5901864127478979310.d/classes/schemaorg_apache_xmlbeans/system/s4846B13C10E24B6C12C8DCBE3348DA75/procedure8537type.xsb > (No such file or directory) > (schemaorg_apache_xmlbeans.system.s4846B13C10E24B6C12C8DCBE3348DA75.procedure8537type) > - code 9 > [java] at > org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl$XsbReader.getSaverStream(SchemaTypeSystemImpl.java:2214) > [java] at > org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl$XsbReader.writeRealHeader(SchemaTypeSystemImpl.java:1589) > [java] at > org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.saveType(SchemaTypeSystemImpl.java:1440) > [java] at > org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.saveTypesRecursively(SchemaTypeSystemImpl.java:1316) > [java] at > org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.save(SchemaTypeSystemImpl.java:1291) > [java] at > org.apache.xmlbeans.impl.tool.SchemaCompiler.compile(SchemaCompiler.java:1098) > [java] at > org.apache.xmlbeans.impl.tool.SchemaCompiler.main(SchemaCompiler.java:368) > > BUILD FAILED > /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/build.xml:247: Java > returned: 1 > > and here is the run error I receive once I compile on a different machine: > Failed to transfer wrapper log from > unrectified-20101003-1339-voon0t62/info/l on pads > Execution failed: > Failed to transfer wrapper log from > unrectified-20101003-1339-voon0t62/info/8 on pads > Exception in mProject: > Arguments: [-X, raw_dir/2mass-atlas-000713s-j0760245.fits, > proj_dir/proj_2mass-atlas-000713s-j0760245.fits, header.hdr] > Host: pads > Directory: unrectified-20101003-1339-voon0t62/jobs/l/mProject-lvknimzj > stderr.txt: > > stdout.txt: > > ---- > > Caused by: > Task failed: > org.globus.cog.abstraction.impl.scheduler.common.ProcessException: > Exitcode file > (/home/jonmon/.globus/scripts/PBS6388672747278247642.submit.exitcode) > not found 5 queue polls after the job was reported done > at > org.globus.cog.abstraction.impl.scheduler.common.Job.close(Job.java:66) > at > org.globus.cog.abstraction.impl.scheduler.common.Job.setState(Job.java:177) > at > org.globus.cog.abstraction.impl.scheduler.pbs.QueuePoller.processStdout(QueuePoller.java:126) > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.pollQueue(AbstractQueuePoller.java:169) > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.run(AbstractQueuePoller.java:82) > at java.lang.Thread.run(Thread.java:619) > > The run directory with the files needed to execute and log files is in > ~jonmon/Workspace/Swift/Montage/katz_slides_test/run.0001 > > On 10/3/10 1:30 PM, Mihael Hategan wrote: > > On Sun, 2010-10-03 at 13:27 -0500, Jonathan Monette wrote: > >> Hello, > >> Anyone having a problem using Swift on PADS? I updated Swift and > >> cog to the most recent from trunk and now I cannot compile Swift on > >> PADS. > > I made some recent commits which might be the cause. But I need specific > > errors. > > > > Mihael > > > >> I have to use bridled or another ci machine that shares the > >> filesystem and compile there. I then come back to PADS to execute my > >> swift script and get all sorts of errors. Is anyone experiencing > >> similar problems when using PADS? > >> > > > > -- > Jon > > Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. > - Albert Einstein > From jon.monette at gmail.com Sun Oct 3 19:26:47 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 03 Oct 2010 19:26:47 -0500 Subject: [Swift-user] PADS In-Reply-To: <1286133471.1951.1.camel@blabla2.none> References: <4CA8CB04.2000708@gmail.com> <1286130610.1951.0.camel@blabla2.none> <4CA8CEF4.6080703@gmail.com> <1286133471.1951.1.camel@blabla2.none> Message-ID: <4CA91F47.9080805@gmail.com> I am still not certain why I cannot compile Swift on the head node of PADS but I ran across this error in my runs.Worker task failed: Error submitting block task org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Cannot submit job: Could not submit job (qsub reported an exit code of 1). Error: /home/jonmon/.globus/scripts/PBS807550213750625026.submitUnknown parameters, or invalid PBS script locationPlease contact pads-support at ci.uchicago.eduQsub options: usage: qsub [-a date_time] [-A account_string] [-b secs] [-c [ none | { enabled | periodic | shutdown | depth= | dir= | interval=}... ] [-C directive_prefix] [-d path] [-D path] [-e path] [-h] [-I] [-j oe] [-k {oe}] [-l resource_list] [-m n|{abe}] [-M user_list] [-N jobname] [-o path] [-p priority] [-P proxy_user] [-q queue] [-r y|n] [-S path] [-t number_to_submit] [-T type] [-u user_list] [-w] path [-W otherattributes=value...] [-v variable_list] [-V ] [-x] [-X] [-z] [script]Additional site options: [-h | --help] display usageDetailed information available at: http://www.ci.uchicago.edu/wiki/bin/view/PADS at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63) at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45) at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:56) at org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:66) Caused by: org.globus.cog.abstraction.impl.scheduler.common.ProcessException: Could not submit job (qsub reported an exit code of 1). Error: /home/jonmon/.globus/scripts/PBS807550213750625026.submitUnknown parameters, or invalid PBS script locationPlease contact pads-support at ci.uchicago.eduQsub options: usage: qsub [-a date_time] [-A account_string] [-b secs] [-c [ none | { enabled | periodic | shutdown | depth= | dir= | interval=}... ] [-C directive_prefix] [-d path] [-D path] [-e path] [-h] [-I] [-j oe] [-k {oe}] [-l resource_list] [-m n|{abe}] [-M user_list] [-N jobname] [-o path] [-p priority] [-P proxy_user] [-q queue] [-r y|n] [-S path] [-t number_to_submit] [-T type] [-u user_list] [-w] path [-W otherattributes=value...] [-v variable_list] [-V ] [-x] [-X] [-z] [script]Additional site options: [-h | --help] display usageDetailed information available at: http://www.ci.uchicago.edu/wiki/bin/view/PADS at org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:102) at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53) ... 3 more I got a bunch of failed to shutdown block and then got this error. Attached is the stdout from that run. I also noticed that inside my run directories several PBS*.submit* files are in there. In the PBS*.submit.e* files they seem to be complaining that they can't find a certain file. This is the error that they are reporting: zsh: no such file or directory: /var/spool/torque/mom_priv/jobs/505703.svc.pads.ci.uchicago.edu.SC Does this new information help deduce what the problem is with PADS? Is this a system problem or has a new bug appeared in Swift? On 10/03/2010 02:17 PM, Mihael Hategan wrote: > Ok. I don't think that's related to my commits. > > On Sun, 2010-10-03 at 13:44 -0500, Jonathan Monette wrote: >> Here is the compile error: >> generateVersion: >> >> antlr: >> [java] ANTLR Parser Generator Version 2.7.5 (20050128) >> 1989-2005 jGuru.com >> [java] resources/swiftscript.g:1028: warning:nondeterminism upon >> [java] resources/swiftscript.g:1028: k==1:LBRACK >> [java] resources/swiftscript.g:1028: >> k==2:ID,STRING_LITERAL,LBRACK,LPAREN,AT,PLUS,MINUS,STAR,NOT,INT_LITERAL,FLOAT_LITERAL,"true","false" >> [java] resources/swiftscript.g:1028: between alt 1 and exit >> branch of block >> >> compileSchema: >> [java] IO Error java.io.FileNotFoundException: >> /tmp/xbean5901864127478979310.d/classes/schemaorg_apache_xmlbeans/src/swiftscript.xsd >> (No such file or directory) >> [java] Time to build schema type system: 0.559 seconds >> [java] Exception in thread "main" >> org.apache.xmlbeans.SchemaTypeLoaderException: >> /tmp/xbean5901864127478979310.d/classes/schemaorg_apache_xmlbeans/system/s4846B13C10E24B6C12C8DCBE3348DA75/procedure8537type.xsb >> (No such file or directory) >> (schemaorg_apache_xmlbeans.system.s4846B13C10E24B6C12C8DCBE3348DA75.procedure8537type) >> - code 9 >> [java] at >> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl$XsbReader.getSaverStream(SchemaTypeSystemImpl.java:2214) >> [java] at >> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl$XsbReader.writeRealHeader(SchemaTypeSystemImpl.java:1589) >> [java] at >> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.saveType(SchemaTypeSystemImpl.java:1440) >> [java] at >> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.saveTypesRecursively(SchemaTypeSystemImpl.java:1316) >> [java] at >> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.save(SchemaTypeSystemImpl.java:1291) >> [java] at >> org.apache.xmlbeans.impl.tool.SchemaCompiler.compile(SchemaCompiler.java:1098) >> [java] at >> org.apache.xmlbeans.impl.tool.SchemaCompiler.main(SchemaCompiler.java:368) >> >> BUILD FAILED >> /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/build.xml:247: Java >> returned: 1 >> >> and here is the run error I receive once I compile on a different machine: >> Failed to transfer wrapper log from >> unrectified-20101003-1339-voon0t62/info/l on pads >> Execution failed: >> Failed to transfer wrapper log from >> unrectified-20101003-1339-voon0t62/info/8 on pads >> Exception in mProject: >> Arguments: [-X, raw_dir/2mass-atlas-000713s-j0760245.fits, >> proj_dir/proj_2mass-atlas-000713s-j0760245.fits, header.hdr] >> Host: pads >> Directory: unrectified-20101003-1339-voon0t62/jobs/l/mProject-lvknimzj >> stderr.txt: >> >> stdout.txt: >> >> ---- >> >> Caused by: >> Task failed: >> org.globus.cog.abstraction.impl.scheduler.common.ProcessException: >> Exitcode file >> (/home/jonmon/.globus/scripts/PBS6388672747278247642.submit.exitcode) >> not found 5 queue polls after the job was reported done >> at >> org.globus.cog.abstraction.impl.scheduler.common.Job.close(Job.java:66) >> at >> org.globus.cog.abstraction.impl.scheduler.common.Job.setState(Job.java:177) >> at >> org.globus.cog.abstraction.impl.scheduler.pbs.QueuePoller.processStdout(QueuePoller.java:126) >> at >> org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.pollQueue(AbstractQueuePoller.java:169) >> at >> org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.run(AbstractQueuePoller.java:82) >> at java.lang.Thread.run(Thread.java:619) >> >> The run directory with the files needed to execute and log files is in >> ~jonmon/Workspace/Swift/Montage/katz_slides_test/run.0001 >> >> On 10/3/10 1:30 PM, Mihael Hategan wrote: >>> On Sun, 2010-10-03 at 13:27 -0500, Jonathan Monette wrote: >>>> Hello, >>>> Anyone having a problem using Swift on PADS? I updated Swift and >>>> cog to the most recent from trunk and now I cannot compile Swift on >>>> PADS. >>> I made some recent commits which might be the cause. But I need specific >>> errors. >>> >>> Mihael >>> >>>> I have to use bridled or another ci machine that shares the >>>> filesystem and compile there. I then come back to PADS to execute my >>>> swift script and get all sorts of errors. Is anyone experiencing >>>> similar problems when using PADS? >>>> >> -- >> Jon >> >> Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. >> - Albert Einstein >> > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: swift.out URL: From hategan at mcs.anl.gov Sun Oct 3 19:44:43 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 03 Oct 2010 17:44:43 -0700 Subject: [Swift-user] PADS In-Reply-To: <4CA91F47.9080805@gmail.com> References: <4CA8CB04.2000708@gmail.com> <1286130610.1951.0.camel@blabla2.none> <4CA8CEF4.6080703@gmail.com> <1286133471.1951.1.camel@blabla2.none> <4CA91F47.9080805@gmail.com> Message-ID: <1286153083.4191.4.camel@blabla2.none> Can you set debug=true in etc/provider-pbs.properties and capture a submit script? Mihael On Sun, 2010-10-03 at 19:26 -0500, Jonathan Monette wrote: > I am still not certain why I cannot compile Swift on the head node of > PADS but I ran across this error in my runs.Worker task failed: Error > submitting block task > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > Cannot submit job: Could not submit job (qsub reported an exit code of 1). > Error: /home/jonmon/.globus/scripts/PBS807550213750625026.submitUnknown > parameters, or invalid PBS script locationPlease contact > pads-support at ci.uchicago.eduQsub options: usage: qsub [-a date_time] [-A > account_string] [-b secs] [-c [ none | { enabled | periodic | > shutdown | depth= | dir= | interval=}... ] > [-C directive_prefix] [-d path] [-D path] [-e path] [-h] [-I] [-j > oe] [-k {oe}] [-l resource_list] [-m n|{abe}] [-M user_list] [-N > jobname] [-o path] [-p priority] [-P proxy_user] [-q queue] [-r > y|n] [-S path] [-t number_to_submit] [-T type] [-u user_list] [-w] > path [-W otherattributes=value...] [-v variable_list] [-V ] [-x] > [-X] [-z] [script]Additional site options: [-h | --help] display > usageDetailed information available at: > http://www.ci.uchicago.edu/wiki/bin/view/PADS > > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63) > at > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45) > at > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:56) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:66) > Caused by: > org.globus.cog.abstraction.impl.scheduler.common.ProcessException: Could > not submit job (qsub reported an exit code of 1). > Error: /home/jonmon/.globus/scripts/PBS807550213750625026.submitUnknown > parameters, or invalid PBS script locationPlease contact > pads-support at ci.uchicago.eduQsub options: usage: qsub [-a date_time] [-A > account_string] [-b secs] [-c [ none | { enabled | periodic | > shutdown | depth= | dir= | interval=}... ] > [-C directive_prefix] [-d path] [-D path] [-e path] [-h] [-I] [-j > oe] [-k {oe}] [-l resource_list] [-m n|{abe}] [-M user_list] [-N > jobname] [-o path] [-p priority] [-P proxy_user] [-q queue] [-r > y|n] [-S path] [-t number_to_submit] [-T type] [-u user_list] [-w] > path [-W otherattributes=value...] [-v variable_list] [-V ] [-x] > [-X] [-z] [script]Additional site options: [-h | --help] display > usageDetailed information available at: > http://www.ci.uchicago.edu/wiki/bin/view/PADS > > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:102) > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53) > ... 3 more > > I got a bunch of failed to shutdown block and then got this error. > Attached is the stdout from that run. I also noticed that inside my run > directories several PBS*.submit* files are in there. In the > PBS*.submit.e* files they seem to be complaining that they can't find a > certain file. This is the error that they are reporting: > zsh: no such file or directory: > /var/spool/torque/mom_priv/jobs/505703.svc.pads.ci.uchicago.edu.SC > > Does this new information help deduce what the problem is with PADS? Is > this a system problem or has a new bug appeared in Swift? > > On 10/03/2010 02:17 PM, Mihael Hategan wrote: > > Ok. I don't think that's related to my commits. > > > > On Sun, 2010-10-03 at 13:44 -0500, Jonathan Monette wrote: > >> Here is the compile error: > >> generateVersion: > >> > >> antlr: > >> [java] ANTLR Parser Generator Version 2.7.5 (20050128) > >> 1989-2005 jGuru.com > >> [java] resources/swiftscript.g:1028: warning:nondeterminism upon > >> [java] resources/swiftscript.g:1028: k==1:LBRACK > >> [java] resources/swiftscript.g:1028: > >> k==2:ID,STRING_LITERAL,LBRACK,LPAREN,AT,PLUS,MINUS,STAR,NOT,INT_LITERAL,FLOAT_LITERAL,"true","false" > >> [java] resources/swiftscript.g:1028: between alt 1 and exit > >> branch of block > >> > >> compileSchema: > >> [java] IO Error java.io.FileNotFoundException: > >> /tmp/xbean5901864127478979310.d/classes/schemaorg_apache_xmlbeans/src/swiftscript.xsd > >> (No such file or directory) > >> [java] Time to build schema type system: 0.559 seconds > >> [java] Exception in thread "main" > >> org.apache.xmlbeans.SchemaTypeLoaderException: > >> /tmp/xbean5901864127478979310.d/classes/schemaorg_apache_xmlbeans/system/s4846B13C10E24B6C12C8DCBE3348DA75/procedure8537type.xsb > >> (No such file or directory) > >> (schemaorg_apache_xmlbeans.system.s4846B13C10E24B6C12C8DCBE3348DA75.procedure8537type) > >> - code 9 > >> [java] at > >> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl$XsbReader.getSaverStream(SchemaTypeSystemImpl.java:2214) > >> [java] at > >> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl$XsbReader.writeRealHeader(SchemaTypeSystemImpl.java:1589) > >> [java] at > >> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.saveType(SchemaTypeSystemImpl.java:1440) > >> [java] at > >> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.saveTypesRecursively(SchemaTypeSystemImpl.java:1316) > >> [java] at > >> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.save(SchemaTypeSystemImpl.java:1291) > >> [java] at > >> org.apache.xmlbeans.impl.tool.SchemaCompiler.compile(SchemaCompiler.java:1098) > >> [java] at > >> org.apache.xmlbeans.impl.tool.SchemaCompiler.main(SchemaCompiler.java:368) > >> > >> BUILD FAILED > >> /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/build.xml:247: Java > >> returned: 1 > >> > >> and here is the run error I receive once I compile on a different machine: > >> Failed to transfer wrapper log from > >> unrectified-20101003-1339-voon0t62/info/l on pads > >> Execution failed: > >> Failed to transfer wrapper log from > >> unrectified-20101003-1339-voon0t62/info/8 on pads > >> Exception in mProject: > >> Arguments: [-X, raw_dir/2mass-atlas-000713s-j0760245.fits, > >> proj_dir/proj_2mass-atlas-000713s-j0760245.fits, header.hdr] > >> Host: pads > >> Directory: unrectified-20101003-1339-voon0t62/jobs/l/mProject-lvknimzj > >> stderr.txt: > >> > >> stdout.txt: > >> > >> ---- > >> > >> Caused by: > >> Task failed: > >> org.globus.cog.abstraction.impl.scheduler.common.ProcessException: > >> Exitcode file > >> (/home/jonmon/.globus/scripts/PBS6388672747278247642.submit.exitcode) > >> not found 5 queue polls after the job was reported done > >> at > >> org.globus.cog.abstraction.impl.scheduler.common.Job.close(Job.java:66) > >> at > >> org.globus.cog.abstraction.impl.scheduler.common.Job.setState(Job.java:177) > >> at > >> org.globus.cog.abstraction.impl.scheduler.pbs.QueuePoller.processStdout(QueuePoller.java:126) > >> at > >> org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.pollQueue(AbstractQueuePoller.java:169) > >> at > >> org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.run(AbstractQueuePoller.java:82) > >> at java.lang.Thread.run(Thread.java:619) > >> > >> The run directory with the files needed to execute and log files is in > >> ~jonmon/Workspace/Swift/Montage/katz_slides_test/run.0001 > >> > >> On 10/3/10 1:30 PM, Mihael Hategan wrote: > >>> On Sun, 2010-10-03 at 13:27 -0500, Jonathan Monette wrote: > >>>> Hello, > >>>> Anyone having a problem using Swift on PADS? I updated Swift and > >>>> cog to the most recent from trunk and now I cannot compile Swift on > >>>> PADS. > >>> I made some recent commits which might be the cause. But I need specific > >>> errors. > >>> > >>> Mihael > >>> > >>>> I have to use bridled or another ci machine that shares the > >>>> filesystem and compile there. I then come back to PADS to execute my > >>>> swift script and get all sorts of errors. Is anyone experiencing > >>>> similar problems when using PADS? > >>>> > >> -- > >> Jon > >> > >> Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. > >> - Albert Einstein > >> > > > > -- > Jon > > Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. > - Albert Einstein > From wilde at mcs.anl.gov Sun Oct 3 20:08:22 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 3 Oct 2010 19:08:22 -0600 (GMT-06:00) Subject: [Swift-user] PADS In-Reply-To: <1286153083.4191.4.camel@blabla2.none> Message-ID: <897688021.631571286154502611.JavaMail.root@zimbra.anl.gov> I was able to run a pbs job on pads using swift trunk just now. Can you also post you sites.xml file, Jon? Mine was: fast 00:05:00 /home/wilde/swiftwork - Mike ----- "Mihael Hategan" wrote: > Can you set debug=true in etc/provider-pbs.properties and capture a > submit script? > > Mihael > > On Sun, 2010-10-03 at 19:26 -0500, Jonathan Monette wrote: > > I am still not certain why I cannot compile Swift on the head node > of > > PADS but I ran across this error in my runs.Worker task failed: > Error > > submitting block task > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > > Cannot submit job: Could not submit job (qsub reported an exit code > of 1). > > Error: > /home/jonmon/.globus/scripts/PBS807550213750625026.submitUnknown > > parameters, or invalid PBS script locationPlease contact > > pads-support at ci.uchicago.eduQsub options: usage: qsub [-a date_time] > [-A > > account_string] [-b secs] [-c [ none | { enabled | periodic | > > shutdown | depth= | dir= | interval=}... ] > > > [-C directive_prefix] [-d path] [-D path] [-e path] [-h] [-I] > [-j > > oe] [-k {oe}] [-l resource_list] [-m n|{abe}] [-M user_list] > [-N > > jobname] [-o path] [-p priority] [-P proxy_user] [-q queue] > [-r > > y|n] [-S path] [-t number_to_submit] [-T type] [-u user_list] [-w] > > > path [-W otherattributes=value...] [-v variable_list] [-V ] > [-x] > > [-X] [-z] [script]Additional site options: [-h | --help] display > > > usageDetailed information available at: > > http://www.ci.uchicago.edu/wiki/bin/view/PADS > > > > at > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63) > > at > > > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45) > > at > > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:56) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:66) > > Caused by: > > org.globus.cog.abstraction.impl.scheduler.common.ProcessException: > Could > > not submit job (qsub reported an exit code of 1). > > Error: > /home/jonmon/.globus/scripts/PBS807550213750625026.submitUnknown > > parameters, or invalid PBS script locationPlease contact > > pads-support at ci.uchicago.eduQsub options: usage: qsub [-a date_time] > [-A > > account_string] [-b secs] [-c [ none | { enabled | periodic | > > shutdown | depth= | dir= | interval=}... ] > > > [-C directive_prefix] [-d path] [-D path] [-e path] [-h] [-I] > [-j > > oe] [-k {oe}] [-l resource_list] [-m n|{abe}] [-M user_list] > [-N > > jobname] [-o path] [-p priority] [-P proxy_user] [-q queue] > [-r > > y|n] [-S path] [-t number_to_submit] [-T type] [-u user_list] [-w] > > > path [-W otherattributes=value...] [-v variable_list] [-V ] > [-x] > > [-X] [-z] [script]Additional site options: [-h | --help] display > > > usageDetailed information available at: > > http://www.ci.uchicago.edu/wiki/bin/view/PADS > > > > at > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:102) > > at > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53) > > ... 3 more > > > > I got a bunch of failed to shutdown block and then got this error. > > > Attached is the stdout from that run. I also noticed that inside my > run > > directories several PBS*.submit* files are in there. In the > > PBS*.submit.e* files they seem to be complaining that they can't > find a > > certain file. This is the error that they are reporting: > > zsh: no such file or directory: > > /var/spool/torque/mom_priv/jobs/505703.svc.pads.ci.uchicago.edu.SC > > > > Does this new information help deduce what the problem is with PADS? > Is > > this a system problem or has a new bug appeared in Swift? > > > > On 10/03/2010 02:17 PM, Mihael Hategan wrote: > > > Ok. I don't think that's related to my commits. > > > > > > On Sun, 2010-10-03 at 13:44 -0500, Jonathan Monette wrote: > > >> Here is the compile error: > > >> generateVersion: > > >> > > >> antlr: > > >> [java] ANTLR Parser Generator Version 2.7.5 (20050128) > > >> 1989-2005 jGuru.com > > >> [java] resources/swiftscript.g:1028: > warning:nondeterminism upon > > >> [java] resources/swiftscript.g:1028: k==1:LBRACK > > >> [java] resources/swiftscript.g:1028: > > >> > k==2:ID,STRING_LITERAL,LBRACK,LPAREN,AT,PLUS,MINUS,STAR,NOT,INT_LITERAL,FLOAT_LITERAL,"true","false" > > >> [java] resources/swiftscript.g:1028: between alt 1 and > exit > > >> branch of block > > >> > > >> compileSchema: > > >> [java] IO Error java.io.FileNotFoundException: > > >> > /tmp/xbean5901864127478979310.d/classes/schemaorg_apache_xmlbeans/src/swiftscript.xsd > > >> (No such file or directory) > > >> [java] Time to build schema type system: 0.559 seconds > > >> [java] Exception in thread "main" > > >> org.apache.xmlbeans.SchemaTypeLoaderException: > > >> > /tmp/xbean5901864127478979310.d/classes/schemaorg_apache_xmlbeans/system/s4846B13C10E24B6C12C8DCBE3348DA75/procedure8537type.xsb > > >> (No such file or directory) > > >> > (schemaorg_apache_xmlbeans.system.s4846B13C10E24B6C12C8DCBE3348DA75.procedure8537type) > > >> - code 9 > > >> [java] at > > >> > org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl$XsbReader.getSaverStream(SchemaTypeSystemImpl.java:2214) > > >> [java] at > > >> > org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl$XsbReader.writeRealHeader(SchemaTypeSystemImpl.java:1589) > > >> [java] at > > >> > org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.saveType(SchemaTypeSystemImpl.java:1440) > > >> [java] at > > >> > org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.saveTypesRecursively(SchemaTypeSystemImpl.java:1316) > > >> [java] at > > >> > org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.save(SchemaTypeSystemImpl.java:1291) > > >> [java] at > > >> > org.apache.xmlbeans.impl.tool.SchemaCompiler.compile(SchemaCompiler.java:1098) > > >> [java] at > > >> > org.apache.xmlbeans.impl.tool.SchemaCompiler.main(SchemaCompiler.java:368) > > >> > > >> BUILD FAILED > > >> > /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/build.xml:247: > Java > > >> returned: 1 > > >> > > >> and here is the run error I receive once I compile on a different > machine: > > >> Failed to transfer wrapper log from > > >> unrectified-20101003-1339-voon0t62/info/l on pads > > >> Execution failed: > > >> Failed to transfer wrapper log from > > >> unrectified-20101003-1339-voon0t62/info/8 on pads > > >> Exception in mProject: > > >> Arguments: [-X, raw_dir/2mass-atlas-000713s-j0760245.fits, > > >> proj_dir/proj_2mass-atlas-000713s-j0760245.fits, header.hdr] > > >> Host: pads > > >> Directory: > unrectified-20101003-1339-voon0t62/jobs/l/mProject-lvknimzj > > >> stderr.txt: > > >> > > >> stdout.txt: > > >> > > >> ---- > > >> > > >> Caused by: > > >> Task failed: > > >> > org.globus.cog.abstraction.impl.scheduler.common.ProcessException: > > >> Exitcode file > > >> > (/home/jonmon/.globus/scripts/PBS6388672747278247642.submit.exitcode) > > >> not found 5 queue polls after the job was reported done > > >> at > > >> > org.globus.cog.abstraction.impl.scheduler.common.Job.close(Job.java:66) > > >> at > > >> > org.globus.cog.abstraction.impl.scheduler.common.Job.setState(Job.java:177) > > >> at > > >> > org.globus.cog.abstraction.impl.scheduler.pbs.QueuePoller.processStdout(QueuePoller.java:126) > > >> at > > >> > org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.pollQueue(AbstractQueuePoller.java:169) > > >> at > > >> > org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.run(AbstractQueuePoller.java:82) > > >> at java.lang.Thread.run(Thread.java:619) > > >> > > >> The run directory with the files needed to execute and log files > is in > > >> ~jonmon/Workspace/Swift/Montage/katz_slides_test/run.0001 > > >> > > >> On 10/3/10 1:30 PM, Mihael Hategan wrote: > > >>> On Sun, 2010-10-03 at 13:27 -0500, Jonathan Monette wrote: > > >>>> Hello, > > >>>> Anyone having a problem using Swift on PADS? I updated > Swift and > > >>>> cog to the most recent from trunk and now I cannot compile > Swift on > > >>>> PADS. > > >>> I made some recent commits which might be the cause. But I need > specific > > >>> errors. > > >>> > > >>> Mihael > > >>> > > >>>> I have to use bridled or another ci machine that shares > the > > >>>> filesystem and compile there. I then come back to PADS to > execute my > > >>>> swift script and get all sorts of errors. Is anyone > experiencing > > >>>> similar problems when using PADS? > > >>>> > > >> -- > > >> Jon > > >> > > >> Computers are incredibly fast, accurate, and stupid. Human beings > are incredibly slow, inaccurate, and brilliant. Together they are > powerful beyond imagination. > > >> - Albert Einstein > > >> > > > > > > > -- > > Jon > > > > Computers are incredibly fast, accurate, and stupid. Human beings > are incredibly slow, inaccurate, and brilliant. Together they are > powerful beyond imagination. > > - Albert Einstein > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From jon.monette at gmail.com Sun Oct 3 20:45:57 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 03 Oct 2010 20:45:57 -0500 Subject: [Swift-user] PADS In-Reply-To: <897688021.631571286154502611.JavaMail.root@zimbra.anl.gov> References: <897688021.631571286154502611.JavaMail.root@zimbra.anl.gov> Message-ID: <4CA931D5.7020709@gmail.com> I am running jobs using coasters. Here is my sites file. .05 /gpfs/pads/swift/jonmon/Swift/work/localhost 3600 192.5.86.6 1 10 1 1 fast 1 10000 /gpfs/pads/swift/jonmon/Swift/work/pads This is a PBS submit.o file ---------------------------------------- Begin PBS Prologue Sun Oct 3 19:15:42 CDT 2010 Job ID: 505699.svc.pads.ci.uchicago.edu Username: jonmon Group: ci-users Nodes: c06.pads.ci.uchicago.edu End PBS Prologue Sun Oct 3 19:15:42 CDT 2010 ---------------------------------------- ---------------------------------------- Begin PBS Epilogue Sun Oct 3 19:15:43 CDT 2010 Job ID: 505699.svc.pads.ci.uchicago.edu Username: jonmon Group: ci-users Job Name: PBS1275966776980450327.submit Session: 30684 Limits: ncpus=1,neednodes=1,nodes=1,size=1,walltime=00:18:00 Resources: cput=00:00:00,mem=0kb,vmem=0kb,walltime=00:00:01 Nodes: c06.pads.ci.uchicago.edu End PBS Epilogue Sun Oct 3 19:15:43 CDT 2010 ---------------------------------------- This is using the most recent trunk of Swift and cog. I did not change any of the Swift code merely updated the library and re-compiled. On 10/03/2010 08:08 PM, Michael Wilde wrote: > I was able to run a pbs job on pads using swift trunk just now. > > Can you also post you sites.xml file, Jon? > > Mine was: > > > > > fast > 00:05:00 > > /home/wilde/swiftwork > > > > > - Mike > > > ----- "Mihael Hategan" wrote: > >> Can you set debug=true in etc/provider-pbs.properties and capture a >> submit script? >> >> Mihael >> >> On Sun, 2010-10-03 at 19:26 -0500, Jonathan Monette wrote: >>> I am still not certain why I cannot compile Swift on the head node >> of >>> PADS but I ran across this error in my runs.Worker task failed: >> Error >>> submitting block task >>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>> Cannot submit job: Could not submit job (qsub reported an exit code >> of 1). >>> Error: >> /home/jonmon/.globus/scripts/PBS807550213750625026.submitUnknown >>> parameters, or invalid PBS script locationPlease contact >>> pads-support at ci.uchicago.eduQsub options: usage: qsub [-a date_time] >> [-A >>> account_string] [-b secs] [-c [ none | { enabled | periodic | >>> shutdown | depth= | dir= | interval=}... ] >> >>> [-C directive_prefix] [-d path] [-D path] [-e path] [-h] [-I] >> [-j >>> oe] [-k {oe}] [-l resource_list] [-m n|{abe}] [-M user_list] >> [-N >>> jobname] [-o path] [-p priority] [-P proxy_user] [-q queue] >> [-r >>> y|n] [-S path] [-t number_to_submit] [-T type] [-u user_list] [-w] >>> path [-W otherattributes=value...] [-v variable_list] [-V ] >> [-x] >>> [-X] [-z] [script]Additional site options: [-h | --help] display >>> usageDetailed information available at: >>> http://www.ci.uchicago.edu/wiki/bin/view/PADS >>> >>> at >>> >> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63) >>> at >>> >> org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45) >>> at >>> >> org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:56) >>> at >>> >> org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:66) >>> Caused by: >>> org.globus.cog.abstraction.impl.scheduler.common.ProcessException: >> Could >>> not submit job (qsub reported an exit code of 1). >>> Error: >> /home/jonmon/.globus/scripts/PBS807550213750625026.submitUnknown >>> parameters, or invalid PBS script locationPlease contact >>> pads-support at ci.uchicago.eduQsub options: usage: qsub [-a date_time] >> [-A >>> account_string] [-b secs] [-c [ none | { enabled | periodic | >>> shutdown | depth= | dir= | interval=}... ] >> >>> [-C directive_prefix] [-d path] [-D path] [-e path] [-h] [-I] >> [-j >>> oe] [-k {oe}] [-l resource_list] [-m n|{abe}] [-M user_list] >> [-N >>> jobname] [-o path] [-p priority] [-P proxy_user] [-q queue] >> [-r >>> y|n] [-S path] [-t number_to_submit] [-T type] [-u user_list] [-w] >>> path [-W otherattributes=value...] [-v variable_list] [-V ] >> [-x] >>> [-X] [-z] [script]Additional site options: [-h | --help] display >>> usageDetailed information available at: >>> http://www.ci.uchicago.edu/wiki/bin/view/PADS >>> >>> at >>> >> org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:102) >>> at >>> >> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53) >>> ... 3 more >>> >>> I got a bunch of failed to shutdown block and then got this error. >>> Attached is the stdout from that run. I also noticed that inside my >> run >>> directories several PBS*.submit* files are in there. In the >>> PBS*.submit.e* files they seem to be complaining that they can't >> find a >>> certain file. This is the error that they are reporting: >>> zsh: no such file or directory: >>> /var/spool/torque/mom_priv/jobs/505703.svc.pads.ci.uchicago.edu.SC >>> >>> Does this new information help deduce what the problem is with PADS? >> Is >>> this a system problem or has a new bug appeared in Swift? >>> >>> On 10/03/2010 02:17 PM, Mihael Hategan wrote: >>>> Ok. I don't think that's related to my commits. >>>> >>>> On Sun, 2010-10-03 at 13:44 -0500, Jonathan Monette wrote: >>>>> Here is the compile error: >>>>> generateVersion: >>>>> >>>>> antlr: >>>>> [java] ANTLR Parser Generator Version 2.7.5 (20050128) >>>>> 1989-2005 jGuru.com >>>>> [java] resources/swiftscript.g:1028: >> warning:nondeterminism upon >>>>> [java] resources/swiftscript.g:1028: k==1:LBRACK >>>>> [java] resources/swiftscript.g:1028: >>>>> >> k==2:ID,STRING_LITERAL,LBRACK,LPAREN,AT,PLUS,MINUS,STAR,NOT,INT_LITERAL,FLOAT_LITERAL,"true","false" >>>>> [java] resources/swiftscript.g:1028: between alt 1 and >> exit >>>>> branch of block >>>>> >>>>> compileSchema: >>>>> [java] IO Error java.io.FileNotFoundException: >>>>> >> /tmp/xbean5901864127478979310.d/classes/schemaorg_apache_xmlbeans/src/swiftscript.xsd >>>>> (No such file or directory) >>>>> [java] Time to build schema type system: 0.559 seconds >>>>> [java] Exception in thread "main" >>>>> org.apache.xmlbeans.SchemaTypeLoaderException: >>>>> >> /tmp/xbean5901864127478979310.d/classes/schemaorg_apache_xmlbeans/system/s4846B13C10E24B6C12C8DCBE3348DA75/procedure8537type.xsb >>>>> (No such file or directory) >>>>> >> (schemaorg_apache_xmlbeans.system.s4846B13C10E24B6C12C8DCBE3348DA75.procedure8537type) >>>>> - code 9 >>>>> [java] at >>>>> >> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl$XsbReader.getSaverStream(SchemaTypeSystemImpl.java:2214) >>>>> [java] at >>>>> >> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl$XsbReader.writeRealHeader(SchemaTypeSystemImpl.java:1589) >>>>> [java] at >>>>> >> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.saveType(SchemaTypeSystemImpl.java:1440) >>>>> [java] at >>>>> >> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.saveTypesRecursively(SchemaTypeSystemImpl.java:1316) >>>>> [java] at >>>>> >> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.save(SchemaTypeSystemImpl.java:1291) >>>>> [java] at >>>>> >> org.apache.xmlbeans.impl.tool.SchemaCompiler.compile(SchemaCompiler.java:1098) >>>>> [java] at >>>>> >> org.apache.xmlbeans.impl.tool.SchemaCompiler.main(SchemaCompiler.java:368) >>>>> BUILD FAILED >>>>> >> /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/build.xml:247: >> Java >>>>> returned: 1 >>>>> >>>>> and here is the run error I receive once I compile on a different >> machine: >>>>> Failed to transfer wrapper log from >>>>> unrectified-20101003-1339-voon0t62/info/l on pads >>>>> Execution failed: >>>>> Failed to transfer wrapper log from >>>>> unrectified-20101003-1339-voon0t62/info/8 on pads >>>>> Exception in mProject: >>>>> Arguments: [-X, raw_dir/2mass-atlas-000713s-j0760245.fits, >>>>> proj_dir/proj_2mass-atlas-000713s-j0760245.fits, header.hdr] >>>>> Host: pads >>>>> Directory: >> unrectified-20101003-1339-voon0t62/jobs/l/mProject-lvknimzj >>>>> stderr.txt: >>>>> >>>>> stdout.txt: >>>>> >>>>> ---- >>>>> >>>>> Caused by: >>>>> Task failed: >>>>> >> org.globus.cog.abstraction.impl.scheduler.common.ProcessException: >>>>> Exitcode file >>>>> >> (/home/jonmon/.globus/scripts/PBS6388672747278247642.submit.exitcode) >>>>> not found 5 queue polls after the job was reported done >>>>> at >>>>> >> org.globus.cog.abstraction.impl.scheduler.common.Job.close(Job.java:66) >>>>> at >>>>> >> org.globus.cog.abstraction.impl.scheduler.common.Job.setState(Job.java:177) >>>>> at >>>>> >> org.globus.cog.abstraction.impl.scheduler.pbs.QueuePoller.processStdout(QueuePoller.java:126) >>>>> at >>>>> >> org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.pollQueue(AbstractQueuePoller.java:169) >>>>> at >>>>> >> org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.run(AbstractQueuePoller.java:82) >>>>> at java.lang.Thread.run(Thread.java:619) >>>>> >>>>> The run directory with the files needed to execute and log files >> is in >>>>> ~jonmon/Workspace/Swift/Montage/katz_slides_test/run.0001 >>>>> >>>>> On 10/3/10 1:30 PM, Mihael Hategan wrote: >>>>>> On Sun, 2010-10-03 at 13:27 -0500, Jonathan Monette wrote: >>>>>>> Hello, >>>>>>> Anyone having a problem using Swift on PADS? I updated >> Swift and >>>>>>> cog to the most recent from trunk and now I cannot compile >> Swift on >>>>>>> PADS. >>>>>> I made some recent commits which might be the cause. But I need >> specific >>>>>> errors. >>>>>> >>>>>> Mihael >>>>>> >>>>>>> I have to use bridled or another ci machine that shares >> the >>>>>>> filesystem and compile there. I then come back to PADS to >> execute my >>>>>>> swift script and get all sorts of errors. Is anyone >> experiencing >>>>>>> similar problems when using PADS? >>>>>>> >>>>> -- >>>>> Jon >>>>> >>>>> Computers are incredibly fast, accurate, and stupid. Human beings >> are incredibly slow, inaccurate, and brilliant. Together they are >> powerful beyond imagination. >>>>> - Albert Einstein >>>>> >>> -- >>> Jon >>> >>> Computers are incredibly fast, accurate, and stupid. Human beings >> are incredibly slow, inaccurate, and brilliant. Together they are >> powerful beyond imagination. >>> - Albert Einstein >>> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From jon.monette at gmail.com Mon Oct 4 12:07:09 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Mon, 04 Oct 2010 12:07:09 -0500 Subject: [Swift-user] PADS In-Reply-To: <897688021.631571286154502611.JavaMail.root@zimbra.anl.gov> References: <897688021.631571286154502611.JavaMail.root@zimbra.anl.gov> Message-ID: <4CAA09BD.4070804@gmail.com> Ok. I am not sure what happened over the weekend but now my jobs are running and they are not erroring out. On 10/03/2010 08:08 PM, Michael Wilde wrote: > I was able to run a pbs job on pads using swift trunk just now. > > Can you also post you sites.xml file, Jon? > > Mine was: > > > > > fast > 00:05:00 > > /home/wilde/swiftwork > > > > > - Mike > > > ----- "Mihael Hategan" wrote: > >> Can you set debug=true in etc/provider-pbs.properties and capture a >> submit script? >> >> Mihael >> >> On Sun, 2010-10-03 at 19:26 -0500, Jonathan Monette wrote: >>> I am still not certain why I cannot compile Swift on the head node >> of >>> PADS but I ran across this error in my runs.Worker task failed: >> Error >>> submitting block task >>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>> Cannot submit job: Could not submit job (qsub reported an exit code >> of 1). >>> Error: >> /home/jonmon/.globus/scripts/PBS807550213750625026.submitUnknown >>> parameters, or invalid PBS script locationPlease contact >>> pads-support at ci.uchicago.eduQsub options: usage: qsub [-a date_time] >> [-A >>> account_string] [-b secs] [-c [ none | { enabled | periodic | >>> shutdown | depth= | dir= | interval=}... ] >> >>> [-C directive_prefix] [-d path] [-D path] [-e path] [-h] [-I] >> [-j >>> oe] [-k {oe}] [-l resource_list] [-m n|{abe}] [-M user_list] >> [-N >>> jobname] [-o path] [-p priority] [-P proxy_user] [-q queue] >> [-r >>> y|n] [-S path] [-t number_to_submit] [-T type] [-u user_list] [-w] >>> path [-W otherattributes=value...] [-v variable_list] [-V ] >> [-x] >>> [-X] [-z] [script]Additional site options: [-h | --help] display >>> usageDetailed information available at: >>> http://www.ci.uchicago.edu/wiki/bin/view/PADS >>> >>> at >>> >> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63) >>> at >>> >> org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45) >>> at >>> >> org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:56) >>> at >>> >> org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:66) >>> Caused by: >>> org.globus.cog.abstraction.impl.scheduler.common.ProcessException: >> Could >>> not submit job (qsub reported an exit code of 1). >>> Error: >> /home/jonmon/.globus/scripts/PBS807550213750625026.submitUnknown >>> parameters, or invalid PBS script locationPlease contact >>> pads-support at ci.uchicago.eduQsub options: usage: qsub [-a date_time] >> [-A >>> account_string] [-b secs] [-c [ none | { enabled | periodic | >>> shutdown | depth= | dir= | interval=}... ] >> >>> [-C directive_prefix] [-d path] [-D path] [-e path] [-h] [-I] >> [-j >>> oe] [-k {oe}] [-l resource_list] [-m n|{abe}] [-M user_list] >> [-N >>> jobname] [-o path] [-p priority] [-P proxy_user] [-q queue] >> [-r >>> y|n] [-S path] [-t number_to_submit] [-T type] [-u user_list] [-w] >>> path [-W otherattributes=value...] [-v variable_list] [-V ] >> [-x] >>> [-X] [-z] [script]Additional site options: [-h | --help] display >>> usageDetailed information available at: >>> http://www.ci.uchicago.edu/wiki/bin/view/PADS >>> >>> at >>> >> org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:102) >>> at >>> >> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53) >>> ... 3 more >>> >>> I got a bunch of failed to shutdown block and then got this error. >>> Attached is the stdout from that run. I also noticed that inside my >> run >>> directories several PBS*.submit* files are in there. In the >>> PBS*.submit.e* files they seem to be complaining that they can't >> find a >>> certain file. This is the error that they are reporting: >>> zsh: no such file or directory: >>> /var/spool/torque/mom_priv/jobs/505703.svc.pads.ci.uchicago.edu.SC >>> >>> Does this new information help deduce what the problem is with PADS? >> Is >>> this a system problem or has a new bug appeared in Swift? >>> >>> On 10/03/2010 02:17 PM, Mihael Hategan wrote: >>>> Ok. I don't think that's related to my commits. >>>> >>>> On Sun, 2010-10-03 at 13:44 -0500, Jonathan Monette wrote: >>>>> Here is the compile error: >>>>> generateVersion: >>>>> >>>>> antlr: >>>>> [java] ANTLR Parser Generator Version 2.7.5 (20050128) >>>>> 1989-2005 jGuru.com >>>>> [java] resources/swiftscript.g:1028: >> warning:nondeterminism upon >>>>> [java] resources/swiftscript.g:1028: k==1:LBRACK >>>>> [java] resources/swiftscript.g:1028: >>>>> >> k==2:ID,STRING_LITERAL,LBRACK,LPAREN,AT,PLUS,MINUS,STAR,NOT,INT_LITERAL,FLOAT_LITERAL,"true","false" >>>>> [java] resources/swiftscript.g:1028: between alt 1 and >> exit >>>>> branch of block >>>>> >>>>> compileSchema: >>>>> [java] IO Error java.io.FileNotFoundException: >>>>> >> /tmp/xbean5901864127478979310.d/classes/schemaorg_apache_xmlbeans/src/swiftscript.xsd >>>>> (No such file or directory) >>>>> [java] Time to build schema type system: 0.559 seconds >>>>> [java] Exception in thread "main" >>>>> org.apache.xmlbeans.SchemaTypeLoaderException: >>>>> >> /tmp/xbean5901864127478979310.d/classes/schemaorg_apache_xmlbeans/system/s4846B13C10E24B6C12C8DCBE3348DA75/procedure8537type.xsb >>>>> (No such file or directory) >>>>> >> (schemaorg_apache_xmlbeans.system.s4846B13C10E24B6C12C8DCBE3348DA75.procedure8537type) >>>>> - code 9 >>>>> [java] at >>>>> >> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl$XsbReader.getSaverStream(SchemaTypeSystemImpl.java:2214) >>>>> [java] at >>>>> >> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl$XsbReader.writeRealHeader(SchemaTypeSystemImpl.java:1589) >>>>> [java] at >>>>> >> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.saveType(SchemaTypeSystemImpl.java:1440) >>>>> [java] at >>>>> >> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.saveTypesRecursively(SchemaTypeSystemImpl.java:1316) >>>>> [java] at >>>>> >> org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl.save(SchemaTypeSystemImpl.java:1291) >>>>> [java] at >>>>> >> org.apache.xmlbeans.impl.tool.SchemaCompiler.compile(SchemaCompiler.java:1098) >>>>> [java] at >>>>> >> org.apache.xmlbeans.impl.tool.SchemaCompiler.main(SchemaCompiler.java:368) >>>>> BUILD FAILED >>>>> >> /autonfs/home/jonmon/Library/Swift/trunk/cog/modules/swift/build.xml:247: >> Java >>>>> returned: 1 >>>>> >>>>> and here is the run error I receive once I compile on a different >> machine: >>>>> Failed to transfer wrapper log from >>>>> unrectified-20101003-1339-voon0t62/info/l on pads >>>>> Execution failed: >>>>> Failed to transfer wrapper log from >>>>> unrectified-20101003-1339-voon0t62/info/8 on pads >>>>> Exception in mProject: >>>>> Arguments: [-X, raw_dir/2mass-atlas-000713s-j0760245.fits, >>>>> proj_dir/proj_2mass-atlas-000713s-j0760245.fits, header.hdr] >>>>> Host: pads >>>>> Directory: >> unrectified-20101003-1339-voon0t62/jobs/l/mProject-lvknimzj >>>>> stderr.txt: >>>>> >>>>> stdout.txt: >>>>> >>>>> ---- >>>>> >>>>> Caused by: >>>>> Task failed: >>>>> >> org.globus.cog.abstraction.impl.scheduler.common.ProcessException: >>>>> Exitcode file >>>>> >> (/home/jonmon/.globus/scripts/PBS6388672747278247642.submit.exitcode) >>>>> not found 5 queue polls after the job was reported done >>>>> at >>>>> >> org.globus.cog.abstraction.impl.scheduler.common.Job.close(Job.java:66) >>>>> at >>>>> >> org.globus.cog.abstraction.impl.scheduler.common.Job.setState(Job.java:177) >>>>> at >>>>> >> org.globus.cog.abstraction.impl.scheduler.pbs.QueuePoller.processStdout(QueuePoller.java:126) >>>>> at >>>>> >> org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.pollQueue(AbstractQueuePoller.java:169) >>>>> at >>>>> >> org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.run(AbstractQueuePoller.java:82) >>>>> at java.lang.Thread.run(Thread.java:619) >>>>> >>>>> The run directory with the files needed to execute and log files >> is in >>>>> ~jonmon/Workspace/Swift/Montage/katz_slides_test/run.0001 >>>>> >>>>> On 10/3/10 1:30 PM, Mihael Hategan wrote: >>>>>> On Sun, 2010-10-03 at 13:27 -0500, Jonathan Monette wrote: >>>>>>> Hello, >>>>>>> Anyone having a problem using Swift on PADS? I updated >> Swift and >>>>>>> cog to the most recent from trunk and now I cannot compile >> Swift on >>>>>>> PADS. >>>>>> I made some recent commits which might be the cause. But I need >> specific >>>>>> errors. >>>>>> >>>>>> Mihael >>>>>> >>>>>>> I have to use bridled or another ci machine that shares >> the >>>>>>> filesystem and compile there. I then come back to PADS to >> execute my >>>>>>> swift script and get all sorts of errors. Is anyone >> experiencing >>>>>>> similar problems when using PADS? >>>>>>> >>>>> -- >>>>> Jon >>>>> >>>>> Computers are incredibly fast, accurate, and stupid. Human beings >> are incredibly slow, inaccurate, and brilliant. Together they are >> powerful beyond imagination. >>>>> - Albert Einstein >>>>> >>> -- >>> Jon >>> >>> Computers are incredibly fast, accurate, and stupid. Human beings >> are incredibly slow, inaccurate, and brilliant. Together they are >> powerful beyond imagination. >>> - Albert Einstein >>> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From aespinosa at cs.uchicago.edu Mon Oct 4 12:42:34 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 4 Oct 2010 12:42:34 -0500 Subject: [Swift-user] passive worker job assignments Message-ID: Hi, When coasters are in passive mode, how many jobs are assigned to the workers? Is it still decided by the *Overallocation parameters? Thanks, -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wilde at mcs.anl.gov Mon Oct 4 12:47:05 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 4 Oct 2010 11:47:05 -0600 (GMT-06:00) Subject: [Swift-user] passive worker job assignments In-Reply-To: Message-ID: <1832826780.660041286214425690.JavaMail.root@zimbra.anl.gov> Do you mean "how many jobs will each worker run concurrently"? I think that is workersPerNode, which is still in effect in passive mode as far as I know. So I would think you start one worker per compute node, and workersPerNode (which should really be renamed concurrentJobsPerWorker) determines how many jobs each worker will concurrently launch. Thats my understanding; Mihael and Justin should correct or confirm this. - Mike ----- "Allan Espinosa" wrote: > Hi, > > When coasters are in passive mode, how many jobs are assigned to the > workers? Is it still decided by the *Overallocation parameters? > > Thanks, > -Allan > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Mon Oct 4 12:58:37 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 4 Oct 2010 12:58:37 -0500 Subject: [Swift-user] passive worker job assignments In-Reply-To: <1832826780.660041286214425690.JavaMail.root@zimbra.anl.gov> References: <1832826780.660041286214425690.JavaMail.root@zimbra.anl.gov> Message-ID: No, I meant how many will it queue to that specific worker. By default I know it is some factor of workersPerNode * Overallocation. 2010/10/4 Michael Wilde : > Do you mean "how many jobs will each worker run concurrently"? I think that is workersPerNode, which is still in effect in passive mode as far as I know. > > So I would think you start one worker per compute node, and workersPerNode (which should really be renamed concurrentJobsPerWorker) determines how many jobs each worker will concurrently launch. > > Thats my understanding; Mihael and Justin should correct or confirm this. > > - Mike > > > ----- "Allan Espinosa" wrote: > >> Hi, >> >> When coasters are in passive mode, how many jobs are assigned to the >> workers? ?Is it still decided by the *Overallocation parameters? >> >> Thanks, >> -Allan From aespinosa at cs.uchicago.edu Mon Oct 4 13:21:36 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 4 Oct 2010 13:21:36 -0500 Subject: [Swift-user] multisite passive workers Message-ID: Hi, I noticed that there is only one callback URI when I have multiple sites with a passive worker configuration. How does the coaster service know which site the registering worker belong to? My test jobs maybe failing because the site swift initializes for a particular worker maybe the wrong one. thanks, -Allan From jon.monette at gmail.com Mon Oct 4 15:14:39 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Mon, 04 Oct 2010 15:14:39 -0500 Subject: [Swift-user] slots in sites.xml Message-ID: <4CAA35AF.2000107@gmail.com> Hello, In the sites entry below I set slots to be 100. I thought this would have coasters start 100 workers submitted to the fast queue. Is this the proper way to ask for 100 workers on PADS. When I check all jobs submitted to PADS it tell me I have 51. That is queue jobs + running jobs. Is my site entry wrong? pool handle="pads"> 3600 192.5.86.6 1 100 1 1 fast 1 10000 /gpfs/pads/swift/jonmon/Swift/work/pads -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From hategan at mcs.anl.gov Mon Oct 4 15:17:02 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 04 Oct 2010 13:17:02 -0700 Subject: [Swift-user] passive worker job assignments In-Reply-To: References: Message-ID: <1286223422.8333.0.camel@blabla2.none> On Mon, 2010-10-04 at 12:42 -0500, Allan Espinosa wrote: > Hi, > > When coasters are in passive mode, how many jobs are assigned to the > workers? Is it still decided by the *Overallocation parameters? No. But I think workersPerNode is considered. > > Thanks, > -Allan > From hategan at mcs.anl.gov Mon Oct 4 15:25:18 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 04 Oct 2010 13:25:18 -0700 Subject: [Swift-user] passive worker job assignments In-Reply-To: References: <1832826780.660041286214425690.JavaMail.root@zimbra.anl.gov> Message-ID: <1286223918.8333.9.camel@blabla2.none> On Mon, 2010-10-04 at 12:58 -0500, Allan Espinosa wrote: > No, I meant how many will it queue to that specific worker. By > default I know it is some factor of workersPerNode * Overallocation. That type of queuing is only for block allocation. It will do whatever normally happens when you have running blocks, which in this case will be to run the largest jobs first. From hategan at mcs.anl.gov Mon Oct 4 15:27:09 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 04 Oct 2010 13:27:09 -0700 Subject: [Swift-user] slots in sites.xml In-Reply-To: <4CAA35AF.2000107@gmail.com> References: <4CAA35AF.2000107@gmail.com> Message-ID: <1286224029.8333.11.camel@blabla2.none> Slot only says what the maximum number of jobs can be. The allocationStepSize determines which fraction of those will be allocated in one step. But essentially you want to leave some slots for future jobs. On Mon, 2010-10-04 at 15:14 -0500, Jonathan Monette wrote: > Hello, > In the sites entry below I set slots to be 100. I thought this > would have coasters start 100 workers submitted to the fast queue. Is > this the proper way to ask for 100 workers on PADS. When I check all > jobs submitted to PADS it tell me I have 51. That is queue jobs + > running jobs. Is my site entry wrong? > pool handle="pads"> > > > 3600 > 192.5.86.6 > 1 > 100 > 1 > 1 > fast > 1 > 10000 > /gpfs/pads/swift/jonmon/Swift/work/pads > > From hategan at mcs.anl.gov Mon Oct 4 15:29:10 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 04 Oct 2010 13:29:10 -0700 Subject: [Swift-user] multisite passive workers In-Reply-To: References: Message-ID: <1286224150.8612.1.camel@blabla2.none> On Mon, 2010-10-04 at 13:21 -0500, Allan Espinosa wrote: > Hi, > > I noticed that there is only one callback URI when I have multiple > sites with a passive worker configuration. How does the coaster > service know which site the registering worker belong to? You should get one URL for each service, unless there is some failure in starting a service. > > My test jobs maybe failing because the site swift initializes for a > particular worker maybe the wrong one. > > thanks, > -Allan > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From jon.monette at gmail.com Mon Oct 4 15:46:56 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Mon, 04 Oct 2010 15:46:56 -0500 Subject: [Swift-user] Argument list to long Message-ID: <4CAA3D40.6040001@gmail.com> Hello, What does this error mean? Failed to transfer wrapper log from unrectified-20101004-1432-k89q8d3b/info/y on localhost Execution failed: Exception in mImgtbl: Arguments: [proj_dir, images.tbl] Host: localhost Directory: unrectified-20101004-1432-k89q8d3b/jobs/y/mImgtbl-yuqc8ozj stderr.txt: stdout.txt: ---- Caused by: Cannot run program "/bin/bash" (in directory "/gpfs/pads/swift/jonmon/Swift/work/localhost/unrectified-20101004-1432-k89q8d3b"): java.io.IOException: error=7, Argument list too long When it says that it cannot run /bin/bash, is it talking about _swiftwrap? Also, what does Argument list too long indicate? It shows that the arguments to the app mImgtbl are proj_dir and images.tbl. That is the correct number of arguments to mImgtbl. proj_dir is a directory that contains about 4100 files. Is there wrapper script in swift to mImgtbl that cannot accept this many files? -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From aespinosa at cs.uchicago.edu Mon Oct 4 16:10:54 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 4 Oct 2010 16:10:54 -0500 Subject: [Swift-user] multisite passive workers In-Reply-To: <1286224150.8612.1.camel@blabla2.none> References: <1286224150.8612.1.camel@blabla2.none> Message-ID: Ahh... I'm just getting one URL service . Is it because I have the same url for each site? With this, for each entry, should I specify a port manually like url="https://communicado.ci.uchicago.edu:1984" , url="https://communicado.ci.uchicago.edu:1985" ... ? swift-r3649 cog-r2890 RunID: osg Progress: Progress: Initializing:1 Selecting site:817 Progress: Selecting site:1273 Initializing site shared directory:27 Stage in:1 Passive queue processor initialized. Callback URI is http://128.135.125.17:50003 Progress: Selecting site:1100 Initializing site shared directory:24 Stage in:9 Submitted:168 Progress: Selecting site:1058 Initializing site shared directory:23 Stage in:4 Submitted:216 Progress: Selecting site:1058 Initializing site shared directory:23 Stage in:4 Submitted:216 Progress: Selecting site:1041 Initializing site shared directory:23 Stage in:21 Submitted:216 site entries; passive 20.0 2.7 36 /nfs/data/osg_store/engage-scec/swift_scratch passive 20.0 0.44 36 /osgremote/osg_data/engage-scec/swift_scratch passive 20.0 68.7 36 /uscms_grid/data/engage-scec/swift_scratch passive 20.0 9.46 36 /cluster/grid/data/engage-scec/swift_scratch passive 20.0 0.22 36 /osg/storage/data/engage-scec/swift_scratch 2010/10/4 Mihael Hategan : > On Mon, 2010-10-04 at 13:21 -0500, Allan Espinosa wrote: >> Hi, >> >> I noticed that there is only one callback URI when I have multiple >> sites with a passive worker configuration. ?How does the coaster >> service know which site the registering worker belong to? > > You should get one URL for each service, unless there is some failure in > starting a service. > >> >> My test jobs maybe failing because the site swift initializes for a >> particular worker maybe the wrong one. >> >> thanks, >> -Allan From wilde at mcs.anl.gov Mon Oct 4 16:22:45 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 4 Oct 2010 15:22:45 -0600 (GMT-06:00) Subject: [Swift-user] Argument list to long In-Reply-To: <4CAA3D40.6040001@gmail.com> Message-ID: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> My guess is that you are calling one of your reduction jobs passing in a lot of files on the command line. Is that the case for the failing app? If so, you may be able to use the writeData() built in function to create a file containing a list of files names for say the input array, and pass just that one file on the command line. You'll still need to make sure that all the data dependencies are specified correctly and that all the required input files are arguments to the swift app() function for the reduction application, so that they get made available. This technique was proposed by Ben, but has not to my knowledge been heavily tested yet. So we may find more gotchas eg related to passing the IF or OF input and output file lists. - Mike ----- "Jonathan Monette" wrote: > Hello, > What does this error mean? > > Failed to transfer wrapper log from > unrectified-20101004-1432-k89q8d3b/info/y on localhost > Execution failed: > Exception in mImgtbl: > Arguments: [proj_dir, images.tbl] > Host: localhost > Directory: unrectified-20101004-1432-k89q8d3b/jobs/y/mImgtbl-yuqc8ozj > stderr.txt: > > stdout.txt: > > ---- > > Caused by: > Cannot run program "/bin/bash" (in directory > "/gpfs/pads/swift/jonmon/Swift/work/localhost/unrectified-20101004-1432-k89q8d3b"): > > java.io.IOException: error=7, Argument list too long > > When it says that it cannot run /bin/bash, is it talking about > _swiftwrap? Also, what does Argument list too long indicate? It > shows > that the arguments to the app mImgtbl are proj_dir and images.tbl. > That > is the correct number of arguments to mImgtbl. proj_dir is a > directory > that contains about 4100 files. Is there wrapper script in swift to > mImgtbl that cannot accept this many files? > > -- > Jon > > Computers are incredibly fast, accurate, and stupid. Human beings are > incredibly slow, inaccurate, and brilliant. Together they are powerful > beyond imagination. > - Albert Einstein > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Mon Oct 4 16:23:04 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 04 Oct 2010 14:23:04 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <4CAA3D40.6040001@gmail.com> References: <4CAA3D40.6040001@gmail.com> Message-ID: <1286227384.8957.0.camel@blabla2.none> Post log (i.e. full stack trace). On Mon, 2010-10-04 at 15:46 -0500, Jonathan Monette wrote: > Hello, > What does this error mean? > > Failed to transfer wrapper log from > unrectified-20101004-1432-k89q8d3b/info/y on localhost > Execution failed: > Exception in mImgtbl: > Arguments: [proj_dir, images.tbl] > Host: localhost > Directory: unrectified-20101004-1432-k89q8d3b/jobs/y/mImgtbl-yuqc8ozj > stderr.txt: > > stdout.txt: > > ---- > > Caused by: > Cannot run program "/bin/bash" (in directory > "/gpfs/pads/swift/jonmon/Swift/work/localhost/unrectified-20101004-1432-k89q8d3b"): > java.io.IOException: error=7, Argument list too long > > When it says that it cannot run /bin/bash, is it talking about > _swiftwrap? Also, what does Argument list too long indicate? It shows > that the arguments to the app mImgtbl are proj_dir and images.tbl. That > is the correct number of arguments to mImgtbl. proj_dir is a directory > that contains about 4100 files. Is there wrapper script in swift to > mImgtbl that cannot accept this many files? > From hategan at mcs.anl.gov Mon Oct 4 16:24:47 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 04 Oct 2010 14:24:47 -0700 Subject: [Swift-user] multisite passive workers In-Reply-To: References: <1286224150.8612.1.camel@blabla2.none> Message-ID: <1286227487.8957.2.camel@blabla2.none> On Mon, 2010-10-04 at 16:10 -0500, Allan Espinosa wrote: > Ahh... > > I'm just getting one URL service . Is it because I have the same url > for each site? Pretty much. Why do you want to have different sites? Just start one coaster service on localhost and connect workers from wherever. > > With this, for each entry, should I specify a port manually like > url="https://communicado.ci.uchicago.edu:1984" , > url="https://communicado.ci.uchicago.edu:1985" ... ? > > swift-r3649 cog-r2890 > > RunID: osg > Progress: > Progress: Initializing:1 Selecting site:817 > Progress: Selecting site:1273 Initializing site shared directory:27 > Stage in:1 > Passive queue processor initialized. Callback URI is http://128.135.125.17:50003 > Progress: Selecting site:1100 Initializing site shared directory:24 > Stage in:9 Submitted:168 > Progress: Selecting site:1058 Initializing site shared directory:23 > Stage in:4 Submitted:216 > Progress: Selecting site:1058 Initializing site shared directory:23 > Stage in:4 Submitted:216 > Progress: Selecting site:1041 Initializing site shared directory:23 > Stage in:21 Submitted:216 > > > site entries; > > jobmanager="local:local" /> > > passive > > 20.0 > 2.7 > > 36 > > > /nfs/data/osg_store/engage-scec/swift_scratch > > > > jobmanager="local:local" /> > > passive > > 20.0 > 0.44 > > 36 > > > /osgremote/osg_data/engage-scec/swift_scratch > > > > jobmanager="local:local" /> > > passive > > 20.0 > 68.7 > > 36 > > > /uscms_grid/data/engage-scec/swift_scratch > > > > jobmanager="local:local" /> > > passive > > 20.0 > 9.46 > > 36 > > > /cluster/grid/data/engage-scec/swift_scratch > > > > jobmanager="local:local" /> > > passive > > 20.0 > 0.22 > > 36 > > > /osg/storage/data/engage-scec/swift_scratch > > > > 2010/10/4 Mihael Hategan : > > On Mon, 2010-10-04 at 13:21 -0500, Allan Espinosa wrote: > >> Hi, > >> > >> I noticed that there is only one callback URI when I have multiple > >> sites with a passive worker configuration. How does the coaster > >> service know which site the registering worker belong to? > > > > You should get one URL for each service, unless there is some failure in > > starting a service. > > > >> > >> My test jobs maybe failing because the site swift initializes for a > >> particular worker maybe the wrong one. > >> > >> thanks, > >> -Allan > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From aespinosa at cs.uchicago.edu Mon Oct 4 16:30:28 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 4 Oct 2010 16:30:28 -0500 Subject: [Swift-user] multisite passive workers In-Reply-To: <1286227487.8957.2.camel@blabla2.none> References: <1286224150.8612.1.camel@blabla2.none> <1286227487.8957.2.camel@blabla2.none> Message-ID: I was using multiply pool entries because each site has a different gridftp url. But with provider staging I no longer need that? -Allan 2010/10/4 Mihael Hategan : > On Mon, 2010-10-04 at 16:10 -0500, Allan Espinosa wrote: >> Ahh... >> >> I'm just getting one URL service . ?Is it because I have the same url >> for each site? > > Pretty much. Why do you want to have different sites? Just start one > coaster service on localhost and connect workers from wherever. > >> >> With this, for each entry, should I specify a port manually like >> url="https://communicado.ci.uchicago.edu:1984" , >> url="https://communicado.ci.uchicago.edu:1985" ... ? >> >> swift-r3649 cog-r2890 >> >> RunID: osg >> Progress: >> Progress: ?Initializing:1 ?Selecting site:817 >> Progress: ?Selecting site:1273 ?Initializing site shared directory:27 >> Stage in:1 >> Passive queue processor initialized. Callback URI is http://128.135.125.17:50003 >> Progress: ?Selecting site:1100 ?Initializing site shared directory:24 >> Stage in:9 ?Submitted:168 >> Progress: ?Selecting site:1058 ?Initializing site shared directory:23 >> Stage in:4 ?Submitted:216 >> Progress: ?Selecting site:1058 ?Initializing site shared directory:23 >> Stage in:4 ?Submitted:216 >> Progress: ?Selecting site:1041 ?Initializing site shared directory:23 >> Stage in:21 ?Submitted:216 >> >> >> site entries; >> ? >> ? ? > ? ? ? ? jobmanager="local:local" /> >> >> ? ? passive >> >> ? ? 20.0 >> ? ? 2.7 >> >> ? ? 36 >> >> ? ? >> ? ? /nfs/data/osg_store/engage-scec/swift_scratch >> ? >> >> ? >> ? ? > ? ? ? ? jobmanager="local:local" /> >> >> ? ? passive >> >> ? ? 20.0 >> ? ? 0.44 >> >> ? ? 36 >> >> ? ? >> ? ? /osgremote/osg_data/engage-scec/swift_scratch >> ? >> >> ? >> ? ? > ? ? ? ? jobmanager="local:local" /> >> >> ? ? passive >> >> ? ? 20.0 >> ? ? 68.7 >> >> ? ? 36 >> >> ? ? >> ? ? /uscms_grid/data/engage-scec/swift_scratch >> ? >> >> ? >> ? ? > ? ? ? ? jobmanager="local:local" /> >> >> ? ? passive >> >> ? ? 20.0 >> ? ? 9.46 >> >> ? ? 36 >> >> ? ? >> ? ? /cluster/grid/data/engage-scec/swift_scratch >> ? >> >> ? >> ? ? > ? ? ? ? jobmanager="local:local" /> >> >> ? ? passive >> >> ? ? 20.0 >> ? ? 0.22 >> >> ? ? 36 >> >> ? ? >> ? ? /osg/storage/data/engage-scec/swift_scratch >> ? >> >> >> 2010/10/4 Mihael Hategan : >> > On Mon, 2010-10-04 at 13:21 -0500, Allan Espinosa wrote: >> >> Hi, >> >> >> >> I noticed that there is only one callback URI when I have multiple >> >> sites with a passive worker configuration. ?How does the coaster >> >> service know which site the registering worker belong to? >> > >> > You should get one URL for each service, unless there is some failure in >> > starting a service. >> > >> >> >> >> My test jobs maybe failing because the site swift initializes for a >> >> particular worker maybe the wrong one. >> >> >> >> thanks, >> >> -Allan From hategan at mcs.anl.gov Mon Oct 4 16:33:57 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 04 Oct 2010 14:33:57 -0700 Subject: [Swift-user] multisite passive workers In-Reply-To: References: <1286224150.8612.1.camel@blabla2.none> <1286227487.8957.2.camel@blabla2.none> Message-ID: <1286228037.10162.1.camel@blabla2.none> On Mon, 2010-10-04 at 16:30 -0500, Allan Espinosa wrote: > I was using multiply pool entries because each site has a different > gridftp url. > > But with provider staging I no longer need that? With provider staging you no longer need that. It should work though, even with multiple pools. I think. Are you sure the worker nodes on the problem machines are allowed to see the world? From aespinosa at cs.uchicago.edu Mon Oct 4 16:39:25 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 4 Oct 2010 16:39:25 -0500 Subject: [Swift-user] multisite passive workers In-Reply-To: <1286228037.10162.1.camel@blabla2.none> References: <1286224150.8612.1.camel@blabla2.none> <1286227487.8957.2.camel@blabla2.none> <1286228037.10162.1.camel@blabla2.none> Message-ID: The service do see the workers registering. So I think they see the world. OSG sites do support condor_glideins so they should have outbound connections. -Allan 2010/10/4 Mihael Hategan : > On Mon, 2010-10-04 at 16:30 -0500, Allan Espinosa wrote: >> I was using multiply pool entries because each site has a different >> gridftp url. >> >> But with provider staging I no longer need that? > > With provider staging you no longer need that. wohoho. :) > > It should work though, even with multiple pools. I think. Are you sure > the worker nodes on the problem machines are allowed to see the world? > From hategan at mcs.anl.gov Mon Oct 4 16:49:00 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 04 Oct 2010 14:49:00 -0700 Subject: [Swift-user] multisite passive workers In-Reply-To: References: <1286224150.8612.1.camel@blabla2.none> <1286227487.8957.2.camel@blabla2.none> <1286228037.10162.1.camel@blabla2.none> Message-ID: <1286228940.10296.2.camel@blabla2.none> On Mon, 2010-10-04 at 16:39 -0500, Allan Espinosa wrote: > The service do see the workers registering. So I think they see the > world. OSG sites do support condor_glideins so they should have > outbound connections. So what goes wrong then? From aespinosa at cs.uchicago.edu Mon Oct 4 17:46:56 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 4 Oct 2010 17:46:56 -0500 Subject: [Swift-user] multisite passive workers In-Reply-To: <1286228940.10296.2.camel@blabla2.none> References: <1286224150.8612.1.camel@blabla2.none> <1286227487.8957.2.camel@blabla2.none> <1286228037.10162.1.camel@blabla2.none> <1286228940.10296.2.camel@blabla2.none> Message-ID: There was a mismatch in the file and job operations. The job thinks it is in site A but the worker that it was sent to was site B. 2010/10/4 Mihael Hategan : > On Mon, 2010-10-04 at 16:39 -0500, Allan Espinosa wrote: >> The service do see the workers registering. ?So I think they see the >> world. ?OSG sites do support condor_glideins so they should have >> outbound connections. > > So what goes wrong then? From jon.monette at gmail.com Mon Oct 4 18:39:40 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Mon, 04 Oct 2010 18:39:40 -0500 Subject: [Swift-user] Argument list to long In-Reply-To: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> Message-ID: <4CAA65BC.9090507@gmail.com> Yes. I have to make sure that all 4100 files are created before the failing app can execute. The writeData() way seems like a hack to get around the problem is Swift but I will try it out and see if my script completes. Mihael: pasted below is the stack trace that was generated in the log file. 2010-10-04 15:32:29,988-0500 WARN vdl:transferwrapperlog Failed to transfer wrapper log from unrectified-20101004-1432-k89q8d3b/info/y on localhost 2010-10-04 15:32:29,991-0500 INFO vdl:execute END_FAILURE thread=0-3 tr=mImgtbl 2010-10-04 15:32:29,996-0500 DEBUG VDL2ExecutionContext sys:throw @ execute-default.k, line: 45: Exception in mImgtbl: Arguments: [proj_dir, images.tbl] Host: localhost Directory: unrectified-20101004-1432-k89q8d3b/jobs/y/mImgtbl-yuqc8ozj stderr.txt: stdout.txt: ---- Exception in mImgtbl: Arguments: [proj_dir, images.tbl] Host: localhost Directory: unrectified-20101004-1432-k89q8d3b/jobs/y/mImgtbl-yuqc8ozj stderr.txt: stdout.txt: ---- Caused by: Cannot run program "/bin/bash" (in directory "/gpfs/pads/swift/jonmon/Swift/work/localhost/unrectified-20101004-1432-k89q8d3b"): java.io.IOException: error=7, Argument list too long Caused by: java.io.IOException: Cannot run program "/bin/bash" (in directory "/gpfs/pads/swift/jonmon/Swift/work/localhost/unrectified-20101004-1432-k89q8d3b"): java.io.IOException: error=7, Argument list too long Caused by: java.io.IOException: java.io.IOException: error=7, Argument list too long at org.globus.cog.karajan.workflow.nodes.functions.KException.function(KException.java:29) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:27) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:50) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:26) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:238) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:289) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:402) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:343) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:230) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:44) at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) at java.lang.Thread.run(Thread.java:619) Caused by: Cannot run program "/bin/bash" (in directory "/gpfs/pads/swift/jonmon/Swift/work/localhost/unrectified-20101004-1432-k89q8d3b"): java.io.IOException: error=7, Argument list too long Caused by: java.io.IOException: Cannot run program "/bin/bash" (in directory "/gpfs/pads/swift/jonmon/Swift/work/localhost/unrectified-20101004-1432-k89q8d3b"): java.io.IOException: error=7, Argument list too long Caused by: java.io.IOException: java.io.IOException: error=7, Argument list too long at org.globus.cog.karajan.workflow.events.FailureNotificationEvent.(FailureNotificationEvent.java:36) at org.globus.cog.karajan.workflow.events.FailureNotificationEvent.(FailureNotificationEvent.java:42) at org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:153) at org.globus.cog.karajan.workflow.nodes.grid.GridExec.taskFailed(GridExec.java:373) at org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.statusChanged(AbstractGridNode.java:276) at org.griphyn.vdl.karajan.lib.Execute.statusChanged(Execute.java:127) at org.globus.cog.karajan.scheduler.AbstractScheduler.fireJobStatusChangeEvent(AbstractScheduler.java:168) at org.globus.cog.karajan.scheduler.LateBindingScheduler.statusChanged(LateBindingScheduler.java:675) at org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler.statusChanged(WeightedHostScoreScheduler.java:422) at org.griphyn.vdl.karajan.VDSAdaptiveScheduler.statusChanged(VDSAdaptiveScheduler.java:410) at org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:233) at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:223) at org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:54) at org.globus.cog.abstraction.impl.execution.local.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:220) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) ... 1 more Caused by: java.io.IOException: Cannot run program "/bin/bash" (in directory "/gpfs/pads/swift/jonmon/Swift/work/localhost/unrectified-20101004-1432-k89q8d3b"): java.io.IOException: error=7, Argument list too long at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) at java.lang.Runtime.exec(Runtime.java:593) at org.globus.cog.abstraction.impl.execution.local.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:161) ... 6 more Caused by: java.io.IOException: java.io.IOException: error=7, Argument list too long at java.lang.UNIXProcess.(UNIXProcess.java:148) at java.lang.ProcessImpl.start(ProcessImpl.java:65) at java.lang.ProcessBuilder.start(ProcessBuilder.java:453) ... 8 more 2010-10-04 15:32:30,145-0500 INFO TaskNotifier Congestion queue size: 0 2010-10-04 15:32:30,193-0500 INFO ExecutionContext Detailed exception: Exception in mImgtbl: Arguments: [proj_dir, images.tbl] Host: localhost Directory: unrectified-20101004-1432-k89q8d3b/jobs/y/mImgtbl-yuqc8ozj stderr.txt: stdout.txt: ---- Caused by: Cannot run program "/bin/bash" (in directory "/gpfs/pads/swift/jonmon/Swift/work/localhost/unrectified-20101004-1432-k89q8d3b"): java.io.IOException: error=7, Argument list too long Caused by: java.io.IOException: Cannot run program "/bin/bash" (in directory "/gpfs/pads/swift/jonmon/Swift/work/localhost/unrectified-20101004-1432-k89q8d3b"): java.io.IOException: error=7, Argument list too long Caused by: java.io.IOException: java.io.IOException: error=7, Argument list too long at org.globus.cog.karajan.workflow.nodes.functions.KException.function(KException.java:29) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:27) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:50) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:26) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:238) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:289) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:402) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:343) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:230) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:44) at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) at java.lang.Thread.run(Thread.java:619) Caused by: Cannot run program "/bin/bash" (in directory "/gpfs/pads/swift/jonmon/Swift/work/localhost/unrectified-20101004-1432-k89q8d3b"): java.io.IOException: error=7, Argument list too long Caused by: java.io.IOException: Cannot run program "/bin/bash" (in directory "/gpfs/pads/swift/jonmon/Swift/work/localhost/unrectified-20101004-1432-k89q8d3b"): java.io.IOException: error=7, Argument list too long Caused by: java.io.IOException: java.io.IOException: error=7, Argument list too long at org.globus.cog.karajan.workflow.events.FailureNotificationEvent.(FailureNotificationEvent.java:36) at org.globus.cog.karajan.workflow.events.FailureNotificationEvent.(FailureNotificationEvent.java:42) at org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:153) at org.globus.cog.karajan.workflow.nodes.grid.GridExec.taskFailed(GridExec.java:373) at org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.statusChanged(AbstractGridNode.java:276) at org.griphyn.vdl.karajan.lib.Execute.statusChanged(Execute.java:127) at org.globus.cog.karajan.scheduler.AbstractScheduler.fireJobStatusChangeEvent(AbstractScheduler.java:168) at org.globus.cog.karajan.scheduler.LateBindingScheduler.statusChanged(LateBindingScheduler.java:675) at org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler.statusChanged(WeightedHostScoreScheduler.java:422) at org.griphyn.vdl.karajan.VDSAdaptiveScheduler.statusChanged(VDSAdaptiveScheduler.java:410) at org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:233) at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:223) at org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:54) at org.globus.cog.abstraction.impl.execution.local.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:220) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) ... 1 more Caused by: java.io.IOException: Cannot run program "/bin/bash" (in directory "/gpfs/pads/swift/jonmon/Swift/work/localhost/unrectified-20101004-1432-k89q8d3b"): java.io.IOException: error=7, Argument list too long at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) at java.lang.Runtime.exec(Runtime.java:593) at org.globus.cog.abstraction.impl.execution.local.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:161) ... 6 more Caused by: java.io.IOException: java.io.IOException: error=7, Argument list too long at java.lang.UNIXProcess.(UNIXProcess.java:148) at java.lang.ProcessImpl.start(ProcessImpl.java:65) at java.lang.ProcessBuilder.start(ProcessBuilder.java:453) ... 8 more On 10/4/10 4:22 PM, Michael Wilde wrote: > My guess is that you are calling one of your reduction jobs passing in a lot of files on the command line. Is that the case for the failing app? > > If so, you may be able to use the writeData() built in function to create a file containing a list of files names for say the input array, and pass just that one file on the command line. You'll still need to make sure that all the data dependencies are specified correctly and that all the required input files are arguments to the swift app() function for the reduction application, so that they get made available. > > This technique was proposed by Ben, but has not to my knowledge been heavily tested yet. So we may find more gotchas eg related to passing the IF or OF input and output file lists. > > - Mike > > ----- "Jonathan Monette" wrote: > >> Hello, >> What does this error mean? >> >> Failed to transfer wrapper log from >> unrectified-20101004-1432-k89q8d3b/info/y on localhost >> Execution failed: >> Exception in mImgtbl: >> Arguments: [proj_dir, images.tbl] >> Host: localhost >> Directory: unrectified-20101004-1432-k89q8d3b/jobs/y/mImgtbl-yuqc8ozj >> stderr.txt: >> >> stdout.txt: >> >> ---- >> >> Caused by: >> Cannot run program "/bin/bash" (in directory >> "/gpfs/pads/swift/jonmon/Swift/work/localhost/unrectified-20101004-1432-k89q8d3b"): >> >> java.io.IOException: error=7, Argument list too long >> >> When it says that it cannot run /bin/bash, is it talking about >> _swiftwrap? Also, what does Argument list too long indicate? It >> shows >> that the arguments to the app mImgtbl are proj_dir and images.tbl. >> That >> is the correct number of arguments to mImgtbl. proj_dir is a >> directory >> that contains about 4100 files. Is there wrapper script in swift to >> mImgtbl that cannot accept this many files? >> >> -- >> Jon >> >> Computers are incredibly fast, accurate, and stupid. Human beings are >> incredibly slow, inaccurate, and brilliant. Together they are powerful >> beyond imagination. >> - Albert Einstein >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From hategan at mcs.anl.gov Mon Oct 4 18:46:06 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 04 Oct 2010 16:46:06 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <4CAA65BC.9090507@gmail.com> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> Message-ID: <1286235966.11966.4.camel@blabla2.none> On Mon, 2010-10-04 at 18:39 -0500, Jonathan Monette wrote: > Yes. I have to make sure that all 4100 files are created before the > failing app can execute. > > The writeData() way seems like a hack to get around the problem is Swift > but I will try it out and see if my script completes. > > Mihael: pasted below is the stack trace that was generated in the log file. Yeah. It's what Mike says. But it's not the app arguments. Instead, I'm guessing it's the input/output file lists. There was a scheme in non-provider-staging swift to pass these things in lists, but I'm guessing you are using provider staging. Perhaps some mode to automatically do this for large numbers of arguments is in order. Mihael From jon.monette at gmail.com Mon Oct 4 18:59:35 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Mon, 04 Oct 2010 18:59:35 -0500 Subject: [Swift-user] Argument list to long In-Reply-To: <1286235966.11966.4.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> Message-ID: <4CAA6A67.5070607@gmail.com> well if I am understanding the problem the input list will be about 4100 files and the output list will be a single file (unless Swift adds more input and output files). I do not think I am using provider staging though. In my swift.properties i do not set the use.provider.staging option and in etc/swift.properties use.provider.staging is set to false. On 10/4/10 6:46 PM, Mihael Hategan wrote: > On Mon, 2010-10-04 at 18:39 -0500, Jonathan Monette wrote: >> Yes. I have to make sure that all 4100 files are created before the >> failing app can execute. >> >> The writeData() way seems like a hack to get around the problem is Swift >> but I will try it out and see if my script completes. >> >> Mihael: pasted below is the stack trace that was generated in the log file. > Yeah. It's what Mike says. > > But it's not the app arguments. Instead, I'm guessing it's the > input/output file lists. > > There was a scheme in non-provider-staging swift to pass these things in > lists, but I'm guessing you are using provider staging. Perhaps some > mode to automatically do this for large numbers of arguments is in > order. > > Mihael > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From hategan at mcs.anl.gov Mon Oct 4 20:25:55 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 04 Oct 2010 18:25:55 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <4CAA6A67.5070607@gmail.com> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> Message-ID: <1286241955.12637.3.camel@blabla2.none> Groovy. Then set "wrapper.parameter.mode=files" in swift.properties. On Mon, 2010-10-04 at 18:59 -0500, Jonathan Monette wrote: > well if I am understanding the problem the input list will be about > 4100 files and the output list will be a single file (unless Swift adds > more input and output files). I do not think I am using provider > staging though. In my swift.properties i do not set the > use.provider.staging option and in etc/swift.properties > use.provider.staging is set to false. > > On 10/4/10 6:46 PM, Mihael Hategan wrote: > > On Mon, 2010-10-04 at 18:39 -0500, Jonathan Monette wrote: > >> Yes. I have to make sure that all 4100 files are created before the > >> failing app can execute. > >> > >> The writeData() way seems like a hack to get around the problem is Swift > >> but I will try it out and see if my script completes. > >> > >> Mihael: pasted below is the stack trace that was generated in the log file. > > Yeah. It's what Mike says. > > > > But it's not the app arguments. Instead, I'm guessing it's the > > input/output file lists. > > > > There was a scheme in non-provider-staging swift to pass these things in > > lists, but I'm guessing you are using provider staging. Perhaps some > > mode to automatically do this for large numbers of arguments is in > > order. > > > > Mihael > > > From jon.monette at gmail.com Mon Oct 4 21:39:18 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Mon, 04 Oct 2010 21:39:18 -0500 Subject: [Swift-user] Argument list to long In-Reply-To: <1286241955.12637.3.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> Message-ID: <4CAA8FD6.3080906@gmail.com> By setting "wrapper.parameter.mode=files" I get "failed to transfer wrapper logs". Here is my swift.properties file. execution.retries=0 sitedir.keep=true status.mode=provider //wrapper.log.always.transfer=true foreach.maxthreads=1024 wrapper.parameter.mode=files I have tried this with "wrapper.log.always.transfer=true" both commented and uncommented still get the same error. On 10/04/2010 08:25 PM, Mihael Hategan wrote: > Groovy. Then set "wrapper.parameter.mode=files" in swift.properties. > > On Mon, 2010-10-04 at 18:59 -0500, Jonathan Monette wrote: >> well if I am understanding the problem the input list will be about >> 4100 files and the output list will be a single file (unless Swift adds >> more input and output files). I do not think I am using provider >> staging though. In my swift.properties i do not set the >> use.provider.staging option and in etc/swift.properties >> use.provider.staging is set to false. >> >> On 10/4/10 6:46 PM, Mihael Hategan wrote: >>> On Mon, 2010-10-04 at 18:39 -0500, Jonathan Monette wrote: >>>> Yes. I have to make sure that all 4100 files are created before the >>>> failing app can execute. >>>> >>>> The writeData() way seems like a hack to get around the problem is Swift >>>> but I will try it out and see if my script completes. >>>> >>>> Mihael: pasted below is the stack trace that was generated in the log file. >>> Yeah. It's what Mike says. >>> >>> But it's not the app arguments. Instead, I'm guessing it's the >>> input/output file lists. >>> >>> There was a scheme in non-provider-staging swift to pass these things in >>> lists, but I'm guessing you are using provider staging. Perhaps some >>> mode to automatically do this for large numbers of arguments is in >>> order. >>> >>> Mihael >>> > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From hategan at mcs.anl.gov Mon Oct 4 21:44:52 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 04 Oct 2010 19:44:52 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <4CAA8FD6.3080906@gmail.com> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> Message-ID: <1286246692.13534.1.camel@blabla2.none> Ok. I believe that now you hit a bug in swift. Luckily there might be something we can do about that. May I have the log file? On Mon, 2010-10-04 at 21:39 -0500, Jonathan Monette wrote: > By setting "wrapper.parameter.mode=files" I get "failed to transfer > wrapper logs". Here is my swift.properties file. > > execution.retries=0 > sitedir.keep=true > status.mode=provider > //wrapper.log.always.transfer=true > foreach.maxthreads=1024 > wrapper.parameter.mode=files > > I have tried this with "wrapper.log.always.transfer=true" both commented > and uncommented still get the same error. > > On 10/04/2010 08:25 PM, Mihael Hategan wrote: > > Groovy. Then set "wrapper.parameter.mode=files" in swift.properties. > > > > On Mon, 2010-10-04 at 18:59 -0500, Jonathan Monette wrote: > >> well if I am understanding the problem the input list will be about > >> 4100 files and the output list will be a single file (unless Swift adds > >> more input and output files). I do not think I am using provider > >> staging though. In my swift.properties i do not set the > >> use.provider.staging option and in etc/swift.properties > >> use.provider.staging is set to false. > >> > >> On 10/4/10 6:46 PM, Mihael Hategan wrote: > >>> On Mon, 2010-10-04 at 18:39 -0500, Jonathan Monette wrote: > >>>> Yes. I have to make sure that all 4100 files are created before the > >>>> failing app can execute. > >>>> > >>>> The writeData() way seems like a hack to get around the problem is Swift > >>>> but I will try it out and see if my script completes. > >>>> > >>>> Mihael: pasted below is the stack trace that was generated in the log file. > >>> Yeah. It's what Mike says. > >>> > >>> But it's not the app arguments. Instead, I'm guessing it's the > >>> input/output file lists. > >>> > >>> There was a scheme in non-provider-staging swift to pass these things in > >>> lists, but I'm guessing you are using provider staging. Perhaps some > >>> mode to automatically do this for large numbers of arguments is in > >>> order. > >>> > >>> Mihael > >>> > > > From jon.monette at gmail.com Mon Oct 4 21:48:58 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Mon, 04 Oct 2010 21:48:58 -0500 Subject: [Swift-user] Argument list to long In-Reply-To: <1286246692.13534.1.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> Message-ID: <4CAA921A.4070907@gmail.com> Sure can. I have two log files in my home directory on the ci machines. ~jonmon One is rather large(20M) and that is the one with "wrapper.paramter.mode=files" not set. The other is about 4.7M and that one has "wrapper.parameter.mode=files" set. You should have permissions to read them. On 10/04/2010 09:44 PM, Mihael Hategan wrote: > Ok. I believe that now you hit a bug in swift. Luckily there might be > something we can do about that. > May I have the log file? > > On Mon, 2010-10-04 at 21:39 -0500, Jonathan Monette wrote: >> By setting "wrapper.parameter.mode=files" I get "failed to transfer >> wrapper logs". Here is my swift.properties file. >> >> execution.retries=0 >> sitedir.keep=true >> status.mode=provider >> //wrapper.log.always.transfer=true >> foreach.maxthreads=1024 >> wrapper.parameter.mode=files >> >> I have tried this with "wrapper.log.always.transfer=true" both commented >> and uncommented still get the same error. >> >> On 10/04/2010 08:25 PM, Mihael Hategan wrote: >>> Groovy. Then set "wrapper.parameter.mode=files" in swift.properties. >>> >>> On Mon, 2010-10-04 at 18:59 -0500, Jonathan Monette wrote: >>>> well if I am understanding the problem the input list will be about >>>> 4100 files and the output list will be a single file (unless Swift adds >>>> more input and output files). I do not think I am using provider >>>> staging though. In my swift.properties i do not set the >>>> use.provider.staging option and in etc/swift.properties >>>> use.provider.staging is set to false. >>>> >>>> On 10/4/10 6:46 PM, Mihael Hategan wrote: >>>>> On Mon, 2010-10-04 at 18:39 -0500, Jonathan Monette wrote: >>>>>> Yes. I have to make sure that all 4100 files are created before the >>>>>> failing app can execute. >>>>>> >>>>>> The writeData() way seems like a hack to get around the problem is Swift >>>>>> but I will try it out and see if my script completes. >>>>>> >>>>>> Mihael: pasted below is the stack trace that was generated in the log file. >>>>> Yeah. It's what Mike says. >>>>> >>>>> But it's not the app arguments. Instead, I'm guessing it's the >>>>> input/output file lists. >>>>> >>>>> There was a scheme in non-provider-staging swift to pass these things in >>>>> lists, but I'm guessing you are using provider staging. Perhaps some >>>>> mode to automatically do this for large numbers of arguments is in >>>>> order. >>>>> >>>>> Mihael >>>>> > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From aespinosa at cs.uchicago.edu Tue Oct 5 14:38:40 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 5 Oct 2010 14:38:40 -0500 Subject: [Swift-user] propagating the properties channel to outside the scheduler. Message-ID: Hi, I'm writing this OSG site tester script that submits condor-g jobs. It seems that the property elements are not being used in my task:execute() call. Here's the script: import("task.k") import("sys.k") element(pool, [handle, ..., optional(workdir), channel(properties)] host(name = handle each(...) to(properties each(properties) ) ) ) element(servicelist, [type, provider, url] service(type, provider=provider, url=url) ) element(gridftp, [url, optional(storage), optional(major), optional(minor), optional(patch)] if( url == "local://localhost" servicelist("file", "local", "") servicelist("file", "gsiftp", url) ) ) element(execution, [provider, url] servicelist(type="execution", provider=provider, url=url) ) element(filesystem, [provider, url, optional(storage)] servicelist(type="file", provider=provider, url=url) ) element(profile, [namespace, key, value] if( namespace == "karajan" property("{key}", value) property("{namespace}:{key}", value) ) ) element(workdirectory, [dir] property("workdir", dir) ) sitesFile := "condor_osg.xml" sites := list(executeFile(sitesFile)) for(site, sites print(site) task:execute("/bin/hostname", stdout="file:///home/aespinosa/workflows/pool_coaster/site_test/{site}", provider="condor", host=site) ) sample generated condor submit file: $ cat *.submit universe = vanilla output = file:///home/aespinosa/workflows/pool_coaster/site_test/BNL-ATLAS error = /home/aespinosa/.globus/scripts/Condor8392886088313119280.submit.stderr executable = /bin/hostname notification = Never leave_in_queue = TRUE queue a pool entry: grid gt2 gridgk02.racf.bnl.gov/jobmanager-condor 20.0 0.95 /usatlas/prodjob/share/engage-scec/swift_scratch -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wilde at mcs.anl.gov Tue Oct 5 15:01:24 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Tue, 5 Oct 2010 14:01:24 -0600 (GMT-06:00) Subject: [Swift-user] propagating the properties channel to outside the scheduler. In-Reply-To: <620992191.725581286308368302.JavaMail.root@zimbra.anl.gov> Message-ID: <1307756183.726371286308884942.JavaMail.root@zimbra.anl.gov> Allan, while you are debugging this, it would also be good to do a full end-to-end site testing in Swift, using a simple cat script as we discussed. One ugly but effective way to do this is to run one cat job per site by defining say identical cat apps cat01 through catN (where N is the number of sites to test), and then dynamically create a tc.data file that maps each catNN app to a specific Grid site. So your script would, on OSG for example, need to run swift-osg-ress, and from that, create the tc.data file and a testosg.swift file. Then set swift.properties for the desired level (perhaps 0) of retries etc, eg: sitedir.keep=true execution.retries=0 lazy.errors=false Then let this run for as long as it takes for most of the jobs to either run or fail, and likely, a few to hang waiting in queues or Condor-G retry/hold states. The Karajan script is a lower level test that is likely useful as well for diagnostics, but which doesnt replace a full Swift end-to-end test. - Mike ----- "Allan Espinosa" wrote: > Hi, > > I'm writing this OSG site tester script that submits condor-g jobs. > It seems that the property elements are not being used in my > task:execute() call. > > Here's the script: > > import("task.k") > import("sys.k") > > element(pool, [handle, ..., optional(workdir), channel(properties)] > host(name = handle > each(...) > to(properties > each(properties) > ) > ) > ) > > element(servicelist, [type, provider, url] > service(type, provider=provider, url=url) > ) > > element(gridftp, [url, optional(storage), optional(major), > optional(minor), optional(patch)] > if( > url == "local://localhost" > servicelist("file", "local", "") > servicelist("file", "gsiftp", url) > ) > ) > > element(execution, [provider, url] > servicelist(type="execution", provider=provider, url=url) > ) > > element(filesystem, [provider, url, optional(storage)] > servicelist(type="file", provider=provider, url=url) > ) > > element(profile, [namespace, key, value] > if( > namespace == "karajan" > property("{key}", value) > property("{namespace}:{key}", value) > ) > ) > > element(workdirectory, [dir] > property("workdir", dir) > ) > > sitesFile := "condor_osg.xml" > sites := list(executeFile(sitesFile)) > > for(site, sites > print(site) > task:execute("/bin/hostname", > stdout="file:///home/aespinosa/workflows/pool_coaster/site_test/{site}", > provider="condor", host=site) > ) > > > sample generated condor submit file: > $ cat *.submit > universe = vanilla > output = > file:///home/aespinosa/workflows/pool_coaster/site_test/BNL-ATLAS > error = > /home/aespinosa/.globus/scripts/Condor8392886088313119280.submit.stderr > > executable = /bin/hostname > > notification = Never > leave_in_queue = TRUE > queue > > > a pool entry: > > > > grid > gt2 > gridgk02.racf.bnl.gov/jobmanager-condor > > 20.0 > 0.95 > > > > /usatlas/prodjob/share/engage-scec/swift_scratch > > > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Tue Oct 5 15:10:41 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 5 Oct 2010 15:10:41 -0500 Subject: [Swift-user] propagating the properties channel to outside the scheduler. In-Reply-To: <1307756183.726371286308884942.JavaMail.root@zimbra.anl.gov> References: <620992191.725581286308368302.JavaMail.root@zimbra.anl.gov> <1307756183.726371286308884942.JavaMail.root@zimbra.anl.gov> Message-ID: Hi MIke, This makes sense as well. I'll go over this direction instead. -Allan 2010/10/5 : > Allan, while you are debugging this, it would also be good to do a full end-to-end site testing in Swift, using a simple cat script as we discussed. > > One ugly but effective way to do this is to run one cat job per site by defining say ?identical cat apps cat01 through catN (where N is the number of sites to test), and then dynamically create a tc.data file that maps each catNN app to a specific Grid site. > > So your script would, on OSG for example, need to run swift-osg-ress, and from that, create the tc.data file and a testosg.swift file. > > Then set swift.properties for the desired level (perhaps 0) of retries etc, eg: > sitedir.keep=true > execution.retries=0 > lazy.errors=false > > Then let this run for as long as it takes for most of the jobs to either run or fail, and likely, a few to hang waiting in queues or Condor-G retry/hold states. > > The Karajan script is a lower level test that is likely useful as well for diagnostics, but which doesnt replace a full Swift end-to-end test. > > - Mike > > > ----- "Allan Espinosa" wrote: > >> Hi, >> >> I'm writing this OSG site tester script that submits condor-g jobs. >> It seems that the property elements are not being used in my >> task:execute() call. >> >> Here's the script: >> >> import("task.k") >> import("sys.k") >> >> element(pool, [handle, ..., optional(workdir), channel(properties)] >> ? host(name = handle >> ? ? each(...) >> ? ? to(properties >> ? ? ? each(properties) >> ? ? ) >> ? ) >> ) >> >> element(servicelist, [type, provider, url] >> ? service(type, provider=provider, url=url) >> ) >> >> element(gridftp, [url, optional(storage), optional(major), >> optional(minor), optional(patch)] >> ? if( >> ? ? url == "local://localhost" >> ? ? servicelist("file", "local", "") >> ? ? servicelist("file", "gsiftp", url) >> ? ) >> ) >> >> element(execution, [provider, url] >> ? servicelist(type="execution", provider=provider, url=url) >> ) >> >> element(filesystem, [provider, url, optional(storage)] >> ? servicelist(type="file", provider=provider, url=url) >> ) >> >> element(profile, [namespace, key, value] >> ? if( >> ? ? namespace == "karajan" >> ? ? property("{key}", value) >> ? ? property("{namespace}:{key}", value) >> ? ) >> ) >> >> element(workdirectory, [dir] >> ? property("workdir", dir) >> ) >> >> sitesFile := "condor_osg.xml" >> sites := list(executeFile(sitesFile)) >> >> for(site, sites >> ? print(site) >> ? task:execute("/bin/hostname", >> stdout="file:///home/aespinosa/workflows/pool_coaster/site_test/{site}", >> provider="condor", host=site) >> ) >> >> >> sample generated condor submit file: >> $ cat *.submit >> universe = vanilla >> output = >> file:///home/aespinosa/workflows/pool_coaster/site_test/BNL-ATLAS >> error = >> /home/aespinosa/.globus/scripts/Condor8392886088313119280.submit.stderr >> >> executable = /bin/hostname >> >> notification = Never >> leave_in_queue = TRUE >> queue >> >> >> a pool entry: >> ? >> ? ? >> >> ? ? grid >> ? ? gt2 >> gridgk02.racf.bnl.gov/jobmanager-condor >> >> ? ? 20.0 >> ? ? 0.95 >> >> ? ? >> >> /usatlas/prodjob/share/engage-scec/swift_scratch >> ? >> >> From wilde at mcs.anl.gov Wed Oct 6 09:43:23 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 6 Oct 2010 08:43:23 -0600 (GMT-06:00) Subject: [Swift-user] Proposal to deprecate clustering feature - Please tell us if you use it Message-ID: <1639696693.755281286376203407.JavaMail.root@zimbra.anl.gov> There has been a proposal to deprecate the Swift clustering feature some time in the next few months. The rationale is that it is largely superseded by the Coaster feature, and that it has received very little use, and that its presence complicates the code and increases the testing burden. No decision to go ahead with the removal of clustering has been made; I advocate keeping it if users use it and want to keep doing so. Please post a message here if you use it, and what your experience with it has been. Thanks, Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Wed Oct 6 10:52:11 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 6 Oct 2010 10:52:11 -0500 Subject: OSG site tester (was Re: [Swift-user] propagating the properties channel to outside the scheduler.) Message-ID: The cat script generator as suggested by Mike: http://gist.github.com/613551 2010/10/5 : > Allan, while you are debugging this, it would also be good to do a full end-to-end site testing in Swift, using a simple cat script as we discussed. > > One ugly but effective way to do this is to run one cat job per site by defining say ?identical cat apps cat01 through catN (where N is the number of sites to test), and then dynamically create a tc.data file that maps each catNN app to a specific Grid site. > > So your script would, on OSG for example, need to run swift-osg-ress, and from that, create the tc.data file and a testosg.swift file. > > Then set swift.properties for the desired level (perhaps 0) of retries etc, eg: > sitedir.keep=true > execution.retries=0 > lazy.errors=false > > Then let this run for as long as it takes for most of the jobs to either run or fail, and likely, a few to hang waiting in queues or Condor-G retry/hold states. > > The Karajan script is a lower level test that is likely useful as well for diagnostics, but which doesnt replace a full Swift end-to-end test. > > - Mike > > > ----- "Allan Espinosa" wrote: > >> Hi, >> >> I'm writing this OSG site tester script that submits condor-g jobs. >> It seems that the property elements are not being used in my >> task:execute() call. >> >> Here's the script: >> >> import("task.k") >> import("sys.k") >> >> element(pool, [handle, ..., optional(workdir), channel(properties)] >> ? host(name = handle >> ? ? each(...) >> ? ? to(properties >> ? ? ? each(properties) >> ? ? ) >> ? ) >> ) >> >> element(servicelist, [type, provider, url] >> ? service(type, provider=provider, url=url) >> ) >> >> element(gridftp, [url, optional(storage), optional(major), >> optional(minor), optional(patch)] >> ? if( >> ? ? url == "local://localhost" >> ? ? servicelist("file", "local", "") >> ? ? servicelist("file", "gsiftp", url) >> ? ) >> ) >> >> element(execution, [provider, url] >> ? servicelist(type="execution", provider=provider, url=url) >> ) >> >> element(filesystem, [provider, url, optional(storage)] >> ? servicelist(type="file", provider=provider, url=url) >> ) >> >> element(profile, [namespace, key, value] >> ? if( >> ? ? namespace == "karajan" >> ? ? property("{key}", value) >> ? ? property("{namespace}:{key}", value) >> ? ) >> ) >> >> element(workdirectory, [dir] >> ? property("workdir", dir) >> ) >> >> sitesFile := "condor_osg.xml" >> sites := list(executeFile(sitesFile)) >> >> for(site, sites >> ? print(site) >> ? task:execute("/bin/hostname", >> stdout="file:///home/aespinosa/workflows/pool_coaster/site_test/{site}", >> provider="condor", host=site) >> ) >> >> >> sample generated condor submit file: >> $ cat *.submit >> universe = vanilla >> output = >> file:///home/aespinosa/workflows/pool_coaster/site_test/BNL-ATLAS >> error = >> /home/aespinosa/.globus/scripts/Condor8392886088313119280.submit.stderr >> >> executable = /bin/hostname >> >> notification = Never >> leave_in_queue = TRUE >> queue >> >> >> a pool entry: >> ? >> ? ? >> >> ? ? grid >> ? ? gt2 >> gridgk02.racf.bnl.gov/jobmanager-condor >> >> ? ? 20.0 >> ? ? 0.95 >> >> ? ? >> >> /usatlas/prodjob/share/engage-scec/swift_scratch >> ? From aespinosa at cs.uchicago.edu Fri Oct 8 16:10:26 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 8 Oct 2010 16:10:26 -0500 Subject: [Swift-user] coaster-service local service port Message-ID: <20101008211026.GC2802@origin> Hi, What's the commandline argument to specify the port of the local service? In addition, is there some page I can refer to for its commandline arguments? Thanks, -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wilde at mcs.anl.gov Fri Oct 8 16:30:39 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 8 Oct 2010 15:30:39 -0600 (GMT-06:00) Subject: [Swift-user] coaster-service local service port In-Reply-To: <20101008211026.GC2802@origin> Message-ID: <1637738404.897251286573439520.JavaMail.root@zimbra.anl.gov> Allan, the only help I know of for the command is: com$ coaster-service -help Usage: coaster-service where options are: [(-port | -p) ] Specifies which port to start the service on [-nosec] Disables GSI security and uses plain TCP sockets instead [-proxy ] Specifies the location of a proxy credential that will be used for authentication. If not specified, the default proxy will be used. [-local] Binds the service to the loopback interface [(-help | -h)] Displays usage information Mike ----- "Allan Espinosa" wrote: > Hi, > > What's the commandline argument to specify the port of the local > service? > > In addition, is there some page I can refer to for its commandline > arguments? > > Thanks, > -Allan > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Fri Oct 8 16:41:57 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 8 Oct 2010 16:41:57 -0500 Subject: [Swift-user] coaster-service local service port In-Reply-To: <1637738404.897251286573439520.JavaMail.root@zimbra.anl.gov> References: <20101008211026.GC2802@origin> <1637738404.897251286573439520.JavaMail.root@zimbra.anl.gov> Message-ID: <20101008214157.GD2802@origin> Thanks Mike. Poking at CoasterPersistentService.java, it looks like the port 50000+ for the local service is automatically set by the provider. I'm coding up a patch to add the option. Hopefully this won't break things :) -Allan On Fri, Oct 08, 2010 at 03:30:39PM -0600, Michael Wilde wrote: > Allan, the only help I know of for the command is: > > com$ coaster-service -help > Usage: > coaster-service > > where options are: > > [(-port | -p) ] > Specifies which port to start the service on > > [-nosec] > Disables GSI security and uses plain TCP sockets instead > > [-proxy ] > Specifies the location of a proxy credential that will be used > for authentication. If not specified, the default proxy will be > used. > > [-local] > Binds the service to the loopback interface > > [(-help | -h)] > Displays usage information > > > Mike -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Fri Oct 8 23:07:45 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 08 Oct 2010 21:07:45 -0700 Subject: [Swift-user] coaster-service local service port In-Reply-To: <20101008211026.GC2802@origin> References: <20101008211026.GC2802@origin> Message-ID: <1286597265.2681.6.camel@blabla2.none> On Fri, 2010-10-08 at 16:10 -0500, Allan Espinosa wrote: > Hi, > > What's the commandline argument to specify the port of the local service? > > In addition, is there some page I can refer to for its commandline arguments? No, but cmd -h should do the trick for now. From hategan at mcs.anl.gov Fri Oct 8 23:08:50 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 08 Oct 2010 21:08:50 -0700 Subject: [Swift-user] coaster-service local service port In-Reply-To: <1637738404.897251286573439520.JavaMail.root@zimbra.anl.gov> References: <1637738404.897251286573439520.JavaMail.root@zimbra.anl.gov> Message-ID: <1286597330.2681.7.camel@blabla2.none> I should read the thread before answering to avoid being from the department of redundancy department. On Fri, 2010-10-08 at 15:30 -0600, Michael Wilde wrote: > Allan, the only help I know of for the command is: > > com$ coaster-service -help > Usage: > coaster-service > > where options are: > > [(-port | -p) ] > Specifies which port to start the service on > > [-nosec] > Disables GSI security and uses plain TCP sockets instead > > [-proxy ] > Specifies the location of a proxy credential that will be used > for authentication. If not specified, the default proxy will be > used. > > [-local] > Binds the service to the loopback interface > > [(-help | -h)] > Displays usage information > > > Mike > > > > ----- "Allan Espinosa" wrote: > > > Hi, > > > > What's the commandline argument to specify the port of the local > > service? > > > > In addition, is there some page I can refer to for its commandline > > arguments? > > > > Thanks, > > -Allan > > > > -- > > Allan M. Espinosa > > PhD student, Computer Science > > University of Chicago > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > From hategan at mcs.anl.gov Sat Oct 9 13:41:08 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 09 Oct 2010 11:41:08 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <4CAA921A.4070907@gmail.com> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> Message-ID: <1286649668.6823.1.camel@blabla2.none> And these files would be where exactly? On Mon, 2010-10-04 at 21:48 -0500, Jonathan Monette wrote: > Sure can. I have two log files in my home directory on the ci > machines. ~jonmon > One is rather large(20M) and that is the one with > "wrapper.paramter.mode=files" not set. The other is about 4.7M and that > one has "wrapper.parameter.mode=files" set. > You should have permissions to read them. > > On 10/04/2010 09:44 PM, Mihael Hategan wrote: > > Ok. I believe that now you hit a bug in swift. Luckily there might be > > something we can do about that. > > May I have the log file? > > > > On Mon, 2010-10-04 at 21:39 -0500, Jonathan Monette wrote: > >> By setting "wrapper.parameter.mode=files" I get "failed to transfer > >> wrapper logs". Here is my swift.properties file. > >> > >> execution.retries=0 > >> sitedir.keep=true > >> status.mode=provider > >> //wrapper.log.always.transfer=true > >> foreach.maxthreads=1024 > >> wrapper.parameter.mode=files > >> > >> I have tried this with "wrapper.log.always.transfer=true" both commented > >> and uncommented still get the same error. > >> > >> On 10/04/2010 08:25 PM, Mihael Hategan wrote: > >>> Groovy. Then set "wrapper.parameter.mode=files" in swift.properties. > >>> > >>> On Mon, 2010-10-04 at 18:59 -0500, Jonathan Monette wrote: > >>>> well if I am understanding the problem the input list will be about > >>>> 4100 files and the output list will be a single file (unless Swift adds > >>>> more input and output files). I do not think I am using provider > >>>> staging though. In my swift.properties i do not set the > >>>> use.provider.staging option and in etc/swift.properties > >>>> use.provider.staging is set to false. > >>>> > >>>> On 10/4/10 6:46 PM, Mihael Hategan wrote: > >>>>> On Mon, 2010-10-04 at 18:39 -0500, Jonathan Monette wrote: > >>>>>> Yes. I have to make sure that all 4100 files are created before the > >>>>>> failing app can execute. > >>>>>> > >>>>>> The writeData() way seems like a hack to get around the problem is Swift > >>>>>> but I will try it out and see if my script completes. > >>>>>> > >>>>>> Mihael: pasted below is the stack trace that was generated in the log file. > >>>>> Yeah. It's what Mike says. > >>>>> > >>>>> But it's not the app arguments. Instead, I'm guessing it's the > >>>>> input/output file lists. > >>>>> > >>>>> There was a scheme in non-provider-staging swift to pass these things in > >>>>> lists, but I'm guessing you are using provider staging. Perhaps some > >>>>> mode to automatically do this for large numbers of arguments is in > >>>>> order. > >>>>> > >>>>> Mihael > >>>>> > > > From jon.monette at gmail.com Sat Oct 9 13:44:38 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Sat, 09 Oct 2010 13:44:38 -0500 Subject: [Swift-user] Argument list to long In-Reply-To: <1286649668.6823.1.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> Message-ID: <4CB0B816.8000700@gmail.com> My bad. Forgot to put them in the home directory. They are there now On 10/9/10 1:41 PM, Mihael Hategan wrote: > And these files would be where exactly? > > On Mon, 2010-10-04 at 21:48 -0500, Jonathan Monette wrote: >> Sure can. I have two log files in my home directory on the ci >> machines. ~jonmon >> One is rather large(20M) and that is the one with >> "wrapper.paramter.mode=files" not set. The other is about 4.7M and that >> one has "wrapper.parameter.mode=files" set. >> You should have permissions to read them. >> >> On 10/04/2010 09:44 PM, Mihael Hategan wrote: >>> Ok. I believe that now you hit a bug in swift. Luckily there might be >>> something we can do about that. >>> May I have the log file? >>> >>> On Mon, 2010-10-04 at 21:39 -0500, Jonathan Monette wrote: >>>> By setting "wrapper.parameter.mode=files" I get "failed to transfer >>>> wrapper logs". Here is my swift.properties file. >>>> >>>> execution.retries=0 >>>> sitedir.keep=true >>>> status.mode=provider >>>> //wrapper.log.always.transfer=true >>>> foreach.maxthreads=1024 >>>> wrapper.parameter.mode=files >>>> >>>> I have tried this with "wrapper.log.always.transfer=true" both commented >>>> and uncommented still get the same error. >>>> >>>> On 10/04/2010 08:25 PM, Mihael Hategan wrote: >>>>> Groovy. Then set "wrapper.parameter.mode=files" in swift.properties. >>>>> >>>>> On Mon, 2010-10-04 at 18:59 -0500, Jonathan Monette wrote: >>>>>> well if I am understanding the problem the input list will be about >>>>>> 4100 files and the output list will be a single file (unless Swift adds >>>>>> more input and output files). I do not think I am using provider >>>>>> staging though. In my swift.properties i do not set the >>>>>> use.provider.staging option and in etc/swift.properties >>>>>> use.provider.staging is set to false. >>>>>> >>>>>> On 10/4/10 6:46 PM, Mihael Hategan wrote: >>>>>>> On Mon, 2010-10-04 at 18:39 -0500, Jonathan Monette wrote: >>>>>>>> Yes. I have to make sure that all 4100 files are created before the >>>>>>>> failing app can execute. >>>>>>>> >>>>>>>> The writeData() way seems like a hack to get around the problem is Swift >>>>>>>> but I will try it out and see if my script completes. >>>>>>>> >>>>>>>> Mihael: pasted below is the stack trace that was generated in the log file. >>>>>>> Yeah. It's what Mike says. >>>>>>> >>>>>>> But it's not the app arguments. Instead, I'm guessing it's the >>>>>>> input/output file lists. >>>>>>> >>>>>>> There was a scheme in non-provider-staging swift to pass these things in >>>>>>> lists, but I'm guessing you are using provider staging. Perhaps some >>>>>>> mode to automatically do this for large numbers of arguments is in >>>>>>> order. >>>>>>> >>>>>>> Mihael >>>>>>> > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From hategan at mcs.anl.gov Sat Oct 9 14:45:53 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 09 Oct 2010 12:45:53 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <4CB0B816.8000700@gmail.com> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> Message-ID: <1286653553.7457.0.camel@blabla2.none> Can you post the app procedure from your swift file that invokes mImgtbl? Mihael On Sat, 2010-10-09 at 13:44 -0500, Jonathan Monette wrote: > My bad. Forgot to put them in the home directory. They are there now > > On 10/9/10 1:41 PM, Mihael Hategan wrote: > > And these files would be where exactly? > > > > On Mon, 2010-10-04 at 21:48 -0500, Jonathan Monette wrote: > >> Sure can. I have two log files in my home directory on the ci > >> machines. ~jonmon > >> One is rather large(20M) and that is the one with > >> "wrapper.paramter.mode=files" not set. The other is about 4.7M and that > >> one has "wrapper.parameter.mode=files" set. > >> You should have permissions to read them. > >> > >> On 10/04/2010 09:44 PM, Mihael Hategan wrote: > >>> Ok. I believe that now you hit a bug in swift. Luckily there might be > >>> something we can do about that. > >>> May I have the log file? > >>> > >>> On Mon, 2010-10-04 at 21:39 -0500, Jonathan Monette wrote: > >>>> By setting "wrapper.parameter.mode=files" I get "failed to transfer > >>>> wrapper logs". Here is my swift.properties file. > >>>> > >>>> execution.retries=0 > >>>> sitedir.keep=true > >>>> status.mode=provider > >>>> //wrapper.log.always.transfer=true > >>>> foreach.maxthreads=1024 > >>>> wrapper.parameter.mode=files > >>>> > >>>> I have tried this with "wrapper.log.always.transfer=true" both commented > >>>> and uncommented still get the same error. > >>>> > >>>> On 10/04/2010 08:25 PM, Mihael Hategan wrote: > >>>>> Groovy. Then set "wrapper.parameter.mode=files" in swift.properties. > >>>>> > >>>>> On Mon, 2010-10-04 at 18:59 -0500, Jonathan Monette wrote: > >>>>>> well if I am understanding the problem the input list will be about > >>>>>> 4100 files and the output list will be a single file (unless Swift adds > >>>>>> more input and output files). I do not think I am using provider > >>>>>> staging though. In my swift.properties i do not set the > >>>>>> use.provider.staging option and in etc/swift.properties > >>>>>> use.provider.staging is set to false. > >>>>>> > >>>>>> On 10/4/10 6:46 PM, Mihael Hategan wrote: > >>>>>>> On Mon, 2010-10-04 at 18:39 -0500, Jonathan Monette wrote: > >>>>>>>> Yes. I have to make sure that all 4100 files are created before the > >>>>>>>> failing app can execute. > >>>>>>>> > >>>>>>>> The writeData() way seems like a hack to get around the problem is Swift > >>>>>>>> but I will try it out and see if my script completes. > >>>>>>>> > >>>>>>>> Mihael: pasted below is the stack trace that was generated in the log file. > >>>>>>> Yeah. It's what Mike says. > >>>>>>> > >>>>>>> But it's not the app arguments. Instead, I'm guessing it's the > >>>>>>> input/output file lists. > >>>>>>> > >>>>>>> There was a scheme in non-provider-staging swift to pass these things in > >>>>>>> lists, but I'm guessing you are using provider staging. Perhaps some > >>>>>>> mode to automatically do this for large numbers of arguments is in > >>>>>>> order. > >>>>>>> > >>>>>>> Mihael > >>>>>>> > > > From jon.monette at gmail.com Sat Oct 9 14:47:19 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Sat, 09 Oct 2010 14:47:19 -0500 Subject: [Swift-user] Argument list to long In-Reply-To: <1286653553.7457.0.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> Message-ID: <4CB0C6C7.5000204@gmail.com> app ( Table img_tbl ) mImgtbl( Image imgs[] ) { mImgtbl @dirname( imgs[0] ) @img_tbl; } On 10/9/10 2:45 PM, Mihael Hategan wrote: > Can you post the app procedure from your swift file that invokes > mImgtbl? > > Mihael > > On Sat, 2010-10-09 at 13:44 -0500, Jonathan Monette wrote: >> My bad. Forgot to put them in the home directory. They are there now >> >> On 10/9/10 1:41 PM, Mihael Hategan wrote: >>> And these files would be where exactly? >>> >>> On Mon, 2010-10-04 at 21:48 -0500, Jonathan Monette wrote: >>>> Sure can. I have two log files in my home directory on the ci >>>> machines. ~jonmon >>>> One is rather large(20M) and that is the one with >>>> "wrapper.paramter.mode=files" not set. The other is about 4.7M and that >>>> one has "wrapper.parameter.mode=files" set. >>>> You should have permissions to read them. >>>> >>>> On 10/04/2010 09:44 PM, Mihael Hategan wrote: >>>>> Ok. I believe that now you hit a bug in swift. Luckily there might be >>>>> something we can do about that. >>>>> May I have the log file? >>>>> >>>>> On Mon, 2010-10-04 at 21:39 -0500, Jonathan Monette wrote: >>>>>> By setting "wrapper.parameter.mode=files" I get "failed to transfer >>>>>> wrapper logs". Here is my swift.properties file. >>>>>> >>>>>> execution.retries=0 >>>>>> sitedir.keep=true >>>>>> status.mode=provider >>>>>> //wrapper.log.always.transfer=true >>>>>> foreach.maxthreads=1024 >>>>>> wrapper.parameter.mode=files >>>>>> >>>>>> I have tried this with "wrapper.log.always.transfer=true" both commented >>>>>> and uncommented still get the same error. >>>>>> >>>>>> On 10/04/2010 08:25 PM, Mihael Hategan wrote: >>>>>>> Groovy. Then set "wrapper.parameter.mode=files" in swift.properties. >>>>>>> >>>>>>> On Mon, 2010-10-04 at 18:59 -0500, Jonathan Monette wrote: >>>>>>>> well if I am understanding the problem the input list will be about >>>>>>>> 4100 files and the output list will be a single file (unless Swift adds >>>>>>>> more input and output files). I do not think I am using provider >>>>>>>> staging though. In my swift.properties i do not set the >>>>>>>> use.provider.staging option and in etc/swift.properties >>>>>>>> use.provider.staging is set to false. >>>>>>>> >>>>>>>> On 10/4/10 6:46 PM, Mihael Hategan wrote: >>>>>>>>> On Mon, 2010-10-04 at 18:39 -0500, Jonathan Monette wrote: >>>>>>>>>> Yes. I have to make sure that all 4100 files are created before the >>>>>>>>>> failing app can execute. >>>>>>>>>> >>>>>>>>>> The writeData() way seems like a hack to get around the problem is Swift >>>>>>>>>> but I will try it out and see if my script completes. >>>>>>>>>> >>>>>>>>>> Mihael: pasted below is the stack trace that was generated in the log file. >>>>>>>>> Yeah. It's what Mike says. >>>>>>>>> >>>>>>>>> But it's not the app arguments. Instead, I'm guessing it's the >>>>>>>>> input/output file lists. >>>>>>>>> >>>>>>>>> There was a scheme in non-provider-staging swift to pass these things in >>>>>>>>> lists, but I'm guessing you are using provider staging. Perhaps some >>>>>>>>> mode to automatically do this for large numbers of arguments is in >>>>>>>>> order. >>>>>>>>> >>>>>>>>> Mihael >>>>>>>>> > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From hategan at mcs.anl.gov Sat Oct 9 14:58:13 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 09 Oct 2010 12:58:13 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <4CB0C6C7.5000204@gmail.com> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com> Message-ID: <1286654293.7457.9.camel@blabla2.none> So swift will still pass all elements in imgs[] to the wrapper as the list of input files, and try to stage them in. If you have them locally and you are managing them independently, you may want to try to define them as "external". As for the parameter files, I have a suspicion that the following line (taken from one such parameter file) may be responsible for the problems: -scratch-e /home/jonmon/Library/Montage/bin/mProject (each parameter is supposed to be on its own line). I'll see if I can fix that. On Sat, 2010-10-09 at 14:47 -0500, Jonathan Monette wrote: > app ( Table img_tbl ) mImgtbl( Image imgs[] ) > { > mImgtbl @dirname( imgs[0] ) @img_tbl; > } > > On 10/9/10 2:45 PM, Mihael Hategan wrote: > > Can you post the app procedure from your swift file that invokes > > mImgtbl? > > > > Mihael > > > > On Sat, 2010-10-09 at 13:44 -0500, Jonathan Monette wrote: > >> My bad. Forgot to put them in the home directory. They are there now > >> > >> On 10/9/10 1:41 PM, Mihael Hategan wrote: > >>> And these files would be where exactly? > >>> > >>> On Mon, 2010-10-04 at 21:48 -0500, Jonathan Monette wrote: > >>>> Sure can. I have two log files in my home directory on the ci > >>>> machines. ~jonmon > >>>> One is rather large(20M) and that is the one with > >>>> "wrapper.paramter.mode=files" not set. The other is about 4.7M and that > >>>> one has "wrapper.parameter.mode=files" set. > >>>> You should have permissions to read them. > >>>> > >>>> On 10/04/2010 09:44 PM, Mihael Hategan wrote: > >>>>> Ok. I believe that now you hit a bug in swift. Luckily there might be > >>>>> something we can do about that. > >>>>> May I have the log file? > >>>>> > >>>>> On Mon, 2010-10-04 at 21:39 -0500, Jonathan Monette wrote: > >>>>>> By setting "wrapper.parameter.mode=files" I get "failed to transfer > >>>>>> wrapper logs". Here is my swift.properties file. > >>>>>> > >>>>>> execution.retries=0 > >>>>>> sitedir.keep=true > >>>>>> status.mode=provider > >>>>>> //wrapper.log.always.transfer=true > >>>>>> foreach.maxthreads=1024 > >>>>>> wrapper.parameter.mode=files > >>>>>> > >>>>>> I have tried this with "wrapper.log.always.transfer=true" both commented > >>>>>> and uncommented still get the same error. > >>>>>> > >>>>>> On 10/04/2010 08:25 PM, Mihael Hategan wrote: > >>>>>>> Groovy. Then set "wrapper.parameter.mode=files" in swift.properties. > >>>>>>> > >>>>>>> On Mon, 2010-10-04 at 18:59 -0500, Jonathan Monette wrote: > >>>>>>>> well if I am understanding the problem the input list will be about > >>>>>>>> 4100 files and the output list will be a single file (unless Swift adds > >>>>>>>> more input and output files). I do not think I am using provider > >>>>>>>> staging though. In my swift.properties i do not set the > >>>>>>>> use.provider.staging option and in etc/swift.properties > >>>>>>>> use.provider.staging is set to false. > >>>>>>>> > >>>>>>>> On 10/4/10 6:46 PM, Mihael Hategan wrote: > >>>>>>>>> On Mon, 2010-10-04 at 18:39 -0500, Jonathan Monette wrote: > >>>>>>>>>> Yes. I have to make sure that all 4100 files are created before the > >>>>>>>>>> failing app can execute. > >>>>>>>>>> > >>>>>>>>>> The writeData() way seems like a hack to get around the problem is Swift > >>>>>>>>>> but I will try it out and see if my script completes. > >>>>>>>>>> > >>>>>>>>>> Mihael: pasted below is the stack trace that was generated in the log file. > >>>>>>>>> Yeah. It's what Mike says. > >>>>>>>>> > >>>>>>>>> But it's not the app arguments. Instead, I'm guessing it's the > >>>>>>>>> input/output file lists. > >>>>>>>>> > >>>>>>>>> There was a scheme in non-provider-staging swift to pass these things in > >>>>>>>>> lists, but I'm guessing you are using provider staging. Perhaps some > >>>>>>>>> mode to automatically do this for large numbers of arguments is in > >>>>>>>>> order. > >>>>>>>>> > >>>>>>>>> Mihael > >>>>>>>>> > > > From jon.monette at gmail.com Sat Oct 9 15:00:19 2010 From: jon.monette at gmail.com (jon.monette at gmail.com) Date: Sat, 9 Oct 2010 20:00:19 +0000 Subject: [Swift-user] Argument list to long In-Reply-To: <1286654293.7457.9.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> Message-ID: <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> What do you mean by manage them independently? Sent on the Sprint? Now Network from my BlackBerry? -----Original Message----- From: Mihael Hategan Date: Sat, 09 Oct 2010 12:58:13 To: Jonathan Monette Cc: Michael Wilde; Subject: Re: [Swift-user] Argument list to long So swift will still pass all elements in imgs[] to the wrapper as the list of input files, and try to stage them in. If you have them locally and you are managing them independently, you may want to try to define them as "external". As for the parameter files, I have a suspicion that the following line (taken from one such parameter file) may be responsible for the problems: -scratch-e /home/jonmon/Library/Montage/bin/mProject (each parameter is supposed to be on its own line). I'll see if I can fix that. On Sat, 2010-10-09 at 14:47 -0500, Jonathan Monette wrote: > app ( Table img_tbl ) mImgtbl( Image imgs[] ) > { > mImgtbl @dirname( imgs[0] ) @img_tbl; > } > > On 10/9/10 2:45 PM, Mihael Hategan wrote: > > Can you post the app procedure from your swift file that invokes > > mImgtbl? > > > > Mihael > > > > On Sat, 2010-10-09 at 13:44 -0500, Jonathan Monette wrote: > >> My bad. Forgot to put them in the home directory. They are there now > >> > >> On 10/9/10 1:41 PM, Mihael Hategan wrote: > >>> And these files would be where exactly? > >>> > >>> On Mon, 2010-10-04 at 21:48 -0500, Jonathan Monette wrote: > >>>> Sure can. I have two log files in my home directory on the ci > >>>> machines. ~jonmon > >>>> One is rather large(20M) and that is the one with > >>>> "wrapper.paramter.mode=files" not set. The other is about 4.7M and that > >>>> one has "wrapper.parameter.mode=files" set. > >>>> You should have permissions to read them. > >>>> > >>>> On 10/04/2010 09:44 PM, Mihael Hategan wrote: > >>>>> Ok. I believe that now you hit a bug in swift. Luckily there might be > >>>>> something we can do about that. > >>>>> May I have the log file? > >>>>> > >>>>> On Mon, 2010-10-04 at 21:39 -0500, Jonathan Monette wrote: > >>>>>> By setting "wrapper.parameter.mode=files" I get "failed to transfer > >>>>>> wrapper logs". Here is my swift.properties file. > >>>>>> > >>>>>> execution.retries=0 > >>>>>> sitedir.keep=true > >>>>>> status.mode=provider > >>>>>> //wrapper.log.always.transfer=true > >>>>>> foreach.maxthreads=1024 > >>>>>> wrapper.parameter.mode=files > >>>>>> > >>>>>> I have tried this with "wrapper.log.always.transfer=true" both commented > >>>>>> and uncommented still get the same error. > >>>>>> > >>>>>> On 10/04/2010 08:25 PM, Mihael Hategan wrote: > >>>>>>> Groovy. Then set "wrapper.parameter.mode=files" in swift.properties. > >>>>>>> > >>>>>>> On Mon, 2010-10-04 at 18:59 -0500, Jonathan Monette wrote: > >>>>>>>> well if I am understanding the problem the input list will be about > >>>>>>>> 4100 files and the output list will be a single file (unless Swift adds > >>>>>>>> more input and output files). I do not think I am using provider > >>>>>>>> staging though. In my swift.properties i do not set the > >>>>>>>> use.provider.staging option and in etc/swift.properties > >>>>>>>> use.provider.staging is set to false. > >>>>>>>> > >>>>>>>> On 10/4/10 6:46 PM, Mihael Hategan wrote: > >>>>>>>>> On Mon, 2010-10-04 at 18:39 -0500, Jonathan Monette wrote: > >>>>>>>>>> Yes. I have to make sure that all 4100 files are created before the > >>>>>>>>>> failing app can execute. > >>>>>>>>>> > >>>>>>>>>> The writeData() way seems like a hack to get around the problem is Swift > >>>>>>>>>> but I will try it out and see if my script completes. > >>>>>>>>>> > >>>>>>>>>> Mihael: pasted below is the stack trace that was generated in the log file. > >>>>>>>>> Yeah. It's what Mike says. > >>>>>>>>> > >>>>>>>>> But it's not the app arguments. Instead, I'm guessing it's the > >>>>>>>>> input/output file lists. > >>>>>>>>> > >>>>>>>>> There was a scheme in non-provider-staging swift to pass these things in > >>>>>>>>> lists, but I'm guessing you are using provider staging. Perhaps some > >>>>>>>>> mode to automatically do this for large numbers of arguments is in > >>>>>>>>> order. > >>>>>>>>> > >>>>>>>>> Mihael > >>>>>>>>> > > > From hategan at mcs.anl.gov Sat Oct 9 15:01:07 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 09 Oct 2010 13:01:07 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <1286654293.7457.9.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com> <1286654293.7457.9.camel@blabla2.none> Message-ID: <1286654467.7457.10.camel@blabla2.none> On Sat, 2010-10-09 at 12:58 -0700, Mihael Hategan wrote: > As for the parameter files, I have a suspicion that the following line > (taken from one such parameter file) may be responsible for the > problems: > > -scratch-e /home/jonmon/Library/Montage/bin/mProject > > (each parameter is supposed to be on its own line). > > I'll see if I can fix that. Swift trunk/r3676. You should try that. Mihael From hategan at mcs.anl.gov Sat Oct 9 15:02:47 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 09 Oct 2010 13:02:47 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> Message-ID: <1286654567.7457.12.camel@blabla2.none> On Sat, 2010-10-09 at 20:00 +0000, jon.monette at gmail.com wrote: > What do you mean by manage them independently? I'm not sure. I guess it's whether swift can skip staging them in and stop keeping track of them. But that's pretty much the same as saying they are "external", so it seems circular. From jon.monette at gmail.com Sun Oct 10 15:07:12 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 10 Oct 2010 15:07:12 -0500 Subject: [Swift-user] Argument list to long In-Reply-To: <1286654567.7457.12.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> Message-ID: <4CB21CF0.4070907@gmail.com> Ok. I have not tried declaring my list of images as external yet. About to try that. But setting wrapper.parameter.mode=files, the coaster job fails with code 254(not sure what that error code means). When I do not set the in swift.properties the I get the argument list too long error again. What does the status.mode=provider line do? I am looking at my swift.properties and that is the only line I do not know what it does. I think I got that from Mike. On 10/9/10 3:02 PM, Mihael Hategan wrote: > On Sat, 2010-10-09 at 20:00 +0000, jon.monette at gmail.com wrote: >> What do you mean by manage them independently? > I'm not sure. I guess it's whether swift can skip staging them in and > stop keeping track of them. But that's pretty much the same as saying > they are "external", so it seems circular. > > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From jon.monette at gmail.com Sun Oct 10 15:09:36 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 10 Oct 2010 15:09:36 -0500 Subject: [Swift-user] Argument list to long In-Reply-To: <1286654567.7457.12.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> Message-ID: <4CB21D80.4060801@gmail.com> Also, how do I declare the image list as an external type? On 10/9/10 3:02 PM, Mihael Hategan wrote: > On Sat, 2010-10-09 at 20:00 +0000, jon.monette at gmail.com wrote: >> What do you mean by manage them independently? > I'm not sure. I guess it's whether swift can skip staging them in and > stop keeping track of them. But that's pretty much the same as saying > they are "external", so it seems circular. > > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From hategan at mcs.anl.gov Sun Oct 10 15:25:31 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 10 Oct 2010 13:25:31 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <4CB21CF0.4070907@gmail.com> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21CF0.4070907@gmail.com> Message-ID: <1286742331.27981.1.camel@blabla2.none> On Sun, 2010-10-10 at 15:07 -0500, Jonathan Monette wrote: > Ok. I have not tried declaring my list of images as external yet. > About to try that. But setting wrapper.parameter.mode=files, the > coaster job fails with code 254(not sure what that error code means). I believe I fixed that in svn. You should try the other option first (i.e. update and re-run with parameter.mode=files). > When I do not set the in swift.properties the I get the argument list > too long error again. What does the status.mode=provider line do? I am > looking at my swift.properties and that is the only line I do not know > what it does. I think I got that from Mike. Don't use that for now. Let's troubleshoot one problem at a time. From hategan at mcs.anl.gov Sun Oct 10 15:26:06 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 10 Oct 2010 13:26:06 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <4CB21D80.4060801@gmail.com> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> Message-ID: <1286742366.27981.2.camel@blabla2.none> Can you re-try the current thing with parameter.mode=files and the latest trunk first? On Sun, 2010-10-10 at 15:09 -0500, Jonathan Monette wrote: > Also, how do I declare the image list as an external type? > > On 10/9/10 3:02 PM, Mihael Hategan wrote: > > On Sat, 2010-10-09 at 20:00 +0000, jon.monette at gmail.com wrote: > >> What do you mean by manage them independently? > > I'm not sure. I guess it's whether swift can skip staging them in and > > stop keeping track of them. But that's pretty much the same as saying > > they are "external", so it seems circular. > > > > > From jon.monette at gmail.com Sun Oct 10 15:51:57 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 10 Oct 2010 15:51:57 -0500 Subject: [Swift-user] Argument list to long In-Reply-To: <1286742366.27981.2.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> Message-ID: <4CB2276D.3020504@gmail.com> cog is at revision 2910 and swift is at revision 3676. I have added "wrapper.parameter.mode=files" into my swift.properties. Here is the output to the screen: RunID: 20101010-1548-ky63yf59 Progress: Progress:Progress:Progress: uninitialized:4 uninitialized:3 uninitialized:4 Progress: Selecting site:4118 Initializing site shared directory:1 Progress: Selecting site:4017 Stage in:101 Submitting:1 original callback URI is http://169.254.95.119:37936 callback URI has been overridden to http://192.5.86.6:37936 Progress: Selecting site:4017 Submitted:101 Active:1 Failed to transfer wrapper log from unrectified-20101010-1548-ky63yf59/info/k on pads Failed to transfer wrapper log from unrectified-20101010-1548-ky63yf59/info/7 on pads Failed to transfer wrapper log from unrectified-20101010-1548-ky63yf59/info/6 on pads Execution failed: Failed to transfer wrapper log from unrectified-20101010-1548-ky63yf59/info/g on pads Failed to transfer wrapper log from unrectified-20101010-1548-ky63yf59/info/j on pads Failed to transfer wrapper log from unrectified-20101010-1548-ky63yf59/info/t on pads Exception in mProject: Arguments: [-X, raw_dir/2mass-atlas-000218s-j0150091.fits, proj_dir/proj_2mass-atlas-000218s-j0150091.fits, header.hdr] Host: pads Directory: unrectified-20101010-1548-ky63yf59/jobs/6/mProject-6g4u6zzj stderr.txt: stdout.txt: ---- Caused by: Job failed with an exit code of 254 Cleaning up... After that coasters just cancels the other jobs. On 10/10/10 3:26 PM, Mihael Hategan wrote: > Can you re-try the current thing with parameter.mode=files and the > latest trunk first? > > On Sun, 2010-10-10 at 15:09 -0500, Jonathan Monette wrote: >> Also, how do I declare the image list as an external type? >> >> On 10/9/10 3:02 PM, Mihael Hategan wrote: >>> On Sat, 2010-10-09 at 20:00 +0000, jon.monette at gmail.com wrote: >>>> What do you mean by manage them independently? >>> I'm not sure. I guess it's whether swift can skip staging them in and >>> stop keeping track of them. But that's pretty much the same as saying >>> they are "external", so it seems circular. >>> >>> > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From hategan at mcs.anl.gov Sun Oct 10 17:05:54 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 10 Oct 2010 15:05:54 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <4CB2276D.3020504@gmail.com> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> Message-ID: <1286748354.17568.0.camel@blabla2.none> Ok. I can reproduce this. It means it wasn't the previous problem. On Sun, 2010-10-10 at 15:51 -0500, Jonathan Monette wrote: > cog is at revision 2910 and swift is at revision 3676. I have added > "wrapper.parameter.mode=files" into my swift.properties. Here is the > output to the screen: > RunID: 20101010-1548-ky63yf59 > Progress: > Progress:Progress:Progress: uninitialized:4 > uninitialized:3 uninitialized:4 > > Progress: Selecting site:4118 Initializing site shared directory:1 > Progress: Selecting site:4017 Stage in:101 Submitting:1 > original callback URI is http://169.254.95.119:37936 > callback URI has been overridden to http://192.5.86.6:37936 > Progress: Selecting site:4017 Submitted:101 Active:1 > Failed to transfer wrapper log from > unrectified-20101010-1548-ky63yf59/info/k on pads > Failed to transfer wrapper log from > unrectified-20101010-1548-ky63yf59/info/7 on pads > Failed to transfer wrapper log from > unrectified-20101010-1548-ky63yf59/info/6 on pads > Execution failed: > Failed to transfer wrapper log from > unrectified-20101010-1548-ky63yf59/info/g on pads > Failed to transfer wrapper log from > unrectified-20101010-1548-ky63yf59/info/j on pads > Failed to transfer wrapper log from > unrectified-20101010-1548-ky63yf59/info/t on pads > Exception in mProject: > Arguments: [-X, raw_dir/2mass-atlas-000218s-j0150091.fits, > proj_dir/proj_2mass-atlas-000218s-j0150091.fits, header.hdr] > Host: pads > Directory: unrectified-20101010-1548-ky63yf59/jobs/6/mProject-6g4u6zzj > stderr.txt: > > stdout.txt: > > ---- > > Caused by: > Job failed with an exit code of 254 > Cleaning up... > > After that coasters just cancels the other jobs. > > On 10/10/10 3:26 PM, Mihael Hategan wrote: > > Can you re-try the current thing with parameter.mode=files and the > > latest trunk first? > > > > On Sun, 2010-10-10 at 15:09 -0500, Jonathan Monette wrote: > >> Also, how do I declare the image list as an external type? > >> > >> On 10/9/10 3:02 PM, Mihael Hategan wrote: > >>> On Sat, 2010-10-09 at 20:00 +0000, jon.monette at gmail.com wrote: > >>>> What do you mean by manage them independently? > >>> I'm not sure. I guess it's whether swift can skip staging them in and > >>> stop keeping track of them. But that's pretty much the same as saying > >>> they are "external", so it seems circular. > >>> > >>> > > > From jon.monette at gmail.com Sun Oct 10 17:09:10 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 10 Oct 2010 17:09:10 -0500 Subject: [Swift-user] Argument list to long In-Reply-To: <1286748354.17568.0.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> Message-ID: <4CB23986.1090809@gmail.com> Ok. If you need anymore of my log files I can gather some up for you. On 10/10/10 5:05 PM, Mihael Hategan wrote: > Ok. I can reproduce this. > > It means it wasn't the previous problem. > > On Sun, 2010-10-10 at 15:51 -0500, Jonathan Monette wrote: >> cog is at revision 2910 and swift is at revision 3676. I have added >> "wrapper.parameter.mode=files" into my swift.properties. Here is the >> output to the screen: >> RunID: 20101010-1548-ky63yf59 >> Progress: >> Progress:Progress:Progress: uninitialized:4 >> uninitialized:3 uninitialized:4 >> >> Progress: Selecting site:4118 Initializing site shared directory:1 >> Progress: Selecting site:4017 Stage in:101 Submitting:1 >> original callback URI is http://169.254.95.119:37936 >> callback URI has been overridden to http://192.5.86.6:37936 >> Progress: Selecting site:4017 Submitted:101 Active:1 >> Failed to transfer wrapper log from >> unrectified-20101010-1548-ky63yf59/info/k on pads >> Failed to transfer wrapper log from >> unrectified-20101010-1548-ky63yf59/info/7 on pads >> Failed to transfer wrapper log from >> unrectified-20101010-1548-ky63yf59/info/6 on pads >> Execution failed: >> Failed to transfer wrapper log from >> unrectified-20101010-1548-ky63yf59/info/g on pads >> Failed to transfer wrapper log from >> unrectified-20101010-1548-ky63yf59/info/j on pads >> Failed to transfer wrapper log from >> unrectified-20101010-1548-ky63yf59/info/t on pads >> Exception in mProject: >> Arguments: [-X, raw_dir/2mass-atlas-000218s-j0150091.fits, >> proj_dir/proj_2mass-atlas-000218s-j0150091.fits, header.hdr] >> Host: pads >> Directory: unrectified-20101010-1548-ky63yf59/jobs/6/mProject-6g4u6zzj >> stderr.txt: >> >> stdout.txt: >> >> ---- >> >> Caused by: >> Job failed with an exit code of 254 >> Cleaning up... >> >> After that coasters just cancels the other jobs. >> >> On 10/10/10 3:26 PM, Mihael Hategan wrote: >>> Can you re-try the current thing with parameter.mode=files and the >>> latest trunk first? >>> >>> On Sun, 2010-10-10 at 15:09 -0500, Jonathan Monette wrote: >>>> Also, how do I declare the image list as an external type? >>>> >>>> On 10/9/10 3:02 PM, Mihael Hategan wrote: >>>>> On Sat, 2010-10-09 at 20:00 +0000, jon.monette at gmail.com wrote: >>>>>> What do you mean by manage them independently? >>>>> I'm not sure. I guess it's whether swift can skip staging them in and >>>>> stop keeping track of them. But that's pretty much the same as saying >>>>> they are "external", so it seems circular. >>>>> >>>>> > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From hategan at mcs.anl.gov Sun Oct 10 17:12:07 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 10 Oct 2010 15:12:07 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <4CB23986.1090809@gmail.com> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> Message-ID: <1286748727.19802.0.camel@blabla2.none> I'm good. Try now. On Sun, 2010-10-10 at 17:09 -0500, Jonathan Monette wrote: > Ok. If you need anymore of my log files I can gather some up for you. > > On 10/10/10 5:05 PM, Mihael Hategan wrote: > > Ok. I can reproduce this. > > > > It means it wasn't the previous problem. > > > > On Sun, 2010-10-10 at 15:51 -0500, Jonathan Monette wrote: > >> cog is at revision 2910 and swift is at revision 3676. I have added > >> "wrapper.parameter.mode=files" into my swift.properties. Here is the > >> output to the screen: > >> RunID: 20101010-1548-ky63yf59 > >> Progress: > >> Progress:Progress:Progress: uninitialized:4 > >> uninitialized:3 uninitialized:4 > >> > >> Progress: Selecting site:4118 Initializing site shared directory:1 > >> Progress: Selecting site:4017 Stage in:101 Submitting:1 > >> original callback URI is http://169.254.95.119:37936 > >> callback URI has been overridden to http://192.5.86.6:37936 > >> Progress: Selecting site:4017 Submitted:101 Active:1 > >> Failed to transfer wrapper log from > >> unrectified-20101010-1548-ky63yf59/info/k on pads > >> Failed to transfer wrapper log from > >> unrectified-20101010-1548-ky63yf59/info/7 on pads > >> Failed to transfer wrapper log from > >> unrectified-20101010-1548-ky63yf59/info/6 on pads > >> Execution failed: > >> Failed to transfer wrapper log from > >> unrectified-20101010-1548-ky63yf59/info/g on pads > >> Failed to transfer wrapper log from > >> unrectified-20101010-1548-ky63yf59/info/j on pads > >> Failed to transfer wrapper log from > >> unrectified-20101010-1548-ky63yf59/info/t on pads > >> Exception in mProject: > >> Arguments: [-X, raw_dir/2mass-atlas-000218s-j0150091.fits, > >> proj_dir/proj_2mass-atlas-000218s-j0150091.fits, header.hdr] > >> Host: pads > >> Directory: unrectified-20101010-1548-ky63yf59/jobs/6/mProject-6g4u6zzj > >> stderr.txt: > >> > >> stdout.txt: > >> > >> ---- > >> > >> Caused by: > >> Job failed with an exit code of 254 > >> Cleaning up... > >> > >> After that coasters just cancels the other jobs. > >> > >> On 10/10/10 3:26 PM, Mihael Hategan wrote: > >>> Can you re-try the current thing with parameter.mode=files and the > >>> latest trunk first? > >>> > >>> On Sun, 2010-10-10 at 15:09 -0500, Jonathan Monette wrote: > >>>> Also, how do I declare the image list as an external type? > >>>> > >>>> On 10/9/10 3:02 PM, Mihael Hategan wrote: > >>>>> On Sat, 2010-10-09 at 20:00 +0000, jon.monette at gmail.com wrote: > >>>>>> What do you mean by manage them independently? > >>>>> I'm not sure. I guess it's whether swift can skip staging them in and > >>>>> stop keeping track of them. But that's pretty much the same as saying > >>>>> they are "external", so it seems circular. > >>>>> > >>>>> > > > From jon.monette at gmail.com Sun Oct 10 17:30:17 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 10 Oct 2010 17:30:17 -0500 Subject: [Swift-user] Argument list to long In-Reply-To: <1286748727.19802.0.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> Message-ID: <4CB23E79.20004@gmail.com> Ok. This seems to have fixed the problem. Now I am getting an error but it has to do with the app invocation. Thanks. On 10/10/10 5:12 PM, Mihael Hategan wrote: > I'm good. Try now. > > On Sun, 2010-10-10 at 17:09 -0500, Jonathan Monette wrote: >> Ok. If you need anymore of my log files I can gather some up for you. >> >> On 10/10/10 5:05 PM, Mihael Hategan wrote: >>> Ok. I can reproduce this. >>> >>> It means it wasn't the previous problem. >>> >>> On Sun, 2010-10-10 at 15:51 -0500, Jonathan Monette wrote: >>>> cog is at revision 2910 and swift is at revision 3676. I have added >>>> "wrapper.parameter.mode=files" into my swift.properties. Here is the >>>> output to the screen: >>>> RunID: 20101010-1548-ky63yf59 >>>> Progress: >>>> Progress:Progress:Progress: uninitialized:4 >>>> uninitialized:3 uninitialized:4 >>>> >>>> Progress: Selecting site:4118 Initializing site shared directory:1 >>>> Progress: Selecting site:4017 Stage in:101 Submitting:1 >>>> original callback URI is http://169.254.95.119:37936 >>>> callback URI has been overridden to http://192.5.86.6:37936 >>>> Progress: Selecting site:4017 Submitted:101 Active:1 >>>> Failed to transfer wrapper log from >>>> unrectified-20101010-1548-ky63yf59/info/k on pads >>>> Failed to transfer wrapper log from >>>> unrectified-20101010-1548-ky63yf59/info/7 on pads >>>> Failed to transfer wrapper log from >>>> unrectified-20101010-1548-ky63yf59/info/6 on pads >>>> Execution failed: >>>> Failed to transfer wrapper log from >>>> unrectified-20101010-1548-ky63yf59/info/g on pads >>>> Failed to transfer wrapper log from >>>> unrectified-20101010-1548-ky63yf59/info/j on pads >>>> Failed to transfer wrapper log from >>>> unrectified-20101010-1548-ky63yf59/info/t on pads >>>> Exception in mProject: >>>> Arguments: [-X, raw_dir/2mass-atlas-000218s-j0150091.fits, >>>> proj_dir/proj_2mass-atlas-000218s-j0150091.fits, header.hdr] >>>> Host: pads >>>> Directory: unrectified-20101010-1548-ky63yf59/jobs/6/mProject-6g4u6zzj >>>> stderr.txt: >>>> >>>> stdout.txt: >>>> >>>> ---- >>>> >>>> Caused by: >>>> Job failed with an exit code of 254 >>>> Cleaning up... >>>> >>>> After that coasters just cancels the other jobs. >>>> >>>> On 10/10/10 3:26 PM, Mihael Hategan wrote: >>>>> Can you re-try the current thing with parameter.mode=files and the >>>>> latest trunk first? >>>>> >>>>> On Sun, 2010-10-10 at 15:09 -0500, Jonathan Monette wrote: >>>>>> Also, how do I declare the image list as an external type? >>>>>> >>>>>> On 10/9/10 3:02 PM, Mihael Hategan wrote: >>>>>>> On Sat, 2010-10-09 at 20:00 +0000, jon.monette at gmail.com wrote: >>>>>>>> What do you mean by manage them independently? >>>>>>> I'm not sure. I guess it's whether swift can skip staging them in and >>>>>>> stop keeping track of them. But that's pretty much the same as saying >>>>>>> they are "external", so it seems circular. >>>>>>> >>>>>>> > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From jon.monette at gmail.com Sun Oct 10 18:20:38 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 10 Oct 2010 18:20:38 -0500 Subject: [Swift-user] Argument list to long In-Reply-To: <1286748727.19802.0.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> Message-ID: <4CB24A46.3080400@gmail.com> GridExec TASK_DEFINITION: Task(type=JOB_SUBMISSION, identity=urn:0-2-1-1513-3-1-1286751458281) is /bin/bash shared/_swiftwrap mProject-23r0czzj\ -p 2 What does this line mean? I cannot find the portion of code in _swiftwrap that process this. On 10/10/10 5:12 PM, Mihael Hategan wrote: > I'm good. Try now. > > On Sun, 2010-10-10 at 17:09 -0500, Jonathan Monette wrote: >> Ok. If you need anymore of my log files I can gather some up for you. >> >> On 10/10/10 5:05 PM, Mihael Hategan wrote: >>> Ok. I can reproduce this. >>> >>> It means it wasn't the previous problem. >>> >>> On Sun, 2010-10-10 at 15:51 -0500, Jonathan Monette wrote: >>>> cog is at revision 2910 and swift is at revision 3676. I have added >>>> "wrapper.parameter.mode=files" into my swift.properties. Here is the >>>> output to the screen: >>>> RunID: 20101010-1548-ky63yf59 >>>> Progress: >>>> Progress:Progress:Progress: uninitialized:4 >>>> uninitialized:3 uninitialized:4 >>>> >>>> Progress: Selecting site:4118 Initializing site shared directory:1 >>>> Progress: Selecting site:4017 Stage in:101 Submitting:1 >>>> original callback URI is http://169.254.95.119:37936 >>>> callback URI has been overridden to http://192.5.86.6:37936 >>>> Progress: Selecting site:4017 Submitted:101 Active:1 >>>> Failed to transfer wrapper log from >>>> unrectified-20101010-1548-ky63yf59/info/k on pads >>>> Failed to transfer wrapper log from >>>> unrectified-20101010-1548-ky63yf59/info/7 on pads >>>> Failed to transfer wrapper log from >>>> unrectified-20101010-1548-ky63yf59/info/6 on pads >>>> Execution failed: >>>> Failed to transfer wrapper log from >>>> unrectified-20101010-1548-ky63yf59/info/g on pads >>>> Failed to transfer wrapper log from >>>> unrectified-20101010-1548-ky63yf59/info/j on pads >>>> Failed to transfer wrapper log from >>>> unrectified-20101010-1548-ky63yf59/info/t on pads >>>> Exception in mProject: >>>> Arguments: [-X, raw_dir/2mass-atlas-000218s-j0150091.fits, >>>> proj_dir/proj_2mass-atlas-000218s-j0150091.fits, header.hdr] >>>> Host: pads >>>> Directory: unrectified-20101010-1548-ky63yf59/jobs/6/mProject-6g4u6zzj >>>> stderr.txt: >>>> >>>> stdout.txt: >>>> >>>> ---- >>>> >>>> Caused by: >>>> Job failed with an exit code of 254 >>>> Cleaning up... >>>> >>>> After that coasters just cancels the other jobs. >>>> >>>> On 10/10/10 3:26 PM, Mihael Hategan wrote: >>>>> Can you re-try the current thing with parameter.mode=files and the >>>>> latest trunk first? >>>>> >>>>> On Sun, 2010-10-10 at 15:09 -0500, Jonathan Monette wrote: >>>>>> Also, how do I declare the image list as an external type? >>>>>> >>>>>> On 10/9/10 3:02 PM, Mihael Hategan wrote: >>>>>>> On Sat, 2010-10-09 at 20:00 +0000, jon.monette at gmail.com wrote: >>>>>>>> What do you mean by manage them independently? >>>>>>> I'm not sure. I guess it's whether swift can skip staging them in and >>>>>>> stop keeping track of them. But that's pretty much the same as saying >>>>>>> they are "external", so it seems circular. >>>>>>> >>>>>>> > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From wozniak at mcs.anl.gov Sun Oct 10 18:38:54 2010 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Sun, 10 Oct 2010 18:38:54 -0500 (Central Daylight Time) Subject: [Swift-user] Argument list to long In-Reply-To: <4CB24A46.3080400@gmail.com> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> <4CB24A46.3080400@gmail.com> Message-ID: I put that in there in an attempt to connect the Task id with the command line. It's just a logging message, right? On Sun, 10 Oct 2010, Jonathan Monette wrote: > GridExec TASK_DEFINITION: Task(type=JOB_SUBMISSION, > identity=urn:0-2-1-1513-3-1-1286751458281) is /bin/bash shared/_swiftwrap > mProject-23r0czzj\ > -p 2 > > What does this line mean? I cannot find the portion of code in _swiftwrap > that process this. > > On 10/10/10 5:12 PM, Mihael Hategan wrote: >> I'm good. Try now. >> >> On Sun, 2010-10-10 at 17:09 -0500, Jonathan Monette wrote: >>> Ok. If you need anymore of my log files I can gather some up for you. >>> >>> On 10/10/10 5:05 PM, Mihael Hategan wrote: >>>> Ok. I can reproduce this. >>>> >>>> It means it wasn't the previous problem. >>>> >>>> On Sun, 2010-10-10 at 15:51 -0500, Jonathan Monette wrote: >>>>> cog is at revision 2910 and swift is at revision 3676. I have added >>>>> "wrapper.parameter.mode=files" into my swift.properties. Here is the >>>>> output to the screen: >>>>> RunID: 20101010-1548-ky63yf59 >>>>> Progress: >>>>> Progress:Progress:Progress: uninitialized:4 >>>>> uninitialized:3 uninitialized:4 >>>>> >>>>> Progress: Selecting site:4118 Initializing site shared directory:1 >>>>> Progress: Selecting site:4017 Stage in:101 Submitting:1 >>>>> original callback URI is http://169.254.95.119:37936 >>>>> callback URI has been overridden to http://192.5.86.6:37936 >>>>> Progress: Selecting site:4017 Submitted:101 Active:1 >>>>> Failed to transfer wrapper log from >>>>> unrectified-20101010-1548-ky63yf59/info/k on pads >>>>> Failed to transfer wrapper log from >>>>> unrectified-20101010-1548-ky63yf59/info/7 on pads >>>>> Failed to transfer wrapper log from >>>>> unrectified-20101010-1548-ky63yf59/info/6 on pads >>>>> Execution failed: >>>>> Failed to transfer wrapper log from >>>>> unrectified-20101010-1548-ky63yf59/info/g on pads >>>>> Failed to transfer wrapper log from >>>>> unrectified-20101010-1548-ky63yf59/info/j on pads >>>>> Failed to transfer wrapper log from >>>>> unrectified-20101010-1548-ky63yf59/info/t on pads >>>>> Exception in mProject: >>>>> Arguments: [-X, raw_dir/2mass-atlas-000218s-j0150091.fits, >>>>> proj_dir/proj_2mass-atlas-000218s-j0150091.fits, header.hdr] >>>>> Host: pads >>>>> Directory: unrectified-20101010-1548-ky63yf59/jobs/6/mProject-6g4u6zzj >>>>> stderr.txt: >>>>> >>>>> stdout.txt: >>>>> >>>>> ---- >>>>> >>>>> Caused by: >>>>> Job failed with an exit code of 254 >>>>> Cleaning up... >>>>> >>>>> After that coasters just cancels the other jobs. >>>>> >>>>> On 10/10/10 3:26 PM, Mihael Hategan wrote: >>>>>> Can you re-try the current thing with parameter.mode=files and the >>>>>> latest trunk first? >>>>>> >>>>>> On Sun, 2010-10-10 at 15:09 -0500, Jonathan Monette wrote: >>>>>>> Also, how do I declare the image list as an external type? >>>>>>> >>>>>>> On 10/9/10 3:02 PM, Mihael Hategan wrote: >>>>>>>> On Sat, 2010-10-09 at 20:00 +0000, jon.monette at gmail.com wrote: >>>>>>>>> What do you mean by manage them independently? >>>>>>>> I'm not sure. I guess it's whether swift can skip staging them in and >>>>>>>> stop keeping track of them. But that's pretty much the same as saying >>>>>>>> they are "external", so it seems circular. >>>>>>>> >>>>>>>> >> > > -- Justin M Wozniak From jon.monette at gmail.com Sun Oct 10 18:42:42 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 10 Oct 2010 18:42:42 -0500 Subject: [Swift-user] Argument list to long In-Reply-To: References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> <4CB24A46.3080400@gmail.com> Message-ID: <4CB24F72.20702@gmail.com> correct. I am getting an error coming from my app. The arguments that are "supposedly" passed to the app are correct but still get an error. I wanted to see what the command line to my app actually looks like. It should be mProject -X "in image" "out image" "header file" but I can't find if Swift is actually calling it like this. On 10/10/10 6:38 PM, Justin M Wozniak wrote: > > I put that in there in an attempt to connect the Task id with the > command line. It's just a logging message, right? > > On Sun, 10 Oct 2010, Jonathan Monette wrote: > >> GridExec TASK_DEFINITION: Task(type=JOB_SUBMISSION, >> identity=urn:0-2-1-1513-3-1-1286751458281) is /bin/bash >> shared/_swiftwrap mProject-23r0czzj\ >> -p 2 >> >> What does this line mean? I cannot find the portion of code in >> _swiftwrap that process this. >> >> On 10/10/10 5:12 PM, Mihael Hategan wrote: >>> I'm good. Try now. >>> >>> On Sun, 2010-10-10 at 17:09 -0500, Jonathan Monette wrote: >>>> Ok. If you need anymore of my log files I can gather some up for you. >>>> >>>> On 10/10/10 5:05 PM, Mihael Hategan wrote: >>>>> Ok. I can reproduce this. >>>>> >>>>> It means it wasn't the previous problem. >>>>> >>>>> On Sun, 2010-10-10 at 15:51 -0500, Jonathan Monette wrote: >>>>>> cog is at revision 2910 and swift is at revision 3676. I have added >>>>>> "wrapper.parameter.mode=files" into my swift.properties. Here is >>>>>> the >>>>>> output to the screen: >>>>>> RunID: 20101010-1548-ky63yf59 >>>>>> Progress: >>>>>> Progress:Progress:Progress: uninitialized:4 >>>>>> uninitialized:3 uninitialized:4 >>>>>> >>>>>> Progress: Selecting site:4118 Initializing site shared directory:1 >>>>>> Progress: Selecting site:4017 Stage in:101 Submitting:1 >>>>>> original callback URI is http://169.254.95.119:37936 >>>>>> callback URI has been overridden to http://192.5.86.6:37936 >>>>>> Progress: Selecting site:4017 Submitted:101 Active:1 >>>>>> Failed to transfer wrapper log from >>>>>> unrectified-20101010-1548-ky63yf59/info/k on pads >>>>>> Failed to transfer wrapper log from >>>>>> unrectified-20101010-1548-ky63yf59/info/7 on pads >>>>>> Failed to transfer wrapper log from >>>>>> unrectified-20101010-1548-ky63yf59/info/6 on pads >>>>>> Execution failed: >>>>>> Failed to transfer wrapper log from >>>>>> unrectified-20101010-1548-ky63yf59/info/g on pads >>>>>> Failed to transfer wrapper log from >>>>>> unrectified-20101010-1548-ky63yf59/info/j on pads >>>>>> Failed to transfer wrapper log from >>>>>> unrectified-20101010-1548-ky63yf59/info/t on pads >>>>>> Exception in mProject: >>>>>> Arguments: [-X, raw_dir/2mass-atlas-000218s-j0150091.fits, >>>>>> proj_dir/proj_2mass-atlas-000218s-j0150091.fits, header.hdr] >>>>>> Host: pads >>>>>> Directory: >>>>>> unrectified-20101010-1548-ky63yf59/jobs/6/mProject-6g4u6zzj >>>>>> stderr.txt: >>>>>> >>>>>> stdout.txt: >>>>>> >>>>>> ---- >>>>>> >>>>>> Caused by: >>>>>> Job failed with an exit code of 254 >>>>>> Cleaning up... >>>>>> >>>>>> After that coasters just cancels the other jobs. >>>>>> >>>>>> On 10/10/10 3:26 PM, Mihael Hategan wrote: >>>>>>> Can you re-try the current thing with parameter.mode=files and the >>>>>>> latest trunk first? >>>>>>> >>>>>>> On Sun, 2010-10-10 at 15:09 -0500, Jonathan Monette wrote: >>>>>>>> Also, how do I declare the image list as an external type? >>>>>>>> >>>>>>>> On 10/9/10 3:02 PM, Mihael Hategan wrote: >>>>>>>>> On Sat, 2010-10-09 at 20:00 +0000, jon.monette at gmail.com wrote: >>>>>>>>>> What do you mean by manage them independently? >>>>>>>>> I'm not sure. I guess it's whether swift can skip staging them >>>>>>>>> in and >>>>>>>>> stop keeping track of them. But that's pretty much the same as >>>>>>>>> saying >>>>>>>>> they are "external", so it seems circular. >>>>>>>>> >>>>>>>>> >>> >> >> > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From hategan at mcs.anl.gov Sun Oct 10 19:26:18 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 10 Oct 2010 17:26:18 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <4CB24F72.20702@gmail.com> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> <4CB24A46.3080400@gmail.com> <4CB24F72.20702@gmail.com> Message-ID: <1286756778.20744.1.camel@blabla2.none> The arguments are in a parameter file in /parameters//param- The contents looks something like this: -scratch -e /bin/cat -out 0653-out0937.txt -err stderr.txt -i -d -if 0653-in.txt -of 0653-out0937.txt -k -cdmfile -status files -a 0653-in.txt On Sun, 2010-10-10 at 18:42 -0500, Jonathan Monette wrote: > correct. I am getting an error coming from my app. The arguments > that are "supposedly" passed to the app are correct but still get an > error. I wanted to see what the command line to my app actually looks > like. It should be mProject -X "in image" "out image" "header file" but > I can't find if Swift is actually calling it like this. > > On 10/10/10 6:38 PM, Justin M Wozniak wrote: > > > > I put that in there in an attempt to connect the Task id with the > > command line. It's just a logging message, right? > > > > On Sun, 10 Oct 2010, Jonathan Monette wrote: > > > >> GridExec TASK_DEFINITION: Task(type=JOB_SUBMISSION, > >> identity=urn:0-2-1-1513-3-1-1286751458281) is /bin/bash > >> shared/_swiftwrap mProject-23r0czzj\ > >> -p 2 > >> > >> What does this line mean? I cannot find the portion of code in > >> _swiftwrap that process this. > >> > >> On 10/10/10 5:12 PM, Mihael Hategan wrote: > >>> I'm good. Try now. > >>> > >>> On Sun, 2010-10-10 at 17:09 -0500, Jonathan Monette wrote: > >>>> Ok. If you need anymore of my log files I can gather some up for you. > >>>> > >>>> On 10/10/10 5:05 PM, Mihael Hategan wrote: > >>>>> Ok. I can reproduce this. > >>>>> > >>>>> It means it wasn't the previous problem. > >>>>> > >>>>> On Sun, 2010-10-10 at 15:51 -0500, Jonathan Monette wrote: > >>>>>> cog is at revision 2910 and swift is at revision 3676. I have added > >>>>>> "wrapper.parameter.mode=files" into my swift.properties. Here is > >>>>>> the > >>>>>> output to the screen: > >>>>>> RunID: 20101010-1548-ky63yf59 > >>>>>> Progress: > >>>>>> Progress:Progress:Progress: uninitialized:4 > >>>>>> uninitialized:3 uninitialized:4 > >>>>>> > >>>>>> Progress: Selecting site:4118 Initializing site shared directory:1 > >>>>>> Progress: Selecting site:4017 Stage in:101 Submitting:1 > >>>>>> original callback URI is http://169.254.95.119:37936 > >>>>>> callback URI has been overridden to http://192.5.86.6:37936 > >>>>>> Progress: Selecting site:4017 Submitted:101 Active:1 > >>>>>> Failed to transfer wrapper log from > >>>>>> unrectified-20101010-1548-ky63yf59/info/k on pads > >>>>>> Failed to transfer wrapper log from > >>>>>> unrectified-20101010-1548-ky63yf59/info/7 on pads > >>>>>> Failed to transfer wrapper log from > >>>>>> unrectified-20101010-1548-ky63yf59/info/6 on pads > >>>>>> Execution failed: > >>>>>> Failed to transfer wrapper log from > >>>>>> unrectified-20101010-1548-ky63yf59/info/g on pads > >>>>>> Failed to transfer wrapper log from > >>>>>> unrectified-20101010-1548-ky63yf59/info/j on pads > >>>>>> Failed to transfer wrapper log from > >>>>>> unrectified-20101010-1548-ky63yf59/info/t on pads > >>>>>> Exception in mProject: > >>>>>> Arguments: [-X, raw_dir/2mass-atlas-000218s-j0150091.fits, > >>>>>> proj_dir/proj_2mass-atlas-000218s-j0150091.fits, header.hdr] > >>>>>> Host: pads > >>>>>> Directory: > >>>>>> unrectified-20101010-1548-ky63yf59/jobs/6/mProject-6g4u6zzj > >>>>>> stderr.txt: > >>>>>> > >>>>>> stdout.txt: > >>>>>> > >>>>>> ---- > >>>>>> > >>>>>> Caused by: > >>>>>> Job failed with an exit code of 254 > >>>>>> Cleaning up... > >>>>>> > >>>>>> After that coasters just cancels the other jobs. > >>>>>> > >>>>>> On 10/10/10 3:26 PM, Mihael Hategan wrote: > >>>>>>> Can you re-try the current thing with parameter.mode=files and the > >>>>>>> latest trunk first? > >>>>>>> > >>>>>>> On Sun, 2010-10-10 at 15:09 -0500, Jonathan Monette wrote: > >>>>>>>> Also, how do I declare the image list as an external type? > >>>>>>>> > >>>>>>>> On 10/9/10 3:02 PM, Mihael Hategan wrote: > >>>>>>>>> On Sat, 2010-10-09 at 20:00 +0000, jon.monette at gmail.com wrote: > >>>>>>>>>> What do you mean by manage them independently? > >>>>>>>>> I'm not sure. I guess it's whether swift can skip staging them > >>>>>>>>> in and > >>>>>>>>> stop keeping track of them. But that's pretty much the same as > >>>>>>>>> saying > >>>>>>>>> they are "external", so it seems circular. > >>>>>>>>> > >>>>>>>>> > >>> > >> > >> > > > From jon.monette at gmail.com Sun Oct 10 19:42:37 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 10 Oct 2010 19:42:37 -0500 Subject: [Swift-user] Argument list to long In-Reply-To: <1286756778.20744.1.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> <4CB24A46.3080400@gmail.com> <4CB24F72.20702@gmail.com> <1286756778.20744.1.camel@blabla2.none> Message-ID: <4CB25D7D.6090209@gmail.com> Is _swiftseq the wrapper that actually executes the app? The app in Swift eventually has to be executed in the shell correct? What I am trying to see is what is that command line that is submitted to the shell. On 10/10/10 7:26 PM, Mihael Hategan wrote: > The arguments are in a parameter file in > /parameters//param- > > The contents looks something like this: > -scratch > -e /bin/cat > -out 0653-out0937.txt > -err stderr.txt > -i > -d > -if 0653-in.txt > -of 0653-out0937.txt > -k > -cdmfile > -status files > -a 0653-in.txt > > > On Sun, 2010-10-10 at 18:42 -0500, Jonathan Monette wrote: >> correct. I am getting an error coming from my app. The arguments >> that are "supposedly" passed to the app are correct but still get an >> error. I wanted to see what the command line to my app actually looks >> like. It should be mProject -X "in image" "out image" "header file" but >> I can't find if Swift is actually calling it like this. >> >> On 10/10/10 6:38 PM, Justin M Wozniak wrote: >>> I put that in there in an attempt to connect the Task id with the >>> command line. It's just a logging message, right? >>> >>> On Sun, 10 Oct 2010, Jonathan Monette wrote: >>> >>>> GridExec TASK_DEFINITION: Task(type=JOB_SUBMISSION, >>>> identity=urn:0-2-1-1513-3-1-1286751458281) is /bin/bash >>>> shared/_swiftwrap mProject-23r0czzj\ >>>> -p 2 >>>> >>>> What does this line mean? I cannot find the portion of code in >>>> _swiftwrap that process this. >>>> >>>> On 10/10/10 5:12 PM, Mihael Hategan wrote: >>>>> I'm good. Try now. >>>>> >>>>> On Sun, 2010-10-10 at 17:09 -0500, Jonathan Monette wrote: >>>>>> Ok. If you need anymore of my log files I can gather some up for you. >>>>>> >>>>>> On 10/10/10 5:05 PM, Mihael Hategan wrote: >>>>>>> Ok. I can reproduce this. >>>>>>> >>>>>>> It means it wasn't the previous problem. >>>>>>> >>>>>>> On Sun, 2010-10-10 at 15:51 -0500, Jonathan Monette wrote: >>>>>>>> cog is at revision 2910 and swift is at revision 3676. I have added >>>>>>>> "wrapper.parameter.mode=files" into my swift.properties. Here is >>>>>>>> the >>>>>>>> output to the screen: >>>>>>>> RunID: 20101010-1548-ky63yf59 >>>>>>>> Progress: >>>>>>>> Progress:Progress:Progress: uninitialized:4 >>>>>>>> uninitialized:3 uninitialized:4 >>>>>>>> >>>>>>>> Progress: Selecting site:4118 Initializing site shared directory:1 >>>>>>>> Progress: Selecting site:4017 Stage in:101 Submitting:1 >>>>>>>> original callback URI is http://169.254.95.119:37936 >>>>>>>> callback URI has been overridden to http://192.5.86.6:37936 >>>>>>>> Progress: Selecting site:4017 Submitted:101 Active:1 >>>>>>>> Failed to transfer wrapper log from >>>>>>>> unrectified-20101010-1548-ky63yf59/info/k on pads >>>>>>>> Failed to transfer wrapper log from >>>>>>>> unrectified-20101010-1548-ky63yf59/info/7 on pads >>>>>>>> Failed to transfer wrapper log from >>>>>>>> unrectified-20101010-1548-ky63yf59/info/6 on pads >>>>>>>> Execution failed: >>>>>>>> Failed to transfer wrapper log from >>>>>>>> unrectified-20101010-1548-ky63yf59/info/g on pads >>>>>>>> Failed to transfer wrapper log from >>>>>>>> unrectified-20101010-1548-ky63yf59/info/j on pads >>>>>>>> Failed to transfer wrapper log from >>>>>>>> unrectified-20101010-1548-ky63yf59/info/t on pads >>>>>>>> Exception in mProject: >>>>>>>> Arguments: [-X, raw_dir/2mass-atlas-000218s-j0150091.fits, >>>>>>>> proj_dir/proj_2mass-atlas-000218s-j0150091.fits, header.hdr] >>>>>>>> Host: pads >>>>>>>> Directory: >>>>>>>> unrectified-20101010-1548-ky63yf59/jobs/6/mProject-6g4u6zzj >>>>>>>> stderr.txt: >>>>>>>> >>>>>>>> stdout.txt: >>>>>>>> >>>>>>>> ---- >>>>>>>> >>>>>>>> Caused by: >>>>>>>> Job failed with an exit code of 254 >>>>>>>> Cleaning up... >>>>>>>> >>>>>>>> After that coasters just cancels the other jobs. >>>>>>>> >>>>>>>> On 10/10/10 3:26 PM, Mihael Hategan wrote: >>>>>>>>> Can you re-try the current thing with parameter.mode=files and the >>>>>>>>> latest trunk first? >>>>>>>>> >>>>>>>>> On Sun, 2010-10-10 at 15:09 -0500, Jonathan Monette wrote: >>>>>>>>>> Also, how do I declare the image list as an external type? >>>>>>>>>> >>>>>>>>>> On 10/9/10 3:02 PM, Mihael Hategan wrote: >>>>>>>>>>> On Sat, 2010-10-09 at 20:00 +0000, jon.monette at gmail.com wrote: >>>>>>>>>>>> What do you mean by manage them independently? >>>>>>>>>>> I'm not sure. I guess it's whether swift can skip staging them >>>>>>>>>>> in and >>>>>>>>>>> stop keeping track of them. But that's pretty much the same as >>>>>>>>>>> saying >>>>>>>>>>> they are "external", so it seems circular. >>>>>>>>>>> >>>>>>>>>>> >>>> > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From hategan at mcs.anl.gov Sun Oct 10 19:49:08 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 10 Oct 2010 17:49:08 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <4CB25D7D.6090209@gmail.com> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA65BC.9090507@gmail.com> <1286235966.11966.4.camel@blabla2.none> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> <4CB24A46.3080400@gmail.com> <4CB24F72.20702@gmail.com> <1286756778.20744.1.camel@blabla2. none> <4CB25D7D.6090209@gmail.com> Message-ID: <1286758148.20885.2.camel@blabla2.none> On Sun, 2010-10-10 at 19:42 -0500, Jonathan Monette wrote: > Is _swiftseq the wrapper that actually executes the app? That would be _swiftwrap not _swiftseq. > The app in > Swift eventually has to be executed in the shell correct? What I am > trying to see is what is that command line that is submitted to the shell. You could add an environment variable called "SWIFT_GEN_SCRIPTS" and set it to some non-null value. That will generate run.sh files in the relevant job directories which would contain all that information (and can be used to re-run individual apps). From jon.monette at gmail.com Sun Oct 10 19:51:21 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 10 Oct 2010 19:51:21 -0500 Subject: [Swift-user] Argument list to long In-Reply-To: <1286758148.20885.2.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> <4CB24A46.3080400@gmail.com> <4CB24F72.20702@gmail.com> <1286756778.20744.1.camel@blabla2. none> <4CB25D7D.6090209@gmail.com> <1286758148.20885.2.camel@blabla2.none> Message-ID: <4CB25F89.2080109@gmail.com> This environment variable can be set to anything? As long as it is not null? For instance in my login shell I can execute the command: export SWIFT_GEN_SCRIPTS=1 and it will generate these run.sh? On 10/10/10 7:49 PM, Mihael Hategan wrote: > On Sun, 2010-10-10 at 19:42 -0500, Jonathan Monette wrote: >> Is _swiftseq the wrapper that actually executes the app? > That would be _swiftwrap not _swiftseq. > >> The app in >> Swift eventually has to be executed in the shell correct? What I am >> trying to see is what is that command line that is submitted to the shell. > You could add an environment variable called "SWIFT_GEN_SCRIPTS" and set > it to some non-null value. That will generate run.sh files in the > relevant job directories which would contain all that information (and > can be used to re-run individual apps). > > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From hategan at mcs.anl.gov Sun Oct 10 19:55:59 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 10 Oct 2010 17:55:59 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <4CB25F89.2080109@gmail.com> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA6A67.5070607@gmail.com> <1286241955.12637.3.camel@blabla2.none> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> <4CB24A46.3080400@gmail.com> <4CB24F72.20702@gmail.com> <1286756778.20744.1.camel@blabla2. none> <4CB25D7D.6090209@gmail.com> <1286758148.20885.2.camel@blabla2 .none> <4CB25F89.2080109@gmail.com> Message-ID: <1286758559.20983.0.camel@blabla2.none> On Sun, 2010-10-10 at 19:51 -0500, Jonathan Monette wrote: > This environment variable can be set to anything? As long as it is not > null? For instance in my login shell I can execute the command: > export SWIFT_GEN_SCRIPTS=1 > and it will generate these run.sh? Not unless the jobs magically inherit your login environment. I.e., set it in sites.xml for each site you want this to work on. From jon.monette at gmail.com Sun Oct 10 19:56:58 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 10 Oct 2010 19:56:58 -0500 Subject: [Swift-user] Argument list to long In-Reply-To: <1286758559.20983.0.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> <4CB24A46.3080400@gmail.com> <4CB24F72.20702@gmail.com> <1286756778.20744.1.camel@blabla2. none> <4CB25D7D.6090209@gmail.com> <1286758148.20885.2.camel@blabla2 .none> <4CB25F89.2080109@gmail.com> <1286758559.20983.0.camel@blabla2.none> Message-ID: <4CB260DA.6010102@gmail.com> alright. On 10/10/10 7:55 PM, Mihael Hategan wrote: > On Sun, 2010-10-10 at 19:51 -0500, Jonathan Monette wrote: >> This environment variable can be set to anything? As long as it is not >> null? For instance in my login shell I can execute the command: >> export SWIFT_GEN_SCRIPTS=1 >> and it will generate these run.sh? > Not unless the jobs magically inherit your login environment. > > I.e., set it in sites.xml for each site you want this to work on. > > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From jon.monette at gmail.com Sun Oct 10 20:08:06 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 10 Oct 2010 20:08:06 -0500 Subject: [Swift-user] Argument list to long In-Reply-To: <1286758559.20983.0.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> <4CB24A46.3080400@gmail.com> <4CB24F72.20702@gmail.com> <1286756778.20744.1.camel@blabla2. none> <4CB25D7D.6090209@gmail.com> <1286758148.20885.2.camel@blabla2 .none> <4CB25F89.2080109@gmail.com> <1286758559.20983.0.camel@blabla2.none> Message-ID: <4CB26376.3040703@gmail.com> Ok. Thanks. That got it for me. #!/bin/bash "/home/jonmon/Library/Montage/bin/mProject" "-X raw_dir/2mass-atlas-000315s-j0330245.fits proj_dir/proj_2mass-atlas-000315s-j0330245.fits|header.hdr" 1>"stdout.txt" 2>"stderr.txt" Why is there at "|" right before the parameter header? If I were to run this by hand in a shell, the command would be /home/jonmon/Library/Montage/bin/mProject -X raw_dir/2mass-atlas-000315s-j0330245.fits proj_dir/proj_2mass-atlas-000315s-j0330245.fits header.hdr Is this how it is supposed to be? I thought | in bash meant pipe output from left into whats on the right? On 10/10/10 7:55 PM, Mihael Hategan wrote: > On Sun, 2010-10-10 at 19:51 -0500, Jonathan Monette wrote: >> This environment variable can be set to anything? As long as it is not >> null? For instance in my login shell I can execute the command: >> export SWIFT_GEN_SCRIPTS=1 >> and it will generate these run.sh? > Not unless the jobs magically inherit your login environment. > > I.e., set it in sites.xml for each site you want this to work on. > > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From hategan at mcs.anl.gov Sun Oct 10 20:13:55 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 10 Oct 2010 18:13:55 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <4CB26376.3040703@gmail.com> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> <4CB24A46.3080400@gmail.com> <4CB24F72.20702@gmail.com> <1286756778.20744.1.camel@blabla2. none> <4CB25D7D.6090209@gmail.com> <1286758148.20885.2.camel@blabla2 .none> <4CB25F89.2080109@gmail.com> <1286758559.20983.0.camel@blabla 2.none> <4CB26376.3040703@gmail.com> Message-ID: <1286759635.21129.1.camel@blabla2.none> On Sun, 2010-10-10 at 20:08 -0500, Jonathan Monette wrote: > Ok. Thanks. That got it for me. > > #!/bin/bash > "/home/jonmon/Library/Montage/bin/mProject" "-X > raw_dir/2mass-atlas-000315s-j0330245.fits > proj_dir/proj_2mass-atlas-000315s-j0330245.fits|header.hdr" > 1>"stdout.txt" 2>"stderr.txt" > > Why is there at "|" right before the parameter header? If I were to run > this by hand in a shell, the command would be > /home/jonmon/Library/Montage/bin/mProject -X > raw_dir/2mass-atlas-000315s-j0330245.fits > proj_dir/proj_2mass-atlas-000315s-j0330245.fits header.hdr > > Is this how it is supposed to be? I thought | in bash meant pipe output > from left into whats on the right? Right. It's a bug. The pipe is used as file name separator by swift when passing a list of file names around. It should either be filtered by _swiftwrap or it should not be used with parameter.mode=files. So it's a bug. From jon.monette at gmail.com Sun Oct 10 20:15:25 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 10 Oct 2010 20:15:25 -0500 Subject: [Swift-user] Argument list to long In-Reply-To: <1286759635.21129.1.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> <4CB24A46.3080400@gmail.com> <4CB24F72.20702@gmail.com> <1286756778.20744.1.camel@blabla2. none> <4CB25D7D.6090209@gmail.com> <1286758148.20885.2.camel@blabla2 .none> <4CB25F89.2080109@gmail.com> <1286758559.20983.0.camel@blabla 2.none> <4CB26376.3040703@gmail.com> <1286759635.21129.1.camel@blabla2.none> Message-ID: <4CB2652D.2010701@gmail.com> Ok. Is this an easily correctable bug? On 10/10/10 8:13 PM, Mihael Hategan wrote: > On Sun, 2010-10-10 at 20:08 -0500, Jonathan Monette wrote: >> Ok. Thanks. That got it for me. >> >> #!/bin/bash >> "/home/jonmon/Library/Montage/bin/mProject" "-X >> raw_dir/2mass-atlas-000315s-j0330245.fits >> proj_dir/proj_2mass-atlas-000315s-j0330245.fits|header.hdr" >> 1>"stdout.txt" 2>"stderr.txt" >> >> Why is there at "|" right before the parameter header? If I were to run >> this by hand in a shell, the command would be >> /home/jonmon/Library/Montage/bin/mProject -X >> raw_dir/2mass-atlas-000315s-j0330245.fits >> proj_dir/proj_2mass-atlas-000315s-j0330245.fits header.hdr >> >> Is this how it is supposed to be? I thought | in bash meant pipe output >> from left into whats on the right? > Right. It's a bug. The pipe is used as file name separator by swift when > passing a list of file names around. It should either be filtered by > _swiftwrap or it should not be used with parameter.mode=files. So it's a > bug. > > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From hategan at mcs.anl.gov Sun Oct 10 20:20:29 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 10 Oct 2010 18:20:29 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <4CB2652D.2010701@gmail.com> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com> <1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> <4CB24A46.3080400@gmail.com> <4CB24F72.20702@gmail.com> <1286756778.20744.1.camel@blabla2. none> <4CB25D7D.6090209@gmail.com> <1286758148.20885.2.camel@blabla2 .none> <4CB25F89.2080109@gmail.com> <1286758559.20983.0.camel@blabla 2.none> <4CB26376.3040703@gmail.com> <1286759635.21129.1.camel@blabla2.none> <4CB2652D.2010701@gmail.com> Message-ID: <1286760029.21216.0.camel@blabla2.none> On Sun, 2010-10-10 at 20:15 -0500, Jonathan Monette wrote: > Ok. Is this an easily correctable bug? I don't know yet. I'll investigate once I'm done with QM homework. From jon.monette at gmail.com Sun Oct 10 20:22:21 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 10 Oct 2010 20:22:21 -0500 Subject: [Swift-user] Argument list to long In-Reply-To: <1286760029.21216.0.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com> <1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> <4CB24A46.3080400@gmail.com> <4CB24F72.20702@gmail.com> <1286756778.20744.1.camel@blabla2. none> <4CB25D7D.6090209@gmail.com> <1286758148.20885.2.camel@blabla2 .none> <4CB25F89.2080109@gmail.com> <1286758559.20983.0.camel@blabla 2.none> <4CB26376.3040703@gmail.com> <1286759635.21129.1.camel@blabla2.none> <4CB2652D.2010701@gmail.com> <1286760029.21216.0.camel@blabla2.none> Message-ID: <4CB266CD.3070309@gmail.com> aight. On 10/10/10 8:20 PM, Mihael Hategan wrote: > On Sun, 2010-10-10 at 20:15 -0500, Jonathan Monette wrote: >> Ok. Is this an easily correctable bug? > I don't know yet. I'll investigate once I'm done with QM homework. > > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein From hategan at mcs.anl.gov Sun Oct 10 22:24:44 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 10 Oct 2010 20:24:44 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <4CB26376.3040703@gmail.com> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> <4CB24A46.3080400@gmail.com> <4CB24F72.20702@gmail.com> <1286756778.20744.1.camel@blabla2. none> <4CB25D7D.6090209@gmail.com> <1286758148.20885.2.camel@blabla2 .none> <4CB25F89.2080109@gmail.com> <1286758559.20983.0.camel@blabla 2.none> <4CB26376.3040703@gmail.com> Message-ID: <1286767484.24726.0.camel@blabla2.none> I can't reproduce this. Can you paste the parameter file and info files (if you have those) from this invocation? Also, has the swift script changed since you last mentioned it? On Sun, 2010-10-10 at 20:08 -0500, Jonathan Monette wrote: > Ok. Thanks. That got it for me. > > #!/bin/bash > "/home/jonmon/Library/Montage/bin/mProject" "-X > raw_dir/2mass-atlas-000315s-j0330245.fits > proj_dir/proj_2mass-atlas-000315s-j0330245.fits|header.hdr" > 1>"stdout.txt" 2>"stderr.txt" > > Why is there at "|" right before the parameter header? If I were to run > this by hand in a shell, the command would be > /home/jonmon/Library/Montage/bin/mProject -X > raw_dir/2mass-atlas-000315s-j0330245.fits > proj_dir/proj_2mass-atlas-000315s-j0330245.fits header.hdr > > Is this how it is supposed to be? I thought | in bash meant pipe output > from left into whats on the right? > > On 10/10/10 7:55 PM, Mihael Hategan wrote: > > On Sun, 2010-10-10 at 19:51 -0500, Jonathan Monette wrote: > >> This environment variable can be set to anything? As long as it is not > >> null? For instance in my login shell I can execute the command: > >> export SWIFT_GEN_SCRIPTS=1 > >> and it will generate these run.sh? > > Not unless the jobs magically inherit your login environment. > > > > I.e., set it in sites.xml for each site you want this to work on. > > > > > From hategan at mcs.anl.gov Sun Oct 10 22:26:29 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 10 Oct 2010 20:26:29 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <1286767484.24726.0.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CAA8FD6.3080906@gmail.com> <1286246692.13534.1.camel@blabla2.none> <4CAA921A.4070907@gmail.com> <1286649668.6823.1.camel@blabla2.none> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> <4CB24A46.3080400@gmail.com> <4CB24F72.20702@gmail.com> <1286756778.20744.1.camel@blabla2. none> <4CB25D7D.6090209@gmail.com> <1286758148.20885.2.camel@blabla2 .none> <4CB25F89.2080109@gmail.com> <1286758559.20983.0.camel@blabla 2.none> <4CB26376.3040703@gmail.com> <1286767484.24726.0.camel@blabla2.none> Message-ID: <1286767589.24726.1.camel@blabla2.none> On Sun, 2010-10-10 at 20:24 -0700, Mihael Hategan wrote: > I can't reproduce this. > > Can you paste the parameter file and info files (if you have those) from > this invocation? > > Also, has the swift script changed since you last mentioned it? Actually that's not what you last sent. So can you paste the swift code for the app procedure for mProject? From jon.monette at gmail.com Sun Oct 10 22:27:47 2010 From: jon.monette at gmail.com (Jonathan Monette) Date: Sun, 10 Oct 2010 22:27:47 -0500 Subject: [Swift-user] Argument list to long In-Reply-To: <1286767589.24726.1.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> <4CB24A46.3080400@gmail.com> <4CB24F72.20702@gmail.com> <1286756778.20744.1.camel@blabla2. none> <4CB25D7D.6090209@gmail.com> <1286758148.20885.2.camel@blabla2 .none> <4CB25F89.2080109@gmail.com> <1286758559.20983.0.camel@blabla 2.none> <4CB26376.3040703@gmail.com> <1286767484.24726.0.camel@blabla2.none> <1286767589.24726.1.camel@blabla2.none> Message-ID: <4CB28433.4060305@gmail.com> Attached is the info and parameter file for one of the invocations. Here is the app procedure. app ( Image proj_img ) mProject( Image raw_img, MosaicData hdr ) { mProject "-X" @raw_img @proj_img @hdr; } On 10/10/2010 10:26 PM, Mihael Hategan wrote: > On Sun, 2010-10-10 at 20:24 -0700, Mihael Hategan wrote: >> I can't reproduce this. >> >> Can you paste the parameter file and info files (if you have those) from >> this invocation? >> >> Also, has the swift script changed since you last mentioned it? > Actually that's not what you last sent. So can you paste the swift code > for the app procedure for mProject? > -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: mProject-oqt0hzzj-info URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: param-mProject-oqt0hzzj URL: From ketancmaheshwari at gmail.com Mon Oct 11 07:09:52 2010 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 11 Oct 2010 14:09:52 +0200 Subject: [Swift-user] Difficulty forming commandline Message-ID: Hello, I have difficulty in forming a commandline that can be executed from within Swift. I have a commandline in the following form: Image_Crop -dir "Repository to process" -xmin someint -ymin someint -xmax someint -ymax someint I place it in Swiftscript as follows: type messagefile; (messagefile mf) image_crop (string somedir, int xmin, int ymin, int xmax, int ymax) { app { Image_Crop " -dir " somedir " -xmin " xmin " -ymin " ymin " -xmax " xmax " -ymax " ymax stdout=@filename(mf); } } I call it as follows: messagefile icoutfile <"icout.txt">; icoutfile = image_crop ("/home/ketan/MpiReg/dataold/sorted/g_8b1e46671d5733/",20,71,142,169); However, the call does not go through as expected. I suppose, Swift creates a wrapper around this call and adds more parameters and switches as shown in the log file inside its directory (*.d) as follows : Image_Crop-txp9800k -jobdir t -e /home/ketan/MpiReg/bin/Image_Crop -out icout.txt -err stderr.txt -i -d -if -of icout.txt -k -status files -a -dir /home/ketan/MpiReg/dataold/sorted/g_8b1e46671d5733/ -xmin 20 -ymin 71 -xmax 142 -ymax 169 It seems the binary Image_Crop does not go along well with the additional switches and parameters generated by the wrapper. Any suggestions to approach this? Regards, Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Oct 11 10:06:22 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Mon, 11 Oct 2010 09:06:22 -0600 (GMT-06:00) Subject: [Swift-user] Difficulty forming commandline In-Reply-To: <1147695096.938281286808195378.JavaMail.root@zimbra.anl.gov> Message-ID: <436490366.940121286809582979.JavaMail.root@zimbra.anl.gov> Ketan, at least from a very brief review of your example, it looks like Swift of forming the command line pretty close to what it should be doing. The extra command line options you are observing are flags to the Swift application launching script "_swiftwrap" which runs on the execution host, with its current working directory being the temporary "job directory" as described in the Users Guide. If you put in debugging statements in Image_Crop, you'll see that the command line is coming in largely as you would expected. If Image_Crop is a binary, you can put a debugging wrapper script around it just while you are experimenting with how Swift works. The one issue you *might* be encountering is the passing of a directory as an argument. One common pitfall is that if you pass a "relative" directory as a string, your application will in fact be running with a different current working dircetory than you expect (ie, the job directory) and will this not be able to open the files. In fact, once you start running on distributed resources, the application may be running on a system that cant even access that directory. An approach to consider is to map all the files in that directory into an array (using for example simple_mapper, array_mapper, or your own external mapper), then pass the array as an argument to your app() function, and on the command line put something like @filename(myFileArray[0]). Then make the actual executable be a wrapper script that does a "dirname" of that argument so that your application gets invoked with a directory, as it requires. Its possible we should add an @dirname() primitive that works like @filename() to handle this common case without requiring a user-written wrapper shell script. - Mike ----- "Ketan Maheshwari" wrote: > Hello, > I have difficulty in forming a commandline that can be executed from within Swift. > I have a commandline in the following form: > Image_Crop -dir "Repository to process" -xmin someint -ymin someint -xmax someint -ymax someint > > I place it in Swiftscript as follows: > type messagefile; > (messagefile mf) image_crop (string somedir, int xmin, int ymin, int xmax, int ymax) { app { Image_Crop " -dir " somedir " -xmin " xmin " -ymin " ymin " -xmax " xmax " -ymax " ymax stdout=@filename(mf); } } > > I call it as follows: > messagefile icoutfile <"icout.txt">; > icoutfile = image_crop ("/home/ketan/MpiReg/dataold/sorted/g_8b1e46671d5733/",20,71,142,169); > > However, the call does not go through as expected. I suppose, Swift creates a wrapper around this call and adds more parameters and switches as shown in the log file inside its directory (*.d) as follows : > Image_Crop-txp9800k -jobdir t -e /home/ketan/MpiReg/bin/Image_Crop -out icout.txt -err stderr.txt -i -d -if -of icout.txt -k -status files -a -dir /home/ketan/MpiReg/dataold/sorted/g_8b1e46671d5733/ -xmin 20 -ymin 71 -xmax 142 -ymax 169 > > It seems the binary Image_Crop does not go along well with the additional switches and parameters generated by the wrapper. > Any suggestions to approach this? > > Regards, > Ketan > > _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsk at ci.uchicago.edu Thu Oct 14 12:08:24 2010 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Thu, 14 Oct 2010 12:08:24 -0500 Subject: [Swift-user] tutorial/understanding issue 7 Message-ID: <11375E5F-A3F8-4ED3-8832-D55ED60E054F@ci.uchicago.edu> Hi, Continuing my journey through the Swift tutorial (http://www.ci.uchicago.edu/swift/guides/tutorial.php), though the previous 6 messages are waiting for approval, as I was not a member of the swift-users list when I sent them... In section 3.5, why can't I do this: type messagefile; app (messagefile t) greeting (string s[]) { echo s[0] s[1] s[2] stdout=@filename(t); } messagefile outfile <"q5out.txt">; #string words[] = ["how","are","you"]; string words[]; words[0] = "how; words[1] = "are"; words[2] = "you"; outfile = greeting(words); is the issue that swift doesn't know how large to make words[]? I also tried: string words[3]; but this also didn't work. Do strings need to be assigned when they are declared? Is this a general rule for Swift variables? I guess part of the reason this is confusing me is I see the following in the Swift userguide, which seems to declare an array before assigning any of its elements. file a[]; file b[]; foreach v,i in a { b[i] = p(v); } a[0] = r(); a[1] = s(); Thanks, Dan -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsk at ci.uchicago.edu Thu Oct 14 12:14:25 2010 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Thu, 14 Oct 2010 12:14:25 -0500 Subject: [Swift-user] more tutorial issues Message-ID: <989A0119-04BD-4055-A053-A831A241E498@ci.uchicago.edu> In Section 3.6, I see the following: This is useful when you want to explicitly name input and output files for your program. For example, 'outfile' in exercise HELLOWORLD. anonymous mapping - no name is specified in the source code. A name is automatically generated for the file. This is useful for intermediate files that are only referenced through SwiftScript, such as 'outfile' in exercise ANONYMOUSFILE. A variable declaration is mapped anonymously by ommitting any mapper definition, like this: The problem here is that there are no exercises called HELLOWORLD or ANONYMOUSFILE in this document. The former could be called "the exercise in section 2", and the latter, "the exercise in section 3.3". Also, a few lines later in section 3.6, I see TODO: introduce @v syntax. It seems to me that this shouldn't still be in the tutorial, but should be replaced with text that does this. Dan -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Thu Oct 14 12:15:14 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 14 Oct 2010 12:15:14 -0500 Subject: [Swift-user] Re: a third tutorial question In-Reply-To: References: Message-ID: This was a change in the API around a year or two ago. Clearly the documentation needs an update. -Allan 2010/10/14 Daniel S. Katz : > One more Swift thing I don't understand... > Dan > > > Begin forwarded message: > > From: "Daniel S. Katz" > Date: October 14, 2010 10:45:24 AM CDT > To: swift-user at ci.uchicago.edu > Subject: a third tutorial question > > In http://www.ci.uchicago.edu/swift/guides/tutorial.php in first.swift, the > procedure is defined as: > > app?(messagefile?t)?greeting?()?{ > ????????echo?"Hello,?world!"?stdout=@filename(t); > } > > > in parameter.swift, the new procedure is defined as: > > (messagefile?t)?greeting?(string?s)?{ > ????app?{ > ????????echo?s?stdout=@filename(t); > ????} > } > > I don't understand why the style of defining the procedure has changed, or > what this change implies. > I would have just started with the first.swift procedure, and changed it to: > > app?(messagefile?t)?greeting?(string?s)?{ > ?? ? ? ?echo?s?stdout=@filename(t); > } > > > In fact, I did try this, and the code works fine, so I fail to understand > the reason for the larger change that is in the tutorial. > Dan > > -- > Daniel S. Katz > University of Chicago > (773)?834-7186 (voice) > (773) 834-3700 (fax) > d.katz at ieee.org?or?dsk at ci.uchicago.edu > http://www.ci.uchicago.edu/~dsk/ > > > > > -- > Daniel S. Katz > University of Chicago > (773)?834-7186 (voice) > (773) 834-3700 (fax) > d.katz at ieee.org?or?dsk at ci.uchicago.edu > http://www.ci.uchicago.edu/~dsk/ > > > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From dsk at ci.uchicago.edu Thu Oct 14 12:18:18 2010 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Thu, 14 Oct 2010 12:18:18 -0500 Subject: [Swift-user] Re: a third tutorial question In-Reply-To: References: Message-ID: But both seem to work... Which is the favored choice, and why? The example in first.swift seems shorter and cleaner... Dan On Oct 14, 2010, at 12:15 PM, Allan Espinosa wrote: > This was a change in the API around a year or two ago. Clearly the > documentation needs an update. > > -Allan > > 2010/10/14 Daniel S. Katz : >> One more Swift thing I don't understand... >> Dan >> >> >> Begin forwarded message: >> >> From: "Daniel S. Katz" >> Date: October 14, 2010 10:45:24 AM CDT >> To: swift-user at ci.uchicago.edu >> Subject: a third tutorial question >> >> In http://www.ci.uchicago.edu/swift/guides/tutorial.php in first.swift, the >> procedure is defined as: >> >> app (messagefile t) greeting () { >> echo "Hello, world!" stdout=@filename(t); >> } >> >> >> in parameter.swift, the new procedure is defined as: >> >> (messagefile t) greeting (string s) { >> app { >> echo s stdout=@filename(t); >> } >> } >> >> I don't understand why the style of defining the procedure has changed, or >> what this change implies. >> I would have just started with the first.swift procedure, and changed it to: >> >> app (messagefile t) greeting (string s) { >> echo s stdout=@filename(t); >> } >> >> >> In fact, I did try this, and the code works fine, so I fail to understand >> the reason for the larger change that is in the tutorial. >> Dan >> >> -- >> Daniel S. Katz >> University of Chicago >> (773) 834-7186 (voice) >> (773) 834-3700 (fax) >> d.katz at ieee.org or dsk at ci.uchicago.edu >> http://www.ci.uchicago.edu/~dsk/ >> >> >> >> >> -- >> Daniel S. Katz >> University of Chicago >> (773) 834-7186 (voice) >> (773) 834-3700 (fax) >> d.katz at ieee.org or dsk at ci.uchicago.edu >> http://www.ci.uchicago.edu/~dsk/ >> >> >> >> > > > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ From dsk at ci.uchicago.edu Thu Oct 14 12:42:35 2010 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Thu, 14 Oct 2010 12:42:35 -0500 Subject: [Swift-user] Re: a third tutorial question In-Reply-To: References: Message-ID: <6EB7977F-1086-4333-B0F1-27708F07E609@ci.uchicago.edu> Similarly, I notice that the type declaration of messagefile as a marker type (I think) has two forms. In first.swift, I see: type messagefile; In parameter.swift (the version in the swift repository), I see: type messagefile {} In the Swift user guide when this concept is brought up, the first style is used. Also, parameter.swift in the tutorial html page has the first format, even though the on-line code has the second format. Does this matter? Which is preferred and why? Dan On Oct 14, 2010, at 12:18 PM, Daniel S. Katz wrote: > But both seem to work... > > Which is the favored choice, and why? The example in first.swift seems shorter and cleaner... > > Dan > > > On Oct 14, 2010, at 12:15 PM, Allan Espinosa wrote: > >> This was a change in the API around a year or two ago. Clearly the >> documentation needs an update. >> >> -Allan >> >> 2010/10/14 Daniel S. Katz : >>> One more Swift thing I don't understand... >>> Dan >>> >>> >>> Begin forwarded message: >>> >>> From: "Daniel S. Katz" >>> Date: October 14, 2010 10:45:24 AM CDT >>> To: swift-user at ci.uchicago.edu >>> Subject: a third tutorial question >>> >>> In http://www.ci.uchicago.edu/swift/guides/tutorial.php in first.swift, the >>> procedure is defined as: >>> >>> app (messagefile t) greeting () { >>> echo "Hello, world!" stdout=@filename(t); >>> } >>> >>> >>> in parameter.swift, the new procedure is defined as: >>> >>> (messagefile t) greeting (string s) { >>> app { >>> echo s stdout=@filename(t); >>> } >>> } >>> >>> I don't understand why the style of defining the procedure has changed, or >>> what this change implies. >>> I would have just started with the first.swift procedure, and changed it to: >>> >>> app (messagefile t) greeting (string s) { >>> echo s stdout=@filename(t); >>> } >>> >>> >>> In fact, I did try this, and the code works fine, so I fail to understand >>> the reason for the larger change that is in the tutorial. >>> Dan >>> >>> -- >>> Daniel S. Katz >>> University of Chicago >>> (773) 834-7186 (voice) >>> (773) 834-3700 (fax) >>> d.katz at ieee.org or dsk at ci.uchicago.edu >>> http://www.ci.uchicago.edu/~dsk/ >>> >>> >>> >>> >>> -- >>> Daniel S. Katz >>> University of Chicago >>> (773) 834-7186 (voice) >>> (773) 834-3700 (fax) >>> d.katz at ieee.org or dsk at ci.uchicago.edu >>> http://www.ci.uchicago.edu/~dsk/ >>> >>> >>> >>> >> >> >> >> -- >> Allan M. Espinosa >> PhD student, Computer Science >> University of Chicago > > -- > Daniel S. Katz > University of Chicago > (773) 834-7186 (voice) > (773) 834-3700 (fax) > d.katz at ieee.org or dsk at ci.uchicago.edu > http://www.ci.uchicago.edu/~dsk/ > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ From dsk at ci.uchicago.edu Thu Oct 14 13:14:44 2010 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Thu, 14 Oct 2010 13:14:44 -0500 Subject: [Swift-user] tutorial issue 9 Message-ID: Hi, The code in example 3.9 on http://www.ci.uchicago.edu/swift/guides/tutorial.php doesn't seem to work. When I try to run it, I get: tmp:swift dsk$ swift iterate.swift Could not start execution. variable a has multiple writers. There are some other issues with this example, too: 1. simple_mapper is not introduced or explained; it just shows up. 2. @extractint is not introduced or explained; it just shows up. 3. trace() is not introduced or explained; it just shows up. 4. based on what I learned from example 3.3, it seems I should be able to anonymous files here - is this correct? If so, would I just change counterfile a[] ; to counterfile a[]; Can I use an array of anonymous files like this? Thanks, Dan -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Thu Oct 14 14:35:34 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 14 Oct 2010 12:35:34 -0700 Subject: [Swift-user] tutorial/understanding issue 7 In-Reply-To: <11375E5F-A3F8-4ED3-8832-D55ED60E054F@ci.uchicago.edu> References: <11375E5F-A3F8-4ED3-8832-D55ED60E054F@ci.uchicago.edu> Message-ID: <1287084934.4230.7.camel@blabla2.none> Inline On Thu, 2010-10-14 at 12:08 -0500, Daniel S. Katz wrote: > Continuing my journey through the Swift tutorial > (http://www.ci.uchicago.edu/swift/guides/tutorial.php), though the > previous 6 messages are waiting for approval, as I was not a member of > the swift-users list when I sent them... > > > In section 3.5, why can't I do this: > > > > > type messagefile; > > > app (messagefile t) greeting (string s[]) { > echo s[0] s[1] s[2] stdout=@filename(t); > } > > > messagefile outfile <"q5out.txt">; > > > #string words[] = ["how","are","you"]; > string words[]; > words[0] = "how; > words[1] = "are"; > words[2] = "you"; > > > outfile = greeting(words); > There is no theoretical reason why you shouldn't be able to do so. It follows that we are talking about a bug. It was likely not addressed (or reported) because there is a way to deal with the situation. > > > is the issue that swift doesn't know how large to make words[]? > Arrays are dynamic, so not quite. > > I also tried: > > > string words[3]; > > > but this also didn't work. > A good point. There is, when iterating over an array, a distinction between an array whose size you know and one whose size is not known statically. The above type of declaration could be used to provide that information, and I think it should be added to swift (if missing). > > Do strings need to be assigned when they are declared? Is this a > general rule for Swift variables? > No in theory. Variables need to eventually be assigned, otherwise they are considered "input" variables. But that does not apply to primitives. So in the case of strings your code should work. Mihael From hategan at mcs.anl.gov Thu Oct 14 14:40:33 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 14 Oct 2010 12:40:33 -0700 Subject: [Swift-user] Re: a third tutorial question In-Reply-To: References: Message-ID: <1287085233.4230.11.camel@blabla2.none> On Thu, 2010-10-14 at 12:18 -0500, Daniel S. Katz wrote: > But both seem to work... > > Which is the favored choice, and why? The example in first.swift seems shorter and cleaner... Originally it was: (returns) name(params) { app {...}} That made little sense. So it was changed to: app (returns) name(params) {...} which more closely matches the modifier idea in C/Java and the likes. At the same time, the former was deprecated (except there probably is no mention of this anywhere). It should, IMO, be removed in future versions of swift. So what you consider cleaner is what I would stick with. Mihael > > Dan > > > On Oct 14, 2010, at 12:15 PM, Allan Espinosa wrote: > > > This was a change in the API around a year or two ago. Clearly the > > documentation needs an update. > > > > -Allan > > > > 2010/10/14 Daniel S. Katz : > >> One more Swift thing I don't understand... > >> Dan > >> > >> > >> Begin forwarded message: > >> > >> From: "Daniel S. Katz" > >> Date: October 14, 2010 10:45:24 AM CDT > >> To: swift-user at ci.uchicago.edu > >> Subject: a third tutorial question > >> > >> In http://www.ci.uchicago.edu/swift/guides/tutorial.php in first.swift, the > >> procedure is defined as: > >> > >> app (messagefile t) greeting () { > >> echo "Hello, world!" stdout=@filename(t); > >> } > >> > >> > >> in parameter.swift, the new procedure is defined as: > >> > >> (messagefile t) greeting (string s) { > >> app { > >> echo s stdout=@filename(t); > >> } > >> } > >> > >> I don't understand why the style of defining the procedure has changed, or > >> what this change implies. > >> I would have just started with the first.swift procedure, and changed it to: > >> > >> app (messagefile t) greeting (string s) { > >> echo s stdout=@filename(t); > >> } > >> > >> > >> In fact, I did try this, and the code works fine, so I fail to understand > >> the reason for the larger change that is in the tutorial. > >> Dan > >> > >> -- > >> Daniel S. Katz > >> University of Chicago > >> (773) 834-7186 (voice) > >> (773) 834-3700 (fax) > >> d.katz at ieee.org or dsk at ci.uchicago.edu > >> http://www.ci.uchicago.edu/~dsk/ > >> > >> > >> > >> > >> -- > >> Daniel S. Katz > >> University of Chicago > >> (773) 834-7186 (voice) > >> (773) 834-3700 (fax) > >> d.katz at ieee.org or dsk at ci.uchicago.edu > >> http://www.ci.uchicago.edu/~dsk/ > >> > >> > >> > >> > > > > > > > > -- > > Allan M. Espinosa > > PhD student, Computer Science > > University of Chicago > From hategan at mcs.anl.gov Thu Oct 14 14:46:10 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 14 Oct 2010 12:46:10 -0700 Subject: [Swift-user] Re: a third tutorial question In-Reply-To: <6EB7977F-1086-4333-B0F1-27708F07E609@ci.uchicago.edu> References: <6EB7977F-1086-4333-B0F1-27708F07E609@ci.uchicago.edu> Message-ID: <1287085570.4230.17.camel@blabla2.none> On Thu, 2010-10-14 at 12:42 -0500, Daniel S. Katz wrote: > Similarly, I notice that the type declaration of messagefile as a marker type (I think) has two forms. > > In first.swift, I see: > > type messagefile; > > In parameter.swift (the version in the swift repository), I see: > > type messagefile {} > > In the Swift user guide when this concept is brought up, the first style is used. Also, parameter.swift in the tutorial html page has the first format, even though the on-line code has the second format. > > Does this matter? No. > Which is preferred and why? while (true) ; while (true) {} i.e. it is the programmer's preference. Mihael From dsk at ci.uchicago.edu Thu Oct 14 16:23:57 2010 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Thu, 14 Oct 2010 17:23:57 -0400 Subject: [Swift-user] tutorial/understanding issue 7 In-Reply-To: <1287084934.4230.7.camel@blabla2.none> References: <11375E5F-A3F8-4ED3-8832-D55ED60E054F@ci.uchicago.edu> <1287084934.4230.7.camel@blabla2.none> Message-ID: <6751312A-6B76-4E4B-B251-B8FD10998A4A@ci.uchicago.edu> Thanks. Can this be added to the Swift list of bugs as a low priority issue? Dan On Oct 14, 2010, at 3:35 PM, Mihael Hategan wrote: > Inline > > On Thu, 2010-10-14 at 12:08 -0500, Daniel S. Katz wrote: > >> Continuing my journey through the Swift tutorial >> (http://www.ci.uchicago.edu/swift/guides/tutorial.php), though the >> previous 6 messages are waiting for approval, as I was not a member of >> the swift-users list when I sent them... >> >> >> In section 3.5, why can't I do this: >> >> >> >> >> type messagefile; >> >> >> app (messagefile t) greeting (string s[]) { >> echo s[0] s[1] s[2] stdout=@filename(t); >> } >> >> >> messagefile outfile <"q5out.txt">; >> >> >> #string words[] = ["how","are","you"]; >> string words[]; >> words[0] = "how; >> words[1] = "are"; >> words[2] = "you"; >> >> >> outfile = greeting(words); >> > There is no theoretical reason why you shouldn't be able to do so. It > follows that we are talking about a bug. It was likely not addressed (or > reported) because there is a way to deal with the situation. >> >> >> is the issue that swift doesn't know how large to make words[]? >> > Arrays are dynamic, so not quite. >> >> I also tried: >> >> >> string words[3]; >> >> >> but this also didn't work. >> > A good point. There is, when iterating over an array, a distinction > between an array whose size you know and one whose size is not known > statically. The above type of declaration could be used to provide that > information, and I think it should be added to swift (if missing). >> >> Do strings need to be assigned when they are declared? Is this a >> general rule for Swift variables? >> > No in theory. Variables need to eventually be assigned, otherwise they > are considered "input" variables. But that does not apply to primitives. > So in the case of strings your code should work. > > Mihael > > -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ From dsk at ci.uchicago.edu Thu Oct 14 16:25:24 2010 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Thu, 14 Oct 2010 17:25:24 -0400 Subject: [Swift-user] Re: a third tutorial question In-Reply-To: <1287085570.4230.17.camel@blabla2.none> References: <6EB7977F-1086-4333-B0F1-27708F07E609@ci.uchicago.edu> <1287085570.4230.17.camel@blabla2.none> Message-ID: <6E4F1BAD-4F00-49AC-9A6A-E5BA6677F4E9@ci.uchicago.edu> Ok, thanks - it would probably be good to be consistent in the examples and the tutorial web page that lists the examples. Dan On Oct 14, 2010, at 3:46 PM, Mihael Hategan wrote: > On Thu, 2010-10-14 at 12:42 -0500, Daniel S. Katz wrote: >> Similarly, I notice that the type declaration of messagefile as a marker type (I think) has two forms. >> >> In first.swift, I see: >> >> type messagefile; >> >> In parameter.swift (the version in the swift repository), I see: >> >> type messagefile {} >> >> In the Swift user guide when this concept is brought up, the first style is used. Also, parameter.swift in the tutorial html page has the first format, even though the on-line code has the second format. >> >> Does this matter? > > No. > >> Which is preferred and why? > > while (true) ; > while (true) {} > > i.e. it is the programmer's preference. > > Mihael > -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ From dsk at ci.uchicago.edu Thu Oct 14 16:26:45 2010 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Thu, 14 Oct 2010 17:26:45 -0400 Subject: [Swift-user] Re: a third tutorial question In-Reply-To: <1287085233.4230.11.camel@blabla2.none> References: <1287085233.4230.11.camel@blabla2.none> Message-ID: Ok, thanks. I guess this is again a documentation issue. It would be good if the tutorial and examples matched the "cleaner" version, so that the other version isn't around to confuse new users. I'm not sure who would do this... Dan On Oct 14, 2010, at 3:40 PM, Mihael Hategan wrote: > On Thu, 2010-10-14 at 12:18 -0500, Daniel S. Katz wrote: >> But both seem to work... >> >> Which is the favored choice, and why? The example in first.swift seems shorter and cleaner... > > Originally it was: > > (returns) name(params) { app {...}} > > That made little sense. > > So it was changed to: > > app (returns) name(params) {...} > > which more closely matches the modifier idea in C/Java and the likes. > > At the same time, the former was deprecated (except there probably is no > mention of this anywhere). It should, IMO, be removed in future versions > of swift. > > So what you consider cleaner is what I would stick with. > > Mihael > >> >> Dan >> >> >> On Oct 14, 2010, at 12:15 PM, Allan Espinosa wrote: >> >>> This was a change in the API around a year or two ago. Clearly the >>> documentation needs an update. >>> >>> -Allan >>> >>> 2010/10/14 Daniel S. Katz : >>>> One more Swift thing I don't understand... >>>> Dan >>>> >>>> >>>> Begin forwarded message: >>>> >>>> From: "Daniel S. Katz" >>>> Date: October 14, 2010 10:45:24 AM CDT >>>> To: swift-user at ci.uchicago.edu >>>> Subject: a third tutorial question >>>> >>>> In http://www.ci.uchicago.edu/swift/guides/tutorial.php in first.swift, the >>>> procedure is defined as: >>>> >>>> app (messagefile t) greeting () { >>>> echo "Hello, world!" stdout=@filename(t); >>>> } >>>> >>>> >>>> in parameter.swift, the new procedure is defined as: >>>> >>>> (messagefile t) greeting (string s) { >>>> app { >>>> echo s stdout=@filename(t); >>>> } >>>> } >>>> >>>> I don't understand why the style of defining the procedure has changed, or >>>> what this change implies. >>>> I would have just started with the first.swift procedure, and changed it to: >>>> >>>> app (messagefile t) greeting (string s) { >>>> echo s stdout=@filename(t); >>>> } >>>> >>>> >>>> In fact, I did try this, and the code works fine, so I fail to understand >>>> the reason for the larger change that is in the tutorial. >>>> Dan >>>> >>>> -- >>>> Daniel S. Katz >>>> University of Chicago >>>> (773) 834-7186 (voice) >>>> (773) 834-3700 (fax) >>>> d.katz at ieee.org or dsk at ci.uchicago.edu >>>> http://www.ci.uchicago.edu/~dsk/ >>>> >>>> >>>> >>>> >>>> -- >>>> Daniel S. Katz >>>> University of Chicago >>>> (773) 834-7186 (voice) >>>> (773) 834-3700 (fax) >>>> d.katz at ieee.org or dsk at ci.uchicago.edu >>>> http://www.ci.uchicago.edu/~dsk/ >>>> >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Allan M. Espinosa >>> PhD student, Computer Science >>> University of Chicago >> > > -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ From aespinosa at cs.uchicago.edu Fri Oct 15 11:01:08 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 15 Oct 2010 11:01:08 -0500 Subject: [Swift-user] deadlock on workflow Message-ID: <20101015160108.GA17814@origin> When I changed my provider from coaster-persistent to plain caster, i encountered this deadlock in my workflow: 2010-10-15 10:57:40 Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode): "Attach Listener" daemon prio=10 tid=0x0000000050d19000 nid=0x2a0d waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Queued Task Replication Sweeper" daemon prio=10 tid=0x00002aab385f1000 nid=0x2356 waiting on condition [0x00000000436e9000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.griphyn.vdl.karajan.lib.replication.Sweeper.run(Sweeper.java:44) "NBS7" daemon prio=10 tid=0x0000000050d13800 nid=0x2355 waiting on condition [0x00000000435e8000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab0cc945f8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "NBS6" daemon prio=10 tid=0x0000000050d24000 nid=0x2354 waiting on condition [0x00000000434e7000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab0cc945f8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "NBS5" daemon prio=10 tid=0x0000000050d23000 nid=0x2352 waiting on condition [0x00000000433e6000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab0cc945f8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "NBS4" daemon prio=10 tid=0x00000000509ed800 nid=0x2350 waiting on condition [0x00000000432e5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab0cc945f8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "NBS3" daemon prio=10 tid=0x00000000509ea800 nid=0x234f waiting on condition [0x00000000431e4000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab0cc945f8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "NBS2" daemon prio=10 tid=0x0000000050ce8800 nid=0x234e waiting on condition [0x00000000430e3000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab0cc945f8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "NBS1" daemon prio=10 tid=0x00000000509e3800 nid=0x234d waiting on condition [0x0000000042fe2000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab0cc945f8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "Scheduler" prio=10 tid=0x00002aab385e9800 nid=0x234c in Object.wait() [0x0000000042ee1000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aab0cc95200> (a org.griphyn.vdl.karajan.VDSAdaptiveScheduler) at org.globus.cog.karajan.scheduler.LateBindingScheduler.sleep(LateBindingScheduler.java:305) at org.globus.cog.karajan.scheduler.LateBindingScheduler.run(LateBindingScheduler.java:258) - locked <0x00002aab0cc95200> (a org.griphyn.vdl.karajan.VDSAdaptiveScheduler) "Progress ticker" daemon prio=10 tid=0x00000000509f4000 nid=0x234b waiting on condition [0x0000000042de0000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.griphyn.vdl.karajan.lib.RuntimeStats$ProgressTicker.run(RuntimeStats.java:137) "Restart Log Sync" daemon prio=10 tid=0x00002aab385e8800 nid=0x234a in Object.wait() [0x0000000042cdf000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aab0cc7f350> (a org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread) at java.lang.Object.wait(Object.java:485) at org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread.run(SyncThread.java:45) - locked <0x00002aab0cc7f350> (a org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread) "Swift console debugger" daemon prio=10 tid=0x0000000050d2c000 nid=0x2349 runnable [0x0000000042bde000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:199) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked <0x00002aaab43ce4a8> (a java.io.BufferedInputStream) at org.griphyn.vdl.karajan.InHook.run(InHook.java:46) at java.lang.Thread.run(Thread.java:619) "network debugger" prio=10 tid=0x0000000050ce4800 nid=0x2348 runnable [0x0000000042add000] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390) - locked <0x00002aab0cd99c60> (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:453) at java.net.ServerSocket.accept(ServerSocket.java:421) at org.griphyn.vdl.karajan.Monitor$Service.run(Monitor.java:428) at java.lang.Thread.run(Thread.java:619) "Overloaded Host Monitor" daemon prio=10 tid=0x0000000050b8a000 nid=0x2347 waiting on condition [0x00000000429dc000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.globus.cog.karajan.scheduler.OverloadedHostMonitor.run(OverloadedHostMonitor.java:47) "Timer-0" daemon prio=10 tid=0x0000000050cac800 nid=0x2346 in Object.wait() [0x00000000428db000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aab0cc2bf30> (a java.util.TaskQueue) at java.util.TimerThread.mainLoop(Timer.java:509) - locked <0x00002aab0cc2bf30> (a java.util.TaskQueue) at java.util.TimerThread.run(Timer.java:462) "NBS0" daemon prio=10 tid=0x00002aab38392800 nid=0x2345 waiting on condition [0x00000000427da000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aab0cc945f8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "pool-1-thread-4" prio=10 tid=0x0000000050b2a800 nid=0x2344 waiting for monitor entry [0x00000000426d8000] java.lang.Thread.State: BLOCKED (on object monitor) at org.griphyn.vdl.mapping.AbstractDataNode.addListener(AbstractDataNode.java:563) - waiting to lock <0x00002aab08c9f3c8> (a org.griphyn.vdl.mapping.RootDataNode) at org.griphyn.vdl.mapping.RootDataNode.innerInit(RootDataNode.java:50) - locked <0x00002aab098f02c0> (a org.griphyn.vdl.mapping.RootDataNode) at org.griphyn.vdl.mapping.RootDataNode.handleClosed(RootDataNode.java:90) at org.griphyn.vdl.mapping.AbstractDataNode.notifyListeners(AbstractDataNode.java:583) - locked <0x00002aab0898bac8> (a org.griphyn.vdl.mapping.RootDataNode) at org.griphyn.vdl.mapping.AbstractDataNode.closeShallow(AbstractDataNode.java:396) - locked <0x00002aab0898bac8> (a org.griphyn.vdl.mapping.RootDataNode) at org.griphyn.vdl.mapping.AbstractDataNode.setValue(AbstractDataNode.java:346) at org.griphyn.vdl.mapping.RootDataNode.setValue(RootDataNode.java:218) at org.griphyn.vdl.karajan.lib.SetFieldValue.deepCopy(SetFieldValue.java:83) at org.griphyn.vdl.karajan.lib.SetFieldValue.function(SetFieldValue.java:46) - locked <0x00002aab0cc8b0c0> (a org.griphyn.vdl.mapping.RootArrayDataNode) - locked <0x00002aab0898bac8> (a org.griphyn.vdl.mapping.RootDataNode) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:68) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:45) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:72) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:45) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:50) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:26) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:238) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:289) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:402) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:343) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:230) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:44) at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) at java.lang.Thread.run(Thread.java:619) "pool-1-thread-3" prio=10 tid=0x0000000050c7e000 nid=0x2343 waiting for monitor entry [0x00000000425d7000] java.lang.Thread.State: BLOCKED (on object monitor) at org.griphyn.vdl.karajan.lib.SetFieldValue.function(SetFieldValue.java:43) - waiting to lock <0x00002aab0cc8b0c0> (a org.griphyn.vdl.mapping.RootArrayDataNode) - locked <0x00002aab08ca6c98> (a org.griphyn.vdl.mapping.RootDataNode) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:68) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:45) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:72) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:45) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:50) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:26) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:238) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:289) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:402) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:343) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:230) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:44) at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) at java.lang.Thread.run(Thread.java:619) "pool-1-thread-2" prio=10 tid=0x00000000508d6800 nid=0x2342 waiting for monitor entry [0x00000000424d6000] java.lang.Thread.State: BLOCKED (on object monitor) at org.griphyn.vdl.karajan.lib.SetFieldValue.function(SetFieldValue.java:43) - waiting to lock <0x00002aab0cc8b0c0> (a org.griphyn.vdl.mapping.RootArrayDataNode) - locked <0x00002aab08994c90> (a org.griphyn.vdl.mapping.RootDataNode) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:68) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:45) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:72) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:45) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:50) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:26) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:238) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:289) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:402) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:343) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:230) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:44) at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) at java.lang.Thread.run(Thread.java:619) "pool-1-thread-1" prio=10 tid=0x0000000050955800 nid=0x2341 waiting for monitor entry [0x0000000041161000] java.lang.Thread.State: BLOCKED (on object monitor) at org.griphyn.vdl.karajan.lib.SetFieldValue.function(SetFieldValue.java:43) - waiting to lock <0x00002aab0cc8b0c0> (a org.griphyn.vdl.mapping.RootArrayDataNode) - locked <0x00002aab08c9f3c8> (a org.griphyn.vdl.mapping.RootDataNode) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:68) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:45) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:72) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:45) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:50) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:26) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:238) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:289) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:402) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:343) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:230) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:44) at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) at java.lang.Thread.run(Thread.java:619) "Low Memory Detector" daemon prio=10 tid=0x00000000506e2000 nid=0x233f runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "CompilerThread1" daemon prio=10 tid=0x00000000506e0000 nid=0x233e waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "CompilerThread0" daemon prio=10 tid=0x00000000506da800 nid=0x233d waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x00000000506d8000 nid=0x233c runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x00000000506b3800 nid=0x233b in Object.wait() [0x0000000041eb4000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aab0cc2c2f8> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <0x00002aab0cc2c2f8> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0x00000000506b1800 nid=0x233a in Object.wait() [0x0000000040eb5000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaab43ce540> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0x00002aaab43ce540> (a java.lang.ref.Reference$Lock) "main" prio=10 tid=0x0000000050650000 nid=0x2334 in Object.wait() [0x000000004051e000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aab0ca5e798> (a org.griphyn.vdl.karajan.VDL2ExecutionContext) at java.lang.Object.wait(Object.java:485) at org.globus.cog.karajan.workflow.ExecutionContext.waitFor(ExecutionContext.java:261) - locked <0x00002aab0ca5e798> (a org.griphyn.vdl.karajan.VDL2ExecutionContext) at org.griphyn.vdl.karajan.Loader.main(Loader.java:192) "VM Thread" prio=10 tid=0x00000000506ad000 nid=0x2339 runnable "GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000050663000 nid=0x2335 runnable "GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000050665000 nid=0x2336 runnable "GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000050666800 nid=0x2337 runnable "GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000050668800 nid=0x2338 runnable "VM Periodic Task Thread" prio=10 tid=0x00000000506ed000 nid=0x2340 waiting on condition JNI global references: 1311 Found one Java-level deadlock: ============================= "pool-1-thread-4": waiting to lock monitor 0x0000000050ceb2e0 (object 0x00002aab08c9f3c8, a org.griphyn.vdl.mapping.RootDataNode), which is held by "pool-1-thread-1" "pool-1-thread-1": waiting to lock monitor 0x00002aab387b99d8 (object 0x00002aab0cc8b0c0, a org.griphyn.vdl.mapping.RootArrayDataNode), which is held by "pool-1-thread-4" Java stack information for the threads listed above: =================================================== "pool-1-thread-4": at org.griphyn.vdl.mapping.AbstractDataNode.addListener(AbstractDataNode.java:563) - waiting to lock <0x00002aab08c9f3c8> (a org.griphyn.vdl.mapping.RootDataNode) at org.griphyn.vdl.mapping.RootDataNode.innerInit(RootDataNode.java:50) - locked <0x00002aab098f02c0> (a org.griphyn.vdl.mapping.RootDataNode) at org.griphyn.vdl.mapping.RootDataNode.handleClosed(RootDataNode.java:90) at org.griphyn.vdl.mapping.AbstractDataNode.notifyListeners(AbstractDataNode.java:583) - locked <0x00002aab0898bac8> (a org.griphyn.vdl.mapping.RootDataNode) at org.griphyn.vdl.mapping.AbstractDataNode.closeShallow(AbstractDataNode.java:396) - locked <0x00002aab0898bac8> (a org.griphyn.vdl.mapping.RootDataNode) at org.griphyn.vdl.mapping.AbstractDataNode.setValue(AbstractDataNode.java:346) at org.griphyn.vdl.mapping.RootDataNode.setValue(RootDataNode.java:218) at org.griphyn.vdl.karajan.lib.SetFieldValue.deepCopy(SetFieldValue.java:83) at org.griphyn.vdl.karajan.lib.SetFieldValue.function(SetFieldValue.java:46) - locked <0x00002aab0cc8b0c0> (a org.griphyn.vdl.mapping.RootArrayDataNode) - locked <0x00002aab0898bac8> (a org.griphyn.vdl.mapping.RootDataNode) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:68) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:45) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:72) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:45) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:50) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:26) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:238) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:289) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:402) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:343) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:230) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:44) at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) at java.lang.Thread.run(Thread.java:619) "pool-1-thread-1": at org.griphyn.vdl.karajan.lib.SetFieldValue.function(SetFieldValue.java:43) - waiting to lock <0x00002aab0cc8b0c0> (a org.griphyn.vdl.mapping.RootArrayDataNode) - locked <0x00002aab08c9f3c8> (a org.griphyn.vdl.mapping.RootDataNode) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:68) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:45) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:72) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:45) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:50) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:26) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:238) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:289) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:402) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:343) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:230) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:44) at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) at java.lang.Thread.run(Thread.java:619) Found 1 deadlock. This happened after 2 jobs (out of 400k) here's the main part of the workflow int run_id = 664; string datadir = "/gpfs/pads/swift/aespinosa/science/cybershake/Results"; Station site = get_site(run_id); Sgt sgt_var ; Rupture rups[] = get_ruptures(run_id); foreach rup in rups { string loc_sub = @strcat(datadir, "/", site.name, "/", rup.source, "/", rup.index); Sgt sub ; string var_str[] = get_variations( site, rup, "/gpfs/pads/swift/aespinosa/science/cybershaku/RuptureVariations" ); Variation vars[] ; sub = extract(sgt_var, site, vars[rup.size-1]); Seismogram seis[] ; PeakValue peak[] ; foreach var,i in vars { seis[i] = seismogram(sub, var, site); peak[i] = peak_calc(seis[i], var); } } -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Sun Oct 17 14:45:14 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 17 Oct 2010 12:45:14 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <4CB28433.4060305@gmail.com> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> <4CB24A46.3080400@gmail.com> <4CB24F72.20702@gmail.com> <1286756778.20744.1.camel@blabla2. none> <4CB25D7D.6090209@gmail.com> <1286758148.20885.2.camel@blabla2 .none> <4CB25F89.2080109@gmail.com> <1286758559.20983.0.camel@blabla 2.none> <4CB26376.3040703@gmail.com> <1286767484.24726.0.camel@blabla2.none> <1286767589.24726.1.camel@blabla2.none> <4CB28433.4060305@gmail.com> Message-ID: <1287344714.17311.0.camel@blabla2.none> Swift r3684. Essentially it was this: - CMDARGS=("${CMDARGS[*]}" "$line") + CMDARGS=("${CMDARGS[@]}" "$line") Mihael On Sun, 2010-10-10 at 22:27 -0500, Jonathan Monette wrote: > Attached is the info and parameter file for one of the invocations. > Here is the app procedure. > > > app ( Image proj_img ) mProject( Image raw_img, MosaicData hdr ) > { > mProject "-X" @raw_img @proj_img @hdr; > } > > > On 10/10/2010 10:26 PM, Mihael Hategan wrote: > > On Sun, 2010-10-10 at 20:24 -0700, Mihael Hategan wrote: > >> I can't reproduce this. > >> > >> Can you paste the parameter file and info files (if you have those) from > >> this invocation? > >> > >> Also, has the swift script changed since you last mentioned it? > > Actually that's not what you last sent. So can you paste the swift code > > for the app procedure for mProject? > > > From jon.monette at gmail.com Sun Oct 17 14:53:54 2010 From: jon.monette at gmail.com (jon.monette at gmail.com) Date: Sun, 17 Oct 2010 19:53:54 +0000 Subject: [Swift-user] Argument list to long In-Reply-To: <1287344714.17311.0.camel@blabla2.none> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> <4CB24A46.3080400@gmail.com> <4CB24F72.20702@gmail.com> <1286756778.20744.1.camel@blabla2. none> <4CB25D7D.6090209@gmail.com> <1286758148.20885.2.camel@blabla2 .none> <4CB25F89.2080109@gmail.com> <1286758559.20983.0.camel@blabla 2.none> <4CB26376.3040703@gmail.com> <1286767484.24726.0.camel@blabla2.none> <1286767589.24726.1.camel@blabla2.none> <4CB28433.4060305@gmail.com><1287344714.17311.0.camel@blabla2.none> Message-ID: <1939660039-1287345233-cardhu_decombobulator_blackberry.rim.net-498068482-@bda090.bisx.prod.on.blackberry> What is the difference between ${CMDARGS[*]} and ${CMDARGS[@]} Sent on the Sprint? Now Network from my BlackBerry? -----Original Message----- From: Mihael Hategan Date: Sun, 17 Oct 2010 12:45:14 To: Jonathan Monette Cc: Justin M Wozniak; Subject: Re: [Swift-user] Argument list to long Swift r3684. Essentially it was this: - CMDARGS=("${CMDARGS[*]}" "$line") + CMDARGS=("${CMDARGS[@]}" "$line") Mihael On Sun, 2010-10-10 at 22:27 -0500, Jonathan Monette wrote: > Attached is the info and parameter file for one of the invocations. > Here is the app procedure. > > > app ( Image proj_img ) mProject( Image raw_img, MosaicData hdr ) > { > mProject "-X" @raw_img @proj_img @hdr; > } > > > On 10/10/2010 10:26 PM, Mihael Hategan wrote: > > On Sun, 2010-10-10 at 20:24 -0700, Mihael Hategan wrote: > >> I can't reproduce this. > >> > >> Can you paste the parameter file and info files (if you have those) from > >> this invocation? > >> > >> Also, has the swift script changed since you last mentioned it? > > Actually that's not what you last sent. So can you paste the swift code > > for the app procedure for mProject? > > > From hategan at mcs.anl.gov Sun Oct 17 15:15:10 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 17 Oct 2010 13:15:10 -0700 Subject: [Swift-user] Argument list to long In-Reply-To: <1939660039-1287345233-cardhu_decombobulator_blackberry.rim.net-498068482-@bda090.bisx.prod.on.blackberry> References: <2093121837.675981286227365765.JavaMail.root@zimbra.anl.gov> <4CB0B816.8000700@gmail.com> <1286653553.7457.0.camel@blabla2.none> <4CB0C6C7.5000204@gmail.com><1286654293.7457.9.camel@blabla2.none> <629618251-1286654421-cardhu_decombobulator_blackberry.rim.net-560100004-@bda085.bisx.prod.on.blackberry> <1286654567.7457.12.camel@blabla2.none> <4CB21D80.4060801@gmail.com> <1286742366.27981.2.camel@blabla2.none> <4CB2276D.3020504@gmail.com> <1286748354.17568.0.camel@blabla2.none> <4CB23986.1090809@gmail.com> <1286748727.19802.0.camel@blabla2.none> <4CB24A46.3080400@gmail.com> <4CB24F72.20702@gmail.com> <1286756778.20744.1.camel@blabla2. none> <4CB25D7D.6090209@gmail.com> <1286758148.20885.2.camel@blabla2 .none> <4CB25F89.2080109@gmail.com> <1286758559.20983.0.camel@blabla 2.none> <4CB26376.3040703@gmail.com> <1286767484.24726.0.camel@blabla2.none> <1286767589.24726.1.camel@blabla2.none> <4CB28433.4060305@gmail.com> <1287344714.17311.0.camel@blabla2.none> <1939660039-1287345233-cardhu_decombobulator_blackberry.rim.net-498068482-@bda090.bisx.prod.on.blackberry> Message-ID: <1287346510.19585.1.camel@blabla2.none> Quoting from the bash man page: "If the word is double-quoted, ${name[*]} expands to a single word with the value of each array member separated by the first character of the IFS special variable, and ${name[@]} expands each element of name to a separate word." On Sun, 2010-10-17 at 19:53 +0000, jon.monette at gmail.com wrote: > What is the difference between ${CMDARGS[*]} and ${CMDARGS[@]} > Sent on the Sprint? Now Network from my BlackBerry? > > -----Original Message----- > From: Mihael Hategan > Date: Sun, 17 Oct 2010 12:45:14 > To: Jonathan Monette > Cc: Justin M Wozniak; > Subject: Re: [Swift-user] Argument list to long > > Swift r3684. > > Essentially it was this: > - CMDARGS=("${CMDARGS[*]}" "$line") > + CMDARGS=("${CMDARGS[@]}" "$line") > > Mihael > > On Sun, 2010-10-10 at 22:27 -0500, Jonathan Monette wrote: > > Attached is the info and parameter file for one of the invocations. > > Here is the app procedure. > > > > > > app ( Image proj_img ) mProject( Image raw_img, MosaicData hdr ) > > { > > mProject "-X" @raw_img @proj_img @hdr; > > } > > > > > > On 10/10/2010 10:26 PM, Mihael Hategan wrote: > > > On Sun, 2010-10-10 at 20:24 -0700, Mihael Hategan wrote: > > >> I can't reproduce this. > > >> > > >> Can you paste the parameter file and info files (if you have those) from > > >> this invocation? > > >> > > >> Also, has the swift script changed since you last mentioned it? > > > Actually that's not what you last sent. So can you paste the swift code > > > for the app procedure for mProject? > > > > > > > From hategan at mcs.anl.gov Sun Oct 17 21:47:13 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 17 Oct 2010 19:47:13 -0700 Subject: [Swift-user] deadlock on workflow In-Reply-To: <20101015160108.GA17814@origin> References: <20101015160108.GA17814@origin> Message-ID: <1287370033.21170.1.camel@blabla2.none> Try now (swift r3685). On Fri, 2010-10-15 at 11:01 -0500, Allan Espinosa wrote: > When I changed my provider from coaster-persistent to plain caster, i > encountered this deadlock in my workflow: > That is coincidental. The deadlock happens exclusively in swift code. Mihael From aespinosa at cs.uchicago.edu Tue Oct 19 12:40:49 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 19 Oct 2010 12:40:49 -0500 Subject: [Swift-user] understanding coaster service logs Message-ID: Hi, My coaster services are registering 1.2k CPUs at a time. Yet my jobs does not seem to get sent to some workers: Progress: Selecting site:2762 Submitted:580 Finished successfully:424 Failed but can retry:138 Progress: Selecting site:2762 Submitted:580 Finished successfully:424 Failed but can retry:138 It maybe the case that my workers only has outbound connections and no inbound? Which coaster-service log entries should I look out for to know if (1) a job is received and (2) a job is dispatched to a worker (3) blockID the job was sent to? Does : pull refer to a worker receiving a job? SC-null: Disabling heartbeats (config is null) (0) Scheduling SC-null for addition nullChannel started Received registration: blockid = Prairiefire, url = MetaChannel: 835803672[661780277: {}] -> null: Disabling heartbeats (config is null) MetaChannel: 835803672[661780277: {}] -> null.bind -> SC-null Started CPU 397:1287509763s Started worker Prairiefire:000397 Prairiefire:397 pull Plan time: 1 Plan time: 1 Plan time: 1 Plan time: 1 Plan time: 1 Sender 799611093 queue size: 0 Sender 1237174744 queue size: 0 runTime: 54, sleepTime: 9993 Avg stream buf: 0 runTime: 6, sleepTime: 10003 Plan time: 1 Plan time: 1 Plan time: 1 Plan time: 1 Plan time: 1 Plan time: 1 Avg stream buf: 0 Sender 1191940729 queue size: 0 Sender 1231426791 queue size: 0 No streams Sender 1191940729 queue size: 0 Sender 2128911821 queue size: 0 Avg stream buf: 0 No streams Avg stream buf: 0 Plan time: 1 -- Allan M. Espinosa PhD student, Computer Science University of Chicago From aespinosa at cs.uchicago.edu Tue Oct 19 12:44:25 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 19 Oct 2010 12:44:25 -0500 Subject: [Swift-user] Re: understanding coaster service logs In-Reply-To: References: Message-ID: Btw, on the client/ submit host side, I get reply timeout exceptions from the service: 2010-10-19 12:41:15,173-0500 WARN Command Command(581490, HEARTBEAT)fault was: Reply timeout org.globus.cog.karajan.workflow.service.ReplyTimeoutException at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:280) at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:285) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) 2010-10-19 12:41:15,173-0500 INFO Command Sending Command(581490, HEARTBEAT) on MetaChannel: 715323878[834961640: {}] -> GSSCChannel-https://communicado.ci.uchicago.edu:61999(1)[834961640: {}] 2010/10/19 Allan Espinosa : > Hi, > > My coaster services are registering 1.2k CPUs at a time. ?Yet my jobs > does not seem to get sent to some workers: > > Progress: ?Selecting site:2762 ?Submitted:580 ?Finished > successfully:424 Failed but can retry:138 > Progress: ?Selecting site:2762 ?Submitted:580 ?Finished > successfully:424 Failed but can retry:138 > > It maybe the case that my workers only has outbound connections and no inbound? > > Which coaster-service log entries should I look out for to know if (1) > a job is received and (2) a job is dispatched to a worker (3) blockID > the job was sent to? > > Does : pull refer to a worker receiving a job? -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Tue Oct 19 12:50:19 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 19 Oct 2010 10:50:19 -0700 Subject: [Swift-user] understanding coaster service logs In-Reply-To: References: Message-ID: <1287510619.8070.2.camel@blabla2.none> On Tue, 2010-10-19 at 12:40 -0500, Allan Espinosa wrote: > Hi, > > My coaster services are registering 1.2k CPUs at a time. Yet my jobs > does not seem to get sent to some workers: > > Progress: Selecting site:2762 Submitted:580 Finished > successfully:424 Failed but can retry:138 > Progress: Selecting site:2762 Submitted:580 Finished > successfully:424 Failed but can retry:138 > > It maybe the case that my workers only has outbound connections and no inbound? No. A single connection is used for two-way communication. > > Which coaster-service log entries should I look out for to know if (1) > a job is received and (2) a job is dispatched to a worker (3) blockID > the job was sent to? Cpu.java has the following line: logger.info(block.getId() + ":" + getId() + " submitting " + task.getIdentity()); This tells you when a job is sent to a worker. If you want to know if a job is received by a worker, you need to enable worker logging. > > Does : pull refer to a worker receiving a job? No. It refers to a block selecting a job for submission to a worker. Mihael From iraicu at cs.uchicago.edu Tue Oct 19 21:51:16 2010 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 19 Oct 2010 21:51:16 -0500 Subject: [Swift-user] Emerging Research and Development session at CloudCom 2010 November 30-December 3, 2010 Message-ID: <4CBE5924.4030606@cs.uchicago.edu> The main track track(207 papers submitted, 48 accepted), workshops and tutorials schedules are already posted http://salsahpc.indiana.edu/CloudCom2010/schedule.html but Emerging Research and Development session is still open. Of course you could just attend! *IEEE 2nd International Conference on Cloud Computing Technology and Science November 30-December 3, 2010 Indianapolis, Indiana *http://2010.cloudcom.org/ *Earlybird registration through October 31. * The IEEE CloudCom 2010 conference welcomes proposals for an *Emerging Research and Development session *that includes student projects, new cloud commercial products, hot topics and wild and crazy ideas. Each accepted project will be displayed or demonstrated during the poster/exhibit/demo session in the evening of Wednesday December 1, 2010. All audiences are welcome to submit their research material, including all levels of students, researchers and those from corporations dealing with cloud technologies. All three types of material will be displayed in a single space during the poster reception. (Posters accepted to the IEEE CloudCom 2010 technical program of the will be displayed in a separate section.) This is a perfect opportunity to get your emerging research seen by the international Cloud Community! * Emerging Research Submission* *Guidelines* Proposals for the Emerging Research and Development Session should be submitted for review in the form of an extended abstract with the following submission requirements: * Up to 2 Pages in Length * Single-spaced, double column text in 10-point font * 8.5 x 11 inch pages * PDF Formatted The abstracts should clearly present the work described, including the problem statement, previous work as well as contribution of work. There should be a clear relevance to clouds in work. Each abstract must contain a section that describes the overall structure of the poster or display; the authors are strongly encouraged to present examples of graphs and illustrations that will appear on the poster. In addition, each abstract proposing an exhibition or a demo should contain a description of facilities that the authors expect to be provided. Each abstract should describe category in which work falls -- *student projects*, *commercial products*, *hot topics* or *wild and crazy ideas*. Accepted abstracts will be posted on the conference Web site. Additionally, a video crew will record selected posters, demos, and exhibits. Authors of accepted extended abstracts will be required to sign a release for the recording of these videos. Accepted presenters will be required to register for the full conference (no one-day registrations accepted for presenters). Accepted presenters will be granted the earlybird conference rate even if they are notified of acceptance after the earlybird cutoff date. Visit http://2010.cloudcom.org/ for complete submission guidelines. -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor ================================================================= Computer Science Department Illinois Institute of Technology 10 W. 31st Street Stuart Building, Room 237D Chicago, IL 60616 ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ ================================================================= ================================================================= -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor ================================================================= Computer Science Department Illinois Institute of Technology 10 W. 31st Street Stuart Building, Room 237D Chicago, IL 60616 ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ ================================================================= ================================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Wed Oct 20 12:58:01 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 20 Oct 2010 12:58:01 -0500 Subject: [Swift-user] deadlock on workflow In-Reply-To: <1287370033.21170.1.camel@blabla2.none> References: <20101015160108.GA17814@origin> <1287370033.21170.1.camel@blabla2.none> Message-ID: my workflow's now proceeding/ working. thanks 2010/10/17 Mihael Hategan : > Try now (swift r3685). > > On Fri, 2010-10-15 at 11:01 -0500, Allan Espinosa wrote: >> When I changed my provider from coaster-persistent to plain caster, i >> encountered this deadlock in my workflow: >> > > That is coincidental. The deadlock happens exclusively in swift code. > > Mihael > > > From aespinosa at cs.uchicago.edu Wed Oct 20 13:08:16 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 20 Oct 2010 13:08:16 -0500 Subject: [Swift-user] log4j settings of vdl:* elements Message-ID: Hi, I think I have asked this before, but can't find the previous posts about it. I woud like to set the vdl:execute2 log level to DEBUG. Which package/class path should I adjust in log4j.properties? Thanks, -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wozniak at mcs.anl.gov Wed Oct 20 13:23:41 2010 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Wed, 20 Oct 2010 13:23:41 -0500 (CDT) Subject: [Swift-user] log4j settings of vdl:* elements In-Reply-To: References: Message-ID: Try log4j.logger.swift=DEBUG See org.griphyn.vdl.karajan.lib.Log for more info. On Wed, 20 Oct 2010, Allan Espinosa wrote: > Hi, > > I think I have asked this before, but can't find the previous posts about it. > > I woud like to set the vdl:execute2 log level to DEBUG. Which > package/class path should I adjust in log4j.properties? > > Thanks, > -Allan -- Justin M Wozniak From dsk at ci.uchicago.edu Thu Oct 21 09:55:41 2010 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Thu, 21 Oct 2010 09:55:41 -0500 Subject: [Swift-user] another small tutorial issue Message-ID: <8167ADAC-554A-4170-96A4-6C7CE19378F6@ci.uchicago.edu> In http://www.ci.uchicago.edu/swift/guides/tutorial.php Section 3.1, I read "The code changes from first.swift are highlighted below." However, this isn't true. The things that looks like highlights are the use of color, which is the same as in all of the examples (and which is not explained anywhere in the tutorial page). It would be useful if the differences between this example and the first example actually were highlighted in this example. If nothing else, the first example could be placed next to this example in an adjacent box. Dan -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsk at ci.uchicago.edu Thu Oct 21 09:56:28 2010 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Thu, 21 Oct 2010 09:56:28 -0500 Subject: [Swift-user] a tutorial problem Message-ID: <0425D913-2ABA-4D53-9927-C2487E8A6784@ci.uchicago.edu> Hi, This is the first real error in the tutorial. Section 3.3 is incomplete, and doesn't make any sense. It refers to greeting.txt from the previous section, but there is no greeting.txt in the previous section. There's a greeting procedure, but no greeting file. It seems that this might refer to hello.txt. Also, the line in the box refers to outfile, which wasn't defined. So, the text in section 3.3 should read: In the previous section, the file hello.txt is used only to store an intermediate result. We don't really care about which name is used for the file, and we can let Swift choose the name. To do that, omit the mapping entirely when declaring outfile: And line in the box should read: messagefile hellofile; For completeness, the code from the previous example could be restated with this change: type messagefile {} (messagefile t) greeting (string s) { app { echo s stdout=@filename(t); } } (messagefile o) capitalise(messagefile i) { app { tr "[a-z]" "[A-Z]" stdin=@filename(i) stdout=@filename(o); } } messagefile hellofile; messagefile final <"capitals.txt">; hellofile = greeting("hello from Swift"); final = capitalise(hellofile); Dan -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Oct 21 11:12:28 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 21 Oct 2010 10:12:28 -0600 (GMT-06:00) Subject: [Swift-user] a tutorial problem In-Reply-To: <0425D913-2ABA-4D53-9927-C2487E8A6784@ci.uchicago.edu> Message-ID: <244831244.196741287677548567.JavaMail.root@zimbra.anl.gov> Thanks, Dan. David, can you comment on the problems Dan has reported (today and a few days back) on the tutorial, as I recall that you worked on it this summer? Thanks, Mike ----- "Daniel S. Katz" wrote: > Hi, > This is the first real error in the tutorial. Section 3.3 is incomplete, and doesn't make any sense. > It refers to greeting.txt from the previous section, but there is no greeting.txt in the previous section. There's a greeting procedure, but no greeting file. It seems that this might refer to hello.txt. Also, the line in the box refers to outfile, which wasn't defined. > > So, the text in section 3.3 should read: > > In the previous section, the file hello.txt is used only to store an intermediate result. We don't really care about which name is used for the file, and we can let Swift choose the name. To do that, omit the mapping entirely when declaring outfile: > And line in the box should read: > messagefile hellofile; > > For completeness, the code from the previous example could be restated with this change: > > type messagefile {} (messagefile t) greeting ( string s) { app { echo s stdout= @filename (t); } } (messagefile o) capitalise(messagefile i) { app { tr "[a-z]" "[A-Z]" stdin= @filename (i) stdout= @filename (o); } } messagefile hellofile ; messagefile final < "capitals.txt" >; hellofile = greeting( "hello from Swift" ); final = capitalise(hellofile); > > Dan > > > > > > -- > Daniel S. Katz > University of Chicago > (773) 834-7186 (voice) > (773) 834-3700 (fax) > d.katz at ieee.org or dsk at ci.uchicago.edu > http://www.ci.uchicago.edu/~dsk/ > > > _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Thu Oct 21 11:11:49 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 21 Oct 2010 11:11:49 -0500 Subject: [Swift-user] coaster service log4j.properties Message-ID: Hi, I set my log4j.logger.org.globus.cog.abstraction.coaster.rlog=DEBUG but the persistent coaster-service still seems to be in INFO mode. Does bin/coaster-service still look at etc/log4j.properties ? Or do i need to specify the log4j.properties file in the bin/coaster-service script itself as a java flag? Thanks. -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wilde at mcs.anl.gov Thu Oct 21 12:10:56 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 21 Oct 2010 11:10:56 -0600 (GMT-06:00) Subject: [Swift-user] Status of Swift on Ranger? Message-ID: <2005454051.200111287681056387.JavaMail.root@zimbra.anl.gov> Hi Sarah, Glen Hocky would like to start using Ranger for Swift science runs. Can you send us a brief update on what you see is the status of Swift on Ranger? I.e, what approach works best, and what problems if any to avoid? Do you depend on Swift fixes at a specific revision level of trunk? Do you submit to Ranger using coasters from a HNL submit host? From communicado? What sites file and coaster parameters? (I assume these are on the HNL wiki - can you send a pointer to Glen?) What job-sizing recomendations would you make? Have you tried getting coasters to create job-size-spreads on Ranger? Thanks, Mike From aespinosa at cs.uchicago.edu Thu Oct 21 17:51:37 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 21 Oct 2010 17:51:37 -0500 Subject: [Swift-user] coaster reply timeout heartbeats Message-ID: Hi, Do the HEARTBEAT timeouts occur because the swift client is expecting them by default but the coaster service has it disabled by default? Below are the logs that gave me the idea. Swift client: 2010-10-21 10:49:56,404-0500 WARN Command Command(6761, HEARTBEAT): handling reply timeout; sendReqTime=101021-104756.388, sendTime=101021-104756.388, now=101021-104956.404 2010-10-21 10:49:56,404-0500 INFO Command Command(6761, HEARTBEAT): re-sending 2010-10-21 10:49:56,404-0500 WARN Command Command(6761, HEARTBEAT)fault was: Reply timeout org.globus.cog.karajan.workflow.service.ReplyTimeoutException at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:280) at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:285) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) 2010-10-21 10:49:56,588-0500 INFO GSSChannel Connected to https://communicado.ci.uchicago.edu:61999 2010-10-21 10:49:56,588-0500 INFO AbstractStreamKarajanChannel$Multiplexer (1) Scheduling GSSCChannel-https://communicado.ci.uchicago.edu:61999(6)[1543987498: {}] for addition Persistent service (passive workers): Local contacts: [http://128.135.125.17:60999] Started local service: http://128.135.125.17:60999 Started coaster service: https://128.135.125.17:61999 Started coaster service: https://128.135.125.17:61999 GSSSChannel-null(0)[1205215856: {}]: Disabling heartbeats (config is null) Multiplexer 0 started (1) Scheduling GSSSChannel-null(1)[1205215856: {}] for addition nullChannel started Multiplexer 1 started Channel id: u-5410312-12bd0f72d5e--8000-u1d283670-12bd0f72d69--8000 MetaChannel: 409971196[1205215856: {}] -> null: Disabling heartbeats (disabled in config) MetaChannel: 409971196[1205215856: {}] -> null.bind -> GSSSChannel-null(1)[1205215856: {}] Sending Command(1, RLOG) on GSSSChannel-null(1)[1205215856: {}] Plan time: 1 Plan time: 1 Plan time: 1 Plan time: 1 -- Allan M. Espinosa PhD student, Computer Science University of Chicago From aespinosa at cs.uchicago.edu Thu Oct 21 17:53:57 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 21 Oct 2010 17:53:57 -0500 Subject: [Swift-user] passive worker connection to service. Message-ID: Can you guys confirm this for me? Does the passive worker client only initiates one (1) register request to the service? If so, then that worker is as good as gone when the service shuts down and gets restarted? Or will it register again to that service? -- Allan M. Espinosa PhD student, Computer Science University of Chicago From aespinosa at cs.uchicago.edu Thu Oct 21 18:20:45 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 21 Oct 2010 18:20:45 -0500 Subject: [Swift-user] resume log on output files in a struct Message-ID: I get these entries in my resume log: 11-2940:sub..x!null 11-1812:sub..y!null 11-1812:sub..x!null 11-5367:sub..y!null 11-5367:sub..x!null 11-3291:sub..x!null 11-5378:sub..y!null 11-5378:sub..x!null 11-3292:sub..y!null 11-3292:sub..x!null 11-3298:sub..y!null 11-3298:sub..x!null 11-6:sub..y!null 11-6:sub..x!null 11-5371:sub..y!null 11-5371:sub..x!null 11-2941:sub..y!null 11-2941:sub..x!null 11-539:sub..y!null 11-539:sub..x!null Here's that object in the workflow: Sgt sub ; string var_str[] = get_variations( site, rup, "/gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations" ); Variation vars[] ; sub = extract(sgt_var, site, vars[rup.size-1]); here's the definition in the datatype: type SgtDim; type Variation; type Sgt { SgtDim x; SgtDim y; } type Rupture { int source; int index; int size; } app (Sgt _ext) extract(Sgt _sgt, Station _stat, Variation _var) { extract @strcat("stat=", _stat.name) "extract_sgt=1" @strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) @strcat("rupmodfile=", @filename(_var)) @strcat("sgt_xfile=", @filename(_sgt.x)) @strcat("sgt_yfile=", @filename(_sgt.y)) @strcat("extract_sgt_xfile=", @filename(_ext.x)) @strcat("extract_sgt_yfile=", @filename(_ext.y)); } So just to confirm, the resume log is made from unique jobids of each task that persists across swift session and does not look for the output files / filenames itelf derived in the workflow? Thanks, -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Thu Oct 21 19:09:34 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 21 Oct 2010 17:09:34 -0700 Subject: [Swift-user] coaster reply timeout heartbeats In-Reply-To: References: Message-ID: <1287706174.17758.1.camel@blabla2.none> Heartbeats are a way to establish (within a certain interval of time) that the connection is still alive. Only one side needs to send heartbeats, and they should be replied to without question. A client detects a bad connection when no reply is received to a heartbeat. The "passive" side detects a bad connection when no heartbeat has been received in 2*the interval. On Thu, 2010-10-21 at 17:51 -0500, Allan Espinosa wrote: > Hi, > > Do the HEARTBEAT timeouts occur because the swift client is expecting > them by default but the coaster service has it disabled by default? > > Below are the logs that gave me the idea. > > Swift client: > 2010-10-21 10:49:56,404-0500 WARN Command Command(6761, HEARTBEAT): > handling reply timeout; sendReqTime=101021-104756.388, > sendTime=101021-104756.388, now=101021-104956.404 > 2010-10-21 10:49:56,404-0500 INFO Command Command(6761, HEARTBEAT): re-sending > 2010-10-21 10:49:56,404-0500 WARN Command Command(6761, > HEARTBEAT)fault was: Reply timeout > org.globus.cog.karajan.workflow.service.ReplyTimeoutException > at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:280) > at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:285) > at java.util.TimerThread.mainLoop(Timer.java:512) > at java.util.TimerThread.run(Timer.java:462) > 2010-10-21 10:49:56,588-0500 INFO GSSChannel Connected to > https://communicado.ci.uchicago.edu:61999 > 2010-10-21 10:49:56,588-0500 INFO > AbstractStreamKarajanChannel$Multiplexer (1) Scheduling > GSSCChannel-https://communicado.ci.uchicago.edu:61999(6)[1543987498: > {}] for addition > > Persistent service (passive workers): > Local contacts: [http://128.135.125.17:60999] > Started local service: http://128.135.125.17:60999 > Started coaster service: https://128.135.125.17:61999 > Started coaster service: https://128.135.125.17:61999 > GSSSChannel-null(0)[1205215856: {}]: Disabling heartbeats (config is null) > Multiplexer 0 started > (1) Scheduling GSSSChannel-null(1)[1205215856: {}] for addition > nullChannel started > Multiplexer 1 started > Channel id: u-5410312-12bd0f72d5e--8000-u1d283670-12bd0f72d69--8000 > MetaChannel: 409971196[1205215856: {}] -> null: Disabling heartbeats > (disabled in config) > MetaChannel: 409971196[1205215856: {}] -> null.bind -> > GSSSChannel-null(1)[1205215856: {}] > Sending Command(1, RLOG) on GSSSChannel-null(1)[1205215856: {}] > Plan time: 1 > Plan time: 1 > Plan time: 1 > Plan time: 1 > > From hategan at mcs.anl.gov Thu Oct 21 19:11:35 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 21 Oct 2010 17:11:35 -0700 Subject: [Swift-user] passive worker connection to service. In-Reply-To: References: Message-ID: <1287706295.17758.3.camel@blabla2.none> On Thu, 2010-10-21 at 17:53 -0500, Allan Espinosa wrote: > Can you guys confirm this for me? > > Does the passive worker client only initiates one (1) register request > to the service? If so, then that worker is as good as gone when the > service shuts down and gets restarted? Or will it register again to > that service? > The worker is as good as gone when the service shuts down. That's why you use a persistent service when you want to persist workers between different runs. The passive worker is not meant to address that. Instead it is meant to override the automatic worker submission in order to allow that to work in environments for which there is no provider. From aespinosa at cs.uchicago.edu Thu Oct 21 19:13:04 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 21 Oct 2010 19:13:04 -0500 Subject: [Swift-user] coaster reply timeout heartbeats In-Reply-To: <1287706174.17758.1.camel@blabla2.none> References: <1287706174.17758.1.camel@blabla2.none> Message-ID: 2010/10/21 Mihael Hategan : > Heartbeats are a way to establish (within a certain interval of time) > that the connection is still alive. So at some point, the client's connection with the coaster service got cut here? > > Only one side needs to send heartbeats, and they should be replied to > without question. > > A client detects a bad connection when no reply is received to a > heartbeat. The "passive" side detects a bad connection when no heartbeat > has been received in 2*the interval. By "passive" side you mean the coaster service? > > On Thu, 2010-10-21 at 17:51 -0500, Allan Espinosa wrote: >> Hi, >> >> Do the HEARTBEAT timeouts occur because the swift client is expecting >> them by default but the coaster service has it disabled by default? >> >> Below are the logs that gave me the idea. >> >> Swift client: >> 2010-10-21 10:49:56,404-0500 WARN ?Command Command(6761, HEARTBEAT): >> handling reply timeout; sendReqTime=101021-104756.388, >> sendTime=101021-104756.388, now=101021-104956.404 >> 2010-10-21 10:49:56,404-0500 INFO ?Command Command(6761, HEARTBEAT): re-sending >> 2010-10-21 10:49:56,404-0500 WARN ?Command Command(6761, >> HEARTBEAT)fault was: Reply timeout >> org.globus.cog.karajan.workflow.service.ReplyTimeoutException >> ? ? ? ? at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:280) >> ? ? ? ? at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:285) >> ? ? ? ? at java.util.TimerThread.mainLoop(Timer.java:512) >> ? ? ? ? at java.util.TimerThread.run(Timer.java:462) >> 2010-10-21 10:49:56,588-0500 INFO ?GSSChannel Connected to >> https://communicado.ci.uchicago.edu:61999 >> 2010-10-21 10:49:56,588-0500 INFO >> AbstractStreamKarajanChannel$Multiplexer (1) Scheduling >> GSSCChannel-https://communicado.ci.uchicago.edu:61999(6)[1543987498: >> {}] for addition >> >> Persistent service (passive workers): >> Local contacts: [http://128.135.125.17:60999] >> Started local service: http://128.135.125.17:60999 >> Started coaster service: https://128.135.125.17:61999 >> Started coaster service: https://128.135.125.17:61999 >> GSSSChannel-null(0)[1205215856: {}]: Disabling heartbeats (config is null) >> Multiplexer 0 started >> (1) Scheduling GSSSChannel-null(1)[1205215856: {}] for addition >> nullChannel started >> Multiplexer 1 started >> Channel id: u-5410312-12bd0f72d5e--8000-u1d283670-12bd0f72d69--8000 >> MetaChannel: 409971196[1205215856: {}] -> null: Disabling heartbeats >> (disabled in config) >> MetaChannel: 409971196[1205215856: {}] -> null.bind -> >> GSSSChannel-null(1)[1205215856: {}] >> Sending Command(1, RLOG) on GSSSChannel-null(1)[1205215856: {}] >> Plan time: 1 >> Plan time: 1 >> Plan time: 1 >> Plan time: 1 >> >> > > > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From aespinosa at cs.uchicago.edu Thu Oct 21 19:17:34 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 21 Oct 2010 19:17:34 -0500 Subject: [Swift-user] passive worker connection to service. In-Reply-To: <1287706295.17758.3.camel@blabla2.none> References: <1287706295.17758.3.camel@blabla2.none> Message-ID: I see. Now this reduce my concerns to just the persistence of the coaster service across runs (other post). Thanks! -Allan 2010/10/21 Mihael Hategan : > On Thu, 2010-10-21 at 17:53 -0500, Allan Espinosa wrote: >> Can you guys confirm this for me? >> >> Does the passive worker client only initiates one (1) register request >> to the service? ?If so, then that worker is as good as gone when the >> service shuts down and gets restarted? Or will it register again to >> that service? >> > > The worker is as good as gone when the service shuts down. > > That's why you use a persistent service when you want to persist workers > between different runs. The passive worker is not meant to address that. > Instead it is meant to override the automatic worker submission in order > to allow that to work in environments for which there is no provider. > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Thu Oct 21 19:18:41 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 21 Oct 2010 17:18:41 -0700 Subject: [Swift-user] coaster reply timeout heartbeats In-Reply-To: References: <1287706174.17758.1.camel@blabla2.none> Message-ID: <1287706721.17758.10.camel@blabla2.none> On Thu, 2010-10-21 at 19:13 -0500, Allan Espinosa wrote: > 2010/10/21 Mihael Hategan : > > Heartbeats are a way to establish (within a certain interval of time) > > that the connection is still alive. > > So at some point, the client's connection with the coaster service got cut here? It only really establishes the state of the connection to the extent that the logic surrounding this mechanism is ok. It would seem, from your email, that the connection is fine. Which implies that some of the logic is bad. > > > > > Only one side needs to send heartbeats, and they should be replied to > > without question. > > > > A client detects a bad connection when no reply is received to a > > heartbeat. The "passive" side detects a bad connection when no heartbeat > > has been received in 2*the interval. > > By "passive" side you mean the coaster service? I believe "active" is whoever initiates the connection. So for automatic service startup, active will be the service. For a persistent service, active would be the client. Mihael From skenny at uchicago.edu Fri Oct 22 12:56:27 2010 From: skenny at uchicago.edu (Sarah Kenny) Date: Fri, 22 Oct 2010 12:56:27 -0500 Subject: [Swift-user] Re: Status of Swift on Ranger? In-Reply-To: <2005454051.200111287681056387.JavaMail.root@zimbra.anl.gov> References: <2005454051.200111287681056387.JavaMail.root@zimbra.anl.gov> Message-ID: hi glen, we run on ranger using the latest stable branch of swift...users generate their sites file here: http://www.ci.uchicago.edu/~skenny/sitesgen/sitegen.php which will give you pretty good default settings (you'll have to change the account number and work directory to your own) and then you can tweak it based on your workflow...as you increase number of cores you request usually the queue time will increase... we submit from wherever :) communicado, any of our hnl machines...bridled, pads logins...feel free to use the hnl install of swift: /ci/projects/cnari/apps/swift/bin/swift ~sk On Thu, Oct 21, 2010 at 12:10 PM, Michael Wilde wrote: > Hi Sarah, > > Glen Hocky would like to start using Ranger for Swift science runs. > > Can you send us a brief update on what you see is the status of Swift on > Ranger? I.e, what approach works best, and what problems if any to avoid? > > Do you depend on Swift fixes at a specific revision level of trunk? > > Do you submit to Ranger using coasters from a HNL submit host? From > communicado? > > What sites file and coaster parameters? (I assume these are on the HNL wiki > - can you send a pointer to Glen?) > > What job-sizing recomendations would you make? > > Have you tried getting coasters to create job-size-spreads on Ranger? > > Thanks, > > Mike > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Fri Oct 22 16:59:19 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 22 Oct 2010 16:59:19 -0500 Subject: [Swift-user] foreach dilemma (was Re: Heap space being exhausted) Message-ID: I looked again at my mainloop to figure out why are there not enough jobs from being called in the inner loop despite the huge number of availble workers. Answer: not enough ready jobs! :( I set my foreach.max.threads=100 so there can only be 100 outerloop jobs (6k total) at a time. these outerloop jobs is a dependency for the innerloop jobs so there will be a slow start of the number of ready jobs. Another bottleneck is my data mapper wrapped as an app. since it's only a local job, the default throttle is set to 4 at a time. I made this work around because of synchronization issues on mappers a few months ago. I think it would be good to look at the previous version of my workflow again. -Allan 2010/6/2 Mihael Hategan : > On Wed, 2010-06-02 at 19:03 -0500, Allan Espinosa wrote: >> My Cybershake workflow. ?its basically a 2 level for loop with varying >> inner ?loop sizes. >> >> foreach i in (~4k elements) { >> ? x = f(); >> ? foreach (20-2k elements) { >> ? ? ?... >> ?} >> } > > Yep. You have a winner. Max threads = 1024 * 1024. > > You should adjust that parameter accordingly. I.e. foreach.max.threads = > sqrt(maxTotalThreads). > >> >> 2010/6/2 Mihael Hategan : >> > On Wed, 2010-06-02 at 18:40 -0500, Allan Espinosa wrote: >> >> I tried a HEAPMAX of 4GB. >> >> >> >> No memory problems so far :) >> > >> > Still odd. What's the swift script? >> > >> > I'm asking because foreach.max.threads should work, but it applies to >> > each individual foreach rather than globally. >> > >> >> >> >> 2010/6/2 Mihael Hategan : >> >> > On Wed, 2010-06-02 at 17:47 -0500, Mihael Hategan wrote: >> >> >> On Wed, 2010-06-02 at 17:32 -0500, Allan Espinosa wrote: >> >> >> > btw >> >> >> > >> >> >> > foreach.maxthreads=1024 >> >> >> >> >> >> or more heapmax. >> >> > >> >> > Ehm, I though you found a solution :) >> >> > >> >> > What's the swift script? >> > > > > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From dk0966 at cs.ship.edu Sun Oct 24 22:26:53 2010 From: dk0966 at cs.ship.edu (David Kelly) Date: Sun, 24 Oct 2010 23:26:53 -0400 Subject: [Swift-user] a tutorial problem In-Reply-To: <244831244.196741287677548567.JavaMail.root@zimbra.anl.gov> References: <0425D913-2ABA-4D53-9927-C2487E8A6784@ci.uchicago.edu> <244831244.196741287677548567.JavaMail.root@zimbra.anl.gov> Message-ID: Hello all, I have made the following changes to the tutorial based on Daniel's feedback: Section 3.6 reference to HELLOWORLD changed to "exercise in section 2" Section 3.6 reference to ANONYMOUSFILE changed to "exercise in section 3.3" TODO reminder removed until new content is added Corrected filename in section 3.3 from greeting.txt to hello.txt Added the full code from the prior example to section 3.3 with appropriate changes Changed the style of defining an app to the cleaner version, which should give a more consistent presentation. Modified examples/q5.swift, examples/manyparam.swift, examples/second_procedure.swift, examples/restart.swift, examples/iterate.swift, examples/types.swift, and examples/parameter.swift to match this style. These are in revision I do not have permissions to run a manual update on the documentation, but I did modify the docbook file with the correct information. It should hopefully update on the website within 24 hours. In the meantime, here is a PDF file I generated which contains the updates. Regards, David On Thu, Oct 21, 2010 at 12:12 PM, Michael Wilde wrote: > Thanks, Dan. > > David, can you comment on the problems Dan has reported (today and a few > days back) on the tutorial, as I recall that you worked on it this summer? > > Thanks, > > Mike > > ----- "Daniel S. Katz" wrote: >> Hi, >> > This is the first real error in the tutorial. ?Section 3.3 is incomplete, > and doesn't make any sense. >> > It refers to greeting.txt from the previous section, but there is no > greeting.txt in the previous section. ?There's a greeting procedure, but no > greeting file. ?It seems that this might refer to hello.txt. ?Also, the line > in the box refers to outfile, which wasn't defined. >> >> > So, the text in section 3.3 should read: >> >> > > In the previous section, the file?hello.txt?is used only to store an > intermediate result. We don't really care about which name is used for the > file, and we can let Swift choose the name. > > To do that, omit the mapping entirely when declaring outfile: > >> > And line in the box should read: >> > messagefile?hellofile; >> >> > For completeness, the code from the previous example could be restated with > this change: >> >> > type?messagefile?{} > > (messagefile?t)?greeting?(string?s)?{ > ????app?{ > ????????echo?s?stdout=@filename(t); > ????} > } > > (messagefile?o)?capitalise(messagefile?i)?{ > ????app?{ > ????????tr?"[a-z]"?"[A-Z]"?stdin=@filename(i)?stdout=@filename(o); > ????} > } > > messagefile?hellofile; > messagefile?final?<"capitals.txt">; > > hellofile?=?greeting("hello?from?Swift"); > final?=?capitalise(hellofile); >> >> > Dan >> >> >> >> >> >> -- >> Daniel S. Katz >> University of Chicago >> (773)?834-7186 (voice) >> (773) 834-3700 (fax) >> d.katz at ieee.org?or?dsk at ci.uchicago.edu >> http://www.ci.uchicago.edu/~dsk/ >> > > >> >> _______________________________________________ Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -------------- next part -------------- A non-text attachment was scrubbed... Name: tutorial.pdf Type: application/pdf Size: 42515 bytes Desc: not available URL: From deljaick at terra.com.br Mon Oct 25 11:22:31 2010 From: deljaick at terra.com.br (Daniele El-Jaick) Date: Mon, 25 Oct 2010 14:22:31 -0200 Subject: [Swift-user] About installation Message-ID: <004e01cb7460$d41c0680$7c541380$@com.br> Hi Which operational systems Swift works with? Do you have a version for windows? Thanks Daniele -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsk at ci.uchicago.edu Thu Oct 14 10:19:01 2010 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Thu, 14 Oct 2010 10:19:01 -0500 Subject: [Swift-user] swift understanding/tutorial Message-ID: <785B29E7-72D0-4294-9992-97929755CB48@ci.uchicago.edu> Hi, I'm just working through the swift examples in the tutorial (http://www.ci.uchicago.edu/swift/guides/tutorial.php). Some thoughts/comments follow... This is confusing to me. If the "file to use" is t, it feels to me that t should be an input parameter, as it tells greeting where to write its output. It seems that the following should work: type messagefile; app () greeting(messagefile t) { echo "Hello, world!" stdout=@filename(t); } messagefile outfile <"hello.txt">; greeting(outfile ); but it doesn't, as it appears procedures require a return value. So I next tried the following, which seemed like it should work: type messagefile; int useless; app (int x) greeting(messagefile t) { echo "Hello, world!" stdout=@filename(t); } messagefile outfile <"hello.txt">; useless = greeting(outfile ); but it also didn't, at least not when hello.txt doesn't already exist. The command and output are: tmp:swift dsk$ swift check.swift Swift 0.9 swift-r2860 cog-r2388 RunID: 20101014-1003-tq1hn851 Progress: Failed to transfer wrapper log from check-20101014-1003-tq1hn851/info/4 on localhost Progress: Stage in:1 Failed to transfer wrapper log from check-20101014-1003-tq1hn851/info/6 on localhost Progress: Stage in:1 Failed to transfer wrapper log from check-20101014-1003-tq1hn851/info/8 on localhost Execution failed: Exception in echo: Arguments: [Hello, world!] Host: localhost Directory: check-20101014-1003-tq1hn851/jobs/8/echo-8dycd50k stderr.txt: stdout.txt: ---- Caused by: File not found: /Users/dsk/Desktop/swift-0.9/examples/swift/./hello.txt So, clearly there is something about Swift that I'm missing, and that I'm not gathering from the tutorial web page. I'm writing this with two goals - first, to figure out what basic concept I'm missing, and second, to perhaps expand the tutorial web page so that others don't have this issue. Thanks. Dan -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen shot 2010-10-14 at 9.51.27 AM.png Type: image/png Size: 29588 bytes Desc: not available URL: From dsk at ci.uchicago.edu Thu Oct 14 10:37:38 2010 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Thu, 14 Oct 2010 10:37:38 -0500 Subject: [Swift-user] another small tutorial issue Message-ID: <5CE63425-9331-49BE-B6D8-3DC4B16682E7@ci.uchicago.edu> In http://www.ci.uchicago.edu/swift/guides/tutorial.php Section 3.1, I read "The code changes from first.swift are highlighted below." However, this isn't true. The things that looks like highlights are the use of color, which is the same as in all of the examples (and which is not explained anywhere in the tutorial page). It would be useful if the differences between this example and the first example actually were highlighted in this example. If nothing else, the first example could be placed next to this example in an adjacent box. Dan -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsk at ci.uchicago.edu Thu Oct 14 10:45:24 2010 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Thu, 14 Oct 2010 10:45:24 -0500 Subject: [Swift-user] a third tutorial question Message-ID: In http://www.ci.uchicago.edu/swift/guides/tutorial.php in first.swift, the procedure is defined as: app (messagefile t) greeting () { echo "Hello, world!" stdout=@filename(t); } in parameter.swift, the new procedure is defined as: (messagefile t) greeting (string s) { app { echo s stdout=@filename(t); } } I don't understand why the style of defining the procedure has changed, or what this change implies. I would have just started with the first.swift procedure, and changed it to: app (messagefile t) greeting (string s) { echo s stdout=@filename(t); } In fact, I did try this, and the code works fine, so I fail to understand the reason for the larger change that is in the tutorial. Dan -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsk at ci.uchicago.edu Thu Oct 14 10:53:47 2010 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Thu, 14 Oct 2010 10:53:47 -0500 Subject: [Swift-user] fourth tutorial question Message-ID: Looking at manyparam.swift, the following seems painful: messagefile english <"english.txt">; messagefile french <"francais.txt">; english = greeting("hello"); french = greeting("bonjour"); messagefile japanese <"nihongo.txt">; japanese = greeting("konnichiwa"); It seems like it would be much nicer to be able to write: <"english.txt"> = greeting("hello"); <"francais.txt"> = greeting("bonjour"); <"nihongo.txt"> = greeting("konnichiwa"); But it appears this doesn't work. Is there some reason this can't be done, or is this just a limitation in Swift? Dan -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsk at ci.uchicago.edu Thu Oct 14 11:07:38 2010 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Thu, 14 Oct 2010 11:07:38 -0500 Subject: [Swift-user] fifth tutorial problem Message-ID: <0D68CE9E-B724-40E7-8D05-F7A6D1FBC84F@ci.uchicago.edu> Hi, This is the first real error in the tutorial. Section 3.3 is incomplete, and doesn't make any sense. It refers to greeting.txt from the previous section, but there is no greeting.txt in the previous section. There's a greeting procedure, but no greeting file. It seems that this might refer to hello.txt. Also, the line in the box refers to outfile, which wasn't defined. So, the text in section 3.3 should read: In the previous section, the file hello.txt is used only to store an intermediate result. We don't really care about which name is used for the file, and we can let Swift choose the name. To do that, omit the mapping entirely when declaring outfile: And line in the box should read: messagefile hellofile; For completeness, the code from the previous example could be restated with this change: type messagefile {} (messagefile t) greeting (string s) { app { echo s stdout=@filename(t); } } (messagefile o) capitalise(messagefile i) { app { tr "[a-z]" "[A-Z]" stdin=@filename(i) stdout=@filename(o); } } messagefile hellofile; messagefile final <"capitals.txt">; hellofile = greeting("hello from Swift"); final = capitalise(hellofile); Dan -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsk at ci.uchicago.edu Thu Oct 14 11:11:55 2010 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Thu, 14 Oct 2010 11:11:55 -0500 Subject: [Swift-user] another small issue in the tutorial Message-ID: At the start of section 3.4, I read: SwiftScript has the additional built-in types: boolean, integer and float that function much like their counterparts in other programming languages. If these are really type names, the second should be "int". If there are descriptions, at least float should be change to "floating point", and perhaps boolean should also be changed. The way this is now, it's a mix of types and descriptions. Dan -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Oct 25 14:46:24 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 25 Oct 2010 13:46:24 -0600 (GMT-06:00) Subject: [Swift-user] About installation In-Reply-To: <004e01cb7460$d41c0680$7c541380$@com.br> Message-ID: <1912692066.325811288035984725.JavaMail.root@zimbra.anl.gov> Daniele, The swift command itself runs on Sun Java on Linux and MacOs. It can submit work to the local host, PBS, SGE, Condor and Cobalt schedulers, and to most Globus GRAM2 and GRAM5-supported schedulers. A year ago Swift was enhanced to run on native Windows - see this entry in the Users Guide: http://www.ci.uchicago.edu/swift/guides/userguide.php#tips.windows I think in addition people have successfully run Swift under Cygwin on Windows. - Mike ----- "Daniele El-Jaick" wrote: > > Hi Which operational systems Swift works with? Do you have a version for windows? Thanks Daniele > _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Tue Oct 26 11:22:49 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 26 Oct 2010 11:22:49 -0500 Subject: [Swift-user] Re: resume log on output files in a struct In-Reply-To: References: Message-ID: In response to this, I think these are the corresponding log entries for the behavior: 2010-10-25 15:29:11,047-0500 INFO AbstractDataNode Found data org.griphyn.vdl.mapping.RootDataNode identifier tag:ben c at ci.uchicago.edu,2008:swift:dataset:20101025-1529-eyis7k05:720000000064 type Sgt with no value at dataset=sgt_var (not closed)..y 2010-10-25 15:29:11,047-0500 INFO AbstractDataNode Found data org.griphyn.vdl.mapping.RootDataNode identifier tag:benc at ci.uchicago.edu,2008:swift:dataset:20101025-1529-eyis7k05:720000000064 type Sgt with no value at dataset=sgt_var (not closed)..x But the workflow does stagein and stageout the mapped files properly. 2010/10/21 Allan Espinosa : > I get these entries in my resume log: > > 11-2940:sub..x!null > 11-1812:sub..y!null > 11-1812:sub..x!null > 11-5367:sub..y!null > 11-5367:sub..x!null > 11-3291:sub..x!null > 11-5378:sub..y!null > 11-5378:sub..x!null > 11-3292:sub..y!null > 11-3292:sub..x!null > 11-3298:sub..y!null > 11-3298:sub..x!null > 11-6:sub..y!null > 11-6:sub..x!null > 11-5371:sub..y!null > 11-5371:sub..x!null > 11-2941:sub..y!null > 11-2941:sub..x!null > 11-539:sub..y!null > 11-539:sub..x!null > > Here's that object in the workflow: > ?Sgt sub ? ? ?r=rup.index>; > ?string var_str[] = get_variations( site, rup, > ? ? ?"/gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations" ); > ?Variation vars[] ; > > ?sub = extract(sgt_var, ?site, vars[rup.size-1]); > > here's the definition in the datatype: > type SgtDim; > type Variation; > > type Sgt { > ?SgtDim x; > ?SgtDim y; > } > > type Rupture { > ?int source; > ?int index; > ?int size; > } > > > app (Sgt _ext) extract(Sgt _sgt, Station _stat, Variation _var) { > ?extract @strcat("stat=", _stat.name) "extract_sgt=1" > ? ? ?@strcat("slon=", _stat.lon) @strcat("slat=", _stat.lat) > > ? ? ?@strcat("rupmodfile=", @filename(_var)) > ? ? ?@strcat("sgt_xfile=", @filename(_sgt.x)) > ? ? ?@strcat("sgt_yfile=", @filename(_sgt.y)) > ? ? ?@strcat("extract_sgt_xfile=", @filename(_ext.x)) > ? ? ?@strcat("extract_sgt_yfile=", @filename(_ext.y)); > } > > > So just to confirm, the resume log is made from unique jobids of each > task that persists across swift session and does not look for the > output files / filenames itelf derived in the workflow? > > Thanks, > -Allan > > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From aespinosa at cs.uchicago.edu Tue Oct 26 11:38:21 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 26 Oct 2010 11:38:21 -0500 Subject: [Swift-user] log4j settings of vdl:* elements In-Reply-To: References: Message-ID: Thanks Justin. I got the log entries that I need. But it also overrode some of the classes that i set to NONE like SetFieldValue -Allan 2010/10/20 Justin M Wozniak : > > Try log4j.logger.swift=DEBUG > > See org.griphyn.vdl.karajan.lib.Log for more info. > > On Wed, 20 Oct 2010, Allan Espinosa wrote: > >> Hi, >> >> I think I have asked this before, but can't find the previous posts about >> it. >> >> I woud like to set the vdl:execute2 log level to DEBUG. ?Which >> package/class path should I adjust in log4j.properties? >> >> Thanks, >> -Allan > > -- > Justin M Wozniak > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wilde at mcs.anl.gov Tue Oct 26 17:04:21 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 26 Oct 2010 16:04:21 -0600 (GMT-06:00) Subject: [Swift-user] Mapping question: how to map an optional file? In-Reply-To: Message-ID: <863674084.383171288130661336.JavaMail.root@zimbra.anl.gov> Was: Re: mapping question (relaying to swift-user to solicit additional thoughts) Glen, Offhand I dont see any alternative to the empty file. Ive had the same need in the past and dont see how to do it other than an empty file. Its almost like we need some way to represent a null file without the overhead of actually passing one. And some way to indicate when a null file is acceptable. In your application would an empty file be problematic in performance or semantics? - Mike ----- "Glen Hocky" wrote: > I'm trying to modify my swift scripts slightly so that one output file is optional. i know previously we had done something like map to an empty file but i'm having a hard time doing that without having my application create the empty file. Any suggestions? -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Oct 26 18:09:51 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 26 Oct 2010 16:09:51 -0700 Subject: [Swift-user] Mapping question: how to map an optional file? In-Reply-To: <863674084.383171288130661336.JavaMail.root@zimbra.anl.gov> References: <863674084.383171288130661336.JavaMail.root@zimbra.anl.gov> Message-ID: <1288134591.18201.2.camel@blabla2.none> Right. I don't think we support optional data. I think that ties back into the purely functional nature of swift. I remember there being some discussion about this in the past, but I don't think we reached any reasonable conclusion. Mihael On Tue, 2010-10-26 at 16:04 -0600, Michael Wilde wrote: > Was: Re: mapping question > > (relaying to swift-user to solicit additional thoughts) > > Glen, > > Offhand I dont see any alternative to the empty file. Ive had the same > need in the past and dont see how to do it other than an empty file. > Its almost like we need some way to represent a null file without the > overhead of actually passing one. And some way to indicate when a null > file is acceptable. > > In your application would an empty file be problematic in performance > or semantics? > > - Mike > > ----- "Glen Hocky" wrote: > > I'm trying to modify my swift scripts slightly so that one output > file is optional. i know previously we had done something like map to > an empty file but i'm having a hard time doing that without having my > application create the empty file. Any suggestions? > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user