From hategan at mcs.anl.gov Wed Aug 3 00:22:21 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 02 Aug 2011 22:22:21 -0700 Subject: [Swift-devel] iterate behaviour round II Message-ID: <1312348941.21371.12.camel@blabla> So I think we decided that: iterate v { trace(v); } until (v >= 10); would do the test after v was incremented and would always execute at least once (so 0 to 9 would be printed). But then the tutorial has the following (adapted a bit): int a[]; a[0] = 10; iterate v { a[v + 1] = a[v] - 1; } until(a[v+1] < 1); It's all peachy in concept, except if v is incremented before the check, an access to a[v+1] will hang. a[v] is now the correct expression in the test, but then it's not quite intuitive. Proposal 1: change documentation and tests to "until(a[v] < 1)" (this does not solve the problem in general since a[v+1] would still lead to a hang, not unlike bug 481 Proposal 2: Proposal 1 + deprecate iterate and suggest foreach instead. Opinions? Other ideas? Mihael From hategan at mcs.anl.gov Wed Aug 3 00:23:59 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 02 Aug 2011 22:23:59 -0700 Subject: [Swift-devel] trunk coasters Message-ID: <1312349039.21371.13.camel@blabla> Has anybody ran a job trunk coasters recently? Mihael From hategan at mcs.anl.gov Wed Aug 3 00:27:43 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 02 Aug 2011 22:27:43 -0700 Subject: [Swift-devel] extractint Message-ID: <1312349263.21792.0.camel@blabla> Why is extractint returning a float? From benc at hawaga.org.uk Wed Aug 3 06:38:36 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 3 Aug 2011 13:38:36 +0200 Subject: [Swift-devel] iterate behaviour round II In-Reply-To: <1312348941.21371.12.camel@blabla> References: <1312348941.21371.12.camel@blabla> Message-ID: I dislike using array indices for each step. I wanted to figure out something that looked more like a fold/unfold, where the body of the iterate only has access to "previous" and "next" so that you write something like this: file a[]; file seed <"foo">; a = iterate from seed { next = f(previous) } until(g(next)=false) ; but I never figured out a syntax that I liked. (contrast to haskell unfold syntax) From benc at hawaga.org.uk Wed Aug 3 06:39:11 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 3 Aug 2011 13:39:11 +0200 Subject: [Swift-devel] extractint In-Reply-To: <1312349263.21792.0.camel@blabla> References: <1312349263.21792.0.camel@blabla> Message-ID: On Aug 3, 2011, at 7:27 AM, Mihael Hategan wrote: > Why is extractint returning a float? because swift numerical types are poorly defined? From jonmon at mcs.anl.gov Wed Aug 3 08:51:58 2011 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Wed, 3 Aug 2011 08:51:58 -0500 Subject: [Swift-devel] trunk coasters In-Reply-To: <1312349039.21371.13.camel@blabla> References: <1312349039.21371.13.camel@blabla> Message-ID: <264C2AE9-C8DC-47C9-9EFA-A31E382FDA9D@mcs.anl.gov> I have. A small 45 task run. On Aug 3, 2011, at 12:23 AM, Mihael Hategan wrote: > Has anybody ran a job trunk coasters recently? > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From wilde at mcs.anl.gov Wed Aug 3 08:53:51 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 3 Aug 2011 08:53:51 -0500 Subject: [Swift-devel] trunk coasters In-Reply-To: <1312349039.21371.13.camel@blabla> References: <1312349039.21371.13.camel@blabla> Message-ID: They were failing for me using persistent coasters to osg sites; will be testing further today and file bugs as needed. On 8/3/11, Mihael Hategan wrote: > Has anybody ran a job trunk coasters recently? > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -- Sent from my mobile device From jonmon at mcs.anl.gov Wed Aug 3 08:54:48 2011 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Wed, 3 Aug 2011 08:54:48 -0500 Subject: [Swift-devel] trunk coasters In-Reply-To: References: <1312349039.21371.13.camel@blabla> Message-ID: <48D1727A-0C16-4D0D-AC3B-F1F9DFE5894B@mcs.anl.gov> I have been using automatic coasters and submitting to PADS. I haven't tried any large scale runs recently though. On Aug 3, 2011, at 8:53 AM, Michael Wilde wrote: > They were failing for me using persistent coasters to osg sites; will > be testing further today and file bugs as needed. > > On 8/3/11, Mihael Hategan wrote: >> Has anybody ran a job trunk coasters recently? >> >> Mihael >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> > > -- > Sent from my mobile device > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From wilde at mcs.anl.gov Wed Aug 3 09:08:54 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 3 Aug 2011 09:08:54 -0500 (CDT) Subject: [Swift-devel] trunk coasters In-Reply-To: <48D1727A-0C16-4D0D-AC3B-F1F9DFE5894B@mcs.anl.gov> Message-ID: <1723865632.184066.1312380534696.JavaMail.root@zimbra.anl.gov> Last night (on current trunk) I was getting this: 2011-08-02 23:24:49,863-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION jobid=cat-4q4lmvdk - Application exception: null Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not submit job Caused by: org.globus.cog.karajan.workflow.service.ProtocolException: Failed to set configuration: For input string: "" Caused by: org.globus.cog.karajan.workflow.service.RemoteException: Failed to set configuration: For input string: "" 2011-08-02 23:24:49,866-0500 INFO vdl:execute END_FAILURE thread=0-3-0-1 tr=cat 2011-08-02 23:24:49,868-0500 DEBUG VDL2ExecutionContext Exception in cat: Arguments: [data.txt] Host: localhost Directory: catsn-20110802-2324-ze1lfx8f/jobs/4/cat-4q4lmvdk - - - Exception in cat: Arguments: [data.txt] Host: localhost Directory: catsn-20110802-2324-ze1lfx8f/jobs/4/cat-4q4lmvdk - - - Caused by: null Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not submit job Caused by: org.globus.cog.karajan.workflow.service.ProtocolException: Failed to set configuration: For input string: "" Caused by: org.globus.cog.karajan.workflow.service.RemoteException: Failed to set configuration: For input string: "" at org.globus.cog.karajan.workflow.nodes.functions.KException.function(KException.java:29) --- But that was a rather new configuration, so I need to do more diagnosis. Why did you ask, Mihael - are you seeing problems too? I hope to work with Alberto this week to resume site config testing with a focus on coaster configs. - Mike ----- Original Message ----- > From: "Jonathan Monette" > To: "Michael Wilde" > Cc: "Mihael Hategan" , "Swift Devel" > Sent: Wednesday, August 3, 2011 8:54:48 AM > Subject: Re: [Swift-devel] trunk coasters > I have been using automatic coasters and submitting to PADS. I haven't > tried any large scale runs recently though. > On Aug 3, 2011, at 8:53 AM, Michael Wilde wrote: > > > They were failing for me using persistent coasters to osg sites; > > will > > be testing further today and file bugs as needed. > > > > On 8/3/11, Mihael Hategan wrote: > >> Has anybody ran a job trunk coasters recently? > >> > >> Mihael > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > >> > > > > -- > > Sent from my mobile device > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Wed Aug 3 09:23:29 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 3 Aug 2011 09:23:29 -0500 (CDT) Subject: [Swift-devel] iterate behaviour round II In-Reply-To: <1312348941.21371.12.camel@blabla> Message-ID: <1654887811.184149.1312381409266.JavaMail.root@zimbra.anl.gov> I propose that we do everything possible to ensure that the semantics of iterate does not change from 0.92.1, to avoid breaking code. NCAR and the DOE ParVis project, in particular, has a very large Swift script that they are testing for production use, and we really dont want that to break. e should not allow 0.93 to break current user code -- if at all possible. I propose instead that we experiment with new iterate semantics using one or more new statements (fold, do, while, for). Ben, Mihael, and others are interested in functional-style statements; I am in favor of C-like statements. I would favor having both in the language as long as we keep it simple (which I understand is complex ;) To start a parallel thread here: how feasible is it (within the write-once variable model and scope-creation semantics that iterate uses) to provide the 3 C iteration statements with syntax and semantics as close to C as possible? - Mike ----- Original Message ----- > From: "Mihael Hategan" > To: "Swift Devel" > Sent: Wednesday, August 3, 2011 12:22:21 AM > Subject: [Swift-devel] iterate behaviour round II > So I think we decided that: > > iterate v { > trace(v); > } until (v >= 10); > > would do the test after v was incremented and would always execute at > least once (so 0 to 9 would be printed). > > But then the tutorial has the following (adapted a bit): > > > int a[]; a[0] = 10; > iterate v { > a[v + 1] = a[v] - 1; > } until(a[v+1] < 1); > > It's all peachy in concept, except if v is incremented before the > check, > an access to a[v+1] will hang. a[v] is now the correct expression in > the > test, but then it's not quite intuitive. > > Proposal 1: change documentation and tests to "until(a[v] < 1)" (this > does not solve the problem in general since a[v+1] would still lead to > a > hang, not unlike bug 481 > Proposal 2: Proposal 1 + deprecate iterate and suggest foreach > instead. > > Opinions? Other ideas? > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Wed Aug 3 09:27:03 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 3 Aug 2011 09:27:03 -0500 (CDT) Subject: [Swift-devel] extractint In-Reply-To: <1312349263.21792.0.camel@blabla> Message-ID: <2111872963.184160.1312381623643.JavaMail.root@zimbra.anl.gov> Do we have a test for this in the test suite? Is that a change from 0.92.1 ? - Mike ----- Original Message ----- > From: "Mihael Hategan" > To: "Swift Devel" > Sent: Wednesday, August 3, 2011 12:27:43 AM > Subject: [Swift-devel] extractint > Why is extractint returning a float? > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From jonmon at mcs.anl.gov Wed Aug 3 09:27:54 2011 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Wed, 3 Aug 2011 09:27:54 -0500 Subject: [Swift-devel] extractint In-Reply-To: <2111872963.184160.1312381623643.JavaMail.root@zimbra.anl.gov> References: <2111872963.184160.1312381623643.JavaMail.root@zimbra.anl.gov> Message-ID: I think it was like that because until recently ints were really java doubles underneath. At least that is what it looked like when going through the code. On Aug 3, 2011, at 9:27 AM, Michael Wilde wrote: > Do we have a test for this in the test suite? > Is that a change from 0.92.1 ? > > - Mike > > > ----- Original Message ----- >> From: "Mihael Hategan" >> To: "Swift Devel" >> Sent: Wednesday, August 3, 2011 12:27:43 AM >> Subject: [Swift-devel] extractint >> Why is extractint returning a float? >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From ketancmaheshwari at gmail.com Wed Aug 3 09:28:14 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 3 Aug 2011 09:28:14 -0500 Subject: [Swift-devel] trunk coasters In-Reply-To: <1723865632.184066.1312380534696.JavaMail.root@zimbra.anl.gov> References: <48D1727A-0C16-4D0D-AC3B-F1F9DFE5894B@mcs.anl.gov> <1723865632.184066.1312380534696.JavaMail.root@zimbra.anl.gov> Message-ID: I have been using trunk persistent coasters on mcs resources and did not see any issues. On Wed, Aug 3, 2011 at 9:08 AM, Michael Wilde wrote: > Last night (on current trunk) I was getting this: > > 2011-08-02 23:24:49,863-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION > jobid=cat-4q4lmvdk - Application exception: null > Caused by: > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could > not submit job > Caused by: org.globus.cog.karajan.workflow.service.ProtocolException: > Failed to set configuration: For input string: "" > Caused by: org.globus.cog.karajan.workflow.service.RemoteException: Failed > to set configuration: For input string: "" > 2011-08-02 23:24:49,866-0500 INFO vdl:execute END_FAILURE thread=0-3-0-1 > tr=cat > 2011-08-02 23:24:49,868-0500 DEBUG VDL2ExecutionContext Exception in cat: > Arguments: [data.txt] > Host: localhost > Directory: catsn-20110802-2324-ze1lfx8f/jobs/4/cat-4q4lmvdk > - - - > > Exception in cat: > Arguments: [data.txt] > Host: localhost > Directory: catsn-20110802-2324-ze1lfx8f/jobs/4/cat-4q4lmvdk > - - - > > Caused by: null > Caused by: > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could > not submit job > Caused by: org.globus.cog.karajan.workflow.service.ProtocolException: > Failed to set configuration: For input string: "" > Caused by: org.globus.cog.karajan.workflow.service.RemoteException: Failed > to set configuration: For input string: "" > at > org.globus.cog.karajan.workflow.nodes.functions.KException.function(KException.java:29) > > --- But that was a rather new configuration, so I need to do more > diagnosis. > > Why did you ask, Mihael - are you seeing problems too? > > I hope to work with Alberto this week to resume site config testing with a > focus on coaster configs. > > - Mike > > > > > ----- Original Message ----- > > From: "Jonathan Monette" > > To: "Michael Wilde" > > Cc: "Mihael Hategan" , "Swift Devel" < > swift-devel at ci.uchicago.edu> > > Sent: Wednesday, August 3, 2011 8:54:48 AM > > Subject: Re: [Swift-devel] trunk coasters > > I have been using automatic coasters and submitting to PADS. I haven't > > tried any large scale runs recently though. > > On Aug 3, 2011, at 8:53 AM, Michael Wilde wrote: > > > > > They were failing for me using persistent coasters to osg sites; > > > will > > > be testing further today and file bugs as needed. > > > > > > On 8/3/11, Mihael Hategan wrote: > > >> Has anybody ran a job trunk coasters recently? > > >> > > >> Mihael > > >> > > >> _______________________________________________ > > >> Swift-devel mailing list > > >> Swift-devel at ci.uchicago.edu > > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > >> > > > > > > -- > > > Sent from my mobile device > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Wed Aug 3 10:12:41 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 3 Aug 2011 10:12:41 -0500 (CDT) Subject: [Swift-devel] trunk coasters In-Reply-To: <1723865632.184066.1312380534696.JavaMail.root@zimbra.anl.gov> Message-ID: <570047635.184499.1312384361864.JavaMail.root@zimbra.anl.gov> Im testing the persistent coaster setup that was failing as below, but instead of starting the workers on remote OSG sites Im starting a single worker locally on communicado, where the service is running. This seems to fail in a different manner than the test to OSG sites. I see this in the service log (swift.log file): 2011-08-03 10:01:32,000-0500 INFO Settings Local contacts: [http://128.135.125.17:35852] 2011-08-03 10:01:32,014-0500 INFO CoasterService Started local service: http://128.135.125.17:35852 2011-08-03 10:01:32,014-0500 INFO CoasterService Started coaster service: http://128.135.125.17:41176 2011-08-03 10:05:50,884-0500 INFO AbstractStreamKarajanChannel$Multiplexer Multiplexer 0 started 2011-08-03 10:05:50,884-0500 INFO AbstractStreamKarajanChannel$Multiplexer (0) Scheduling SC-null for addition 2011-08-03 10:05:50,885-0500 INFO AbstractStreamKarajanChannel nullChannel started 2011-08-03 10:05:50,885-0500 INFO AbstractStreamKarajanChannel$Multiplexer Multiplexer 1 started 2011-08-03 10:05:50,909-0500 INFO LocalTCPService Received registration: blockid = twork, url = communicado.ci.uchicago.edu 2011-08-03 10:05:50,919-0500 INFO AbstractKarajanChannel MetaChannel: 700804192[1615734796: {}] -> null: Disabling heartbeats (conf\ ig is null) 2011-08-03 10:05:50,920-0500 INFO MetaChannel MetaChannel: 700804192[1615734796: {}] -> null.bind -> SC-null 2011-08-03 10:05:50,922-0500 DEBUG Cpu workerStarted: twork:communicado.ci.uchicago.edu:0 2011-08-03 10:05:50,922-0500 DEBUG Cpu twork:0 pullLater 2011-08-03 10:05:50,924-0500 INFO Block Started CPU 0:1312383950s 2011-08-03 10:05:50,924-0500 INFO Block Started worker twork:000000 2011-08-03 10:05:50,924-0500 INFO Cpu twork:0 pull 2011-08-03 10:05:50,926-0500 WARN BlockQueueProcessor Failed to send worker status update to client java.lang.NullPointerException at org.globus.cog.karajan.workflow.service.channels.ChannelManager.getMetaChannel(ChannelManager.java:434) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:227) at org.globus.cog.abstraction.coaster.service.job.manager.PassiveQueueProcessor.registrationReceived(PassiveQueueProcessor.j\ ava:72) at org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.registrationReceived(JobQueue.java:143) at org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:64) at org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:57) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:416) at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:157) at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChanne\ l.java:375) 2011-08-03 10:06:00,893-0500 INFO AbstractStreamKarajanChannel$Multiplexer Avg stream buf: 0 2011-08-03 10:06:00,971-0500 INFO PullThread runTime: 4, sleepTime: 10043 2011-08-03 10:06:10,904-0500 INFO AbstractStreamKarajanChannel$Multiplexer Avg stream buf: 0 2011-08-03 10:06:11,012-0500 INFO PullThread runTime: 1, sleepTime: 10040 2011-08-03 10:06:20,911-0500 INFO AbstractStreamKarajanChannel$Multiplexer Avg stream buf: 0 2011-08-03 10:06:21,050-0500 INFO PullThread runTime: 2, sleepTime: 10036 etc... === and this in the service's std out/err log: Local contacts: [http://128.135.125.17:35852] Started local service: http://128.135.125.17:35852 Started coaster service: http://128.135.125.17:41176 Started coaster service: http://128.135.125.17:41176 Multiplexer 0 started (0) Scheduling SC-null for addition nullChannel started Multiplexer 1 started Received registration: blockid = twork, url = communicado.ci.uchicago.edu MetaChannel: 700804192[1615734796: {}] -> null: Disabling heartbeats (config is null) MetaChannel: 700804192[1615734796: {}] -> null.bind -> SC-null Started CPU 0:1312383950s Started worker twork:000000 twork:0 pull Failed to send worker status update to client java.lang.NullPointerException at org.globus.cog.karajan.workflow.service.channels.ChannelManager.getMetaChannel(ChannelManager.java:434) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:227) at org.globus.cog.abstraction.coaster.service.job.manager.PassiveQueueProcessor.registrationReceived(PassiveQueueProcessor.j\ ava:72) at org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.registrationReceived(JobQueue.java:143) at org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:64) at org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:57) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:416) at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:157) at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChanne\ l.java:375) Avg stream buf: 0 runTime: 4, sleepTime: 10043 Avg stream buf: 0 runTime: 1, sleepTime: 10040 Avg stream buf: 0 runTime: 2, sleepTime: 10036 Sender 742510685 queue size: 1 Avg stream buf: 0 runTime: 1, sleepTime: 10042 etc... === and this in the worker log (log level DEBUG): 1312383950.848 INFO - twork Logging started: Wed Aug 3 10:05:50 2011 1312383950.848 INFO - Running on node communicado.ci.uchicago.edu 1312383950.848 DEBUG - uri=http://communicado.ci.uchicago.edu:35852 1312383950.848 DEBUG - scheme=http 1312383950.848 DEBUG - host=communicado.ci.uchicago.edu 1312383950.848 DEBUG - port=35852 1312383950.848 DEBUG - blockid=twork 1312383950.848 INFO - Connecting (0)... 1312383950.848 DEBUG - Trying communicado.ci.uchicago.edu:35852... 1312383950.862 INFO - Connected 1312383950.862 DEBUG - Replies: {} 1312383950.862 DEBUG - OUT: len=8, tag=0, flags=0 1312383950.863 DEBUG - OUT: len=5, tag=0, flags=0 1312383950.863 DEBUG - OUT: len=27, tag=0, flags=0 1312383950.863 DEBUG - OUT: len=16, tag=0, flags=2 1312383950.863 DEBUG - done sending frags for 0 1312383950.931 DEBUG - Fin flag set 1312383950.931 INFO 000000 Registration successful. ID=000000 1312383980.863 DEBUG 000000 Replies: {} 1312383980.863 DEBUG 000000 OUT: len=9, tag=1, flags=2 1312383980.864 DEBUG 000000 done sending frags for 1 1312383980.868 DEBUG 000000 Fin flag set 1312383980.868 DEBUG 000000 Heartbeat acknowledged 1312383986.739 DEBUG 000000 New request (1) 1312383986.739 DEBUG 000000 Fin flag set 1312383986.739 DEBUG 000000 Processing request 1312383986.739 DEBUG 000000 Cmd is HEARTBEAT 1312383986.739 DEBUG 000000 OUT: len=2, tag=1, flags=3 1312383986.739 DEBUG 000000 done sending frags for 1 etc... ----- Original Message ----- > From: "Michael Wilde" > To: "Jonathan Monette" > Cc: "Swift Devel" > Sent: Wednesday, August 3, 2011 9:08:54 AM > Subject: Re: [Swift-devel] trunk coasters > Last night (on current trunk) I was getting this: > > 2011-08-02 23:24:49,863-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION > jobid=cat-4q4lmvdk - Application exception: null > Caused by: > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > Could not submit job > Caused by: org.globus.cog.karajan.workflow.service.ProtocolException: > Failed to set configuration: For input string: "" > Caused by: org.globus.cog.karajan.workflow.service.RemoteException: > Failed to set configuration: For input string: "" > 2011-08-02 23:24:49,866-0500 INFO vdl:execute END_FAILURE > thread=0-3-0-1 tr=cat > 2011-08-02 23:24:49,868-0500 DEBUG VDL2ExecutionContext Exception in > cat: > Arguments: [data.txt] > Host: localhost > Directory: catsn-20110802-2324-ze1lfx8f/jobs/4/cat-4q4lmvdk > - - - > > Exception in cat: > Arguments: [data.txt] > Host: localhost > Directory: catsn-20110802-2324-ze1lfx8f/jobs/4/cat-4q4lmvdk > - - - > > Caused by: null > Caused by: > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > Could not submit job > Caused by: org.globus.cog.karajan.workflow.service.ProtocolException: > Failed to set configuration: For input string: "" > Caused by: org.globus.cog.karajan.workflow.service.RemoteException: > Failed to set configuration: For input string: "" > at > org.globus.cog.karajan.workflow.nodes.functions.KException.function(KException.java:29) > > --- But that was a rather new configuration, so I need to do more > diagnosis. > > Why did you ask, Mihael - are you seeing problems too? > > I hope to work with Alberto this week to resume site config testing > with a focus on coaster configs. > > - Mike > > > > > ----- Original Message ----- > > From: "Jonathan Monette" > > To: "Michael Wilde" > > Cc: "Mihael Hategan" , "Swift Devel" > > > > Sent: Wednesday, August 3, 2011 8:54:48 AM > > Subject: Re: [Swift-devel] trunk coasters > > I have been using automatic coasters and submitting to PADS. I > > haven't > > tried any large scale runs recently though. > > On Aug 3, 2011, at 8:53 AM, Michael Wilde wrote: > > > > > They were failing for me using persistent coasters to osg sites; > > > will > > > be testing further today and file bugs as needed. > > > > > > On 8/3/11, Mihael Hategan wrote: > > >> Has anybody ran a job trunk coasters recently? > > >> > > >> Mihael > > >> > > >> _______________________________________________ > > >> Swift-devel mailing list > > >> Swift-devel at ci.uchicago.edu > > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > >> > > > > > > -- > > > Sent from my mobile device > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Wed Aug 3 10:16:34 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 3 Aug 2011 10:16:34 -0500 (CDT) Subject: [Swift-devel] trunk coasters In-Reply-To: <570047635.184499.1312384361864.JavaMail.root@zimbra.anl.gov> Message-ID: <746307103.184537.1312384594305.JavaMail.root@zimbra.anl.gov> Correction: the simple local worker test below *is* failing in the same manner as the test to OSG sites. A swift run again the service with a single local worker returns the same error as I reported earlier in this thread: com$ swift -config cf.ps -tc.file tc -sites.file sites.grid-ps.xml catsn.swift -n=1 Swift svn swift-r4934 (swift modified locally) cog-r3184 (cog modified locally) RunID: 20110803-1013-2v63ui0g Progress: time: Wed, 03 Aug 2011 10:13:49 -0500 Find: http://localhost:41176 Find: keepalive(120), reconnect - http://localhost:41176 Execution failed: Failed to set configuration: For input string: "" com$ - Mike ----- Original Message ----- > From: "Michael Wilde" > To: "Jonathan Monette" > Cc: "Swift Devel" > Sent: Wednesday, August 3, 2011 10:12:41 AM > Subject: Re: [Swift-devel] trunk coasters > Im testing the persistent coaster setup that was failing as below, but > instead of starting the workers on remote OSG sites Im starting a > single worker locally on communicado, where the service is running. > > This seems to fail in a different manner than the test to OSG sites. > > I see this in the service log (swift.log file): > > 2011-08-03 10:01:32,000-0500 INFO Settings Local contacts: > [http://128.135.125.17:35852] > 2011-08-03 10:01:32,014-0500 INFO CoasterService Started local > service: http://128.135.125.17:35852 > 2011-08-03 10:01:32,014-0500 INFO CoasterService Started coaster > service: http://128.135.125.17:41176 > 2011-08-03 10:05:50,884-0500 INFO > AbstractStreamKarajanChannel$Multiplexer Multiplexer 0 started > 2011-08-03 10:05:50,884-0500 INFO > AbstractStreamKarajanChannel$Multiplexer (0) Scheduling SC-null for > addition > 2011-08-03 10:05:50,885-0500 INFO AbstractStreamKarajanChannel > nullChannel started > 2011-08-03 10:05:50,885-0500 INFO > AbstractStreamKarajanChannel$Multiplexer Multiplexer 1 started > 2011-08-03 10:05:50,909-0500 INFO LocalTCPService Received > registration: blockid = twork, url = communicado.ci.uchicago.edu > 2011-08-03 10:05:50,919-0500 INFO AbstractKarajanChannel MetaChannel: > 700804192[1615734796: {}] -> null: Disabling heartbeats (conf\ > ig is null) > 2011-08-03 10:05:50,920-0500 INFO MetaChannel MetaChannel: > 700804192[1615734796: {}] -> null.bind -> SC-null > 2011-08-03 10:05:50,922-0500 DEBUG Cpu workerStarted: > twork:communicado.ci.uchicago.edu:0 > 2011-08-03 10:05:50,922-0500 DEBUG Cpu twork:0 pullLater > 2011-08-03 10:05:50,924-0500 INFO Block Started CPU 0:1312383950s > 2011-08-03 10:05:50,924-0500 INFO Block Started worker twork:000000 > 2011-08-03 10:05:50,924-0500 INFO Cpu twork:0 pull > 2011-08-03 10:05:50,926-0500 WARN BlockQueueProcessor Failed to send > worker status update to client > java.lang.NullPointerException > at > org.globus.cog.karajan.workflow.service.channels.ChannelManager.getMetaChannel(ChannelManager.java:434) > at > org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:227) > at > org.globus.cog.abstraction.coaster.service.job.manager.PassiveQueueProcessor.registrationReceived(PassiveQueueProcessor.j\ > ava:72) > at > org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.registrationReceived(JobQueue.java:143) > at > org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:64) > at > org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:57) > at > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84) > at > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:416) > at > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:157) > at > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChanne\ > l.java:375) > 2011-08-03 10:06:00,893-0500 INFO > AbstractStreamKarajanChannel$Multiplexer Avg stream buf: 0 > 2011-08-03 10:06:00,971-0500 INFO PullThread runTime: 4, sleepTime: > 10043 > 2011-08-03 10:06:10,904-0500 INFO > AbstractStreamKarajanChannel$Multiplexer Avg stream buf: 0 > 2011-08-03 10:06:11,012-0500 INFO PullThread runTime: 1, sleepTime: > 10040 > 2011-08-03 10:06:20,911-0500 INFO > AbstractStreamKarajanChannel$Multiplexer Avg stream buf: 0 > 2011-08-03 10:06:21,050-0500 INFO PullThread runTime: 2, sleepTime: > 10036 > etc... > > === and this in the service's std out/err log: > > Local contacts: [http://128.135.125.17:35852] > Started local service: http://128.135.125.17:35852 > Started coaster service: http://128.135.125.17:41176 > Started coaster service: http://128.135.125.17:41176 > Multiplexer 0 started > (0) Scheduling SC-null for addition > nullChannel started > Multiplexer 1 started > Received registration: blockid = twork, url = > communicado.ci.uchicago.edu > MetaChannel: 700804192[1615734796: {}] -> null: Disabling heartbeats > (config is null) > MetaChannel: 700804192[1615734796: {}] -> null.bind -> SC-null > Started CPU 0:1312383950s > Started worker twork:000000 > twork:0 pull > Failed to send worker status update to client > java.lang.NullPointerException > at > org.globus.cog.karajan.workflow.service.channels.ChannelManager.getMetaChannel(ChannelManager.java:434) > at > org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:227) > at > org.globus.cog.abstraction.coaster.service.job.manager.PassiveQueueProcessor.registrationReceived(PassiveQueueProcessor.j\ > ava:72) > at > org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.registrationReceived(JobQueue.java:143) > at > org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:64) > at > org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:57) > at > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84) > at > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:416) > at > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:157) > at > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChanne\ > l.java:375) > Avg stream buf: 0 > runTime: 4, sleepTime: 10043 > Avg stream buf: 0 > runTime: 1, sleepTime: 10040 > Avg stream buf: 0 > runTime: 2, sleepTime: 10036 > Sender 742510685 queue size: 1 > Avg stream buf: 0 > runTime: 1, sleepTime: 10042 > etc... > > === and this in the worker log (log level DEBUG): > > 1312383950.848 INFO - twork Logging started: Wed Aug 3 10:05:50 2011 > 1312383950.848 INFO - Running on node communicado.ci.uchicago.edu > 1312383950.848 DEBUG - uri=http://communicado.ci.uchicago.edu:35852 > 1312383950.848 DEBUG - scheme=http > 1312383950.848 DEBUG - host=communicado.ci.uchicago.edu > 1312383950.848 DEBUG - port=35852 > 1312383950.848 DEBUG - blockid=twork > 1312383950.848 INFO - Connecting (0)... > 1312383950.848 DEBUG - Trying communicado.ci.uchicago.edu:35852... > 1312383950.862 INFO - Connected > 1312383950.862 DEBUG - Replies: {} > 1312383950.862 DEBUG - OUT: len=8, tag=0, flags=0 > 1312383950.863 DEBUG - OUT: len=5, tag=0, flags=0 > 1312383950.863 DEBUG - OUT: len=27, tag=0, flags=0 > 1312383950.863 DEBUG - OUT: len=16, tag=0, flags=2 > 1312383950.863 DEBUG - done sending frags for 0 > 1312383950.931 DEBUG - Fin flag set > 1312383950.931 INFO 000000 Registration successful. ID=000000 > 1312383980.863 DEBUG 000000 Replies: {} > 1312383980.863 DEBUG 000000 OUT: len=9, tag=1, flags=2 > 1312383980.864 DEBUG 000000 done sending frags for 1 > 1312383980.868 DEBUG 000000 Fin flag set > 1312383980.868 DEBUG 000000 Heartbeat acknowledged > 1312383986.739 DEBUG 000000 New request (1) > 1312383986.739 DEBUG 000000 Fin flag set > 1312383986.739 DEBUG 000000 Processing request > 1312383986.739 DEBUG 000000 Cmd is HEARTBEAT > 1312383986.739 DEBUG 000000 OUT: len=2, tag=1, flags=3 > 1312383986.739 DEBUG 000000 done sending frags for 1 > etc... > > > > ----- Original Message ----- > > From: "Michael Wilde" > > To: "Jonathan Monette" > > Cc: "Swift Devel" > > Sent: Wednesday, August 3, 2011 9:08:54 AM > > Subject: Re: [Swift-devel] trunk coasters > > Last night (on current trunk) I was getting this: > > > > 2011-08-02 23:24:49,863-0500 DEBUG vdl:execute2 > > APPLICATION_EXCEPTION > > jobid=cat-4q4lmvdk - Application exception: null > > Caused by: > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > Could not submit job > > Caused by: > > org.globus.cog.karajan.workflow.service.ProtocolException: > > Failed to set configuration: For input string: "" > > Caused by: org.globus.cog.karajan.workflow.service.RemoteException: > > Failed to set configuration: For input string: "" > > 2011-08-02 23:24:49,866-0500 INFO vdl:execute END_FAILURE > > thread=0-3-0-1 tr=cat > > 2011-08-02 23:24:49,868-0500 DEBUG VDL2ExecutionContext Exception in > > cat: > > Arguments: [data.txt] > > Host: localhost > > Directory: catsn-20110802-2324-ze1lfx8f/jobs/4/cat-4q4lmvdk > > - - - > > > > Exception in cat: > > Arguments: [data.txt] > > Host: localhost > > Directory: catsn-20110802-2324-ze1lfx8f/jobs/4/cat-4q4lmvdk > > - - - > > > > Caused by: null > > Caused by: > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > Could not submit job > > Caused by: > > org.globus.cog.karajan.workflow.service.ProtocolException: > > Failed to set configuration: For input string: "" > > Caused by: org.globus.cog.karajan.workflow.service.RemoteException: > > Failed to set configuration: For input string: "" > > at > > org.globus.cog.karajan.workflow.nodes.functions.KException.function(KException.java:29) > > > > --- But that was a rather new configuration, so I need to do more > > diagnosis. > > > > Why did you ask, Mihael - are you seeing problems too? > > > > I hope to work with Alberto this week to resume site config testing > > with a focus on coaster configs. > > > > - Mike > > > > > > > > > > ----- Original Message ----- > > > From: "Jonathan Monette" > > > To: "Michael Wilde" > > > Cc: "Mihael Hategan" , "Swift Devel" > > > > > > Sent: Wednesday, August 3, 2011 8:54:48 AM > > > Subject: Re: [Swift-devel] trunk coasters > > > I have been using automatic coasters and submitting to PADS. I > > > haven't > > > tried any large scale runs recently though. > > > On Aug 3, 2011, at 8:53 AM, Michael Wilde wrote: > > > > > > > They were failing for me using persistent coasters to osg sites; > > > > will > > > > be testing further today and file bugs as needed. > > > > > > > > On 8/3/11, Mihael Hategan wrote: > > > >> Has anybody ran a job trunk coasters recently? > > > >> > > > >> Mihael > > > >> > > > >> _______________________________________________ > > > >> Swift-devel mailing list > > > >> Swift-devel at ci.uchicago.edu > > > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > >> > > > > > > > > -- > > > > Sent from my mobile device > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From benc at hawaga.org.uk Wed Aug 3 10:20:46 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 3 Aug 2011 17:20:46 +0200 Subject: [Swift-devel] iterate behaviour round II In-Reply-To: <1654887811.184149.1312381409266.JavaMail.root@zimbra.anl.gov> References: <1654887811.184149.1312381409266.JavaMail.root@zimbra.anl.gov> Message-ID: On Aug 3, 2011, at 4:23 PM, Michael Wilde wrote: > > To start a parallel thread here: how feasible is it (within the write-once variable model and scope-creation semantics that iterate uses) to provide the 3 C iteration statements with syntax and semantics as close to C as possible? the independent-iterations use cases are pretty well covered by foreach, I think. (where each iteration is independent of other iterations). the non-indepent-iterations use cases are fairly poorly defined, and its hard to throw syntax suggestions around without those. swift had 'while' in the past, 'iterate' which replaced it but which is fairly ugly, and the suggestion i made earlier in this thread which I like better than iterate but is still ugly. More suggestions would be interesting, even if just for scoping out what people want from such a construct. -- From hategan at mcs.anl.gov Wed Aug 3 12:05:54 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 03 Aug 2011 10:05:54 -0700 Subject: [Swift-devel] extractint In-Reply-To: References: <1312349263.21792.0.camel@blabla> Message-ID: <1312391154.24112.0.camel@blabla> On Wed, 2011-08-03 at 13:39 +0200, Ben Clifford wrote: > On Aug 3, 2011, at 7:27 AM, Mihael Hategan wrote: > > > Why is extractint returning a float? > > because swift numerical types are poorly defined? The internal representation used to be. I changed that. But that still doesn't explain why extractint returns a Swift float. From wilde at mcs.anl.gov Wed Aug 3 12:07:37 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 3 Aug 2011 12:07:37 -0500 (CDT) Subject: [Swift-devel] extractint In-Reply-To: <1312391154.24112.0.camel@blabla> Message-ID: <1007137.185235.1312391257382.JavaMail.root@zimbra.anl.gov> Can you double check? In looking at the iterate issues this morning I noticed that trace() is printing ints as if they are floats. Is that perhaps what you are seeing? - Mike ----- Original Message ----- > From: "Mihael Hategan" > To: "Ben Clifford" > Cc: "Swift Devel" > Sent: Wednesday, August 3, 2011 12:05:54 PM > Subject: Re: [Swift-devel] extractint > On Wed, 2011-08-03 at 13:39 +0200, Ben Clifford wrote: > > On Aug 3, 2011, at 7:27 AM, Mihael Hategan wrote: > > > > > Why is extractint returning a float? > > > > because swift numerical types are poorly defined? > > The internal representation used to be. I changed that. > > But that still doesn't explain why extractint returns a Swift float. > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Wed Aug 3 12:08:10 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 03 Aug 2011 10:08:10 -0700 Subject: [Swift-devel] trunk coasters In-Reply-To: <1723865632.184066.1312380534696.JavaMail.root@zimbra.anl.gov> References: <1723865632.184066.1312380534696.JavaMail.root@zimbra.anl.gov> Message-ID: <1312391290.24112.2.camel@blabla> On Wed, 2011-08-03 at 09:08 -0500, Michael Wilde wrote: > Why did you ask, Mihael - are you seeing problems too? I asked because when I tried to use my local copy I was getting problems with the service not being able to connect back to the client due to channels not being found and other weirdness. I fixed it in my local copy, but if it actually works in a clean trunk checkout, I'd rather not fix that. From hategan at mcs.anl.gov Wed Aug 3 12:11:54 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 03 Aug 2011 10:11:54 -0700 Subject: [Swift-devel] trunk coasters In-Reply-To: <570047635.184499.1312384361864.JavaMail.root@zimbra.anl.gov> References: <570047635.184499.1312384361864.JavaMail.root@zimbra.anl.gov> Message-ID: <1312391514.24112.5.camel@blabla> On Wed, 2011-08-03 at 10:12 -0500, Michael Wilde wrote: > Im testing the persistent coaster setup that was failing as below, but > instead of starting the workers on remote OSG sites Im starting a > single worker locally on communicado, where the service is running. > > This seems to fail in a different manner than the test to OSG sites. > > I see this in the service log (swift.log file): > [...] > 2011-08-03 10:05:50,926-0500 WARN BlockQueueProcessor Failed to send worker status update to client > java.lang.NullPointerException > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.getMetaChannel(ChannelManager.java:434) Great. That's what I was looking for. It seems it needs fixing. From hategan at mcs.anl.gov Wed Aug 3 12:13:49 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 03 Aug 2011 10:13:49 -0700 Subject: [Swift-devel] extractint In-Reply-To: <1007137.185235.1312391257382.JavaMail.root@zimbra.anl.gov> References: <1007137.185235.1312391257382.JavaMail.root@zimbra.anl.gov> Message-ID: <1312391629.24112.6.camel@blabla> On Wed, 2011-08-03 at 12:07 -0500, Michael Wilde wrote: > Can you double check? In looking at the iterate issues this morning I > noticed that trace() is printing ints as if they are floats. Is that > perhaps what you are seeing? The code is pretty unambiguous: DSHandle result = new RootDataNode(Types.FLOAT, i); From hategan at mcs.anl.gov Wed Aug 3 12:19:28 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 03 Aug 2011 10:19:28 -0700 Subject: [Swift-devel] iterate behaviour round II In-Reply-To: References: <1654887811.184149.1312381409266.JavaMail.root@zimbra.anl.gov> Message-ID: <1312391968.24112.11.camel@blabla> On Wed, 2011-08-03 at 17:20 +0200, Ben Clifford wrote: > On Aug 3, 2011, at 4:23 PM, Michael Wilde wrote: > > > > > To start a parallel thread here: how feasible is it (within the > write-once variable model and scope-creation semantics that iterate > uses) to provide the 3 C iteration statements with syntax and > semantics as close to C as possible? > > the independent-iterations use cases are pretty well covered by > foreach, I think. (where each iteration is independent of other > iterations) The dependent iterations are also covered by foreach, and you can't deadlock as easily as you can in the iterate case: int a[]; a[0] = 10; foreach v, k in a { if (v > 1) { a[k + 1] = v - 1; } } Though now you could equally do: int a[auto]; a << 10; foreach v in a { if (v > 1) { a << v - 1; } } From hategan at mcs.anl.gov Wed Aug 3 12:24:12 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 03 Aug 2011 10:24:12 -0700 Subject: [Swift-devel] iterate behaviour round II In-Reply-To: <1654887811.184149.1312381409266.JavaMail.root@zimbra.anl.gov> References: <1654887811.184149.1312381409266.JavaMail.root@zimbra.anl.gov> Message-ID: <1312392252.24112.16.camel@blabla> On Wed, 2011-08-03 at 09:23 -0500, Michael Wilde wrote: > I propose that we do everything possible to ensure that the semantics > of iterate does not change from 0.92.1, to avoid breaking code. NCAR > and the DOE ParVis project, in particular, has a very large Swift > script that they are testing for production use, and we really dont > want that to break. > > e should not allow 0.93 to break current user code -- if at all possible. On one hand, I agree with you. On the other hand, I do not think that backwards compatibility, in the long run, is a good justification for keeping something that is really poorly done. From wilde at mcs.anl.gov Wed Aug 3 12:33:41 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 3 Aug 2011 12:33:41 -0500 (CDT) Subject: [Swift-devel] Meeting 4PM to talk about release In-Reply-To: <1328230364.185173.1312390766463.JavaMail.root@zimbra.anl.gov> Message-ID: <1692079976.185348.1312392821704.JavaMail.root@zimbra.anl.gov> Lets meet at 4PM Central to discuss 0.93. Dial in: (218) 862-6420 access code 815549 Justin is on vacation. Mihael, Alberto, David, Ketan, Jon - can you join? Topics: - testing 0.93 branch(es) vs trunk - 0.93 was cut Jul 5 - do we want to keep that, or get trunk in shape and make another branch? Can we do "0.NNrcX" branches? How much has been committed to 0.93 so far? - identifying blocker bugs for 0.93 - especially for Mihael to focus on -- resolution of the iterate statement for 0.93 - do bugs remain? did it change from 0.92.1? Are any issues in this statement related to semantic definitions, or due to subtleties in synchronization? I have a test I'd like to hand over: works as expected in 0.92, fails in trunk. -- jobsPerNode not working in trunk (due to Justin's attr mods?) Ketan taking this. -- PBS trunk issues related to attributes? (similar to prior? getting cray attr by mistake) -- fix SGE provider for limited set of machines? (Ranger, ibicluster, ?) - coordination of site testing and approach for that - docs and web cleanup; esp site config guide and gensites, and tutorial route - set target date for 0.93 - is Aug 30 possible? Thanks, - Mike From benc at hawaga.org.uk Wed Aug 3 12:50:59 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 3 Aug 2011 17:50:59 +0000 (GMT) Subject: [Swift-devel] iterate behaviour round II In-Reply-To: <1312391968.24112.11.camel@blabla> References: <1654887811.184149.1312381409266.JavaMail.root@zimbra.anl.gov> <1312391968.24112.11.camel@blabla> Message-ID: > The dependent iterations are also covered by foreach, and you can't > deadlock as easily as you can in the iterate case: I think I disliked that approach before because my thinking was that Swift should require a to be full defined outside of the foreach. But there's no really strong reason for requiring that (after all, it was the whole point of iterate...) and I think it looks nice with the << syntax. > Though now you could equally do: > int a[auto]; a << 10; > foreach v in a { > if (v > 1) { a << v - 1; } > } -- From wilde at mcs.anl.gov Wed Aug 3 12:53:37 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 3 Aug 2011 12:53:37 -0500 (CDT) Subject: [Swift-devel] iterate behaviour round II In-Reply-To: <1312392252.24112.16.camel@blabla> Message-ID: <1050281184.185428.1312394017994.JavaMail.root@zimbra.anl.gov> Right. So lets keep iterate semantically unchanged for now, then deprecate it when we have an approach thats clearly better. For 0.93 lets focus on making it work as currently described. Lets consider deprecating it in some not-too-distant release and if possible remove it much later with sufficient notice to the user community. - Mike (Also note that where iterate is first mentioned in the user guide it has no until() clause.) ----- Original Message ----- > From: "Mihael Hategan" > To: "Michael Wilde" > Cc: "Swift Devel" > Sent: Wednesday, August 3, 2011 12:24:12 PM > Subject: Re: [Swift-devel] iterate behaviour round II > On Wed, 2011-08-03 at 09:23 -0500, Michael Wilde wrote: > > I propose that we do everything possible to ensure that the > > semantics > > of iterate does not change from 0.92.1, to avoid breaking code. NCAR > > and the DOE ParVis project, in particular, has a very large Swift > > script that they are testing for production use, and we really dont > > want that to break. > > > > e should not allow 0.93 to break current user code -- if at all > > possible. > > On one hand, I agree with you. > > On the other hand, I do not think that backwards compatibility, in the > long run, is a good justification for keeping something that is really > poorly done. -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Wed Aug 3 12:56:25 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 03 Aug 2011 10:56:25 -0700 Subject: [Swift-devel] iterate behaviour round II In-Reply-To: <1050281184.185428.1312394017994.JavaMail.root@zimbra.anl.gov> References: <1050281184.185428.1312394017994.JavaMail.root@zimbra.anl.gov> Message-ID: <1312394185.24839.0.camel@blabla> On Wed, 2011-08-03 at 12:53 -0500, Michael Wilde wrote: > Right. So lets keep iterate semantically unchanged for now, then > deprecate it when we have an approach thats clearly better. For 0.93 > lets focus on making it work as currently described. I agree. I will revert the change I did before. From hategan at mcs.anl.gov Wed Aug 3 12:56:35 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 03 Aug 2011 10:56:35 -0700 Subject: [Swift-devel] Meeting 4PM to talk about release In-Reply-To: <1692079976.185348.1312392821704.JavaMail.root@zimbra.anl.gov> References: <1692079976.185348.1312392821704.JavaMail.root@zimbra.anl.gov> Message-ID: <1312394195.24839.1.camel@blabla> On Wed, 2011-08-03 at 12:33 -0500, Michael Wilde wrote: > Lets meet at 4PM Central to discuss 0.93. Dial in: (218) 862-6420 access code 815549 > > Justin is on vacation. Mihael, Alberto, David, Ketan, Jon - can you join? I'm in. From wilde at mcs.anl.gov Wed Aug 3 13:22:43 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 3 Aug 2011 13:22:43 -0500 (CDT) Subject: [Swift-devel] iterate behaviour round II In-Reply-To: <1312394185.24839.0.camel@blabla> Message-ID: <400213490.185542.1312395763634.JavaMail.root@zimbra.anl.gov> Cool. Related to current semantics, Im seeing a case where iterate seems to not terminate correctly with an == test but does with a > test. Is there some float funkiness going on in there too? ">" termination condition works OK: com$ cat iterategt.swift iterate i { trace(i); } until(i > 3); com$ swift iterategt.swift | head -15 no sites file specified, setting to default: /scratch/local/wilde/swift/src/devtrunk/cog/modules/swift/dist/swift-svn/etc/sites.xml Swift svn swift-r4934 (swift modified locally) cog-r3184 (cog modified locally) RunID: 20110803-1315-qn3cr6s8 Progress: time: Wed, 03 Aug 2011 13:15:38 -0500 SwiftScript trace: 0 SwiftScript trace: 1 SwiftScript trace: 2 SwiftScript trace: 3 Final status: time: Wed, 03 Aug 2011 13:15:38 -0500 com$ "==" termination condition never terminates: com$ com$ cat iterateeq.swift iterate i { trace(i); } until(i == 3); com$ com$ swift iterateeq.swift | head -15 no sites file specified, setting to default: /scratch/local/wilde/swift/src/devtrunk/cog/modules/swift/dist/swift-svn/etc/sites.xml Swift svn swift-r4934 (swift modified locally) cog-r3184 (cog modified locally) RunID: 20110803-1316-f9qhsxig SwiftScript trace: 0 Progress: time: Wed, 03 Aug 2011 13:16:06 -0500 SwiftScript trace: 1 SwiftScript trace: 2 SwiftScript trace: 3 SwiftScript trace: 4 SwiftScript trace: 5 SwiftScript trace: 6 SwiftScript trace: 7 SwiftScript trace: 8 SwiftScript trace: 9 SwiftScript trace: 10 ^C com$ swift -version no sites file specified, setting to default: /scratch/local/wilde/swift/src/devtrunk/cog/modules/swift/dist/swift-svn/etc/sites.xml Swift svn swift-r4934 (swift modified locally) cog-r3184 (cog modified locally) com$ which swift /scratch/local/wilde/swift/src/devtrunk/cog/modules/swift/dist/swift-svn/bin/swift com$ ----- Original Message ----- > From: "Mihael Hategan" > To: "Michael Wilde" > Cc: "Swift Devel" > Sent: Wednesday, August 3, 2011 12:56:25 PM > Subject: Re: [Swift-devel] iterate behaviour round II > On Wed, 2011-08-03 at 12:53 -0500, Michael Wilde wrote: > > Right. So lets keep iterate semantically unchanged for now, then > > deprecate it when we have an approach thats clearly better. For 0.93 > > lets focus on making it work as currently described. > > I agree. I will revert the change I did before. -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Wed Aug 3 13:41:27 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 3 Aug 2011 13:41:27 -0500 (CDT) Subject: [Swift-devel] Meeting 4PM to talk about release In-Reply-To: Message-ID: <1199301879.185637.1312396887107.JavaMail.root@zimbra.anl.gov> This page describes what was done for site testing on prior releases: https://sites.google.com/site/swiftdevel/site-specific-testing Alberto, you should update that page with a plan for site testing for 0.93. I'll help you with that this week and you can continue with Justin when he returns next week. - Mike ----- Original Message ----- From: "Alberto Chavez" To: "Mike Wilde" Sent: Wednesday, August 3, 2011 12:54:48 PM Subject: RE: [Swift-devel] Meeting 4PM to talk about release 4PM sounds good > Date: Wed, 3 Aug 2011 12:33:41 -0500 > From: wilde at mcs.anl.gov > To: swift-devel at ci.uchicago.edu > Subject: [Swift-devel] Meeting 4PM to talk about release > > Lets meet at 4PM Central to discuss 0.93. Dial in: (218) 862-6420 access code 815549 > > Justin is on vacation. Mihael, Alberto, David, Ketan, Jon - can you join? > > Topics: > > - testing 0.93 branch(es) vs trunk > - 0.93 was cut Jul 5 - do we want to keep that, or get trunk in shape > and make another branch? Can we do "0.NNrcX" branches? > How much has been committed to 0.93 so far? > > - identifying blocker bugs for 0.93 - especially for Mihael to focus on > > -- resolution of the iterate statement for 0.93 - do bugs remain? > did it change from 0.92.1? Are any issues in this statement > related to semantic definitions, or due to subtleties in synchronization? > I have a test I'd like to hand over: works as expected in 0.92, fails in trunk. > > -- jobsPerNode not working in trunk (due to Justin's attr mods?) Ketan taking this. > > -- PBS trunk issues related to attributes? (similar to prior? getting cray attr by mistake) > > -- fix SGE provider for limited set of machines? (Ranger, ibicluster, ?) > > - coordination of site testing and approach for that > > - docs and web cleanup; esp site config guide and gensites, and tutorial route > > - set target date for 0.93 - is Aug 30 possible? > > Thanks, > > - Mike > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed Aug 3 15:36:55 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 03 Aug 2011 13:36:55 -0700 Subject: [Swift-devel] iterate behaviour round II In-Reply-To: <400213490.185542.1312395763634.JavaMail.root@zimbra.anl.gov> References: <400213490.185542.1312395763634.JavaMail.root@zimbra.anl.gov> Message-ID: <1312403815.26065.0.camel@blabla> Can you do an svn up an recheck? On Wed, 2011-08-03 at 13:22 -0500, Michael Wilde wrote: > Cool. Related to current semantics, Im seeing a case where iterate seems to not terminate correctly with an == test but does with a > test. Is there some float funkiness going on in there too? > > ">" termination condition works OK: > > com$ cat iterategt.swift > iterate i > { > trace(i); > } until(i > 3); > com$ swift iterategt.swift | head -15 > no sites file specified, setting to default: /scratch/local/wilde/swift/src/devtrunk/cog/modules/swift/dist/swift-svn/etc/sites.xml > Swift svn swift-r4934 (swift modified locally) cog-r3184 (cog modified locally) > > RunID: 20110803-1315-qn3cr6s8 > Progress: time: Wed, 03 Aug 2011 13:15:38 -0500 > SwiftScript trace: 0 > SwiftScript trace: 1 > SwiftScript trace: 2 > SwiftScript trace: 3 > Final status: time: Wed, 03 Aug 2011 13:15:38 -0500 > com$ > > "==" termination condition never terminates: > > com$ > com$ cat iterateeq.swift > iterate i > { > trace(i); > } until(i == 3); > com$ > > com$ swift iterateeq.swift | head -15 > no sites file specified, setting to default: /scratch/local/wilde/swift/src/devtrunk/cog/modules/swift/dist/swift-svn/etc/sites.xml > Swift svn swift-r4934 (swift modified locally) cog-r3184 (cog modified locally) > > RunID: 20110803-1316-f9qhsxig > SwiftScript trace: 0 > Progress: time: Wed, 03 Aug 2011 13:16:06 -0500 > SwiftScript trace: 1 > SwiftScript trace: 2 > SwiftScript trace: 3 > SwiftScript trace: 4 > SwiftScript trace: 5 > SwiftScript trace: 6 > SwiftScript trace: 7 > SwiftScript trace: 8 > SwiftScript trace: 9 > SwiftScript trace: 10 > > ^C > com$ swift -version > no sites file specified, setting to default: /scratch/local/wilde/swift/src/devtrunk/cog/modules/swift/dist/swift-svn/etc/sites.xml > Swift svn swift-r4934 (swift modified locally) cog-r3184 (cog modified locally) > > com$ which swift > /scratch/local/wilde/swift/src/devtrunk/cog/modules/swift/dist/swift-svn/bin/swift > com$ > > > > ----- Original Message ----- > > From: "Mihael Hategan" > > To: "Michael Wilde" > > Cc: "Swift Devel" > > Sent: Wednesday, August 3, 2011 12:56:25 PM > > Subject: Re: [Swift-devel] iterate behaviour round II > > On Wed, 2011-08-03 at 12:53 -0500, Michael Wilde wrote: > > > Right. So lets keep iterate semantically unchanged for now, then > > > deprecate it when we have an approach thats clearly better. For 0.93 > > > lets focus on making it work as currently described. > > > > I agree. I will revert the change I did before. > From wilde at mcs.anl.gov Wed Aug 3 15:42:41 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 3 Aug 2011 15:42:41 -0500 (CDT) Subject: [Swift-devel] iterate behaviour round II In-Reply-To: <1312403815.26065.0.camel@blabla> Message-ID: <929176831.186312.1312404161065.JavaMail.root@zimbra.anl.gov> Great, that fixed the failing eq case. - Mike ----- Original Message ----- > From: "Mihael Hategan" > To: "Michael Wilde" > Cc: "Swift Devel" > Sent: Wednesday, August 3, 2011 3:36:55 PM > Subject: Re: [Swift-devel] iterate behaviour round II > Can you do an svn up an recheck? > > On Wed, 2011-08-03 at 13:22 -0500, Michael Wilde wrote: > > Cool. Related to current semantics, Im seeing a case where iterate > > seems to not terminate correctly with an == test but does with a > > > test. Is there some float funkiness going on in there too? > > > > ">" termination condition works OK: > > > > com$ cat iterategt.swift > > iterate i > > { > > trace(i); > > } until(i > 3); > > com$ swift iterategt.swift | head -15 > > no sites file specified, setting to default: > > /scratch/local/wilde/swift/src/devtrunk/cog/modules/swift/dist/swift-svn/etc/sites.xml > > Swift svn swift-r4934 (swift modified locally) cog-r3184 (cog > > modified locally) > > > > RunID: 20110803-1315-qn3cr6s8 > > Progress: time: Wed, 03 Aug 2011 13:15:38 -0500 > > SwiftScript trace: 0 > > SwiftScript trace: 1 > > SwiftScript trace: 2 > > SwiftScript trace: 3 > > Final status: time: Wed, 03 Aug 2011 13:15:38 -0500 > > com$ > > > > "==" termination condition never terminates: > > > > com$ > > com$ cat iterateeq.swift > > iterate i > > { > > trace(i); > > } until(i == 3); > > com$ > > > > com$ swift iterateeq.swift | head -15 > > no sites file specified, setting to default: > > /scratch/local/wilde/swift/src/devtrunk/cog/modules/swift/dist/swift-svn/etc/sites.xml > > Swift svn swift-r4934 (swift modified locally) cog-r3184 (cog > > modified locally) > > > > RunID: 20110803-1316-f9qhsxig > > SwiftScript trace: 0 > > Progress: time: Wed, 03 Aug 2011 13:16:06 -0500 > > SwiftScript trace: 1 > > SwiftScript trace: 2 > > SwiftScript trace: 3 > > SwiftScript trace: 4 > > SwiftScript trace: 5 > > SwiftScript trace: 6 > > SwiftScript trace: 7 > > SwiftScript trace: 8 > > SwiftScript trace: 9 > > SwiftScript trace: 10 > > > > ^C > > com$ swift -version > > no sites file specified, setting to default: > > /scratch/local/wilde/swift/src/devtrunk/cog/modules/swift/dist/swift-svn/etc/sites.xml > > Swift svn swift-r4934 (swift modified locally) cog-r3184 (cog > > modified locally) > > > > com$ which swift > > /scratch/local/wilde/swift/src/devtrunk/cog/modules/swift/dist/swift-svn/bin/swift > > com$ > > > > > > > > ----- Original Message ----- > > > From: "Mihael Hategan" > > > To: "Michael Wilde" > > > Cc: "Swift Devel" > > > Sent: Wednesday, August 3, 2011 12:56:25 PM > > > Subject: Re: [Swift-devel] iterate behaviour round II > > > On Wed, 2011-08-03 at 12:53 -0500, Michael Wilde wrote: > > > > Right. So lets keep iterate semantically unchanged for now, then > > > > deprecate it when we have an approach thats clearly better. For > > > > 0.93 > > > > lets focus on making it work as currently described. > > > > > > I agree. I will revert the change I did before. > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From davidk at ci.uchicago.edu Wed Aug 3 20:57:07 2011 From: davidk at ci.uchicago.edu (David Kelly) Date: Wed, 3 Aug 2011 20:57:07 -0500 (CDT) Subject: [Swift-devel] 0.93 site testing In-Reply-To: <1009000252.54108.1312422324637.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: <519491437.54124.1312423027539.JavaMail.root@zimbra-mb2.anl.gov> Hello, I updated the swift devel website tonight with plans for 0.93 site testing. I am starting with the same site tests that we performed with the last release. The page is at: https://sites.google.com/site/swiftdevel/site-specific-testing Feel free to edit that page as the tests get run. The tests are located in swift/tests/providers, but they likely need tweaked. I'm pretty sure the PADS template is incorrect, others may be as well so it is probably worthwhile to double check everything. David From wilde at mcs.anl.gov Thu Aug 4 04:44:29 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 4 Aug 2011 04:44:29 -0500 (CDT) Subject: [Swift-devel] 0.93 site testing In-Reply-To: <519491437.54124.1312423027539.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: <488459950.187367.1312451069961.JavaMail.root@zimbra.anl.gov> David, Alberto, The test list looks good. We can maybe shuffle the names assigned: Alberto on PADS and Fusion; Justin can maybe help on the BG/P's and Frankin; can add the Cray crow test system. It might be good to add tests of the plain ssh provider, and then the ssh:local coaster configuration. (Jon is using this eg to run on PADS and Beagle from Globus Online).(Aside: I see very excessive logging from the ssh provider - lets investigate, I'll file a ticket) I noticed that in several past incidents our site tests were fooled into thinking they passed, when in fact the actual application invocations took place in an environment different than intended. Some cases that come to mind: - thinking that we were running on a cluster via coasters when in fact the apps ran on localhost. This was the incorrect PADS sites entry you mention below. - thinking that we were running on Cray compute nodes when in fact the apps ran on the Cray PBS service node (on Beagle, again this was a login node) - asking to run N apps per compute node (1 per core) when in fact we ran 1 app per node - asking to run N apps per compute node when in fact we ran N^2 apps per node In this next round of testing, can we enhance the tests (or add new ones) so that: 1) part of the app execution records the node(s) it executes on and ensures that we are running on a compute node (We can do this in a site-independent fashion by adding a "compute node hostname pattern" to the siteTester script: https://trac.ci.uchicago.edu/swift/browser/trunk/tests/sitetester?desc=1 and passing the name pattern to the test. 2) the expected number of apps are running on the compute node (sleep; do ps; count the number of app shells running, and ensure that there are >1 and <= N) - Mike ----- Original Message ----- > From: "David Kelly" > To: swift-devel at ci.uchicago.edu > Sent: Wednesday, August 3, 2011 8:57:07 PM > Subject: [Swift-devel] 0.93 site testing > Hello, > > I updated the swift devel website tonight with plans for 0.93 site > testing. I am starting with the same site tests that we performed with > the last release. The page is at: > > https://sites.google.com/site/swiftdevel/site-specific-testing > > Feel free to edit that page as the tests get run. The tests are > located in swift/tests/providers, but they likely need tweaked. I'm > pretty sure the PADS template is incorrect, others may be as well so > it is probably worthwhile to double check everything. > > David > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Thu Aug 4 04:52:41 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 4 Aug 2011 04:52:41 -0500 (CDT) Subject: [Swift-devel] 0.93 site testing In-Reply-To: <488459950.187367.1312451069961.JavaMail.root@zimbra.anl.gov> Message-ID: <1043952745.187374.1312451561434.JavaMail.root@zimbra.anl.gov> A question about 0.93 testing in general: We agreed yesterday to stay with the current 0.93 branch. Should we be testing 0.93 with the tests in 0.93 itself, or the tests in trunk, or both? I think if we do this rigorously, the answer is that we test 0.93 with the 0.93 test suite, but that we integrate relevant test corrections and enhancements in both directions. The same of course applies to code, but in the case of the tests we may find that more cross-branch integration is needed, especially if we are going to do a lot of improvement on site tests in the next 2 weeks. Im wondering if we can afford the 2-way integration process, or should take the shortcut of testing 0.93 with trunk tests? Im in favor of the two-way approach, but to keep an eye on the process and reconsider if it becomes too costly. - Mike ----- Original Message ----- > From: "Michael Wilde" > To: "David Kelly" , "Alberto Chavez" > Cc: swift-devel at ci.uchicago.edu > Sent: Thursday, August 4, 2011 4:44:29 AM > Subject: Re: [Swift-devel] 0.93 site testing > David, Alberto, > > The test list looks good. We can maybe shuffle the names assigned: > Alberto on PADS and Fusion; Justin can maybe help on the BG/P's and > Frankin; can add the Cray crow test system. > > It might be good to add tests of the plain ssh provider, and then the > ssh:local coaster configuration. (Jon is using this eg to run on PADS > and Beagle from Globus Online).(Aside: I see very excessive logging > from the ssh provider - lets investigate, I'll file a ticket) > > I noticed that in several past incidents our site tests were fooled > into thinking they passed, when in fact the actual application > invocations took place in an environment different than intended. Some > cases that come to mind: > > - thinking that we were running on a cluster via coasters when in fact > the apps ran on localhost. This was the incorrect PADS sites entry you > mention below. > > - thinking that we were running on Cray compute nodes when in fact the > apps ran on the Cray PBS service node (on Beagle, again this was a > login node) > > - asking to run N apps per compute node (1 per core) when in fact we > ran 1 app per node > > - asking to run N apps per compute node when in fact we ran N^2 apps > per node > > In this next round of testing, can we enhance the tests (or add new > ones) so that: > > 1) part of the app execution records the node(s) it executes on and > ensures that we are running on a compute node (We can do this in a > site-independent fashion by adding a "compute node hostname pattern" > to the siteTester script: > https://trac.ci.uchicago.edu/swift/browser/trunk/tests/sitetester?desc=1 > and passing the name pattern to the test. > > 2) the expected number of apps are running on the compute node (sleep; > do ps; count the number of app shells running, and ensure that there > are >1 and <= N) > > - Mike > > ----- Original Message ----- > > From: "David Kelly" > > To: swift-devel at ci.uchicago.edu > > Sent: Wednesday, August 3, 2011 8:57:07 PM > > Subject: [Swift-devel] 0.93 site testing > > Hello, > > > > I updated the swift devel website tonight with plans for 0.93 site > > testing. I am starting with the same site tests that we performed > > with > > the last release. The page is at: > > > > https://sites.google.com/site/swiftdevel/site-specific-testing > > > > Feel free to edit that page as the tests get run. The tests are > > located in swift/tests/providers, but they likely need tweaked. I'm > > pretty sure the PADS template is incorrect, others may be as well so > > it is probably worthwhile to double check everything. > > > > David > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Thu Aug 4 14:40:32 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 04 Aug 2011 12:40:32 -0700 Subject: [Swift-devel] 0.93 site testing In-Reply-To: <1043952745.187374.1312451561434.JavaMail.root@zimbra.anl.gov> References: <1043952745.187374.1312451561434.JavaMail.root@zimbra.anl.gov> Message-ID: <1312486832.30524.0.camel@blabla> On Thu, 2011-08-04 at 04:52 -0500, Michael Wilde wrote: > A question about 0.93 testing in general: We agreed yesterday to stay with the current 0.93 branch. Should we be testing 0.93 with the tests in 0.93 itself, or the tests in trunk, or both? > > I think if we do this rigorously, the answer is that we test 0.93 with the 0.93 test suite, but that we integrate relevant test corrections and enhancements in both directions. The same of course applies to code, but in the case of the tests we may find that more cross-branch integration is needed, especially if we are going to do a lot of improvement on site tests in the next 2 weeks. > > Im wondering if we can afford the 2-way integration process, or should > take the shortcut of testing 0.93 with trunk tests? Im in favor of > the two-way approach, but to keep an eye on the process and reconsider > if it becomes too costly. There were fixes to the tests that I committed to trunk. I think mot of them should be backported to the branch by virtue of them being fixes. From hategan at mcs.anl.gov Thu Aug 4 18:09:56 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 04 Aug 2011 16:09:56 -0700 Subject: [Swift-devel] trunk coasters In-Reply-To: <2098979040.190576.1312489752827.JavaMail.root@zimbra.anl.gov> References: <2098979040.190576.1312489752827.JavaMail.root@zimbra.anl.gov> Message-ID: <1312499396.2896.2.camel@blabla> On Thu, 2011-08-04 at 15:29 -0500, Michael Wilde wrote: > So the other error - the failing service - is not happing on local tests on 0.93; next I'll try the remote cases. Ok. I committed a number of things to trunk, one of which is a fix for the messed up channel lookup problem. I used it previously for auto-deployed services on ranger and pads, but haven't tried it with the stand-alone service. So please test that if you can and let me know. I'll now move to dealing with 0.93 issues. From wilde at mcs.anl.gov Thu Aug 4 23:22:31 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 4 Aug 2011 23:22:31 -0500 (CDT) Subject: [Swift-devel] trunk coasters In-Reply-To: <1312499396.2896.2.camel@blabla> Message-ID: <404855649.191251.1312518151043.JavaMail.root@zimbra.anl.gov> Im still getting the errors below (which I think are what I reported prior to this fix). I'll double check that I got the latest fix in, but I think I do. - Mike 2011-08-04 23:17:47,757-0500 DEBUG Cpu workerStarted: swork:node016:0 2011-08-04 23:17:47,757-0500 DEBUG Cpu swork:0 pullLater 2011-08-04 23:17:47,758-0500 INFO Block Started CPU 0:1312517867s 2011-08-04 23:17:47,758-0500 INFO Block Started worker swork:000000 2011-08-04 23:17:47,758-0500 INFO Cpu swork:0 pull 2011-08-04 23:17:47,761-0500 WARN BlockQueueProcessor Failed to send worker status update to client java.lang.NullPointerException at org.globus.cog.karajan.workflow.service.channels.ChannelManager.getMetaChannel(ChannelManager.java:433) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:226) at org.globus.cog.abstraction.coaster.service.job.manager.PassiveQueueProcessor.registrationReceived(PassiveQueueProcessor.java:72) at org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.registrationReceived(JobQueue.java:143) at org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:64) at org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:57) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:416) at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:157) at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChannel.java:375) 2011-08-04 23:17:47,764-0500 INFO LocalTCPService Received registration: blockid = swork, url = node016 2011-08-04 23:17:47,765-0500 INFO AbstractKarajanChannel MetaChannel: 467772424[15735326: {}] -> null: Disabling heartbeats (config is null) 2011-08-04 23:17:47,765-0500 INFO MetaChannel MetaChannel: 467772424[15735326: {}] -> null.bind -> SC-null 2011-08-04 23:17:47,765-0500 DEBUG Cpu workerStarted: swork:node016:1 2011-08-04 23:17:47,765-0500 DEBUG Cpu swork:1 pullLater 2011-08-04 23:17:47,765-0500 INFO Block Started CPU 1:1312517867s 2011-08-04 23:17:47,765-0500 INFO Cpu swork:1 pull 2011-08-04 23:17:47,765-0500 INFO Block Started worker swork:000001 2011-08-04 23:17:47,766-0500 WARN BlockQueueProcessor Failed to send worker status update to client java.lang.NullPointerException at org.globus.cog.karajan.workflow.service.channels.ChannelManager.getMetaChannel(ChannelManager.java:433) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:226) at org.globus.cog.abstraction.coaster.service.job.manager.PassiveQueueProcessor.registrationReceived(PassiveQueueProcessor.java:72) at org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.registrationReceived(JobQueue.java:143) at org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:64) at org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:57) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:416) at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:157) at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChannel.java:375) 2011-08-04 23:17:48,568-0500 INFO TCPBufferManager Adjusting buffer size to 524288 ----- Original Message ----- > From: "Mihael Hategan" > To: "Michael Wilde" > Cc: "Justin M Wozniak" , "Swift Devel" > Sent: Thursday, August 4, 2011 6:09:56 PM > Subject: Re: [Swift-devel] trunk coasters > On Thu, 2011-08-04 at 15:29 -0500, Michael Wilde wrote: > > > So the other error - the failing service - is not happing on local > > tests on 0.93; next I'll try the remote cases. > > Ok. I committed a number of things to trunk, one of which is a fix for > the messed up channel lookup problem. > > I used it previously for auto-deployed services on ranger and pads, > but > haven't tried it with the stand-alone service. So please test that if > you can and let me know. > > I'll now move to dealing with 0.93 issues. -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Fri Aug 5 01:46:51 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 04 Aug 2011 23:46:51 -0700 Subject: [Swift-devel] testing and bugs Message-ID: <1312526811.16208.2.camel@blabla> I was thinking it would be useful, if - when reporting a bug that, for a release, is meant to be fixed - you have access to SVN, to commit a test for it along with the bug report. And that sentence could have been simpler. From wilde at mcs.anl.gov Fri Aug 5 09:25:46 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 5 Aug 2011 09:25:46 -0500 (CDT) Subject: [Swift-devel] Try Ranger with nodeGranularity=16 Message-ID: <1943578453.191714.1312554346440.JavaMail.root@zimbra.anl.gov> I learned from Mihael yesterday that the SGE provider should in fact work on Ranger in 0.93 and trunk if you set nodeGranularity to 16. This is a confusion between nodes and cores that should get fixed in 0.94 unless testing indicates that the above setting doesnt work in 0.93. - Mike From jonmon at mcs.anl.gov Fri Aug 5 09:42:03 2011 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Fri, 5 Aug 2011 09:42:03 -0500 Subject: [Swift-devel] Try Ranger with nodeGranularity=16 In-Reply-To: <1943578453.191714.1312554346440.JavaMail.root@zimbra.anl.gov> References: <1943578453.191714.1312554346440.JavaMail.root@zimbra.anl.gov> Message-ID: <04BE1E8B-F006-4D30-80DA-6AF031346C91@mcs.anl.gov> On Aug 5, 2011, at 9:25 AM, Michael Wilde wrote: > I learned from Mihael yesterday that the SGE provider should in fact work on Ranger in 0.93 and trunk if you set nodeGranularity to 16. > > This is a confusion between nodes and cores that should get fixed in 0.94 unless testing indicates that the above setting doesnt work in 0.93. > > - Mike > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From jonmon at mcs.anl.gov Fri Aug 5 09:42:13 2011 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Fri, 5 Aug 2011 09:42:13 -0500 Subject: [Swift-devel] Try Ranger with nodeGranularity=16 In-Reply-To: <1943578453.191714.1312554346440.JavaMail.root@zimbra.anl.gov> References: <1943578453.191714.1312554346440.JavaMail.root@zimbra.anl.gov> Message-ID: I will certainly give this a try. I think we mentioned this in the con call on Wednesday but could someone post the svn co procedure for 0.93 perhaps to the swift-devel google site? The swift 0.93 branch is easy to figure out but cog not so much. On Aug 5, 2011, at 9:25 AM, Michael Wilde wrote: > I learned from Mihael yesterday that the SGE provider should in fact work on Ranger in 0.93 and trunk if you set nodeGranularity to 16. > > This is a confusion between nodes and cores that should get fixed in 0.94 unless testing indicates that the above setting doesnt work in 0.93. > > - Mike > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From wilde at mcs.anl.gov Fri Aug 5 09:55:36 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 5 Aug 2011 09:55:36 -0500 (CDT) Subject: [Swift-devel] Try Ranger with nodeGranularity=16 In-Reply-To: <37E2636A-7D90-44DD-90D2-CE9F96570881@gmail.com> Message-ID: <147752740.191850.1312556136873.JavaMail.root@zimbra.anl.gov> > From: "Jonathan Monette" > > I think we mentioned this in the con call on Wednesday but could > someone post the svn co procedure for 0.93 perhaps to the swift-devel > google site? The swift 0.93 branch is easy to figure out but cog not > so much. > svn co https://cogkit.svn.sourceforge.net/svnroot/cogkit/branches/4.1.9/src/cog cd cog/modules svn co https://svn.ci.uchicago.edu/svn/vdl2/branches/release-0.93 swift From wilde at mcs.anl.gov Fri Aug 5 10:00:58 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 5 Aug 2011 10:00:58 -0500 (CDT) Subject: [Swift-devel] trunk coasters In-Reply-To: <404855649.191251.1312518151043.JavaMail.root@zimbra.anl.gov> Message-ID: <2050741312.191916.1312556458623.JavaMail.root@zimbra.anl.gov> Mihael, Persistent coasters works well so far in 0.93; the problem below seems to be in trunk. Im able to run to many remote OSG sites now, with good performance, using provider staging, with one coaster service. Ive seen one script of 100 jobs hang after 97 completed (once), but all other tests up to 1000 jobs have succeeded. I'll try to recreate that hang and capture logs etc. - Mike ----- Original Message ----- > From: "Michael Wilde" > To: "Mihael Hategan" > Cc: "Justin M Wozniak" , "Swift Devel" > Sent: Thursday, August 4, 2011 11:22:31 PM > Subject: Re: [Swift-devel] trunk coasters > Im still getting the errors below (which I think are what I reported > prior to this fix). I'll double check that I got the latest fix in, > but I think I do. > > - Mike > > 2011-08-04 23:17:47,757-0500 DEBUG Cpu workerStarted: swork:node016:0 > 2011-08-04 23:17:47,757-0500 DEBUG Cpu swork:0 pullLater > 2011-08-04 23:17:47,758-0500 INFO Block Started CPU 0:1312517867s > 2011-08-04 23:17:47,758-0500 INFO Block Started worker swork:000000 > 2011-08-04 23:17:47,758-0500 INFO Cpu swork:0 pull > 2011-08-04 23:17:47,761-0500 WARN BlockQueueProcessor Failed to send > worker status update to client > java.lang.NullPointerException > at > org.globus.cog.karajan.workflow.service.channels.ChannelManager.getMetaChannel(ChannelManager.java:433) > at > org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:226) > at > org.globus.cog.abstraction.coaster.service.job.manager.PassiveQueueProcessor.registrationReceived(PassiveQueueProcessor.java:72) > at > org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.registrationReceived(JobQueue.java:143) > at > org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:64) > at > org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:57) > at > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84) > at > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:416) > at > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:157) > at > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChannel.java:375) > 2011-08-04 23:17:47,764-0500 INFO LocalTCPService Received > registration: blockid = swork, url = node016 > 2011-08-04 23:17:47,765-0500 INFO AbstractKarajanChannel MetaChannel: > 467772424[15735326: {}] -> null: Disabling heartbeats (config is null) > 2011-08-04 23:17:47,765-0500 INFO MetaChannel MetaChannel: > 467772424[15735326: {}] -> null.bind -> SC-null > 2011-08-04 23:17:47,765-0500 DEBUG Cpu workerStarted: swork:node016:1 > 2011-08-04 23:17:47,765-0500 DEBUG Cpu swork:1 pullLater > 2011-08-04 23:17:47,765-0500 INFO Block Started CPU 1:1312517867s > 2011-08-04 23:17:47,765-0500 INFO Cpu swork:1 pull > 2011-08-04 23:17:47,765-0500 INFO Block Started worker swork:000001 > 2011-08-04 23:17:47,766-0500 WARN BlockQueueProcessor Failed to send > worker status update to client > java.lang.NullPointerException > at > org.globus.cog.karajan.workflow.service.channels.ChannelManager.getMetaChannel(ChannelManager.java:433) > at > org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:226) > at > org.globus.cog.abstraction.coaster.service.job.manager.PassiveQueueProcessor.registrationReceived(PassiveQueueProcessor.java:72) > at > org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.registrationReceived(JobQueue.java:143) > at > org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:64) > at > org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:57) > at > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84) > at > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:416) > at > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:157) > at > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChannel.java:375) > 2011-08-04 23:17:48,568-0500 INFO TCPBufferManager Adjusting buffer > size to 524288 > > > ----- Original Message ----- > > From: "Mihael Hategan" > > To: "Michael Wilde" > > Cc: "Justin M Wozniak" , "Swift Devel" > > > > Sent: Thursday, August 4, 2011 6:09:56 PM > > Subject: Re: [Swift-devel] trunk coasters > > On Thu, 2011-08-04 at 15:29 -0500, Michael Wilde wrote: > > > > > So the other error - the failing service - is not happing on local > > > tests on 0.93; next I'll try the remote cases. > > > > Ok. I committed a number of things to trunk, one of which is a fix > > for > > the messed up channel lookup problem. > > > > I used it previously for auto-deployed services on ranger and pads, > > but > > haven't tried it with the stand-alone service. So please test that > > if > > you can and let me know. > > > > I'll now move to dealing with 0.93 issues. > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Fri Aug 5 13:37:10 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 05 Aug 2011 11:37:10 -0700 Subject: [Swift-devel] trunk coasters In-Reply-To: <404855649.191251.1312518151043.JavaMail.root@zimbra.anl.gov> References: <404855649.191251.1312518151043.JavaMail.root@zimbra.anl.gov> Message-ID: <1312569430.4824.3.camel@blabla> Can you post the sites file? On Thu, 2011-08-04 at 23:22 -0500, Michael Wilde wrote: > Im still getting the errors below (which I think are what I reported prior to this fix). I'll double check that I got the latest fix in, but I think I do. > > - Mike > > 2011-08-04 23:17:47,757-0500 DEBUG Cpu workerStarted: swork:node016:0 > 2011-08-04 23:17:47,757-0500 DEBUG Cpu swork:0 pullLater > 2011-08-04 23:17:47,758-0500 INFO Block Started CPU 0:1312517867s > 2011-08-04 23:17:47,758-0500 INFO Block Started worker swork:000000 > 2011-08-04 23:17:47,758-0500 INFO Cpu swork:0 pull > 2011-08-04 23:17:47,761-0500 WARN BlockQueueProcessor Failed to send worker status update to client > java.lang.NullPointerException > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.getMetaChannel(ChannelManager.java:433) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:226) > at org.globus.cog.abstraction.coaster.service.job.manager.PassiveQueueProcessor.registrationReceived(PassiveQueueProcessor.java:72) > at org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.registrationReceived(JobQueue.java:143) > at org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:64) > at org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:57) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:416) > at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:157) > at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChannel.java:375) > 2011-08-04 23:17:47,764-0500 INFO LocalTCPService Received registration: blockid = swork, url = node016 > 2011-08-04 23:17:47,765-0500 INFO AbstractKarajanChannel MetaChannel: 467772424[15735326: {}] -> null: Disabling heartbeats (config is null) > 2011-08-04 23:17:47,765-0500 INFO MetaChannel MetaChannel: 467772424[15735326: {}] -> null.bind -> SC-null > 2011-08-04 23:17:47,765-0500 DEBUG Cpu workerStarted: swork:node016:1 > 2011-08-04 23:17:47,765-0500 DEBUG Cpu swork:1 pullLater > 2011-08-04 23:17:47,765-0500 INFO Block Started CPU 1:1312517867s > 2011-08-04 23:17:47,765-0500 INFO Cpu swork:1 pull > 2011-08-04 23:17:47,765-0500 INFO Block Started worker swork:000001 > 2011-08-04 23:17:47,766-0500 WARN BlockQueueProcessor Failed to send worker status update to client > java.lang.NullPointerException > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.getMetaChannel(ChannelManager.java:433) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:226) > at org.globus.cog.abstraction.coaster.service.job.manager.PassiveQueueProcessor.registrationReceived(PassiveQueueProcessor.java:72) > at org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.registrationReceived(JobQueue.java:143) > at org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:64) > at org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:57) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:416) > at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:157) > at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChannel.java:375) > 2011-08-04 23:17:48,568-0500 INFO TCPBufferManager Adjusting buffer size to 524288 > > > ----- Original Message ----- > > From: "Mihael Hategan" > > To: "Michael Wilde" > > Cc: "Justin M Wozniak" , "Swift Devel" > > Sent: Thursday, August 4, 2011 6:09:56 PM > > Subject: Re: [Swift-devel] trunk coasters > > On Thu, 2011-08-04 at 15:29 -0500, Michael Wilde wrote: > > > > > So the other error - the failing service - is not happing on local > > > tests on 0.93; next I'll try the remote cases. > > > > Ok. I committed a number of things to trunk, one of which is a fix for > > the messed up channel lookup problem. > > > > I used it previously for auto-deployed services on ranger and pads, > > but > > haven't tried it with the stand-alone service. So please test that if > > you can and let me know. > > > > I'll now move to dealing with 0.93 issues. > From ketancmaheshwari at gmail.com Fri Aug 5 14:47:30 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Fri, 5 Aug 2011 14:47:30 -0500 Subject: [Swift-devel] How to get Engage VO membership Message-ID: Hello, Per discussion with Mike, here is how one can get a membership to Engage VO: Step1. Apply for a certificate: https://pki1.doegrids.org/ca/; use ANL as affiliation (registration authority) in the form. Step2. When you receive your certificate via a link by mail, download and install it in your browser; I have only tested it for firefox on linux and mac. Jon says, it works for Chrome on mac. And I know that it does not work on Chrome+linux. On firefox, as you click the link that you received in the mail, you will be prompted to install it by firefox, passphrase it and click install. Next take a backup of this certificate in the form of .p12. This is in Preferences > Advanced > Encryption > View Certi > Your Certi Step3. Install DOE CA and ESnet root CA into the browser by clicking the top left links on this page: http://www.doegrids.org/; I do not know if ESnet CA cert is necessary or not but I install both anyways. I know that DOE CA is necessary. Step4. Go to the Engage VO registration point here: https://osg-engage.renci.org:8443/vomrs/Engage/vomrs from the same browser that has the above certs installed. Also see this : https://twiki.grid.iu.edu/bin/view/Engagement/EngageNewUserGuide for more details. Step 5. Once you have the membership of the VO, you need to have the certificate that is in the browser put in your .globus from where you want to access OSG resources. The certificate has to be in the form of .pem files with a seperate file for key and cert. For this use the above backed up .p12 file as follows: $ openssl pkcs12 -in your.p12 -out usercert.pem -nodes -clcerts -nokeys $ openssl pkcs12 -in your.p12 -out userkey.pem -nodes -nocerts Above commands are taken from: http://security.ncsa.illinois.edu/research/grid-howtos/usefulopenssl.html For more on openssl: http://www.openssl.org/docs/apps/openssl.html Step6. Test it: $ voms-proxy-init --voms Engage -hours 48 Regards, -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Fri Aug 5 23:27:01 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 5 Aug 2011 23:27:01 -0500 (CDT) Subject: [Swift-devel] Coaster test failed at 86K of 100K jobs In-Reply-To: <578229958.194613.1312604520459.JavaMail.root@zimbra.anl.gov> Message-ID: <1077362111.194622.1312604821209.JavaMail.root@zimbra.anl.gov> Mihael, I was running catsn.swift with 100K jobs (-n=100000) to a single-server persistent coaster pool to about 50 OSG worker nodes. Using 0.93. It failed after about 86K jobs with this error: Submitted:82 Active:2 Finished successfully:86521 Progress: time: Fri, 05 Aug 2011 22:15:50 -0500 Selecting site:921 Submitting:16 Submitted:83 Active:2 Finished successfully:86531 Progress: time: Fri, 05 Aug 2011 22:15:51 -0500 Selecting site:922 Submitting:12 Submitted:76 Active:13 Finished successfully:86534 Progress: time: Fri, 05 Aug 2011 22:15:54 -0500 Selecting site:918 Submitting:16 Submitted:83 Active:1 Finished successfully:86548 Execution failed: java.util.ConcurrentModificationException The first exception in the logs shows: 2011-08-05 22:15:54,845-0500 DEBUG vdl:mains FOREACH_IT_END line=9 thread=0-3-87187 2011-08-05 22:15:54,845-0500 DEBUG VDL2ExecutionContext java.util.ConcurrentModificationException java.util.ConcurrentModificationException Caused by: java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at org.griphyn.vdl.karajan.lib.RuntimeStats$ProgressTicker.getSummary(RuntimeStats.java:177) at org.griphyn.vdl.karajan.lib.RuntimeStats$ProgressTicker.printStates(RuntimeStats.java:194) at org.griphyn.vdl.karajan.lib.RuntimeStats$ProgressTicker.dumpState(RuntimeStats.java:159) at org.griphyn.vdl.karajan.lib.RuntimeStats.setProgress(RuntimeStats.java:88) at org.griphyn.vdl.karajan.lib.RuntimeStats.vdl_setprogress(RuntimeStats.java:82) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) Ive moved the logs to: /home/wilde/swiftgrid/test.swift-workers/logs.05 - Mike From hategan at mcs.anl.gov Sat Aug 6 00:02:16 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 05 Aug 2011 22:02:16 -0700 Subject: [Swift-devel] Coaster test failed at 86K of 100K jobs In-Reply-To: <1077362111.194622.1312604821209.JavaMail.root@zimbra.anl.gov> References: <1077362111.194622.1312604821209.JavaMail.root@zimbra.anl.gov> Message-ID: <1312606936.18332.1.camel@blabla> Amazing how that bug in what would otherwise be a relatively simple class (CopyOnWriteArrayList) has managed to survive so long. Concurrency ain't easy! I'll have a fix committed after I do a bit of testing. On Fri, 2011-08-05 at 23:27 -0500, Michael Wilde wrote: > Mihael, > > I was running catsn.swift with 100K jobs (-n=100000) to a single-server persistent coaster pool to about 50 OSG worker nodes. Using 0.93. > > It failed after about 86K jobs with this error: > > Submitted:82 Active:2 Finished successfully:86521 > Progress: time: Fri, 05 Aug 2011 22:15:50 -0500 Selecting site:921 Submitting:16 Submitted:83 Active:2 Finished successfully:86531 > Progress: time: Fri, 05 Aug 2011 22:15:51 -0500 Selecting site:922 Submitting:12 Submitted:76 Active:13 Finished successfully:86534 > Progress: time: Fri, 05 Aug 2011 22:15:54 -0500 Selecting site:918 Submitting:16 Submitted:83 Active:1 Finished successfully:86548 > Execution failed: > java.util.ConcurrentModificationException > > > The first exception in the logs shows: > > 2011-08-05 22:15:54,845-0500 DEBUG vdl:mains FOREACH_IT_END line=9 thread=0-3-87187 > 2011-08-05 22:15:54,845-0500 DEBUG VDL2ExecutionContext java.util.ConcurrentModificationException > java.util.ConcurrentModificationException > Caused by: java.util.ConcurrentModificationException > at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) > at java.util.AbstractList$Itr.next(AbstractList.java:343) > at org.griphyn.vdl.karajan.lib.RuntimeStats$ProgressTicker.getSummary(RuntimeStats.java:177) > at org.griphyn.vdl.karajan.lib.RuntimeStats$ProgressTicker.printStates(RuntimeStats.java:194) > at org.griphyn.vdl.karajan.lib.RuntimeStats$ProgressTicker.dumpState(RuntimeStats.java:159) > at org.griphyn.vdl.karajan.lib.RuntimeStats.setProgress(RuntimeStats.java:88) > at org.griphyn.vdl.karajan.lib.RuntimeStats.vdl_setprogress(RuntimeStats.java:82) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > > > Ive moved the logs to: /home/wilde/swiftgrid/test.swift-workers/logs.05 > > - Mike > > > From hategan at mcs.anl.gov Sat Aug 6 01:02:48 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 05 Aug 2011 23:02:48 -0700 Subject: [Swift-devel] Coaster test failed at 86K of 100K jobs In-Reply-To: <1312606936.18332.1.camel@blabla> References: <1077362111.194622.1312604821209.JavaMail.root@zimbra.anl.gov> <1312606936.18332.1.camel@blabla> Message-ID: <1312610568.18332.12.camel@blabla> Potential fix is in the 0.93 branch. I'm not entirely sure that this was the problem, but it's the only one I can see right now. The issue is as follows. There is a "special" implementation of a CopyOnWriteArrayList in the util module. The standard java one does a copy of the underlying array for EVERY operation that changes the list. This guarantees that ongoing iterations will not be messed up by concurrent modifications to the list, but is very bad if you have many operations that change the list. The version in util only does a copy if there is an ongoing iteration on a particular underlying array. If no concurrent changes and iterations occur, this works at the speed of a normal synchronized list. If concurrent changes and iterations occur, there is a copy penalty for each iteration (but only once for each iteration). This requires the user code to notify the implementation when an iteration is done (release). The problem was with the way that the lock was implemented. It would be increased for every iteration, set to 0 for each mutation operation and decreased if > 0 for a release. That was broken, the following could have occurred: iteration1start - lock = 1, with array1 add - lock > 0, copy to array2, lock = 0 iteration2start - lock = 1, with array2 iteration1end - lock = 0 add - lock == 0, add to array2 -> ConcurrentModificationException on iteration2. Though I don't see how the usage stats got to iterate twice at the same time through stuff. Mihael On Fri, 2011-08-05 at 22:02 -0700, Mihael Hategan wrote: > Amazing how that bug in what would otherwise be a relatively simple > class (CopyOnWriteArrayList) has managed to survive so long. Concurrency > ain't easy! > > I'll have a fix committed after I do a bit of testing. From wilde at mcs.anl.gov Sat Aug 6 07:38:45 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 6 Aug 2011 07:38:45 -0500 (CDT) Subject: [Swift-devel] Coaster test failed at 86K of 100K jobs In-Reply-To: <1312610568.18332.12.camel@blabla> Message-ID: <1251862264.194810.1312634325846.JavaMail.root@zimbra.anl.gov> Mihael, I rebuilt with that fix. Now Im getting this error on runs as small as 1,000 jobs. Logs are in: /home/wilde/swiftgrid/test.swift-workers/logs.06 Failing run was *8za.log I copied the sites etc files there as well. com$ ls -lt logs.06 total 1920 -rw-r--r-- 1 wilde ci-users 918172 Aug 6 07:34 swift.log -rw-r--r-- 1 wilde ci-users 526 Aug 6 07:34 start-grid-service.out -rw-r--r-- 1 wilde ci-users 11279 Aug 6 07:34 swift-workers.out -rw-r--r-- 1 wilde ci-users 69555 Aug 6 07:33 condor.log -rw-r--r-- 1 wilde ci-users 488616 Aug 6 07:28 catsn-20110806-0728-tpo2b8za.log drwxr-xr-x 2 wilde ci-users 9 Aug 6 07:28 catsn-20110806-0728-tpo2b8za.d/ -rw-r--r-- 1 wilde ci-users 136 Aug 6 07:28 catsn-20110806-0728-tpo2b8za.0.rlog -rw-r--r-- 1 wilde ci-users 200148 Aug 6 07:28 catsn-20110806-0728-8lecscl7.log drwxr-xr-x 2 wilde ci-users 102 Aug 6 07:28 catsn-20110806-0728-8lecscl7.d/ -rw-r--r-- 1 wilde ci-users 23388 Aug 6 07:28 catsn-20110806-0728-jvvxoqdg.log drwxr-xr-x 2 wilde ci-users 12 Aug 6 07:28 catsn-20110806-0728-jvvxoqdg.d/ -rw-r--r-- 1 wilde ci-users 5940 Aug 6 07:28 catsn-20110806-0728-lge9pvy3.log drwxr-xr-x 2 wilde ci-users 3 Aug 6 07:28 catsn-20110806-0728-lge9pvy3.d/ com$ 2011-08-06 07:28:46,432-0500 DEBUG vdl:execute2 JOB_START jobid=cat-j2tn42ek tr=cat arguments=[data.txt] tmpdir=catsn-20110806-0728-\ tpo2b8za/jobs/j/cat-j2tn42ek host=localhost 2011-08-06 07:28:46,432-0500 DEBUG VDL2ExecutionContext org.globus.cog.karajan.workflow.KarajanRuntimeException: Could not convert v\ alue to boolean: null org.globus.cog.karajan.workflow.KarajanRuntimeException: Could not convert value to boolean: null Caused by: org.globus.cog.karajan.workflow.KarajanRuntimeException: Could not convert value to boolean: null at org.globus.cog.karajan.util.TypeUtil.toBoolean(TypeUtil.java:131) at org.griphyn.vdl.karajan.lib.Mark.function(Mark.java:30) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:62) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: org.globus.cog.karajan.workflow.KarajanRuntimeException: Could not convert value to boolean: null at org.globus.cog.karajan.util.TypeUtil.toBoolean(TypeUtil.java:127) ... 20 more ----- Original Message ----- > From: "Mihael Hategan" > To: "Michael Wilde" > Cc: "Swift Devel" > Sent: Saturday, August 6, 2011 1:02:48 AM > Subject: Re: [Swift-devel] Coaster test failed at 86K of 100K jobs > Potential fix is in the 0.93 branch. > > I'm not entirely sure that this was the problem, but it's the only one > I > can see right now. > > The issue is as follows. There is a "special" implementation of a > CopyOnWriteArrayList in the util module. The standard java one does a > copy of the underlying array for EVERY operation that changes the > list. > This guarantees that ongoing iterations will not be messed up by > concurrent modifications to the list, but is very bad if you have many > operations that change the list. > > The version in util only does a copy if there is an ongoing iteration > on > a particular underlying array. If no concurrent changes and iterations > occur, this works at the speed of a normal synchronized list. If > concurrent changes and iterations occur, there is a copy penalty for > each iteration (but only once for each iteration). This requires the > user code to notify the implementation when an iteration is done > (release). > > The problem was with the way that the lock was implemented. It would > be > increased for every iteration, set to 0 for each mutation operation > and > decreased if > 0 for a release. That was broken, the following could > have occurred: > > iteration1start - lock = 1, with array1 > add - lock > 0, copy to array2, lock = 0 > iteration2start - lock = 1, with array2 > iteration1end - lock = 0 > add - lock == 0, add to array2 -> ConcurrentModificationException on > iteration2. > > Though I don't see how the usage stats got to iterate twice at the > same > time through stuff. > > Mihael > > > On Fri, 2011-08-05 at 22:02 -0700, Mihael Hategan wrote: > > Amazing how that bug in what would otherwise be a relatively simple > > class (CopyOnWriteArrayList) has managed to survive so long. > > Concurrency > > ain't easy! > > > > I'll have a fix committed after I do a bit of testing. -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Sat Aug 6 13:34:45 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 6 Aug 2011 13:34:45 -0500 (CDT) Subject: [Swift-devel] 100K job script hangs at 30K jobs Message-ID: <1296444624.194996.1312655685765.JavaMail.root@zimbra.anl.gov> Mihael, A later catsn test, started this morning, hung at 30K or 100K catsn jobs. Swift was still printing progress but not progressing beyond: Progress: time: Sat, 06 Aug 2011 13:29:08 -0500 Selecting site:1014 Submitted:10 Finished successfully:30329 I had stopped it earlier in the morning, then resumed it to get a jstack. Logs and stack traces of both the swift and coaster service JVMs are in: /home/wilde/swiftgrid/test.swift-workers/logs.07 - Mike From hategan at mcs.anl.gov Sat Aug 6 21:29:48 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 06 Aug 2011 19:29:48 -0700 Subject: [Swift-devel] 100K job script hangs at 30K jobs In-Reply-To: <1296444624.194996.1312655685765.JavaMail.root@zimbra.anl.gov> References: <1296444624.194996.1312655685765.JavaMail.root@zimbra.anl.gov> Message-ID: <1312684188.29942.2.camel@blabla> So this problem was the problem of dying workers combined with the system not noticing it and so zombie jobs would slowly fill the throttle (which was set to 10 in this case). I backported the dead worker detection code from trunk. Combined with retries, this should take care of the problem, but it may be worth looking into why the workers were dying. On Sat, 2011-08-06 at 13:34 -0500, Michael Wilde wrote: > Mihael, > > A later catsn test, started this morning, hung at 30K or 100K catsn jobs. > > Swift was still printing progress but not progressing beyond: > > Progress: time: Sat, 06 Aug 2011 13:29:08 -0500 Selecting site:1014 Submitted:10 Finished successfully:30329 > > I had stopped it earlier in the morning, then resumed it to get a jstack. > > Logs and stack traces of both the swift and coaster service JVMs are in: > /home/wilde/swiftgrid/test.swift-workers/logs.07 > > - Mike From davidk at ci.uchicago.edu Sun Aug 7 00:54:59 2011 From: davidk at ci.uchicago.edu (David Kelly) Date: Sun, 7 Aug 2011 00:54:59 -0500 (CDT) Subject: [Swift-devel] Can't build latest 0.93 Message-ID: <1601513232.57066.1312696499687.JavaMail.root@zimbra-mb2.anl.gov> I am getting this error while running ant dist on 0.93: package.list: [echo] [provider-coaster]: PACKAGE LIST [java] Missing package: backport-util-concurrent.jar BUILD FAILED /home/david/cog.093/modules/swift/build.xml:73: The following error occurred while executing this line: /home/david/cog.093/mbuild.xml:445: The following error occurred while executing this line: /home/david/cog.093/mbuild.xml:79: The following error occurred while executing this line: /home/david/cog.093/mbuild.xml:52: The following error occurred while executing this line: /home/david/cog.093/modules/swift/dependencies.xml:13: The following error occurred while executing this line: /home/david/cog.093/mbuild.xml:163: The following error occurred while executing this line: /home/david/cog.093/mbuild.xml:168: The following error occurred while executing this line: /home/david/cog.093/modules/provider-coaster/build.xml:60: The following error occurred while executing this line: /home/david/cog.093/modules/provider-coaster/build.xml:168: Java returned: 1 From hategan at mcs.anl.gov Sun Aug 7 01:57:40 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 06 Aug 2011 23:57:40 -0700 Subject: [Swift-devel] Can't build latest 0.93 In-Reply-To: <1601513232.57066.1312696499687.JavaMail.root@zimbra-mb2.anl.gov> References: <1601513232.57066.1312696499687.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: <1312700260.12834.0.camel@blabla> Try now (cog r3215) On Sun, 2011-08-07 at 00:54 -0500, David Kelly wrote: > I am getting this error while running ant dist on 0.93: > > package.list: > [echo] [provider-coaster]: PACKAGE LIST > [java] Missing package: backport-util-concurrent.jar > > BUILD FAILED > /home/david/cog.093/modules/swift/build.xml:73: The following error occurred while executing this line: > /home/david/cog.093/mbuild.xml:445: The following error occurred while executing this line: > /home/david/cog.093/mbuild.xml:79: The following error occurred while executing this line: > /home/david/cog.093/mbuild.xml:52: The following error occurred while executing this line: > /home/david/cog.093/modules/swift/dependencies.xml:13: The following error occurred while executing this line: > /home/david/cog.093/mbuild.xml:163: The following error occurred while executing this line: > /home/david/cog.093/mbuild.xml:168: The following error occurred while executing this line: > /home/david/cog.093/modules/provider-coaster/build.xml:60: The following error occurred while executing this line: > /home/david/cog.093/modules/provider-coaster/build.xml:168: Java returned: 1 > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From wilde at mcs.anl.gov Sun Aug 7 22:59:27 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 7 Aug 2011 22:59:27 -0500 (CDT) Subject: [Swift-devel] [Bug 359] Add ability to set ENV vars, maxwalltime, and RAM requirements on app invocation In-Reply-To: <20110808034743.1D613563AA@wind.mcs.anl.gov> Message-ID: <410964285.195935.1312775967623.JavaMail.root@zimbra.anl.gov> I'd rather see features that were slotted for a release get moved to the next release if they dont fit, rather than put back into a "floating" state. This feature was slotted for 0.93 before 0.93 was sealed, so it should move to 0.94 for consideration. - Mike ----- Original Message ----- > From: bugzilla-daemon at mcs.anl.gov > To: wilde at mcs.anl.gov > Sent: Sunday, August 7, 2011 10:47:43 PM > Subject: [Bug 359] Add ability to set ENV vars, maxwalltime, and RAM requirements on app invocation > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=359 > > > Mihael Hategan changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |hategan at mcs.anl.gov > Target Milestone|v0.93 |UNDEFINED > > > > > --- Comment #2 from Mihael Hategan 2011-08-07 > 22:47:42 --- > No new features in 0.93 at this point. Removing milestone. > > -- > Configure bugmail: > https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You reported the bug. -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Sun Aug 7 23:05:43 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 07 Aug 2011 21:05:43 -0700 Subject: [Swift-devel] [Bug 359] Add ability to set ENV vars, maxwalltime, and RAM requirements on app invocation In-Reply-To: <410964285.195935.1312775967623.JavaMail.root@zimbra.anl.gov> References: <410964285.195935.1312775967623.JavaMail.root@zimbra.anl.gov> Message-ID: <1312776343.7851.2.camel@blabla> My instinct was not to assign things that weren't debated on the mailing list to any particular release. But please re-target as needed. Mihael On Sun, 2011-08-07 at 22:59 -0500, Michael Wilde wrote: > I'd rather see features that were slotted for a release get moved to the next release if they dont fit, rather than put back into a "floating" state. > > This feature was slotted for 0.93 before 0.93 was sealed, so it should move to 0.94 for consideration. > > - Mike > > > ----- Original Message ----- > > From: bugzilla-daemon at mcs.anl.gov > > To: wilde at mcs.anl.gov > > Sent: Sunday, August 7, 2011 10:47:43 PM > > Subject: [Bug 359] Add ability to set ENV vars, maxwalltime, and RAM requirements on app invocation > > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=359 > > > > > > Mihael Hategan changed: > > > > What |Removed |Added > > ---------------------------------------------------------------------------- > > CC| |hategan at mcs.anl.gov > > Target Milestone|v0.93 |UNDEFINED > > > > > > > > > > --- Comment #2 from Mihael Hategan 2011-08-07 > > 22:47:42 --- > > No new features in 0.93 at this point. Removing milestone. > > > > -- > > Configure bugmail: > > https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email > > ------- You are receiving this mail because: ------- > > You reported the bug. > From wilde at mcs.anl.gov Mon Aug 8 16:29:02 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 8 Aug 2011 16:29:02 -0500 (CDT) Subject: [Swift-devel] New 0.93 problem: .error No such file or directory In-Reply-To: <1312747967.22082.1.camel@blabla> Message-ID: <388085554.202204.1312838942710.JavaMail.root@zimbra.anl.gov> Mihael, I ran one test to 100K jobs - ran fine. Second test failed after ~15K jobs with the following error: catsn-20110808-1558-6tm450a1.d/cat-ze1806ek.error (No such file or directory) (partial traceback below). Is this related to your change on handling of the status file? I was seeing the same error on sporadic, shorter tests last night but did not yet have a chance to investigate. The full log for this error is catsn-20110808-1558-6tm450a1.log in /home/wilde/swiftgrid/test.swift-workers/logs.10 - Mike 2011-08-08 16:01:27,952-0500 INFO GridExec TASK_DEFINITION: Task(type=JOB_SUBMISSION, identity=urn:0-3-14624-1-1-1312837151244) is \ /bin/bash _swiftwrap.staging -e /bin/cat -out outdir/f.14625.out -err stderr.txt -i -d outdir -if data.txt -of outdir/f.14625.out -k\ -cdmfile -status provider -a data.txt 2011-08-08 16:01:27,960-0500 INFO ExecutionContext Detailed exception: Exception in cat: Arguments: [data.txt] Host: localhost Directory: catsn-20110808-1558-6tm450a1/jobs/z/cat-ze1806ek - - - Caused by: /autonfs/home/wilde/swiftgrid/test.swift-workers/./catsn-20110808-1558-6tm450a1.d/cat-ze1806ek.error (No such file or dir\ ectory) at org.globus.cog.karajan.workflow.nodes.functions.KException.function(KException.java:29) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:27) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) ----- Original Message ----- > From: "Mihael Hategan" > To: "Michael Wilde" > Sent: Sunday, August 7, 2011 3:12:47 PM > Subject: Re: 100K job script hangs at 30K jobs > Ok. I ran 65k jobs with a script that randomly killed and added > workers. > It finished fine, but it needs testing on more environments. > > On Sun, 2011-08-07 at 09:39 -0500, Michael Wilde wrote: > > I'll try to trap that next chance I get, and try to ship back worker > > logs. > > > > ----- Original Message ----- > > > From: "Mihael Hategan" > > > To: "Michael Wilde" > > > Cc: "Swift Devel" > > > Sent: Saturday, August 6, 2011 9:29:48 PM > > > Subject: Re: 100K job script hangs at 30K jobs > > > So this problem was the problem of dying workers combined with the > > > system not noticing it and so zombie jobs would slowly fill the > > > throttle > > > (which was set to 10 in this case). I backported the dead worker > > > detection code from trunk. Combined with retries, this should take > > > care > > > of the problem, but it may be worth looking into why the workers > > > were > > > dying. > > > > > > On Sat, 2011-08-06 at 13:34 -0500, Michael Wilde wrote: > > > > Mihael, > > > > > > > > A later catsn test, started this morning, hung at 30K or 100K > > > > catsn > > > > jobs. > > > > > > > > Swift was still printing progress but not progressing beyond: > > > > > > > > Progress: time: Sat, 06 Aug 2011 13:29:08 -0500 Selecting > > > > site:1014 > > > > Submitted:10 Finished successfully:30329 > > > > > > > > I had stopped it earlier in the morning, then resumed it to get > > > > a > > > > jstack. > > > > > > > > Logs and stack traces of both the swift and coaster service JVMs > > > > are > > > > in: > > > > /home/wilde/swiftgrid/test.swift-workers/logs.07 > > > > > > > > - Mike > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From alberto_chavez at live.com Mon Aug 8 17:14:16 2011 From: alberto_chavez at live.com (Alberto Chavez) Date: Mon, 8 Aug 2011 17:14:16 -0500 Subject: [Swift-devel] ssh-pbs-coasters test case on PADS. Message-ID: Hello, I am going through the test cases for different providers in the test suite directory,I am manually running ssh-pbs-coasters test case with the following command: swift 001-catsn-ssh-pbs-coasters.swift -tc.file tc.template.data -sites.file sites.template.xml I am getting the following output: Swift svn swift-r4861 (swift modified locally) cog-r3183 RunID: 20110808-1703-mz2tcfhaProgress: time: Mon, 08 Aug 2011 17:03:49 -0500Progress: time: Mon, 08 Aug 2011 17:03:55 -0500 Selecting site:8 Initializing site shared directory:1 Stage in:1Progress: time: Mon, 08 Aug 2011 17:03:59 -0500 Submitted:1 Failed but can retry:9Failed to transfer wrapper log for job cat-2jvs26ekProgress: time: Mon, 08 Aug 2011 17:04:02 -0500 Stage in:1 Failed but can retry:9Failed to transfer wrapper log for job cat-uivs26ekFailed to transfer wrapper log for job cat-xivs26ekFailed to transfer wrapper log for job cat-zivs26ekFailed to transfer wrapper log for job cat-4jvs26ekFailed to transfer wrapper log for job cat-0jvs26ekFailed to transfer wrapper log for job cat-yivs26ekProgress: time: Mon, 08 Aug 2011 17:04:03 -0500 Stage in:1 Submitting:1 Failed but can retry:8Failed to transfer wrapper log for job cat-vivs26ekFailed to transfer wrapper log for job cat-1jvs26ekFailed to transfer wrapper log for job cat-3jvs26ekProgress: time: Mon, 08 Aug 2011 17:04:04 -0500 Stage in:1 Submitting:1 Failed but can retry:8Progress: time: Mon, 08 Aug 2011 17:04:07 -0500 Submitting:1 Submitted:1 Failed but can retry:8Failed to transfer wrapper log for job cat-6jvs26ekProgress: time: Mon, 08 Aug 2011 17:04:09 -0500 Stage in:1 Failed but can retry:9Failed to transfer wrapper log for job cat-8jvs26ekFailed to transfer wrapper log for job cat-ajvs26ekFailed to transfer wrapper log for job cat-cjvs26ekFailed to transfer wrapper log for job cat-ejvs26ekFailed to transfer wrapper log for job cat-gjvs26ekFailed to transfer wrapper log for job cat-ijvs26ekProgress: time: Mon, 08 Aug 2011 17:04:10 -0500 Stage in:1 Submitting:1 Failed but can retry:8Failed to transfer wrapper log for job cat-kjvs26ekFailed to transfer wrapper log for job cat-mjvs26ekProgress: time: Mon, 08 Aug 2011 17:04:11 -0500 Stage in:1 Failed but can retry:9Failed to transfer wrapper log for job cat-ojvs26ekProgress: time: Mon, 08 Aug 2011 17:04:16 -0500 Submitting:1 Submitted:1 Failed but can retry:8Failed to transfer wrapper log for job cat-qjvs26ekProgress: time: Mon, 08 Aug 2011 17:04:17 -0500 Failed:1 Failed but can retry:9 these are the contents of sites.template.xml file: 3000 8 1 1 10 short 0.5 10000 /home/achavez/swiftwork and this is the swiftscript that I am trying to run: type file; app (file o) cat (file i){ cat @i stdout=@o;} string t = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";string char[] = @strsplit(t, ""); file out[];foreach j in [1:@toint(@arg("n","10"))] { file data<"data.txt">; out[j] = cat(data);} I am pretty sure the test is failing, and I guess that it's something wrong on my side, I just don't know what that is, so any help figuring out what I'm doing wrong will be strongly appreciated. Everytime I run the test, a dialog box pops up and asks me for my username to login on pads, and then it asks for my password,then it shows the messages:Failed to transfer wrapper log for job XXXXXand then asks three more times for my username and password. Thank you, Alberto. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Aug 8 17:23:15 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 08 Aug 2011 15:23:15 -0700 Subject: [Swift-devel] New 0.93 problem: .error No such file or directory In-Reply-To: <388085554.202204.1312838942710.JavaMail.root@zimbra.anl.gov> References: <388085554.202204.1312838942710.JavaMail.root@zimbra.anl.gov> Message-ID: <1312842195.13185.1.camel@blabla> On Mon, 2011-08-08 at 16:29 -0500, Michael Wilde wrote: > catsn-20110808-1558-6tm450a1.d/cat-ze1806ek.error (No such file or directory) > (partial traceback below). > > Is this related to your change on handling of the status file? Yes, but I thought I fixed it. Make sure you have at least swift r4963. > > I was seeing the same error on sporadic, shorter tests last night but did not yet have a chance to investigate. > > The full log for this error is catsn-20110808-1558-6tm450a1.log in > /home/wilde/swiftgrid/test.swift-workers/logs.10 > > - Mike > > > 2011-08-08 16:01:27,952-0500 INFO GridExec TASK_DEFINITION: Task(type=JOB_SUBMISSION, identity=urn:0-3-14624-1-1-1312837151244) is \ > /bin/bash _swiftwrap.staging -e /bin/cat -out outdir/f.14625.out -err stderr.txt -i -d outdir -if data.txt -of outdir/f.14625.out -k\ > -cdmfile -status provider -a data.txt > 2011-08-08 16:01:27,960-0500 INFO ExecutionContext Detailed exception: > Exception in cat: > Arguments: [data.txt] > Host: localhost > Directory: catsn-20110808-1558-6tm450a1/jobs/z/cat-ze1806ek > - - - > > Caused by: /autonfs/home/wilde/swiftgrid/test.swift-workers/./catsn-20110808-1558-6tm450a1.d/cat-ze1806ek.error (No such file or dir\ > ectory) > > at org.globus.cog.karajan.workflow.nodes.functions.KException.function(KException.java:29) > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:27) > at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) > at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > ----- Original Message ----- > > From: "Mihael Hategan" > > To: "Michael Wilde" > > Sent: Sunday, August 7, 2011 3:12:47 PM > > Subject: Re: 100K job script hangs at 30K jobs > > Ok. I ran 65k jobs with a script that randomly killed and added > > workers. > > It finished fine, but it needs testing on more environments. > > > > On Sun, 2011-08-07 at 09:39 -0500, Michael Wilde wrote: > > > I'll try to trap that next chance I get, and try to ship back worker > > > logs. > > > > > > ----- Original Message ----- > > > > From: "Mihael Hategan" > > > > To: "Michael Wilde" > > > > Cc: "Swift Devel" > > > > Sent: Saturday, August 6, 2011 9:29:48 PM > > > > Subject: Re: 100K job script hangs at 30K jobs > > > > So this problem was the problem of dying workers combined with the > > > > system not noticing it and so zombie jobs would slowly fill the > > > > throttle > > > > (which was set to 10 in this case). I backported the dead worker > > > > detection code from trunk. Combined with retries, this should take > > > > care > > > > of the problem, but it may be worth looking into why the workers > > > > were > > > > dying. > > > > > > > > On Sat, 2011-08-06 at 13:34 -0500, Michael Wilde wrote: > > > > > Mihael, > > > > > > > > > > A later catsn test, started this morning, hung at 30K or 100K > > > > > catsn > > > > > jobs. > > > > > > > > > > Swift was still printing progress but not progressing beyond: > > > > > > > > > > Progress: time: Sat, 06 Aug 2011 13:29:08 -0500 Selecting > > > > > site:1014 > > > > > Submitted:10 Finished successfully:30329 > > > > > > > > > > I had stopped it earlier in the morning, then resumed it to get > > > > > a > > > > > jstack. > > > > > > > > > > Logs and stack traces of both the swift and coaster service JVMs > > > > > are > > > > > in: > > > > > /home/wilde/swiftgrid/test.swift-workers/logs.07 > > > > > > > > > > - Mike > > > > From ketancmaheshwari at gmail.com Mon Aug 8 19:08:53 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 8 Aug 2011 19:08:53 -0500 Subject: [Swift-devel] ssh-pbs-coasters test case on PADS. In-Reply-To: References: Message-ID: Alberto, Create an auth.defaults file in your ~/.ssh directory. Add contents of the following form: bridled.ci.uchicago.edu.type=key bridled.ci.uchicago.edu.username=urusername bridled.ci.uchicago.edu.key=/path/to/your/id_rsa bridled.ci.uchicago.edu.passphrase=yourpassphrase The perms on this file should be: 600 Above example is for bridled. you will need to add the machine names you are connecting from and to each. About the wrapperlog transfer issue: do you have provider staging on in your config? -- Ketan On Mon, Aug 8, 2011 at 5:14 PM, Alberto Chavez wrote: > Hello, > > I am going through the test cases for different providers in the test suite > directory, > I am manually running ssh-pbs-coasters test case with the following > command: > > swift 001-catsn-ssh-pbs-coasters.swift -tc.file tc.template.data > -sites.file sites.template.xml > > I am getting the following output: > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > RunID: 20110808-1703-mz2tcfha > Progress: time: Mon, 08 Aug 2011 17:03:49 -0500 > Progress: time: Mon, 08 Aug 2011 17:03:55 -0500 Selecting site:8 > Initializing site shared directory:1 Stage in:1 > Progress: time: Mon, 08 Aug 2011 17:03:59 -0500 Submitted:1 Failed but > can retry:9 > Failed to transfer wrapper log for job cat-2jvs26ek > Progress: time: Mon, 08 Aug 2011 17:04:02 -0500 Stage in:1 Failed but can > retry:9 > Failed to transfer wrapper log for job cat-uivs26ek > Failed to transfer wrapper log for job cat-xivs26ek > Failed to transfer wrapper log for job cat-zivs26ek > Failed to transfer wrapper log for job cat-4jvs26ek > Failed to transfer wrapper log for job cat-0jvs26ek > Failed to transfer wrapper log for job cat-yivs26ek > Progress: time: Mon, 08 Aug 2011 17:04:03 -0500 Stage in:1 Submitting:1 > Failed but can retry:8 > Failed to transfer wrapper log for job cat-vivs26ek > Failed to transfer wrapper log for job cat-1jvs26ek > Failed to transfer wrapper log for job cat-3jvs26ek > Progress: time: Mon, 08 Aug 2011 17:04:04 -0500 Stage in:1 Submitting:1 > Failed but can retry:8 > Progress: time: Mon, 08 Aug 2011 17:04:07 -0500 Submitting:1 Submitted:1 > Failed but can retry:8 > Failed to transfer wrapper log for job cat-6jvs26ek > Progress: time: Mon, 08 Aug 2011 17:04:09 -0500 Stage in:1 Failed but can > retry:9 > Failed to transfer wrapper log for job cat-8jvs26ek > Failed to transfer wrapper log for job cat-ajvs26ek > Failed to transfer wrapper log for job cat-cjvs26ek > Failed to transfer wrapper log for job cat-ejvs26ek > Failed to transfer wrapper log for job cat-gjvs26ek > Failed to transfer wrapper log for job cat-ijvs26ek > Progress: time: Mon, 08 Aug 2011 17:04:10 -0500 Stage in:1 Submitting:1 > Failed but can retry:8 > Failed to transfer wrapper log for job cat-kjvs26ek > Failed to transfer wrapper log for job cat-mjvs26ek > Progress: time: Mon, 08 Aug 2011 17:04:11 -0500 Stage in:1 Failed but can > retry:9 > Failed to transfer wrapper log for job cat-ojvs26ek > Progress: time: Mon, 08 Aug 2011 17:04:16 -0500 Submitting:1 Submitted:1 > Failed but can retry:8 > Failed to transfer wrapper log for job cat-qjvs26ek > Progress: time: Mon, 08 Aug 2011 17:04:17 -0500 Failed:1 Failed but can > retry:9 > > these are the contents of sites.template.xml file: > > > > > > 3000 > 8 > 1 > 1 > 10 > short > 0.5 > 10000 > /home/achavez/swiftwork > > > > and this is the swiftscript that I am trying to run: > > type file; > > app (file o) cat (file i) > { > cat @i stdout=@o; > } > > string t = > "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"; > string char[] = @strsplit(t, ""); > > file out[]; > foreach j in [1:@toint(@arg("n","10"))] { > file data<"data.txt">; > out[j] = cat(data); > } > > > I am pretty sure the test is failing, and I guess that it's something wrong > on my side, I just don't know what that is, so any help figuring out what > I'm doing wrong will be strongly appreciated. > > Everytime I run the test, a dialog box pops up and asks me for my username > to login on pads, and then it asks for my password, > then it shows the messages: > Failed to transfer wrapper log for job XXXXX > and then asks three more times for my username and password. > > Thank you, > > Alberto. > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Aug 8 19:31:52 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 08 Aug 2011 17:31:52 -0700 Subject: [Swift-devel] ssh-pbs-coasters test case on PADS. In-Reply-To: References:

Message-ID: <1312849912.14688.1.camel@blabla> On Mon, 2011-08-08 at 19:08 -0500, Ketan Maheshwari wrote: > Alberto, > > > Create an auth.defaults file in your ~/.ssh directory. > > > Add contents of the following form: > > > bridled.ci.uchicago.edu.type=key > bridled.ci.uchicago.edu.username=urusername > bridled.ci.uchicago.edu.key=/path/to/your/id_rsa > bridled.ci.uchicago.edu.passphrase=yourpassphrase RIght. If you feel that having your passphrase there is not ok, you can omit it, but you will get a prompt for it. I think you should only get the prompt once, but I've seen it pop twice in the same run, so I'm going to check that. From wilde at mcs.anl.gov Mon Aug 8 20:39:34 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 8 Aug 2011 20:39:34 -0500 (CDT) Subject: [Swift-devel] New 0.93 problem: .error No such file or directory In-Reply-To: <1312842195.13185.1.camel@blabla> Message-ID: <298081939.202648.1312853974463.JavaMail.root@zimbra.anl.gov> Im now running Swift svn swift-r4965 cog-r3225 A 100K-catsn script ran to completion. Then a 500K-catsn script terminated at ~ 15K jobs with the error below. Logs are in /home/wilde/swiftgrid/test.swift-workers Failing run was *pe.log - Mike 2011-08-08 18:37:59,452-0500 DEBUG vdl:execute2 THREAD_ASSOCIATION jobid=cat-1fkb66ek thread=0-3-29294-1-1 host=localhost replicati\ onGroup=8shb66ek 2011-08-08 18:37:59,452-0500 DEBUG vdl:execute2 THREAD_ASSOCIATION jobid=cat-2fkb66ek thread=0-3-29296-1-1 host=localhost replicati\ onGroup=9shb66ek 2011-08-08 18:37:59,452-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION jobid=cat-eakb66ek - Application exception: Task failed: Conn\ ection to worker lost java.net.SocketException: Connection reset at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96) at java.net.SocketOutputStream.write(SocketOutputStream.java:124) at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.send(AbstractStreamKarajanChannel.j\ ava:305) at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.ja\ va:251) 2011-08-08 18:37:59,452-0500 INFO GridExec TASK_DEFINITION: Task(type=JOB_SUBMISSION, identity=urn:0-3-29290-1-1-1312846318323) is\ /bin/bash _swiftwrap.staging -e /bin/cat -out outdir/f.29291.out -err stderr.txt -i -d outdir -if data.txt -of outdir/f.29291.out \ -k -cdmfile -status provider -a data.txt 2011-08-08 18:37:59,452-0500 INFO vdl:execute START thread=0-3-30899-1 tr=cat 2011-08-08 18:37:59,455-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION jobid=cat-oakb66ek - Application exception: Task failed: Conn\ ection to worker lost java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:124) at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.send(AbstractStreamKarajanChannel.j\ ava:305) at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.ja\ va:251) ----- Original Message ----- > From: "Mihael Hategan" > To: "Michael Wilde" > Cc: "Swift Devel" > Sent: Monday, August 8, 2011 5:23:15 PM > Subject: Re: New 0.93 problem: .error No such file or directory > On Mon, 2011-08-08 at 16:29 -0500, Michael Wilde wrote: > > catsn-20110808-1558-6tm450a1.d/cat-ze1806ek.error (No such file or > > directory) > > (partial traceback below). > > > > Is this related to your change on handling of the status file? > > Yes, but I thought I fixed it. Make sure you have at least swift > r4963. > > > > > I was seeing the same error on sporadic, shorter tests last night > > but did not yet have a chance to investigate. > > > > The full log for this error is catsn-20110808-1558-6tm450a1.log in > > /home/wilde/swiftgrid/test.swift-workers/logs.10 > > > > - Mike > > > > > > 2011-08-08 16:01:27,952-0500 INFO GridExec TASK_DEFINITION: > > Task(type=JOB_SUBMISSION, identity=urn:0-3-14624-1-1-1312837151244) > > is \ > > /bin/bash _swiftwrap.staging -e /bin/cat -out outdir/f.14625.out > > -err stderr.txt -i -d outdir -if data.txt -of outdir/f.14625.out -k\ > > -cdmfile -status provider -a data.txt > > 2011-08-08 16:01:27,960-0500 INFO ExecutionContext Detailed > > exception: > > Exception in cat: > > Arguments: [data.txt] > > Host: localhost > > Directory: catsn-20110808-1558-6tm450a1/jobs/z/cat-ze1806ek > > - - - > > > > Caused by: > > /autonfs/home/wilde/swiftgrid/test.swift-workers/./catsn-20110808-1558-6tm450a1.d/cat-ze1806ek.error > > (No such file or dir\ > > ectory) > > > > at > > org.globus.cog.karajan.workflow.nodes.functions.KException.function(KException.java:29) > > at > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:27) > > at > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > at > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > > > > ----- Original Message ----- > > > From: "Mihael Hategan" > > > To: "Michael Wilde" > > > Sent: Sunday, August 7, 2011 3:12:47 PM > > > Subject: Re: 100K job script hangs at 30K jobs > > > Ok. I ran 65k jobs with a script that randomly killed and added > > > workers. > > > It finished fine, but it needs testing on more environments. > > > > > > On Sun, 2011-08-07 at 09:39 -0500, Michael Wilde wrote: > > > > I'll try to trap that next chance I get, and try to ship back > > > > worker > > > > logs. > > > > > > > > ----- Original Message ----- > > > > > From: "Mihael Hategan" > > > > > To: "Michael Wilde" > > > > > Cc: "Swift Devel" > > > > > Sent: Saturday, August 6, 2011 9:29:48 PM > > > > > Subject: Re: 100K job script hangs at 30K jobs > > > > > So this problem was the problem of dying workers combined with > > > > > the > > > > > system not noticing it and so zombie jobs would slowly fill > > > > > the > > > > > throttle > > > > > (which was set to 10 in this case). I backported the dead > > > > > worker > > > > > detection code from trunk. Combined with retries, this should > > > > > take > > > > > care > > > > > of the problem, but it may be worth looking into why the > > > > > workers > > > > > were > > > > > dying. > > > > > > > > > > On Sat, 2011-08-06 at 13:34 -0500, Michael Wilde wrote: > > > > > > Mihael, > > > > > > > > > > > > A later catsn test, started this morning, hung at 30K or > > > > > > 100K > > > > > > catsn > > > > > > jobs. > > > > > > > > > > > > Swift was still printing progress but not progressing > > > > > > beyond: > > > > > > > > > > > > Progress: time: Sat, 06 Aug 2011 13:29:08 -0500 Selecting > > > > > > site:1014 > > > > > > Submitted:10 Finished successfully:30329 > > > > > > > > > > > > I had stopped it earlier in the morning, then resumed it to > > > > > > get > > > > > > a > > > > > > jstack. > > > > > > > > > > > > Logs and stack traces of both the swift and coaster service > > > > > > JVMs > > > > > > are > > > > > > in: > > > > > > /home/wilde/swiftgrid/test.swift-workers/logs.07 > > > > > > > > > > > > - Mike > > > > > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Mon Aug 8 20:58:24 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 08 Aug 2011 18:58:24 -0700 Subject: [Swift-devel] New 0.93 problem: .error No such file or directory In-Reply-To: <298081939.202648.1312853974463.JavaMail.root@zimbra.anl.gov> References: <298081939.202648.1312853974463.JavaMail.root@zimbra.anl.gov> Message-ID: <1312855112.15215.0.camel@blabla> On Mon, 2011-08-08 at 20:39 -0500, Michael Wilde wrote: > Im now running Swift svn swift-r4965 cog-r3225 > > A 100K-catsn script ran to completion. > > Then a 500K-catsn script terminated at ~ 15K jobs with the error below. > > Logs are in /home/wilde/swiftgrid/test.swift-workers Judging from the error message, your workers are dying for unknown reasons. I see only two applications that failed (and they have distinct arguments), so I'm guessing you turned off retries. At 2/15K failure probability, if you set retries to at least 1, you would get a dramatic decrease in the odds that the failure will happen twice for the same app. Do you know where swork:14 and swork:29 ran? (it may be useful to name workers based on their site). Also, if you want to troubleshoot the workers, worker logging may help. Mihael From ketancmaheshwari at gmail.com Mon Aug 8 21:44:18 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 8 Aug 2011 21:44:18 -0500 Subject: [Swift-devel] int to string Message-ID: Hello, I was wondering if we can convert an int to string in Swift. I think @tostr method doesn't exist. Any clues? -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Aug 8 21:52:54 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 08 Aug 2011 19:52:54 -0700 Subject: [Swift-devel] int to string In-Reply-To: References: Message-ID: <1312858374.15649.1.camel@blabla> On Mon, 2011-08-08 at 21:44 -0500, Ketan Maheshwari wrote: > Hello, > > > I was wondering if we can convert an int to string in Swift. I think > @tostr method doesn't exist. Strcat will do implicit conversion of its arguments to string. So @strcat(2) should work. Though we should have @tostr. From wilde at mcs.anl.gov Mon Aug 8 21:54:35 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 8 Aug 2011 21:54:35 -0500 (CDT) Subject: [Swift-devel] New 0.93 problem: .error No such file or directory In-Reply-To: <1312855112.15215.0.camel@blabla> Message-ID: <1602901548.202696.1312858475726.JavaMail.root@zimbra.anl.gov> > Judging from the error message, your workers are dying for unknown > reasons. I see only two applications that failed (and they have > distinct > arguments), so I'm guessing you turned off retries. At 2/15K failure > probability, if you set retries to at least 1, you would get a > dramatic > decrease in the odds that the failure will happen twice for the same > app. Good idea, will do. So I just realized whats happening here. Workers can fail (ie you tested killing them, you said) and Swift will keep running, *but* the apps that were running on failed workers receive failures and need to get retried through normal retry, as if the apps themselves had failed, correct? That just dawned on me. > Do you know where swork:14 and swork:29 ran? (it may be useful to name > workers based on their site). Good idea, will do. > Also, if you want to troubleshoot the workers, worker logging may > help. I have worker logging on; Im not sure why Im not (yet) getting the logs back. My Condor jobs are coded to transfer the worker log back after workers exit. I'll try to get these logs. I saw two apps fail because the site didnt set OSG_WM_TMP (where I place the logs). I thought that in those two cases the worker never started, but perhaps those two failures are related to these two app failures. More digging. - Mike > > Mihael -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Mon Aug 8 21:58:59 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 08 Aug 2011 19:58:59 -0700 Subject: [Swift-devel] New 0.93 problem: .error No such file or directory In-Reply-To: <1602901548.202696.1312858475726.JavaMail.root@zimbra.anl.gov> References: <1602901548.202696.1312858475726.JavaMail.root@zimbra.anl.gov> Message-ID: <1312858739.15790.3.camel@blabla> On Mon, 2011-08-08 at 21:54 -0500, Michael Wilde wrote: > > Judging from the error message, your workers are dying for unknown > > reasons. I see only two applications that failed (and they have > > distinct > > arguments), so I'm guessing you turned off retries. At 2/15K failure > > probability, if you set retries to at least 1, you would get a > > dramatic > > decrease in the odds that the failure will happen twice for the same > > app. > > Good idea, will do. > > So I just realized whats happening here. Workers can fail (ie you > tested killing them, you said) and Swift will keep running, *but* the > apps that were running on failed workers receive failures and need to > get retried through normal retry, as if the apps themselves had > failed, correct? That just dawned on me. Yep. [...] > > I saw two apps fail because the site didnt set OSG_WM_TMP (where I > place the logs). I thought that in those two cases the worker never > started, but perhaps those two failures are related to these two app > failures. In your case there is an actual TCP connections, so the workers must have started. From ketancmaheshwari at gmail.com Mon Aug 8 22:04:25 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 8 Aug 2011 22:04:25 -0500 Subject: [Swift-devel] Exception whilst logging dataset Message-ID: Hi, Testing 0.93 on Beagle, I am seeing this exception for modftdock script: Exception whilst logging dataset content for ?:string = 100 - Closed java.lang.NullPointerException at org.griphyn.vdl.mapping.RootDataNode.getMapper(RootDataNode.java:213) at org.griphyn.vdl.mapping.AbstractDataNode.logContent(AbstractDataNode.java:460) at org.griphyn.vdl.mapping.AbstractDataNode.closeShallow(AbstractDataNode.java:422) at org.griphyn.vdl.mapping.AbstractDataNode.setValue(AbstractDataNode.java:361) at org.griphyn.vdl.mapping.RootDataNode.setValue(RootDataNode.java:221) at org.griphyn.vdl.mapping.RootDataNode.newNode(RootDataNode.java:27) at org.griphyn.vdl.karajan.lib.swiftscript.FnArg.function(FnArg.java:71) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:27) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:309) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:736) For all the variables in the script. The script runs to completion however. ftdock.swift attached. I tried to comment out the trace and converting the int variable "mod_index" to string but the exception persisted. -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ftdock.swift Type: application/octet-stream Size: 1459 bytes Desc: not available URL: From wilde at mcs.anl.gov Mon Aug 8 23:14:22 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 8 Aug 2011 23:14:22 -0500 (CDT) Subject: [Swift-devel] New 0.93 problem: .error No such file or directory In-Reply-To: <1312858739.15790.3.camel@blabla> Message-ID: <1913671608.202766.1312863262707.JavaMail.root@zimbra.anl.gov> OK, with retry on, the same run has now passed 250K jobs, and retried 2 failures successfully. Its running at about 100 jobs/sec to about 38 workers over 22 sites. Once this tests out I'll increase the number of workers. - Mike ----- Original Message ----- > From: "Mihael Hategan" > To: "Michael Wilde" > Cc: "Swift Devel" > Sent: Monday, August 8, 2011 9:58:59 PM > Subject: Re: New 0.93 problem: .error No such file or directory > On Mon, 2011-08-08 at 21:54 -0500, Michael Wilde wrote: > > > Judging from the error message, your workers are dying for unknown > > > reasons. I see only two applications that failed (and they have > > > distinct > > > arguments), so I'm guessing you turned off retries. At 2/15K > > > failure > > > probability, if you set retries to at least 1, you would get a > > > dramatic > > > decrease in the odds that the failure will happen twice for the > > > same > > > app. > > > > Good idea, will do. > > > > So I just realized whats happening here. Workers can fail (ie you > > tested killing them, you said) and Swift will keep running, *but* > > the > > apps that were running on failed workers receive failures and need > > to > > get retried through normal retry, as if the apps themselves had > > failed, correct? That just dawned on me. > > Yep. > > [...] > > > > I saw two apps fail because the site didnt set OSG_WM_TMP (where I > > place the logs). I thought that in those two cases the worker never > > started, but perhaps those two failures are related to these two app > > failures. > > In your case there is an actual TCP connections, so the workers must > have started. -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From jonmon at mcs.anl.gov Mon Aug 8 23:18:41 2011 From: jonmon at mcs.anl.gov (=?utf-8?B?Sm9uYXRoYW4gTW9uZXR0ZQ==?=) Date: Mon, 08 Aug 2011 23:18:41 -0500 Subject: [Swift-devel] =?utf-8?q?int_to_string?= Message-ID: <20110809041823.9B0CF124A1@zimbra.anl.gov> I have been wanting this function in Swift. I had a need for it awhile back but came up with a workaround. I can't exactly remember what the need was for though. ----- Reply message ----- From: "Mihael Hategan" Date: Mon, Aug 8, 2011 9:52 pm Subject: [Swift-devel] int to string To: "Ketan Maheshwari" Cc: "Swift Devel" On Mon, 2011-08-08 at 21:44 -0500, Ketan Maheshwari wrote: > Hello, > > > I was wondering if we can convert an int to string in Swift. I think > @tostr method doesn't exist. Strcat will do implicit conversion of its arguments to string. So @strcat(2) should work. Though we should have @tostr. _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Aug 9 01:23:35 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 08 Aug 2011 23:23:35 -0700 Subject: [Swift-devel] Exception whilst logging dataset In-Reply-To: References: Message-ID: <1312871015.17110.0.camel@blabla> Yep. I can reproduce this. In the mean time, if you need to run stuff, disable provenance logging in swift.properties. On Mon, 2011-08-08 at 22:04 -0500, Ketan Maheshwari wrote: > Hi, > > > Testing 0.93 on Beagle, I am seeing this exception for modftdock > script: > > > Exception whilst logging dataset content for ?:string = 100 - Closed > java.lang.NullPointerException > at > org.griphyn.vdl.mapping.RootDataNode.getMapper(RootDataNode.java:213) > at > org.griphyn.vdl.mapping.AbstractDataNode.logContent(AbstractDataNode.java:460) > at > org.griphyn.vdl.mapping.AbstractDataNode.closeShallow(AbstractDataNode.java:422) > at > org.griphyn.vdl.mapping.AbstractDataNode.setValue(AbstractDataNode.java:361) > at > org.griphyn.vdl.mapping.RootDataNode.setValue(RootDataNode.java:221) > at org.griphyn.vdl.mapping.RootDataNode.newNode(RootDataNode.java:27) > at > org.griphyn.vdl.karajan.lib.swiftscript.FnArg.function(FnArg.java:71) > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:27) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > at > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > at > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > at > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > at > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > at > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > at java.util.concurrent.Executors > $RunnableAdapter.call(Executors.java:452) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:309) > at java.util.concurrent.FutureTask.run(FutureTask.java:149) > at java.util.concurrent.ThreadPoolExecutor > $Worker.runTask(ThreadPoolExecutor.java:897) > at java.util.concurrent.ThreadPoolExecutor > $Worker.run(ThreadPoolExecutor.java:919) > at java.lang.Thread.run(Thread.java:736) > > > For all the variables in the script. > > > The script runs to completion however. > > > ftdock.swift attached. > > > I tried to comment out the trace and converting the int variable > "mod_index" to string but the exception persisted. > > -- > Ketan > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Tue Aug 9 04:34:13 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 09 Aug 2011 02:34:13 -0700 Subject: [Swift-devel] Exception whilst logging dataset In-Reply-To: <1312871015.17110.0.camel@blabla> References: <1312871015.17110.0.camel@blabla> Message-ID: <1312882453.1194.0.camel@blabla> Fixed in swift r4966. On Mon, 2011-08-08 at 23:23 -0700, Mihael Hategan wrote: > Yep. I can reproduce this. > > In the mean time, if you need to run stuff, disable provenance logging > in swift.properties. > > On Mon, 2011-08-08 at 22:04 -0500, Ketan Maheshwari wrote: > > Hi, > > > > > > Testing 0.93 on Beagle, I am seeing this exception for modftdock > > script: > > > > > > Exception whilst logging dataset content for ?:string = 100 - Closed > > java.lang.NullPointerException > > at > > org.griphyn.vdl.mapping.RootDataNode.getMapper(RootDataNode.java:213) > > at > > org.griphyn.vdl.mapping.AbstractDataNode.logContent(AbstractDataNode.java:460) > > at > > org.griphyn.vdl.mapping.AbstractDataNode.closeShallow(AbstractDataNode.java:422) > > at > > org.griphyn.vdl.mapping.AbstractDataNode.setValue(AbstractDataNode.java:361) > > at > > org.griphyn.vdl.mapping.RootDataNode.setValue(RootDataNode.java:221) > > at org.griphyn.vdl.mapping.RootDataNode.newNode(RootDataNode.java:27) > > at > > org.griphyn.vdl.karajan.lib.swiftscript.FnArg.function(FnArg.java:71) > > at > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:27) > > at > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > at > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > at > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > at > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > at > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > at java.util.concurrent.Executors > > $RunnableAdapter.call(Executors.java:452) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:309) > > at java.util.concurrent.FutureTask.run(FutureTask.java:149) > > at java.util.concurrent.ThreadPoolExecutor > > $Worker.runTask(ThreadPoolExecutor.java:897) > > at java.util.concurrent.ThreadPoolExecutor > > $Worker.run(ThreadPoolExecutor.java:919) > > at java.lang.Thread.run(Thread.java:736) > > > > > > For all the variables in the script. > > > > > > The script runs to completion however. > > > > > > ftdock.swift attached. > > > > > > I tried to comment out the trace and converting the int variable > > "mod_index" to string but the exception persisted. > > > > -- > > Ketan > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From wilde at mcs.anl.gov Tue Aug 9 07:16:39 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 9 Aug 2011 07:16:39 -0500 (CDT) Subject: [Swift-devel] New 0.93 problem: .error No such file or directory In-Reply-To: <1913671608.202766.1312863262707.JavaMail.root@zimbra.anl.gov> Message-ID: <1979181272.203035.1312892199200.JavaMail.root@zimbra.anl.gov> I stopped this run and started a larger one: 5M catsn jobs to a pool of 300-400 workers (varies over time). It finished 2.2M and was still running, albeit slowly, when I ended it. The job rate ramped up quickly as the external QueueN script obtained workers. After about 15 mins had obtained 80 workers and seemed to be running at several hundred tasks per second. I had moved all the test clients, IO, and logging to local hard disk on communicado for speed. I set a retry count of 5, and turned on lazy failure mode. After about 6 hours, the test had passed 2.2M jobs and was still progressing, but seemed to have drastically slowed down from its earlier rate. Seemed to have dropped below a few jobs per second. Possibly it ate through its throttle due to failed/hung workers. The throttle was 300 jobs, and it seemed have about 400 running workers (the QueueN algorithm was grabbing more workers than the artificial "demand" I had set of 250). I then killed the run and captured all the logs, including jstacks and a trace of top output every minute. Mainly because I wanted to free up the workers and study the run before continuing. I see about 3 worker failure scenarios in the Condor logs: 1) _swiftwrap.staging: line 331: warning: here-document at line 303 delimited by end-of-file (wanted `$STDERR') 2) com$ cat 2.err Send failed: Transport endpoint is not connected at ./worker.pl line 384. com$ cat 2.out OSG_WN_TMP=/state/partition1/tmp === contact: http://communicado.ci.uchicago.edu:56323 === name: Firefly Running in dir /grid_home/engage/gram_scratch_7Xkg2fpMUc === cwd: /grid_home/engage/gram_scratch_7Xkg2fpMUc === logdir: /state/partition1/tmp/Firefly.workerdir.Q18464 =============================================== === exit: worker.pl exited with code=107 === worker log - last 1000 lines: ==> /state/partition1/tmp/Firefly.workerdir.Q18464/worker-Firefly.log <== 1312882398.535 INFO - Firefly Logging started: Tue Aug 9 04:33:18 2011 1312882398.535 INFO - Running on node c1511.local 1312882398.535 INFO - Connecting (0)... 1312882398.566 INFO - Connected 1312882398.604 INFO 000101 Registration successful. ID=000101 1312890065.197 WARN 000101 Send failed: Transport endpoint is not connected com$ 3) only occurred once or twice, and I need to hunt it down. ---- I see 1234 messages containing "worker lost", like: 2011-08-09 01:50:03,438-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION jobid=cat-fld0l6ek - Application exception: Task failed: Conne ction to worker lost 1234 is >> the throttle of 300, so it seems to be running past that problem. I'll investigate more, but since its working so well I need to first get the application users going that are waiting on this. I wonder if these issues will show up more local stress testing on the MCS hosts, as Alberto and Ketan are working on. - Mike ----- Original Message ----- > From: "Michael Wilde" > To: "Mihael Hategan" > Cc: "Swift Devel" > Sent: Monday, August 8, 2011 11:14:22 PM > Subject: Re: New 0.93 problem: .error No such file or directory > OK, with retry on, the same run has now passed 250K jobs, and retried > 2 failures successfully. Its running at about 100 jobs/sec to about 38 > workers over 22 sites. > > Once this tests out I'll increase the number of workers. > > - Mike > > > ----- Original Message ----- > > From: "Mihael Hategan" > > To: "Michael Wilde" > > Cc: "Swift Devel" > > Sent: Monday, August 8, 2011 9:58:59 PM > > Subject: Re: New 0.93 problem: .error No such file or > > directory > > On Mon, 2011-08-08 at 21:54 -0500, Michael Wilde wrote: > > > > Judging from the error message, your workers are dying for > > > > unknown > > > > reasons. I see only two applications that failed (and they have > > > > distinct > > > > arguments), so I'm guessing you turned off retries. At 2/15K > > > > failure > > > > probability, if you set retries to at least 1, you would get a > > > > dramatic > > > > decrease in the odds that the failure will happen twice for the > > > > same > > > > app. > > > > > > Good idea, will do. > > > > > > So I just realized whats happening here. Workers can fail (ie you > > > tested killing them, you said) and Swift will keep running, *but* > > > the > > > apps that were running on failed workers receive failures and need > > > to > > > get retried through normal retry, as if the apps themselves had > > > failed, correct? That just dawned on me. > > > > Yep. > > > > [...] > > > > > > I saw two apps fail because the site didnt set OSG_WM_TMP (where I > > > place the logs). I thought that in those two cases the worker > > > never > > > started, but perhaps those two failures are related to these two > > > app > > > failures. > > > > In your case there is an actual TCP connections, so the workers must > > have started. > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Tue Aug 9 07:25:41 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 9 Aug 2011 07:25:41 -0500 (CDT) Subject: [Swift-devel] New 0.93 problem: .error No such file or directory In-Reply-To: <1979181272.203035.1312892199200.JavaMail.root@zimbra.anl.gov> Message-ID: <1774137108.203043.1312892741224.JavaMail.root@zimbra.anl.gov> Forgot to mention two things: - the logs are on communicado on local dir /scratch/local/wilde/swift/test.swift-workers/logs.14 - this is a really cool milestone: 2.2M jobs and counting from one swift script to OSG; at about 20 mins into the run it was pushing 138 jobs/sec in one arbitrary 10 min period that I looked at. Nice work, Mihael! - Mike ----- Original Message ----- > From: "Michael Wilde" > To: "Mihael Hategan" > Cc: "Swift Devel" > Sent: Tuesday, August 9, 2011 7:16:39 AM > Subject: Re: [Swift-devel] New 0.93 problem: .error No such file or directory > I stopped this run and started a larger one: 5M catsn jobs to a pool > of 300-400 workers (varies over time). It finished 2.2M and was still > running, albeit slowly, when I ended it. > > The job rate ramped up quickly as the external QueueN script obtained > workers. After about 15 mins had obtained 80 workers and seemed to be > running at several hundred tasks per second. I had moved all the test > clients, IO, and logging to local hard disk on communicado for speed. > I set a retry count of 5, and turned on lazy failure mode. > > After about 6 hours, the test had passed 2.2M jobs and was still > progressing, but seemed to have drastically slowed down from its > earlier rate. Seemed to have dropped below a few jobs per second. > Possibly it ate through its throttle due to failed/hung workers. > > The throttle was 300 jobs, and it seemed have about 400 running > workers (the QueueN algorithm was grabbing more workers than the > artificial "demand" I had set of 250). > > I then killed the run and captured all the logs, including jstacks and > a trace of top output every minute. Mainly because I wanted to free up > the workers and study the run before continuing. > > I see about 3 worker failure scenarios in the Condor logs: > > 1) _swiftwrap.staging: line 331: warning: here-document at line 303 > delimited by end-of-file (wanted `$STDERR') > > 2) com$ cat 2.err > Send failed: Transport endpoint is not connected at ./worker.pl line > 384. > com$ cat 2.out > OSG_WN_TMP=/state/partition1/tmp > === contact: http://communicado.ci.uchicago.edu:56323 > === name: Firefly Running in dir > /grid_home/engage/gram_scratch_7Xkg2fpMUc > === cwd: /grid_home/engage/gram_scratch_7Xkg2fpMUc > === logdir: /state/partition1/tmp/Firefly.workerdir.Q18464 > =============================================== > === exit: worker.pl exited with code=107 > === worker log - last 1000 lines: > > ==> /state/partition1/tmp/Firefly.workerdir.Q18464/worker-Firefly.log > <== > 1312882398.535 INFO - Firefly Logging started: Tue Aug 9 04:33:18 2011 > 1312882398.535 INFO - Running on node c1511.local > 1312882398.535 INFO - Connecting (0)... > 1312882398.566 INFO - Connected > 1312882398.604 INFO 000101 Registration successful. ID=000101 > 1312890065.197 WARN 000101 Send failed: Transport endpoint is not > connected > com$ > > 3) only occurred once or twice, and I need to hunt it down. > > ---- > > I see 1234 messages containing "worker lost", like: > 2011-08-09 01:50:03,438-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION > jobid=cat-fld0l6ek - Application exception: Task failed: Conne > ction to worker lost > > 1234 is >> the throttle of 300, so it seems to be running past that > problem. > > I'll investigate more, but since its working so well I need to first > get the application users going that are waiting on this. I wonder if > these issues will show up more local stress testing on the MCS hosts, > as Alberto and Ketan are working on. > > - Mike > > > ----- Original Message ----- > > From: "Michael Wilde" > > To: "Mihael Hategan" > > Cc: "Swift Devel" > > Sent: Monday, August 8, 2011 11:14:22 PM > > Subject: Re: New 0.93 problem: .error No such file or > > directory > > OK, with retry on, the same run has now passed 250K jobs, and > > retried > > 2 failures successfully. Its running at about 100 jobs/sec to about > > 38 > > workers over 22 sites. > > > > Once this tests out I'll increase the number of workers. > > > > - Mike > > > > > > ----- Original Message ----- > > > From: "Mihael Hategan" > > > To: "Michael Wilde" > > > Cc: "Swift Devel" > > > Sent: Monday, August 8, 2011 9:58:59 PM > > > Subject: Re: New 0.93 problem: .error No such file or > > > directory > > > On Mon, 2011-08-08 at 21:54 -0500, Michael Wilde wrote: > > > > > Judging from the error message, your workers are dying for > > > > > unknown > > > > > reasons. I see only two applications that failed (and they > > > > > have > > > > > distinct > > > > > arguments), so I'm guessing you turned off retries. At 2/15K > > > > > failure > > > > > probability, if you set retries to at least 1, you would get a > > > > > dramatic > > > > > decrease in the odds that the failure will happen twice for > > > > > the > > > > > same > > > > > app. > > > > > > > > Good idea, will do. > > > > > > > > So I just realized whats happening here. Workers can fail (ie > > > > you > > > > tested killing them, you said) and Swift will keep running, > > > > *but* > > > > the > > > > apps that were running on failed workers receive failures and > > > > need > > > > to > > > > get retried through normal retry, as if the apps themselves had > > > > failed, correct? That just dawned on me. > > > > > > Yep. > > > > > > [...] > > > > > > > > I saw two apps fail because the site didnt set OSG_WM_TMP (where > > > > I > > > > place the logs). I thought that in those two cases the worker > > > > never > > > > started, but perhaps those two failures are related to these two > > > > app > > > > failures. > > > > > > In your case there is an actual TCP connections, so the workers > > > must > > > have started. > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From yadudoc1729 at gmail.com Tue Aug 9 08:51:00 2011 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Tue, 9 Aug 2011 19:21:00 +0530 Subject: [Swift-devel] Overwriting procedures in swift. Message-ID: Hi, I've been working on getting an implementation for a feature which allows calling a function by the string-identifier. During the discussion with Mihael, we found that swift allows us to redefine a function with no complaints. We think this is a bug and a check should be put to prevent this. Inputs on this are welcome. Eg. (int o) f (int i){ o=i; } (int z) f (int x){ z= x*5; } trace ( f(8) ); Gives output as 40. while swift should throw an error instead. -- Thanks and Regards, Yadu Nand B From benc at hawaga.org.uk Tue Aug 9 09:21:15 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 9 Aug 2011 16:21:15 +0200 Subject: [Swift-devel] Overwriting procedures in swift. In-Reply-To: References: Message-ID: <93D83E81-B514-4B8A-86BF-F7E1C306184E@hawaga.org.uk> On Aug 9, 2011, at 3:51 PM, Yadu Nand wrote: > I've been working on getting an implementation for a feature > which allows calling a function by the string-identifier. During > the discussion with Mihael, we found that swift allows us to > redefine a function with no complaints. We think this is a bug > and a check should be put to prevent this. Inputs on this are > welcome. Yes, I think this is a bug and a check should be put in to prevent this. Ben From yadudoc1729 at gmail.com Tue Aug 9 09:30:18 2011 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Tue, 9 Aug 2011 20:00:18 +0530 Subject: [Swift-devel] Overwriting procedures in swift. In-Reply-To: <93D83E81-B514-4B8A-86BF-F7E1C306184E@hawaga.org.uk> References: <93D83E81-B514-4B8A-86BF-F7E1C306184E@hawaga.org.uk> Message-ID: > Yes, I think this is a bug and a check should be put in to prevent this. Great :) Can someone review the patch attached, please? -- Thanks and Regards, Yadu Nand B -------------- next part -------------- A non-text attachment was scrubbed... Name: overwrite.patch Type: text/x-patch Size: 1181 bytes Desc: not available URL: From benc at hawaga.org.uk Tue Aug 9 09:33:06 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 9 Aug 2011 14:33:06 +0000 (GMT) Subject: [Swift-devel] Overwriting procedures in swift. In-Reply-To: References: <93D83E81-B514-4B8A-86BF-F7E1C306184E@hawaga.org.uk> Message-ID: what happens with case? (and, what *should* happen with case?) I think karajan identifiers are case insensitive (?) but this patch looks like it is case-sensitive. -- http://www.hawaga.org.uk/ben/ From yadudoc1729 at gmail.com Tue Aug 9 10:01:22 2011 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Tue, 9 Aug 2011 20:31:22 +0530 Subject: [Swift-devel] Overwriting procedures in swift. In-Reply-To: References: <93D83E81-B514-4B8A-86BF-F7E1C306184E@hawaga.org.uk> Message-ID: > what happens with case? (and, what *should* happen with case?) (int o) f ( int i) { o = i; } (int z) F (int a){ z = a * 5 ; } trace ( f (5) , F(5) ); for the above snippet, trace returns 25, 25. So F is overwriting f anyway. I don't think this is right. > I think karajan identifiers are case insensitive (?) but this patch looks > like it is case-sensitive. Fixed it. Please check the new patch attached. -- Thanks and Regards, Yadu Nand B -------------- next part -------------- A non-text attachment was scrubbed... Name: overwrite_case_sensitive.patch Type: text/x-patch Size: 1209 bytes Desc: not available URL: From alberto_chavez at live.com Tue Aug 9 13:43:59 2011 From: alberto_chavez at live.com (Alberto Chavez) Date: Tue, 9 Aug 2011 13:43:59 -0500 Subject: [Swift-devel] ssh test case on pads/beagle Message-ID: Hello, I am trying to run a simpler case than ssh-pbs-coaster test case, and I'm still having the same error.Now I am running only ssh test case (/tests/providers/ssh/001-catsn-ssn.swift) The command line is:swift -config cf -tc.file tc.template.data -sites.file sites.template.xml 001-catsn-ssh.swift The output:Swift svn swift-r4861 (swift modified locally) cog-r3183 RunID: 20110809-1336-ohte788aProgress: time: Tue, 09 Aug 2011 13:36:42 -0500Exception in cat:Arguments: [data.txt]Host: sshDirectory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/m/cat-mq74h7ek- - - Caused by: nullCaused by: org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphraseCaused by: com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't read key due to cryptography problems: java.security.NoSuchAlgorithmException: Unsupported passphrase algorithm: AES-128-CBCProgress: time: Tue, 09 Aug 2011 13:36:43 -0500 Selecting site:8 Submitting:1 Failed:1Exception in cat:Arguments: [data.txt]Host: sshDirectory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/n/cat-nq74h7ek- - - Caused by: nullCaused by: org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphraseCaused by: com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't read key due to cryptography problems: java.security.NoSuchAlgorithmException: Unsupported passphrase algorithm: AES-128-CBCProgress: time: Tue, 09 Aug 2011 13:36:44 -0500 Selecting site:7 Submitting:1 Failed:2Exception in cat:Arguments: [data.txt]Host: sshDirectory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/o/cat-oq74h7ek- - - Caused by: nullCaused by: org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphraseCaused by: com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't read key due to cryptography problems: java.security.NoSuchAlgorithmException: Unsupported passphrase algorithm: AES-128-CBC"error_log.log" 105L, 5770C My auth.defaults reads: login1.beagle.ci.uchicago.edu.type=key login1.beagle.ci.uchicago.edu.username=achavez login1.beagle.ci.uchicago.edu.key=/home/Alberto/.ssh/identity login1.pads.ci.uchicago.edu.type=key login1.pads.ci.uchicago.edu.username=achavez login1.pads.ci.uchicago.edu.key=/home/Alberto/.ssh/identity and it has been set to 600, I ommited the passphrase line, but it is there, and the passphrase is right because I just verified it in two ways: 1) by logging to pads and beagle without providing a password 2) "changed" the password. I the "new" password is the same as the "old" one. sites.templates.xml: 0 /home/achavez/swiftwork config file: wrapperlog.always.transfer=truesitedir.keep=trueexecution.retries=0lazy.errors=truestatus.mode=provideruse.provider.staging=trueprovider.staging.pin.swiftfiles=falseforeach.max.threads=10provenance.log=true I also tried a simpler SwiftScript: type filemsg; app (filemsg output) hello(string s){ echo s stdout=@filename(output);} filemsg myfile<"dogcatdinosaur.out">;myfile = hello("dog,cat,dinosaur"); and I get the following output: Swift svn swift-r4861 (swift modified locally) cog-r3183 RunID: 20110809-1343-2es2hel2Progress: time: Tue, 09 Aug 2011 13:43:25 -0500Exception in echo:Arguments: [dog,cat,dinosaur]Host: sshDirectory: hello_swift-20110809-1343-2es2hel2/jobs/0/echo-0oldh7ek- - - Caused by: nullCaused by: org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphraseCaused by: com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't read key due to cryptography problems: java.security.NoSuchAlgorithmException: Unsupported passphrase algorithm: AES-128-CBCFinal status: time: Tue, 09 Aug 2011 13:43:26 -0500 Failed:1The following errors have occurred:1. Can't read key due to cryptography problems: java.security.NoSuchAlgorithmException: Unsupported passphrase algorithm: AES-128-CBC any thoughts on this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Tue Aug 9 13:47:22 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 9 Aug 2011 13:47:22 -0500 Subject: [Swift-devel] Persistent coasters running one job per worker Message-ID: Mihael, I was discussing this with Justin and we thought you could help: I am observing that persistent coasters are running one job per worker as opposed to the number specified in jobspernode (I also tried nodegranularity) on sites.xml. Attaching the log, and the sites.xml for the run. Swift is 0.93 (Swift svn swift-r4968 cog-r3225). The script is Mike's catsnsleep that sleeps for 20s with n=10. -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sites.pecos.xml Type: text/xml Size: 605 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: catsnsleep-20110809-1324-ouf3x44c.log Type: application/octet-stream Size: 27481 bytes Desc: not available URL: From hategan at mcs.anl.gov Tue Aug 9 13:57:06 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 09 Aug 2011 11:57:06 -0700 Subject: [Swift-devel] ssh test case on pads/beagle In-Reply-To: References: Message-ID: <1312916226.2671.2.camel@blabla> Hmm: Unsupported passphrase algorithm: AES-128-CBC I'll try to see how that can be fixed. In the mean time, can you generate a new key pair with 3DES encryption instead and use that? On Tue, 2011-08-09 at 13:43 -0500, Alberto Chavez wrote: > Hello, > > > I am trying to run a simpler case than ssh-pbs-coaster test case, and > I'm still having the same error. > Now I am running only ssh test case > (/tests/providers/ssh/001-catsn-ssn.swift) > > > The command line is: > swift -config cf -tc.file tc.template.data -sites.file > sites.template.xml 001-catsn-ssh.swift > > > The output: > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > RunID: 20110809-1336-ohte788a > Progress: time: Tue, 09 Aug 2011 13:36:42 -0500 > Exception in cat: > Arguments: [data.txt] > Host: ssh > Directory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/m/cat-mq74h7ek > - - - > > > Caused by: null > Caused by: > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > Caused by: > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't > read key due to cryptography problems: > java.security.NoSuchAlgorithmException: Unsupported passphrase > algorithm: AES-128-CBC > Progress: time: Tue, 09 Aug 2011 13:36:43 -0500 Selecting site:8 > Submitting:1 Failed:1 > Exception in cat: > Arguments: [data.txt] > Host: ssh > Directory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/n/cat-nq74h7ek > - - - > > > Caused by: null > Caused by: > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > Caused by: > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't > read key due to cryptography problems: > java.security.NoSuchAlgorithmException: Unsupported passphrase > algorithm: AES-128-CBC > Progress: time: Tue, 09 Aug 2011 13:36:44 -0500 Selecting site:7 > Submitting:1 Failed:2 > Exception in cat: > Arguments: [data.txt] > Host: ssh > Directory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/o/cat-oq74h7ek > - - - > > > Caused by: null > Caused by: > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > Caused by: > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't > read key due to cryptography problems: > java.security.NoSuchAlgorithmException: Unsupported passphrase > algorithm: AES-128-CBC > "error_log.log" 105L, 5770C > > > My auth.defaults reads: > > > login1.beagle.ci.uchicago.edu.type=key > login1.beagle.ci.uchicago.edu.username=achavez > login1.beagle.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > login1.pads.ci.uchicago.edu.type=key > login1.pads.ci.uchicago.edu.username=achavez > login1.pads.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > and it has been set to 600, I ommited the passphrase line, but it is > there, and the passphrase is right because I just verified it in two > ways: > 1) by logging to pads and beagle without providing a password > 2) "changed" the password. I the "new" password is the same as the > "old" one. > > sites.templates.xml: > > > > jobmanager="ssh"/> > > 0 > /home/achavez/swiftwork > > > > > config file: > > wrapperlog.always.transfer=true > sitedir.keep=true > execution.retries=0 > lazy.errors=true > status.mode=provider > use.provider.staging=true > provider.staging.pin.swiftfiles=false > foreach.max.threads=10 > provenance.log=true > > > > > > I also tried a simpler SwiftScript: > > > type filemsg; > > > app (filemsg output) hello(string s) > { > echo s stdout=@filename(output); > } > > > filemsg myfile<"dogcatdinosaur.out">; > myfile = hello("dog,cat,dinosaur"); > > > and I get the following output: > > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > RunID: 20110809-1343-2es2hel2 > Progress: time: Tue, 09 Aug 2011 13:43:25 -0500 > Exception in echo: > Arguments: [dog,cat,dinosaur] > Host: ssh > Directory: hello_swift-20110809-1343-2es2hel2/jobs/0/echo-0oldh7ek > - - - > > > Caused by: null > Caused by: > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > Caused by: > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't > read key due to cryptography problems: > java.security.NoSuchAlgorithmException: Unsupported passphrase > algorithm: AES-128-CBC > Final status: time: Tue, 09 Aug 2011 13:43:26 -0500 Failed:1 > The following errors have occurred: > 1. Can't read key due to cryptography problems: > java.security.NoSuchAlgorithmException: Unsupported passphrase > algorithm: AES-128-CBC > > > > > any thoughts on this? > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From wilde at mcs.anl.gov Tue Aug 9 13:58:20 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 9 Aug 2011 13:58:20 -0500 (CDT) Subject: [Swift-devel] ssh test case on pads/beagle In-Reply-To: Message-ID: <1181722696.205031.1312916300484.JavaMail.root@zimbra.anl.gov> Alberto, I suspect that the problem is that your SSH key is of a form that's not compatible with the Java SSH library that Swift is using, based on this message: Caused by: com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't read key due to cryptography problems: java.security.NoSuchAlgorithmException: Unsupported passphrase algorithm: AES-128-CBC Can you try again with a new key, generated on Linux, using say RSA encryption? Try using one of the recipes for generating ssh keys that are posted on the MCS or CI web sites. - Mike ----- Original Message ----- From: "Alberto Chavez" To: "Swift Devel" Sent: Tuesday, August 9, 2011 1:43:59 PM Subject: [Swift-devel] ssh test case on pads/beagle Hello, I am trying to run a simpler case than ssh-pbs-coaster test case, and I'm still having the same error. Now I am running only ssh test case (/tests/providers/ssh/001-catsn-ssn.swift) The command line is: swift -config cf -tc.file tc.template.data -sites.file sites.template.xml 001-catsn-ssh.swift The output: Swift svn swift-r4861 (swift modified locally) cog-r3183 RunID: 20110809-1336-ohte788a Progress: time: Tue, 09 Aug 2011 13:36:42 -0500 Exception in cat: Arguments: [data.txt] Host: ssh Directory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/m/cat-mq74h7ek - - - Caused by: null Caused by: org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase Caused by: com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't read key due to cryptography problems: java.security.NoSuchAlgorithmException: Unsupported passphrase algorithm: AES-128-CBC Progress: time: Tue, 09 Aug 2011 13:36:43 -0500 Selecting site:8 Submitting:1 Failed:1 Exception in cat: Arguments: [data.txt] Host: ssh Directory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/n/cat-nq74h7ek - - - Caused by: null Caused by: org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase Caused by: com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't read key due to cryptography problems: java.security.NoSuchAlgorithmException: Unsupported passphrase algorithm: AES-128-CBC Progress: time: Tue, 09 Aug 2011 13:36:44 -0500 Selecting site:7 Submitting:1 Failed:2 Exception in cat: Arguments: [data.txt] Host: ssh Directory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/o/cat-oq74h7ek - - - Caused by: null Caused by: org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase Caused by: com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't read key due to cryptography problems: java.security.NoSuchAlgorithmException: Unsupported passphrase algorithm: AES-128-CBC "error_log.log" 105L, 5770C My auth.defaults reads: login1.beagle.ci.uchicago.edu.type=key login1.beagle.ci.uchicago.edu.username=achavez login1.beagle.ci.uchicago.edu.key=/home/Alberto/.ssh/identity login1.pads.ci.uchicago.edu.type=key login1.pads.ci.uchicago.edu.username=achavez login1.pads.ci.uchicago.edu.key=/home/Alberto/.ssh/identity and it has been set to 600, I ommited the passphrase line, but it is there, and the passphrase is right because I just verified it in two ways: 1) by logging to pads and beagle without providing a password 2) "changed" the password. I the "new" password is the same as the "old" one. sites.templates.xml: 0 /home/achavez/swiftwork config file: wrapperlog.always.transfer=true sitedir.keep=true execution.retries=0 lazy.errors=true status.mode=provider use.provider.staging=true provider.staging.pin.swiftfiles=false foreach.max.threads=10 provenance.log=true I also tried a simpler SwiftScript: type filemsg; app (filemsg output) hello(string s) { echo s stdout=@filename(output); } filemsg myfile<"dogcatdinosaur.out">; myfile = hello("dog,cat,dinosaur"); and I get the following output: Swift svn swift-r4861 (swift modified locally) cog-r3183 RunID: 20110809-1343-2es2hel2 Progress: time: Tue, 09 Aug 2011 13:43:25 -0500 Exception in echo: Arguments: [dog,cat,dinosaur] Host: ssh Directory: hello_swift-20110809-1343-2es2hel2/jobs/0/echo-0oldh7ek - - - Caused by: null Caused by: org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase Caused by: com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't read key due to cryptography problems: java.security.NoSuchAlgorithmException: Unsupported passphrase algorithm: AES-128-CBC Final status: time: Tue, 09 Aug 2011 13:43:26 -0500 Failed:1 The following errors have occurred: 1. Can't read key due to cryptography problems: java.security.NoSuchAlgorithmException: Unsupported passphrase algorithm: AES-128-CBC any thoughts on this? _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Aug 9 13:59:27 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 09 Aug 2011 11:59:27 -0700 Subject: [Swift-devel] Persistent coasters running one job per worker In-Reply-To: References: Message-ID: <1312916367.2671.4.camel@blabla> but but but I checked this, and it worked fine... Can you also post the coasters log (on the machine the coaster service is on, in ~/.globus/coasters)? On Tue, 2011-08-09 at 13:47 -0500, Ketan Maheshwari wrote: > Mihael, > > > I was discussing this with Justin and we thought you could help: > > > I am observing that persistent coasters are running one job per worker > as opposed to the number specified in jobspernode (I also tried > nodegranularity) on sites.xml. > > > Attaching the log, and the sites.xml for the run. Swift is 0.93 (Swift > svn swift-r4968 cog-r3225). > > > The script is Mike's catsnsleep that sleeps for 20s with n=10. > > -- > Ketan > > > From hategan at mcs.anl.gov Tue Aug 9 14:05:36 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 09 Aug 2011 12:05:36 -0700 Subject: [Swift-devel] New 0.93 problem: .error No such file or directory In-Reply-To: <1979181272.203035.1312892199200.JavaMail.root@zimbra.anl.gov> References: <1979181272.203035.1312892199200.JavaMail.root@zimbra.anl.gov> Message-ID: <1312916736.2671.7.camel@blabla> On Tue, 2011-08-09 at 07:16 -0500, Michael Wilde wrote: > I stopped this run and started a larger one: 5M catsn jobs to a pool > of 300-400 workers (varies over time). It finished 2.2M and was still > running, albeit slowly, when I ended it. > > The job rate ramped up quickly as the external QueueN script obtained > workers. After about 15 mins had obtained 80 workers and seemed to be > running at several hundred tasks per second. I had moved all the test > clients, IO, and logging to local hard disk on communicado for speed. > I set a retry count of 5, and turned on lazy failure mode. > > After about 6 hours, the test had passed 2.2M jobs and was still > progressing, but seemed to have drastically slowed down from its > earlier rate. Seemed to have dropped below a few jobs per second. > Possibly it ate through its throttle due to failed/hung workers. Shouldn't be the case any more. My first suspicion would be that swift is running out of memory. But then it could also be some leak in the coaster staging buffers. I'll look at the logs later today. From wilde at mcs.anl.gov Tue Aug 9 14:08:01 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 9 Aug 2011 14:08:01 -0500 (CDT) Subject: [Swift-devel] New 0.93 problem: .error No such file or directory In-Reply-To: <1312916736.2671.7.camel@blabla> Message-ID: <1578546213.205094.1312916881011.JavaMail.root@zimbra.anl.gov> > Shouldn't be the case any more. My first suspicion would be that swift > is running out of memory. But then it could also be some leak in the > coaster staging buffers. I'll look at the logs later today. Cool, thanks. It would be great if the latest log plotting tools could run on this log to plot the activity rate over the test period. From ketancmaheshwari at gmail.com Tue Aug 9 14:09:07 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 9 Aug 2011 14:09:07 -0500 Subject: [Swift-devel] Persistent coasters running one job per worker In-Reply-To: <1312916367.2671.4.camel@blabla> References: <1312916367.2671.4.camel@blabla> Message-ID: I do not see any recent log in ~/.globus/coasters. The stdout/err of the coaster service run is in the attached service.log and the coaster.log is in the attached swift.log. On Tue, Aug 9, 2011 at 1:59 PM, Mihael Hategan wrote: > but but but I checked this, and it worked fine... > > Can you also post the coasters log (on the machine the coaster service > is on, in ~/.globus/coasters)? > > On Tue, 2011-08-09 at 13:47 -0500, Ketan Maheshwari wrote: > > Mihael, > > > > > > I was discussing this with Justin and we thought you could help: > > > > > > I am observing that persistent coasters are running one job per worker > > as opposed to the number specified in jobspernode (I also tried > > nodegranularity) on sites.xml. > > > > > > Attaching the log, and the sites.xml for the run. Swift is 0.93 (Swift > > svn swift-r4968 cog-r3225). > > > > > > The script is Mike's catsnsleep that sleeps for 20s with n=10. > > > > -- > > Ketan > > > > > > > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: service.log Type: application/octet-stream Size: 24692 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: swift.log Type: application/octet-stream Size: 74296 bytes Desc: not available URL: From ketancmaheshwari at gmail.com Tue Aug 9 14:10:34 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 9 Aug 2011 14:10:34 -0500 Subject: [Swift-devel] New 0.93 problem: .error No such file or directory In-Reply-To: <1578546213.205094.1312916881011.JavaMail.root@zimbra.anl.gov> References: <1312916736.2671.7.camel@blabla> <1578546213.205094.1312916881011.JavaMail.root@zimbra.anl.gov> Message-ID: On Tue, Aug 9, 2011 at 2:08 PM, Michael Wilde wrote: > > Shouldn't be the case any more. My first suspicion would be that swift > > is running out of memory. But then it could also be some leak in the > > coaster staging buffers. I'll look at the logs later today. > > Cool, thanks. It would be great if the latest log plotting tools could run > on this log to plot the activity rate over the test period. > I will try this. > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Aug 9 14:16:50 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 09 Aug 2011 12:16:50 -0700 Subject: [Swift-devel] Persistent coasters running one job per worker In-Reply-To: References: <1312916367.2671.4.camel@blabla> Message-ID: <1312917410.3416.2.camel@blabla> Ah! If the workers connect before the client does, then jobsPerNode does not make it to the coaster service. I'll think about this. In the mean time, you could have the workers started after the client sends its first job to the service. I'm thinking that maybe jobsPerNode should be a setting that the workers themselves could be started with. On Tue, 2011-08-09 at 14:09 -0500, Ketan Maheshwari wrote: > I do not see any recent log in ~/.globus/coasters. The stdout/err of > the coaster service run is in the attached service.log and the > coaster.log is in the attached swift.log. > > > > > On Tue, Aug 9, 2011 at 1:59 PM, Mihael Hategan > wrote: > but but but I checked this, and it worked fine... > > Can you also post the coasters log (on the machine the coaster > service > is on, in ~/.globus/coasters)? > > > On Tue, 2011-08-09 at 13:47 -0500, Ketan Maheshwari wrote: > > Mihael, > > > > > > I was discussing this with Justin and we thought you could > help: > > > > > > I am observing that persistent coasters are running one job > per worker > > as opposed to the number specified in jobspernode (I also > tried > > nodegranularity) on sites.xml. > > > > > > Attaching the log, and the sites.xml for the run. Swift is > 0.93 (Swift > > svn swift-r4968 cog-r3225). > > > > > > The script is Mike's catsnsleep that sleeps for 20s with > n=10. > > > > -- > > Ketan > > > > > > > > > > > > > -- > Ketan > > > From wilde at mcs.anl.gov Tue Aug 9 14:28:39 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 9 Aug 2011 14:28:39 -0500 (CDT) Subject: [Swift-devel] Persistent coasters running one job per worker In-Reply-To: <1312917410.3416.2.camel@blabla> Message-ID: <1875366538.205206.1312918119390.JavaMail.root@zimbra.anl.gov> ----- Original Message ----- > From: "Mihael Hategan" > To: "Ketan Maheshwari" > Cc: "Swift Devel" > Sent: Tuesday, August 9, 2011 2:16:50 PM > Subject: Re: [Swift-devel] Persistent coasters running one job per worker > Ah! > > If the workers connect before the client does, then jobsPerNode does > not > make it to the coaster service. > > I'll think about this. In the mean time, you could have the workers > started after the client sends its first job to the service. > > I'm thinking that maybe jobsPerNode should be a setting that the > workers > themselves could be started with. That sounds OK for now, for persistent coasters I assume you mean. - Mike > > On Tue, 2011-08-09 at 14:09 -0500, Ketan Maheshwari wrote: > > I do not see any recent log in ~/.globus/coasters. The stdout/err of > > the coaster service run is in the attached service.log and the > > coaster.log is in the attached swift.log. > > > > > > > > > > On Tue, Aug 9, 2011 at 1:59 PM, Mihael Hategan > > wrote: > > but but but I checked this, and it worked fine... > > > > Can you also post the coasters log (on the machine the > > coaster > > service > > is on, in ~/.globus/coasters)? > > > > > > On Tue, 2011-08-09 at 13:47 -0500, Ketan Maheshwari wrote: > > > Mihael, > > > > > > > > > I was discussing this with Justin and we thought you could > > help: > > > > > > > > > I am observing that persistent coasters are running one > > > job > > per worker > > > as opposed to the number specified in jobspernode (I also > > tried > > > nodegranularity) on sites.xml. > > > > > > > > > Attaching the log, and the sites.xml for the run. Swift is > > 0.93 (Swift > > > svn swift-r4968 cog-r3225). > > > > > > > > > The script is Mike's catsnsleep that sleeps for 20s with > > n=10. > > > > > > -- > > > Ketan > > > > > > > > > > > > > > > > > > > > > > > -- > > Ketan > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Tue Aug 9 14:31:54 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 9 Aug 2011 14:31:54 -0500 (CDT) Subject: [Swift-devel] Bringing back the coaster worker timeout feature? In-Reply-To: <1312917410.3416.2.camel@blabla> Message-ID: <705833790.205231.1312918314735.JavaMail.root@zimbra.anl.gov> Related to the idea of adding a worker option for jobsPerNode: I'd like to propose/discuss adding back the option for workers to time out when they have been idle for some settable period. This would be useful in configurations like we're running for OSG and TeraGrid, where we may at some points have more workers running than the Swift script has demand for, because of the fairly loose coupling between the script and the worker factory, along with queuing delays, etc. - Mike ----- Original Message ----- > From: "Mihael Hategan" > To: "Ketan Maheshwari" > Cc: "Swift Devel" > Sent: Tuesday, August 9, 2011 2:16:50 PM > Subject: Re: [Swift-devel] Persistent coasters running one job per worker > Ah! > > If the workers connect before the client does, then jobsPerNode does > not > make it to the coaster service. > > I'll think about this. In the mean time, you could have the workers > started after the client sends its first job to the service. > > I'm thinking that maybe jobsPerNode should be a setting that the > workers > themselves could be started with. > > On Tue, 2011-08-09 at 14:09 -0500, Ketan Maheshwari wrote: > > I do not see any recent log in ~/.globus/coasters. The stdout/err of > > the coaster service run is in the attached service.log and the > > coaster.log is in the attached swift.log. > > > > > > > > > > On Tue, Aug 9, 2011 at 1:59 PM, Mihael Hategan > > wrote: > > but but but I checked this, and it worked fine... > > > > Can you also post the coasters log (on the machine the > > coaster > > service > > is on, in ~/.globus/coasters)? > > > > > > On Tue, 2011-08-09 at 13:47 -0500, Ketan Maheshwari wrote: > > > Mihael, > > > > > > > > > I was discussing this with Justin and we thought you could > > help: > > > > > > > > > I am observing that persistent coasters are running one > > > job > > per worker > > > as opposed to the number specified in jobspernode (I also > > tried > > > nodegranularity) on sites.xml. > > > > > > > > > Attaching the log, and the sites.xml for the run. Swift is > > 0.93 (Swift > > > svn swift-r4968 cog-r3225). > > > > > > > > > The script is Mike's catsnsleep that sleeps for 20s with > > n=10. > > > > > > -- > > > Ketan > > > > > > > > > > > > > > > > > > > > > > > -- > > Ketan > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Tue Aug 9 14:33:17 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 09 Aug 2011 12:33:17 -0700 Subject: [Swift-devel] Persistent coasters running one job per worker In-Reply-To: <1875366538.205206.1312918119390.JavaMail.root@zimbra.anl.gov> References: <1875366538.205206.1312918119390.JavaMail.root@zimbra.anl.gov> Message-ID: <1312918397.3539.2.camel@blabla> On Tue, 2011-08-09 at 14:28 -0500, Michael Wilde wrote: > > > > > I'm thinking that maybe jobsPerNode should be a setting that the > > workers > > themselves could be started with. > > That sounds OK for now, for persistent coasters I assume you mean. Yes. In the same spirit, one could also pass a walltime that way. Trunk has some code allowing a worker to pass a bunch of key-value pairs when registering, but it's not being used. From yadudoc1729 at gmail.com Tue Aug 9 14:47:58 2011 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Wed, 10 Aug 2011 01:17:58 +0530 Subject: [Swift-devel] Call function Map-Reduce Message-ID: Hi, I've been working on implementing a call function which would allow function calls in swift using string identifiers for procedures. In order to do this we planned to use karajan's executeElement which I think needs a slightly different definition for user defined elements. int x=5; (int out) old_func (int inp) { } old_func(x); used to translate into the following format : x In order to have the calls using executeElement we need the following style ... new_func inp But how do we handle the output variable ? I don't see any documentation on this ? -- Thanks and Regards, Yadu Nand B From hategan at mcs.anl.gov Tue Aug 9 16:28:45 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 09 Aug 2011 14:28:45 -0700 Subject: [Swift-devel] Call function Map-Reduce In-Reply-To: References: Message-ID: <1312925325.4795.1.camel@blabla> On Wed, 2011-08-10 at 01:17 +0530, Yadu Nand wrote: > Hi, > > I've been working on implementing a call function which > would allow function calls in swift using string identifiers > for procedures. > > In order to do this we planned to use karajan's > executeElement which I think needs a slightly different > definition for user defined elements. > > int x=5; > (int out) old_func (int inp) { > > } > old_func(x); > > used to translate into the following format : > > > > > > > > x > > > > In order to have the calls using executeElement we need > the following style > > > > ... > > > > new_func > inp > We'll do it like this: ... > > > But how do we handle the output variable ? I don't see any > documentation on this ? > Return values are passed by reference. So y = f(x) would be y x From yadudoc1729 at gmail.com Wed Aug 10 00:56:35 2011 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Wed, 10 Aug 2011 11:26:35 +0530 Subject: [Swift-devel] Procedure re-defintion, feature or bug ? Message-ID: Hi, Following what Justin said about function redefinition being a feature in the shell I tried a test to check if that really works in swift. (int o) f (int i){ o = i ; } trace ( "first" , f(5) ); (int z) f (int a){ z = a * 10; } trace ( "second", f(5) ); In swift this would give : second, 50 first , 50 What I think is, the procedures are overwritten around compile time which allows this behavior, in which only the last definition is valid by execution time. -- Thanks and Regards, Yadu Nand B From hategan at mcs.anl.gov Wed Aug 10 01:28:44 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 09 Aug 2011 23:28:44 -0700 Subject: [Swift-devel] Procedure re-defintion, feature or bug ? In-Reply-To: References: Message-ID: <1312957724.7413.1.camel@blabla> On Wed, 2011-08-10 at 11:26 +0530, Yadu Nand wrote: > What I think is, the procedures are overwritten around compile > time Not quite. Just that they are both defined before any of the invocations. The swift compiler re-orders instructions quite a bit, so the actual execution order is quite unrelated to the lexical order. That's exactly why I think that this shouldn't happen. There is no way to invoke the first definition, so why allow it? > which allows this behavior, in which only the last definition > is valid by execution time. > > From benc at hawaga.org.uk Wed Aug 10 02:03:12 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 10 Aug 2011 09:03:12 +0200 Subject: [Swift-devel] Procedure re-defintion, feature or bug ? In-Reply-To: References: Message-ID: All the definitions should be happening "simultaneously". Mostly this is how I think definitions should work: Imagine instead of defining f in Swift, you are defining x in the following simultaneous equation: x=3 x=4 What is the value of x? (remember its a simultaneous equation, not a program...) -- From hategan at mcs.anl.gov Wed Aug 10 02:55:12 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 10 Aug 2011 00:55:12 -0700 Subject: [Swift-devel] Call function Map-Reduce In-Reply-To: <1312925325.4795.1.camel@blabla> References: <1312925325.4795.1.camel@blabla> Message-ID: <1312962912.8757.3.camel@blabla> On Tue, 2011-08-09 at 14:28 -0700, Mihael Hategan wrote: > We'll do it like this: > > ... > This should now work in trunk: import("sys.k") element(bla, [...] echo(each(...)) ) executeElement("bla", "test", 1, 2, 3) in xml it's ... It will only work for user defined functions, so executeElement("echo", "test") won't work. But then we won't need that. From wozniak at mcs.anl.gov Wed Aug 10 10:00:11 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Wed, 10 Aug 2011 10:00:11 -0500 (Central Daylight Time) Subject: [Swift-devel] Procedure re-defintion, feature or bug ? In-Reply-To: References:

Message-ID: On Wed, 10 Aug 2011, Ben Clifford wrote: > All the definitions should be happening "simultaneously". > > Mostly this is how I think definitions should work: > > Imagine instead of defining f in Swift, you are defining x in the following simultaneous equation: > > x=3 > x=4 > > What is the value of x? (remember its a simultaneous equation, not a program...) Yes, the definitions are simultaneous and inconsistent and Swift should report an error. Justin -- Justin M Wozniak From yadudoc1729 at gmail.com Wed Aug 10 12:47:09 2011 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Wed, 10 Aug 2011 23:17:09 +0530 Subject: [Swift-devel] Procedure re-defintion, feature or bug ? In-Reply-To: References:

Message-ID: On Wed, Aug 10, 2011 at 8:30 PM, Justin M Wozniak wrote: > Yes, the definitions are simultaneous and inconsistent and Swift should > report an error. Okay I think I understand better now. I'm attaching a patch with a one line change to the earlier patch. Please let me know if this needs additional fixing. -- Thanks and Regards, Yadu Nand B -------------- next part -------------- A non-text attachment was scrubbed... Name: check_proc_redefintion.patch Type: text/x-patch Size: 1313 bytes Desc: not available URL: From yadudoc1729 at gmail.com Wed Aug 10 15:44:34 2011 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Thu, 11 Aug 2011 02:14:34 +0530 Subject: [Swift-devel] Call function. Message-ID: Hi, I'm working on implementing a call function which takes procedure identifiers as strings. This will allow us to do some cool stuff like : int assoc_array [ string ] [ int ]; assoc_array["do_x"][1] = 1000; assoc_array["do_x"][2] = 1001; assoc_array["do_y"][1] = 5000;assoc_array["do_y"][2] = 5002; foreach i in assoc_array { foreach vi in assoc_array[ i ] { call ( i , vi ); }} I have considered two alternative structures for "call " : 1. = call ( , < args > ) ; 2. call ( , , < args > ) ; Any ideas on this are welcome. -- Thanks and Regards, Yadu Nand B From wilde at mcs.anl.gov Wed Aug 10 17:12:58 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 10 Aug 2011 17:12:58 -0500 (CDT) Subject: [Swift-devel] swiftdel Message-ID: <172562579.209954.1313014378362.JavaMail.root@zimbra.anl.gov> I updated the swiftdevel release-plans page with the results form todays 0.93 meeting David, please move everything needed to Bugzilla tickets and close this list down except for non-ticket procedural notes. Thanks, - Mike From alberto_chavez at live.com Wed Aug 10 18:41:25 2011 From: alberto_chavez at live.com (Alberto Chavez) Date: Wed, 10 Aug 2011 18:41:25 -0500 Subject: [Swift-devel] ssh test case on pads/beagle In-Reply-To: <1312916226.2671.2.camel@blabla> References: , <1312916226.2671.2.camel@blabla> Message-ID: I changed my ssh-key, and they worked on the MCS machines because the authorized_keys file has not been updated yet on the CI Machines. I created a new ssh-key using:ssh-keygen -t rsa -b 2048exactly as the MCS site suggested,On the other hand, I still have a problem, I am getting the following error: Swift svn swift-r4978 cog-r3226 RunID: 20110810-1819-1cdo2o62Progress: time: Wed, 10 Aug 2011 18:19:42 -0500Exception in cat:Arguments: [data.txt]Host: sshDirectory: 001-catsn-ssh-20110810-1819-1cdo2o62/jobs/9/cat-9jd0g9ek- - -Caused by: nullCaused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 127Final status: time: Wed, 10 Aug 2011 18:20:00 -0500 Failed:10The following errors have occurred:1. Job failed with an exit code of 127 (10 times) These are the contents of the log: Execution completed with errors 2011-08-10 18:19:43,251-0500 INFO ConnectionProtocol Freeing channel 0 [Unnamed Channel]2011-08-10 18:19:43,263-0500 INFO Exec Exit code 1272011-08-10 18:19:43,269-0500 INFO ConnectionProtocol Freeing channel 0 [Unnamed Channel]2011-08-10 18:19:43,277-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION jobid=cat-9jd0g9ek - Application exception: nullCaused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 1272011-08-10 18:19:43,280-0500 INFO vdl:execute END_FAILURE thread=0-5-3-1 tr=cat2011-08-10 18:19:43,281-0500 INFO vdl:execute Exception in cat: at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250) at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254) at org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636)2011-08-10 18:20:00,332-0500 INFO ExecutionContext Detailed exception: Execution completed with errors at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250) at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254) at org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636)I believe that the problem resides on the TC file because when I run a much simpler SwiftScript like:int i = 9;trace(i);I get the following output:swift traceme.swift -tc.file tc.template.data -sites.file sites.template.xml -config cf Swift svn swift-r4978 cog-r3226RunID: 20110810-1832-buktjj3dProgress: time: Wed, 10 Aug 2011 18:32:30 -0500SwiftScript trace: 9.0Final status: time: Wed, 10 Aug 2011 18:32:30 -0500but as soon as I start using the commands stated the TC file, I get the "exit code 127"My tc file reads:ssh echo /bin/echo null nullssh cat /bin/cat null nullssh ls /bin/ls null nullssh grep /bin/grep null nullssh sort /bin/sort null nullssh paste /bin/paste null nullssh wc /usr/bin/wc null nullI am working on the login node of the MCS machine trying to ssh via Swift to steamroller. > Subject: Re: [Swift-devel] ssh test case on pads/beagle > From: hategan at mcs.anl.gov > To: alberto_chavez at live.com > CC: swift-devel at ci.uchicago.edu > Date: Tue, 9 Aug 2011 11:57:06 -0700 > > Hmm: Unsupported passphrase algorithm: AES-128-CBC > > I'll try to see how that can be fixed. In the mean time, can you > generate a new key pair with 3DES encryption instead and use that? > > On Tue, 2011-08-09 at 13:43 -0500, Alberto Chavez wrote: > > Hello, > > > > > > I am trying to run a simpler case than ssh-pbs-coaster test case, and > > I'm still having the same error. > > Now I am running only ssh test case > > (/tests/providers/ssh/001-catsn-ssn.swift) > > > > > > The command line is: > > swift -config cf -tc.file tc.template.data -sites.file > > sites.template.xml 001-catsn-ssh.swift > > > > > > The output: > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > > > > RunID: 20110809-1336-ohte788a > > Progress: time: Tue, 09 Aug 2011 13:36:42 -0500 > > Exception in cat: > > Arguments: [data.txt] > > Host: ssh > > Directory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/m/cat-mq74h7ek > > - - - > > > > > > Caused by: null > > Caused by: > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > Caused by: > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't > > read key due to cryptography problems: > > java.security.NoSuchAlgorithmException: Unsupported passphrase > > algorithm: AES-128-CBC > > Progress: time: Tue, 09 Aug 2011 13:36:43 -0500 Selecting site:8 > > Submitting:1 Failed:1 > > Exception in cat: > > Arguments: [data.txt] > > Host: ssh > > Directory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/n/cat-nq74h7ek > > - - - > > > > > > Caused by: null > > Caused by: > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > Caused by: > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't > > read key due to cryptography problems: > > java.security.NoSuchAlgorithmException: Unsupported passphrase > > algorithm: AES-128-CBC > > Progress: time: Tue, 09 Aug 2011 13:36:44 -0500 Selecting site:7 > > Submitting:1 Failed:2 > > Exception in cat: > > Arguments: [data.txt] > > Host: ssh > > Directory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/o/cat-oq74h7ek > > - - - > > > > > > Caused by: null > > Caused by: > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > Caused by: > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't > > read key due to cryptography problems: > > java.security.NoSuchAlgorithmException: Unsupported passphrase > > algorithm: AES-128-CBC > > "error_log.log" 105L, 5770C > > > > > > My auth.defaults reads: > > > > > > login1.beagle.ci.uchicago.edu.type=key > > login1.beagle.ci.uchicago.edu.username=achavez > > login1.beagle.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > > login1.pads.ci.uchicago.edu.type=key > > login1.pads.ci.uchicago.edu.username=achavez > > login1.pads.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > > > > > > and it has been set to 600, I ommited the passphrase line, but it is > > there, and the passphrase is right because I just verified it in two > > ways: > > 1) by logging to pads and beagle without providing a password > > 2) "changed" the password. I the "new" password is the same as the > > "old" one. > > > > sites.templates.xml: > > > > > > > > > jobmanager="ssh"/> > > > > 0 > > /home/achavez/swiftwork > > > > > > > > > > config file: > > > > wrapperlog.always.transfer=true > > sitedir.keep=true > > execution.retries=0 > > lazy.errors=true > > status.mode=provider > > use.provider.staging=true > > provider.staging.pin.swiftfiles=false > > foreach.max.threads=10 > > provenance.log=true > > > > > > > > > > > > I also tried a simpler SwiftScript: > > > > > > type filemsg; > > > > > > app (filemsg output) hello(string s) > > { > > echo s stdout=@filename(output); > > } > > > > > > filemsg myfile<"dogcatdinosaur.out">; > > myfile = hello("dog,cat,dinosaur"); > > > > > > and I get the following output: > > > > > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > > > > RunID: 20110809-1343-2es2hel2 > > Progress: time: Tue, 09 Aug 2011 13:43:25 -0500 > > Exception in echo: > > Arguments: [dog,cat,dinosaur] > > Host: ssh > > Directory: hello_swift-20110809-1343-2es2hel2/jobs/0/echo-0oldh7ek > > - - - > > > > > > Caused by: null > > Caused by: > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > Caused by: > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't > > read key due to cryptography problems: > > java.security.NoSuchAlgorithmException: Unsupported passphrase > > algorithm: AES-128-CBC > > Final status: time: Tue, 09 Aug 2011 13:43:26 -0500 Failed:1 > > The following errors have occurred: > > 1. Can't read key due to cryptography problems: > > java.security.NoSuchAlgorithmException: Unsupported passphrase > > algorithm: AES-128-CBC > > > > > > > > > > any thoughts on this? > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonmon at mcs.anl.gov Wed Aug 10 19:09:03 2011 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Wed, 10 Aug 2011 19:09:03 -0500 Subject: [Swift-devel] ssh test case on pads/beagle In-Reply-To: References: , <1312916226.2671.2.camel@blabla> Message-ID: <47D22625-F683-4FCF-A1B5-2DE0D789911E@mcs.anl.gov> Exit code "127" normally means that a particular function doesn't exist. Are you sure that all those paths to apps exist? Also, I am not sure if this is a problem but shouldn't there be a third column in the app file? LIke "ssh echo /bin/echo null null null" On Aug 10, 2011, at 6:41 PM, Alberto Chavez wrote: > I changed my ssh-key, and they worked on the MCS machines because the authorized_keys file has not been updated yet on the CI Machines. > I created a new ssh-key using: > ssh-keygen -t rsa -b 2048 > exactly as the MCS site suggested, > On the other hand, I still have a problem, I am getting the following error: > > Swift svn swift-r4978 cog-r3226 > > RunID: 20110810-1819-1cdo2o62 > Progress: time: Wed, 10 Aug 2011 18:19:42 -0500 > Exception in cat: > Arguments: [data.txt] > Host: ssh > Directory: 001-catsn-ssh-20110810-1819-1cdo2o62/jobs/9/cat-9jd0g9ek > - - - > Caused by: null > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 127 > Final status: time: Wed, 10 Aug 2011 18:20:00 -0500 Failed:10 > The following errors have occurred: > 1. Job failed with an exit code of 127 (10 times) > > > These are the contents of the log: > > Execution completed with errors > > 2011-08-10 18:19:43,251-0500 INFO ConnectionProtocol Freeing channel 0 [Unnamed Channel] > 2011-08-10 18:19:43,263-0500 INFO Exec Exit code 127 > 2011-08-10 18:19:43,269-0500 INFO ConnectionProtocol Freeing channel 0 [Unnamed Channel] > 2011-08-10 18:19:43,277-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION jobid=cat-9jd0g9ek - Application exception: null > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 127 > 2011-08-10 18:19:43,280-0500 INFO vdl:execute END_FAILURE thread=0-5-3-1 tr=cat > 2011-08-10 18:19:43,281-0500 INFO vdl:execute Exception in cat: > at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250) > at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254) > at org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27) > at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) > at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:636) > 2011-08-10 18:20:00,332-0500 INFO ExecutionContext Detailed exception: > > Execution completed with errors > > at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250) > at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254) > at org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27) > at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) > at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:636) > > I believe that the problem resides on the TC file because when I run a much simpler SwiftScript like: > > int i = 9; > trace(i); > > I get the following output: > > swift traceme.swift -tc.file tc.template.data -sites.file sites.template.xml -config cf > Swift svn swift-r4978 cog-r3226 > > RunID: 20110810-1832-buktjj3d > Progress: time: Wed, 10 Aug 2011 18:32:30 -0500 > SwiftScript trace: 9.0 > Final status: time: Wed, 10 Aug 2011 18:32:30 -0500 > > but as soon as I start using the commands stated the TC file, I get the "exit code 127" > > My tc file reads: > > ssh echo /bin/echo null null > ssh cat /bin/cat null null > ssh ls /bin/ls null null > ssh grep /bin/grep null null > ssh sort /bin/sort null null > ssh paste /bin/paste null null > ssh wc /usr/bin/wc null null > > I am working on the login node of the MCS machine trying to ssh via Swift to steamroller. > > > Subject: Re: [Swift-devel] ssh test case on pads/beagle > > From: hategan at mcs.anl.gov > > To: alberto_chavez at live.com > > CC: swift-devel at ci.uchicago.edu > > Date: Tue, 9 Aug 2011 11:57:06 -0700 > > > > Hmm: Unsupported passphrase algorithm: AES-128-CBC > > > > I'll try to see how that can be fixed. In the mean time, can you > > generate a new key pair with 3DES encryption instead and use that? > > > > On Tue, 2011-08-09 at 13:43 -0500, Alberto Chavez wrote: > > > Hello, > > > > > > > > > I am trying to run a simpler case than ssh-pbs-coaster test case, and > > > I'm still having the same error. > > > Now I am running only ssh test case > > > (/tests/providers/ssh/001-catsn-ssn.swift) > > > > > > > > > The command line is: > > > swift -config cf -tc.file tc.template.data -sites.file > > > sites.template.xml 001-catsn-ssh.swift > > > > > > > > > The output: > > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > > > > > > > RunID: 20110809-1336-ohte788a > > > Progress: time: Tue, 09 Aug 2011 13:36:42 -0500 > > > Exception in cat: > > > Arguments: [data.txt] > > > Host: ssh > > > Directory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/m/cat-mq74h7ek > > > - - - > > > > > > > > > Caused by: null > > > Caused by: > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > Caused by: > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't > > > read key due to cryptography problems: > > > java.security.NoSuchAlgorithmException: Unsupported passphrase > > > algorithm: AES-128-CBC > > > Progress: time: Tue, 09 Aug 2011 13:36:43 -0500 Selecting site:8 > > > Submitting:1 Failed:1 > > > Exception in cat: > > > Arguments: [data.txt] > > > Host: ssh > > > Directory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/n/cat-nq74h7ek > > > - - - > > > > > > > > > Caused by: null > > > Caused by: > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > Caused by: > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't > > > read key due to cryptography problems: > > > java.security.NoSuchAlgorithmException: Unsupported passphrase > > > algorithm: AES-128-CBC > > > Progress: time: Tue, 09 Aug 2011 13:36:44 -0500 Selecting site:7 > > > Submitting:1 Failed:2 > > > Exception in cat: > > > Arguments: [data.txt] > > > Host: ssh > > > Directory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/o/cat-oq74h7ek > > > - - - > > > > > > > > > Caused by: null > > > Caused by: > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > Caused by: > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't > > > read key due to cryptography problems: > > > java.security.NoSuchAlgorithmException: Unsupported passphrase > > > algorithm: AES-128-CBC > > > "error_log.log" 105L, 5770C > > > > > > > > > My auth.defaults reads: > > > > > > > > > login1.beagle.ci.uchicago.edu.type=key > > > login1.beagle.ci.uchicago.edu.username=achavez > > > login1.beagle.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > > > > > login1.pads.ci.uchicago.edu.type=key > > > login1.pads.ci.uchicago.edu.username=achavez > > > login1.pads.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > > > > > > > > > > > and it has been set to 600, I ommited the passphrase line, but it is > > > there, and the passphrase is right because I just verified it in two > > > ways: > > > 1) by logging to pads and beagle without providing a password > > > 2) "changed" the password. I the "new" password is the same as the > > > "old" one. > > > > > > sites.templates.xml: > > > > > > > > > > > > > > jobmanager="ssh"/> > > > > > > 0 > > > /home/achavez/swiftwork > > > > > > > > > > > > > > > config file: > > > > > > wrapperlog.always.transfer=true > > > sitedir.keep=true > > > execution.retries=0 > > > lazy.errors=true > > > status.mode=provider > > > use.provider.staging=true > > > provider.staging.pin.swiftfiles=false > > > foreach.max.threads=10 > > > provenance.log=true > > > > > > > > > > > > > > > > > > I also tried a simpler SwiftScript: > > > > > > > > > type filemsg; > > > > > > > > > app (filemsg output) hello(string s) > > > { > > > echo s stdout=@filename(output); > > > } > > > > > > > > > filemsg myfile<"dogcatdinosaur.out">; > > > myfile = hello("dog,cat,dinosaur"); > > > > > > > > > and I get the following output: > > > > > > > > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > > > > > > > RunID: 20110809-1343-2es2hel2 > > > Progress: time: Tue, 09 Aug 2011 13:43:25 -0500 > > > Exception in echo: > > > Arguments: [dog,cat,dinosaur] > > > Host: ssh > > > Directory: hello_swift-20110809-1343-2es2hel2/jobs/0/echo-0oldh7ek > > > - - - > > > > > > > > > Caused by: null > > > Caused by: > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > Caused by: > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't > > > read key due to cryptography problems: > > > java.security.NoSuchAlgorithmException: Unsupported passphrase > > > algorithm: AES-128-CBC > > > Final status: time: Tue, 09 Aug 2011 13:43:26 -0500 Failed:1 > > > The following errors have occurred: > > > 1. Can't read key due to cryptography problems: > > > java.security.NoSuchAlgorithmException: Unsupported passphrase > > > algorithm: AES-128-CBC > > > > > > > > > > > > > > > any thoughts on this? > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From alberto_chavez at live.com Wed Aug 10 19:16:32 2011 From: alberto_chavez at live.com (Alberto Chavez) Date: Wed, 10 Aug 2011 19:16:32 -0500 Subject: [Swift-devel] ssh test case on pads/beagle In-Reply-To: <47D22625-F683-4FCF-A1B5-2DE0D789911E@mcs.anl.gov> References: , <1312916226.2671.2.camel@blabla> , <47D22625-F683-4FCF-A1B5-2DE0D789911E@mcs.anl.gov> Message-ID: Exit code "127" normally means that a particular function doesn't exist. Are you sure that all those paths to apps exist?> Yes, I doubled check that and those are the right paths to the apps. Also, I am not sure if this is a problem but shouldn't there be a third column in the app file? LIke"ssh echo /bin/echo null null null" Looking at the documentation for the transformation catalog, the structure should be: site, transformation name, executable path, installation status, platform, and profile entries. The installation status and platform fields are not used. Set them to INSTALLED and INTEL32::LINUX respectively.The profiles field should be set to null if no profile entries are to be specified, or should contain the profile entries separated by semicolons. but even when I switch the columns to INSTALLED and INTEL32::LINUX and keep the profiles field set to null, I'm still getting the same exit code.On Aug 10, 2011, at 6:41 PM, Alberto Chavez wrote:I changed my ssh-key, and they worked on the MCS machines because the authorized_keys file has not been updated yet on the CI Machines. I created a new ssh-key using:ssh-keygen -t rsa -b 2048exactly as the MCS site suggested,On the other hand, I still have a problem, I am getting the following error: Swift svn swift-r4978 cog-r3226 RunID: 20110810-1819-1cdo2o62Progress: time: Wed, 10 Aug 2011 18:19:42 -0500Exception in cat:Arguments: [data.txt]Host: sshDirectory: 001-catsn-ssh-20110810-1819-1cdo2o62/jobs/9/cat-9jd0g9ek- - -Caused by: nullCaused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 127Final status: time: Wed, 10 Aug 2011 18:20:00 -0500 Failed:10The following errors have occurred:1. Job failed with an exit code of 127 (10 times) These are the contents of the log: Execution completed with errors 2011-08-10 18:19:43,251-0500 INFO ConnectionProtocol Freeing channel 0 [Unnamed Channel]2011-08-10 18:19:43,263-0500 INFO Exec Exit code 1272011-08-10 18:19:43,269-0500 INFO ConnectionProtocol Freeing channel 0 [Unnamed Channel]2011-08-10 18:19:43,277-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION jobid=cat-9jd0g9ek - Application exception: nullCaused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 1272011-08-10 18:19:43,280-0500 INFO vdl:execute END_FAILURE thread=0-5-3-1 tr=cat2011-08-10 18:19:43,281-0500 INFO vdl:execute Exception in cat: at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250) at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254) at org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636)2011-08-10 18:20:00,332-0500 INFO ExecutionContext Detailed exception: Execution completed with errors at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250) at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254) at org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636)I believe that the problem resides on the TC file because when I run a much simpler SwiftScript like:int i = 9;trace(i);I get the following output:swift traceme.swift -tc.file tc.template.data -sites.file sites.template.xml -config cf Swift svn swift-r4978 cog-r3226RunID: 20110810-1832-buktjj3dProgress: time: Wed, 10 Aug 2011 18:32:30 -0500SwiftScript trace: 9.0Final status: time: Wed, 10 Aug 2011 18:32:30 -0500but as soon as I start using the commands stated the TC file, I get the "exit code 127"My tc file reads:ssh echo /bin/echo null nullssh cat /bin/cat null nullssh ls /bin/ls null nullssh grep /bin/grep null nullssh sort /bin/sort null nullssh paste /bin/paste null nullssh wc /usr/bin/wc null nullI am working on the login node of the MCS machine trying to ssh via Swift to steamroller. > Subject: Re: [Swift-devel] ssh test case on pads/beagle > From: hategan at mcs.anl.gov > To: alberto_chavez at live.com > CC: swift-devel at ci.uchicago.edu > Date: Tue, 9 Aug 2011 11:57:06 -0700 > > Hmm: Unsupported passphrase algorithm: AES-128-CBC > > I'll try to see how that can be fixed. In the mean time, can you > generate a new key pair with 3DES encryption instead and use that? > > On Tue, 2011-08-09 at 13:43 -0500, Alberto Chavez wrote: > > Hello, > > > > > > I am trying to run a simpler case than ssh-pbs-coaster test case, and > > I'm still having the same error. > > Now I am running only ssh test case > > (/tests/providers/ssh/001-catsn-ssn.swift) > > > > > > The command line is: > > swift -config cf -tc.file tc.template.data -sites.file > > sites.template.xml 001-catsn-ssh.swift > > > > > > The output: > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > > > > RunID: 20110809-1336-ohte788a > > Progress: time: Tue, 09 Aug 2011 13:36:42 -0500 > > Exception in cat: > > Arguments: [data.txt] > > Host: ssh > > Directory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/m/cat-mq74h7ek > > - - - > > > > > > Caused by: null > > Caused by: > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > Caused by: > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't > > read key due to cryptography problems: > > java.security.NoSuchAlgorithmException: Unsupported passphrase > > algorithm: AES-128-CBC > > Progress: time: Tue, 09 Aug 2011 13:36:43 -0500 Selecting site:8 > > Submitting:1 Failed:1 > > Exception in cat: > > Arguments: [data.txt] > > Host: ssh > > Directory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/n/cat-nq74h7ek > > - - - > > > > > > Caused by: null > > Caused by: > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > Caused by: > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't > > read key due to cryptography problems: > > java.security.NoSuchAlgorithmException: Unsupported passphrase > > algorithm: AES-128-CBC > > Progress: time: Tue, 09 Aug 2011 13:36:44 -0500 Selecting site:7 > > Submitting:1 Failed:2 > > Exception in cat: > > Arguments: [data.txt] > > Host: ssh > > Directory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/o/cat-oq74h7ek > > - - - > > > > > > Caused by: null > > Caused by: > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > Caused by: > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't > > read key due to cryptography problems: > > java.security.NoSuchAlgorithmException: Unsupported passphrase > > algorithm: AES-128-CBC > > "error_log.log" 105L, 5770C > > > > > > My auth.defaults reads: > > > > > > login1.beagle.ci.uchicago.edu.type=key > > login1.beagle.ci.uchicago.edu.username=achavez > > login1.beagle.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > > login1.pads.ci.uchicago.edu.type=key > > login1.pads.ci.uchicago.edu.username=achavez > > login1.pads.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > > > > > > and it has been set to 600, I ommited the passphrase line, but it is > > there, and the passphrase is right because I just verified it in two > > ways: > > 1) by logging to pads and beagle without providing a password > > 2) "changed" the password. I the "new" password is the same as the > > "old" one. > > > > sites.templates.xml: > > > > > > > > > jobmanager="ssh"/> > > > > 0 > > /home/achavez/swiftwork > > > > > > > > > > config file: > > > > wrapperlog.always.transfer=true > > sitedir.keep=true > > execution.retries=0 > > lazy.errors=true > > status.mode=provider > > use.provider.staging=true > > provider.staging.pin.swiftfiles=false > > foreach.max.threads=10 > > provenance.log=true > > > > > > > > > > > > I also tried a simpler SwiftScript: > > > > > > type filemsg; > > > > > > app (filemsg output) hello(string s) > > { > > echo s stdout=@filename(output); > > } > > > > > > filemsg myfile<"dogcatdinosaur.out">; > > myfile = hello("dog,cat,dinosaur"); > > > > > > and I get the following output: > > > > > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > > > > RunID: 20110809-1343-2es2hel2 > > Progress: time: Tue, 09 Aug 2011 13:43:25 -0500 > > Exception in echo: > > Arguments: [dog,cat,dinosaur] > > Host: ssh > > Directory: hello_swift-20110809-1343-2es2hel2/jobs/0/echo-0oldh7ek > > - - - > > > > > > Caused by: null > > Caused by: > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > Caused by: > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't > > read key due to cryptography problems: > > java.security.NoSuchAlgorithmException: Unsupported passphrase > > algorithm: AES-128-CBC > > Final status: time: Tue, 09 Aug 2011 13:43:26 -0500 Failed:1 > > The following errors have occurred: > > 1. Can't read key due to cryptography problems: > > java.security.NoSuchAlgorithmException: Unsupported passphrase > > algorithm: AES-128-CBC > > > > > > > > > > any thoughts on this? > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonmon at mcs.anl.gov Wed Aug 10 23:54:24 2011 From: jonmon at mcs.anl.gov (=?utf-8?B?Sm9uYXRoYW4gTW9uZXR0ZQ==?=) Date: Wed, 10 Aug 2011 23:54:24 -0500 Subject: [Swift-devel] =?utf-8?q?ssh_test_case_on_pads/beagle?= Message-ID: <20110811045409.99852121A8@zimbra.anl.gov> Could you post the sites file? ----- Reply message ----- From: "Alberto Chavez" Date: Wed, Aug 10, 2011 7:16 pm Subject: [Swift-devel] ssh test case on pads/beagle To: Cc: "Mihael Hategan" , "Swift Devel" -------------- next part -------------- An HTML attachment was scrubbed... URL: From alberto_chavez at live.com Thu Aug 11 01:17:40 2011 From: alberto_chavez at live.com (Alberto Chavez) Date: Thu, 11 Aug 2011 01:17:40 -0500 Subject: [Swift-devel] ssh test case on pads/beagle In-Reply-To: <20110811045409.99852121A8@zimbra.anl.gov> References: <20110811045409.99852121A8@zimbra.anl.gov> Message-ID: Sure: 0 /home/achavez/swiftwork To: alberto_chavez at live.com From: jonmon at mcs.anl.gov CC: hategan at mcs.anl.gov; swift-devel at ci.uchicago.edu Subject: Re: [Swift-devel] ssh test case on pads/beagle Date: Wed, 10 Aug 2011 23:54:24 -0500 Could you post the sites file? ----- Reply message ----- From: "Alberto Chavez" Date: Wed, Aug 10, 2011 7:16 pm Subject: [Swift-devel] ssh test case on pads/beagle To: Cc: "Mihael Hategan" , "Swift Devel" Exit code "127" normally means that a particular function doesn't exist. Are you sure that all those paths to apps exist?> Yes, I doubled check that and those are the right paths to the apps. Also, I am not sure if this is a problem but shouldn't there be a third column in the app file? LIke"ssh echo /bin/echo null null null" Looking at the documentation for the transformation catalog, the structure should be: site, transformation name, executable path, installation status, platform, and profile entries. The installation status and platform fields are not used. Set them to INSTALLED and INTEL32::LINUX respectively.The profiles field should be set to null if no profile entries are to be specified, or should contain the profile entries separated by semicolons. but even when I switch the columns to INSTALLED and INTEL32::LINUX and keep the profiles field set to null, I'm still getting the same exit code.On Aug 10, 2011, at 6:41 PM, Alberto Chavez wrote:I changed my ssh-key, and they worked on the MCS machines because the authorized_keys file has not been updated yet on the CI Machines. I created a new ssh-key using:ssh-keygen -t rsa -b 2048exactly as the MCS site suggested,On the other hand, I still have a problem, I am getting the following error: Swift svn swift-r4978 cog-r3226 RunID: 20110810-1819-1cdo2o62Progress: time: Wed, 10 Aug 2011 18:19:42 -0500Exception in cat:Arguments: [data.txt]Host: sshDirectory: 001-catsn-ssh-20110810-1819-1cdo2o62/jobs/9/cat-9jd0g9ek- - -Caused by: nullCaused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 127Final status: time: Wed, 10 Aug 2011 18:20:00 -0500 Failed:10The following errors have occurred:1. Job failed with an exit code of 127 (10 times) These are the contents of the log: Execution completed with errors 2011-08-10 18:19:43,251-0500 INFO ConnectionProtocol Freeing channel 0 [Unnamed Channel]2011-08-10 18:19:43,263-0500 INFO Exec Exit code 1272011-08-10 18:19:43,269-0500 INFO ConnectionProtocol Freeing channel 0 [Unnamed Channel]2011-08-10 18:19:43,277-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION jobid=cat-9jd0g9ek - Application exception: nullCaused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 1272011-08-10 18:19:43,280-0500 INFO vdl:execute END_FAILURE thread=0-5-3-1 tr=cat2011-08-10 18:19:43,281-0500 INFO vdl:execute Exception in cat: at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250) at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254) at org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636)2011-08-10 18:20:00,332-0500 INFO ExecutionContext Detailed exception: Execution completed with errors at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250) at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254) at org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636)I believe that the problem resides on the TC file because when I run a much simpler SwiftScript like:int i = 9;trace(i);I get the following output:swift traceme.swift -tc.file tc.template.data -sites.file sites.template.xml -config cf Swift svn swift-r4978 cog-r3226RunID: 20110810-1832-buktjj3dProgress: time: Wed, 10 Aug 2011 18:32:30 -0500SwiftScript trace: 9.0Final status: time: Wed, 10 Aug 2011 18:32:30 -0500but as soon as I start using the commands stated the TC file, I get the "exit code 127"My tc file reads:ssh echo /bin/echo null nullssh cat /bin/cat null nullssh ls /bin/ls null nullssh grep /bin/grep null nullssh sort /bin/sort null nullssh paste /bin/paste null nullssh wc /usr/bin/wc null nullI am working on the login node of the MCS machine trying to ssh via Swift to steamroller. > Subject: Re: [Swift-devel] ssh test case on pads/beagle > From: hategan at mcs.anl.gov > To: alberto_chavez at live.com > CC: swift-devel at ci.uchicago.edu > Date: Tue, 9 Aug 2011 11:57:06 -0700 > > Hmm: Unsupported passphrase algorithm: AES-128-CBC > > I'll try to see how that can be fixed. In the mean time, can you > generate a new key pair with 3DES encryption instead and use that? > > On Tue, 2011-08-09 at 13:43 -0500, Alberto Chavez wrote: > > Hello, > > > > > > I am trying to run a simpler case than ssh-pbs-coaster test case, and > > I'm still having the same error. > > Now I am running only ssh test case > > (/tests/providers/ssh/001-catsn-ssn.swift) > > > > > > The command line is: > > swift -config cf -tc.file tc.template.data -sites.file > > sites.template.xml 001-catsn-ssh.swift > > > > > > The output: > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > > > > RunID: 20110809-1336-ohte788a > > Progress: time: Tue, 09 Aug 2011 13:36:42 -0500 > > Exception in cat: > > Arguments: [data.txt] > > Host: ssh > > Directory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/m/cat-mq74h7ek > > - - - > > > > > > Caused by: null > > Caused by: > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > Caused by: > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't > > read key due to cryptography problems: > > java.security.NoSuchAlgorithmException: Unsupported passphrase > > algorithm: AES-128-CBC > > Progress: time: Tue, 09 Aug 2011 13:36:43 -0500 Selecting site:8 > > Submitting:1 Failed:1 > > Exception in cat: > > Arguments: [data.txt] > > Host: ssh > > Directory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/n/cat-nq74h7ek > > - - - > > > > > > Caused by: null > > Caused by: > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > Caused by: > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't > > read key due to cryptography problems: > > java.security.NoSuchAlgorithmException: Unsupported passphrase > > algorithm: AES-128-CBC > > Progress: time: Tue, 09 Aug 2011 13:36:44 -0500 Selecting site:7 > > Submitting:1 Failed:2 > > Exception in cat: > > Arguments: [data.txt] > > Host: ssh > > Directory: 001-catsn-ssh-20110809-1336-ohte788a/jobs/o/cat-oq74h7ek > > - - - > > > > > > Caused by: null > > Caused by: > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > Caused by: > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't > > read key due to cryptography problems: > > java.security.NoSuchAlgorithmException: Unsupported passphrase > > algorithm: AES-128-CBC > > "error_log.log" 105L, 5770C > > > > > > My auth.defaults reads: > > > > > > login1.beagle.ci.uchicago.edu.type=key > > login1.beagle.ci.uchicago.edu.username=achavez > > login1.beagle.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > > login1.pads.ci.uchicago.edu.type=key > > login1.pads.ci.uchicago.edu.username=achavez > > login1.pads.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > > > > > > and it has been set to 600, I ommited the passphrase line, but it is > > there, and the passphrase is right because I just verified it in two > > ways: > > 1) by logging to pads and beagle without providing a password > > 2) "changed" the password. I the "new" password is the same as the > > "old" one. > > > > sites.templates.xml: > > > > > > > > > jobmanager="ssh"/> > > > > 0 > > /home/achavez/swiftwork > > > > > > > > > > config file: > > > > wrapperlog.always.transfer=true > > sitedir.keep=true > > execution.retries=0 > > lazy.errors=true > > status.mode=provider > > use.provider.staging=true > > provider.staging.pin.swiftfiles=false > > foreach.max.threads=10 > > provenance.log=true > > > > > > > > > > > > I also tried a simpler SwiftScript: > > > > > > type filemsg; > > > > > > app (filemsg output) hello(string s) > > { > > echo s stdout=@filename(output); > > } > > > > > > filemsg myfile<"dogcatdinosaur.out">; > > myfile = hello("dog,cat,dinosaur"); > > > > > > and I get the following output: > > > > > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > > > > RunID: 20110809-1343-2es2hel2 > > Progress: time: Tue, 09 Aug 2011 13:43:25 -0500 > > Exception in echo: > > Arguments: [dog,cat,dinosaur] > > Host: ssh > > Directory: hello_swift-20110809-1343-2es2hel2/jobs/0/echo-0oldh7ek > > - - - > > > > > > Caused by: null > > Caused by: > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > Caused by: > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: Can't > > read key due to cryptography problems: > > java.security.NoSuchAlgorithmException: Unsupported passphrase > > algorithm: AES-128-CBC > > Final status: time: Tue, 09 Aug 2011 13:43:26 -0500 Failed:1 > > The following errors have occurred: > > 1. Can't read key due to cryptography problems: > > java.security.NoSuchAlgorithmException: Unsupported passphrase > > algorithm: AES-128-CBC > > > > > > > > > > any thoughts on this? > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Thu Aug 11 02:18:07 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 11 Aug 2011 00:18:07 -0700 Subject: [Swift-devel] ssh test case on pads/beagle In-Reply-To: References: <20110811045409.99852121A8@zimbra.anl.gov> Message-ID: <1313047087.3215.6.camel@blabla> Can you post (a link to) the entire log file? Since it contains both the tc.data and sites.xml and the error, it's probably always better to post than individual snippets. On Thu, 2011-08-11 at 01:17 -0500, Alberto Chavez wrote: > Sure: > > > > > > 0 > /home/achavez/swiftwork > > > > > ______________________________________________________________________ > To: alberto_chavez at live.com > From: jonmon at mcs.anl.gov > CC: hategan at mcs.anl.gov; swift-devel at ci.uchicago.edu > Subject: Re: [Swift-devel] ssh test case on pads/beagle > Date: Wed, 10 Aug 2011 23:54:24 -0500 > > Could you post the sites file? > > ----- Reply message ----- > From: "Alberto Chavez" > Date: Wed, Aug 10, 2011 7:16 pm > Subject: [Swift-devel] ssh test case on pads/beagle > To: > Cc: "Mihael Hategan" , "Swift Devel" > > > > > Exit code "127" normally means that a particular function doesn't > exist. Are you sure that all those paths to apps exist? > > Yes, I doubled check that and those are the right paths to the apps. > > > Also, I am not sure if this is a problem but shouldn't there be a > third column in the app file? LIke > "ssh echo /bin/echo null null null" > > > > > Looking at the documentation for the transformation catalog, the > structure should be: > > site, transformation name, executable path, installation status, > platform, and profile entries. > > > > > > The installation status and platform fields are not used. Set them > to INSTALLED and INTEL32::LINUX respectively. > > The profiles field should be set to null if no profile entries are to > be specified, or should contain the profile entries separated by > semicolons. > > > but even when I switch the columns to INSTALLED and INTEL32::LINUX and > keep the profiles field set to null, I'm still getting the same exit > code. > > > On Aug 10, 2011, at 6:41 PM, Alberto Chavez wrote: > > I changed my ssh-key, and they worked on the MCS machines > because the authorized_keys file has not been updated yet on > the CI Machines. > I created a new ssh-key using: > ssh-keygen -t rsa -b 2048 > exactly as the MCS site suggested, > On the other hand, I still have a problem, I am getting the > following error: > > > Swift svn swift-r4978 cog-r3226 > > > RunID: 20110810-1819-1cdo2o62 > Progress: time: Wed, 10 Aug 2011 18:19:42 -0500 > Exception in cat: > Arguments: [data.txt] > Host: ssh > Directory: > 001-catsn-ssh-20110810-1819-1cdo2o62/jobs/9/cat-9jd0g9ek > - - - > Caused by: null > Caused by: > org.globus.cog.abstraction.impl.common.execution.JobException: > Job failed with an exit code of 127 > Final status: time: Wed, 10 Aug 2011 18:20:00 -0500 > Failed:10 > The following errors have occurred: > 1. Job failed with an exit code of 127 (10 times) > > > > > These are the contents of the log: > > > Execution completed with errors > > > 2011-08-10 18:19:43,251-0500 INFO ConnectionProtocol Freeing > channel 0 [Unnamed Channel] > 2011-08-10 18:19:43,263-0500 INFO Exec Exit code 127 > 2011-08-10 18:19:43,269-0500 INFO ConnectionProtocol Freeing > channel 0 [Unnamed Channel] > 2011-08-10 18:19:43,277-0500 DEBUG vdl:execute2 > APPLICATION_EXCEPTION jobid=cat-9jd0g9ek - Application > exception: null > Caused by: > org.globus.cog.abstraction.impl.common.execution.JobException: > Job failed with an exit code of 127 > 2011-08-10 18:19:43,280-0500 INFO vdl:execute END_FAILURE > thread=0-5-3-1 tr=cat > 2011-08-10 18:19:43,281-0500 INFO vdl:execute Exception in > cat: > at > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254) > at > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > at > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > at > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > at > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > at > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > at > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > at java.util.concurrent.Executors > $RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask > $Sync.innerRun(FutureTask.java:334) > at > java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at java.util.concurrent.ThreadPoolExecutor > $Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:636) > 2011-08-10 18:20:00,332-0500 INFO ExecutionContext Detailed > exception: > > > Execution completed with errors > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254) > at > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > at > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > at > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > at > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > at > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > at > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > at java.util.concurrent.Executors > $RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask > $Sync.innerRun(FutureTask.java:334) > at > java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at java.util.concurrent.ThreadPoolExecutor > $Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:636) > > I believe that the problem resides on the TC file because when > I run a much simpler SwiftScript like: > > > int i = 9; > trace(i); > > > I get the following output: > > > swift traceme.swift -tc.file tc.template.data > -sites.file sites.template.xml -config cf > Swift svn swift-r4978 cog-r3226 > > > RunID: 20110810-1832-buktjj3d > Progress: time: Wed, 10 Aug 2011 18:32:30 -0500 > SwiftScript trace: 9.0 > Final status: time: Wed, 10 Aug 2011 18:32:30 -0500 > > > but as soon as I start using the commands stated the TC file, > I get the "exit code 127" > > > My tc file reads: > > > ssh echo /bin/echo null null > ssh cat /bin/cat null null > ssh ls /bin/ls null null > ssh grep /bin/grep null null > ssh sort /bin/sort null null > ssh paste /bin/paste null null > ssh wc /usr/bin/wc null null > > > I am working on the login node of the MCS machine trying to > ssh via Swift to steamroller. > > > > Subject: Re: [Swift-devel] ssh test case on pads/beagle > > From: hategan at mcs.anl.gov > > To: alberto_chavez at live.com > > CC: swift-devel at ci.uchicago.edu > > Date: Tue, 9 Aug 2011 11:57:06 -0700 > > > > Hmm: Unsupported passphrase algorithm: AES-128-CBC > > > > I'll try to see how that can be fixed. In the mean time, can > you > > generate a new key pair with 3DES encryption instead and use > that? > > > > On Tue, 2011-08-09 at 13:43 -0500, Alberto Chavez wrote: > > > Hello, > > > > > > > > > I am trying to run a simpler case than ssh-pbs-coaster > test case, and > > > I'm still having the same error. > > > Now I am running only ssh test case > > > (/tests/providers/ssh/001-catsn-ssn.swift) > > > > > > > > > The command line is: > > > swift -config cf -tc.file tc.template.data -sites.file > > > sites.template.xml 001-catsn-ssh.swift > > > > > > > > > The output: > > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > > > > > > > RunID: 20110809-1336-ohte788a > > > Progress: time: Tue, 09 Aug 2011 13:36:42 -0500 > > > Exception in cat: > > > Arguments: [data.txt] > > > Host: ssh > > > Directory: > 001-catsn-ssh-20110809-1336-ohte788a/jobs/m/cat-mq74h7ek > > > - - - > > > > > > > > > Caused by: null > > > Caused by: > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > Caused by: > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > Can't > > > read key due to cryptography problems: > > > java.security.NoSuchAlgorithmException: Unsupported > passphrase > > > algorithm: AES-128-CBC > > > Progress: time: Tue, 09 Aug 2011 13:36:43 -0500 Selecting > site:8 > > > Submitting:1 Failed:1 > > > Exception in cat: > > > Arguments: [data.txt] > > > Host: ssh > > > Directory: > 001-catsn-ssh-20110809-1336-ohte788a/jobs/n/cat-nq74h7ek > > > - - - > > > > > > > > > Caused by: null > > > Caused by: > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > Caused by: > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > Can't > > > read key due to cryptography problems: > > > java.security.NoSuchAlgorithmException: Unsupported > passphrase > > > algorithm: AES-128-CBC > > > Progress: time: Tue, 09 Aug 2011 13:36:44 -0500 Selecting > site:7 > > > Submitting:1 Failed:2 > > > Exception in cat: > > > Arguments: [data.txt] > > > Host: ssh > > > Directory: > 001-catsn-ssh-20110809-1336-ohte788a/jobs/o/cat-oq74h7ek > > > - - - > > > > > > > > > Caused by: null > > > Caused by: > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > Caused by: > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > Can't > > > read key due to cryptography problems: > > > java.security.NoSuchAlgorithmException: Unsupported > passphrase > > > algorithm: AES-128-CBC > > > "error_log.log" 105L, 5770C > > > > > > > > > My auth.defaults reads: > > > > > > > > > login1.beagle.ci.uchicago.edu.type=key > > > login1.beagle.ci.uchicago.edu.username=achavez > > > > login1.beagle.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > > > > > login1.pads.ci.uchicago.edu.type=key > > > login1.pads.ci.uchicago.edu.username=achavez > > > > login1.pads.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > > > > > > > > > > > and it has been set to 600, I ommited the passphrase line, > but it is > > > there, and the passphrase is right because I just verified > it in two > > > ways: > > > 1) by logging to pads and beagle without providing a > password > > > 2) "changed" the password. I the "new" password is the > same as the > > > "old" one. > > > > > > sites.templates.xml: > > > > > > > > > > > > url="login1.pads.ci.uchicago.edu" > > > jobmanager="ssh"/> > > > url="login1.pads.ci.uchicago.edu" /> > > > 0 > > > /home/achavez/swiftwork > > > > > > > > > > > > > > > config file: > > > > > > wrapperlog.always.transfer=true > > > sitedir.keep=true > > > execution.retries=0 > > > lazy.errors=true > > > status.mode=provider > > > use.provider.staging=true > > > provider.staging.pin.swiftfiles=false > > > foreach.max.threads=10 > > > provenance.log=true > > > > > > > > > > > > > > > > > > I also tried a simpler SwiftScript: > > > > > > > > > type filemsg; > > > > > > > > > app (filemsg output) hello(string s) > > > { > > > echo s stdout=@filename(output); > > > } > > > > > > > > > filemsg myfile<"dogcatdinosaur.out">; > > > myfile = hello("dog,cat,dinosaur"); > > > > > > > > > and I get the following output: > > > > > > > > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > > > > > > > RunID: 20110809-1343-2es2hel2 > > > Progress: time: Tue, 09 Aug 2011 13:43:25 -0500 > > > Exception in echo: > > > Arguments: [dog,cat,dinosaur] > > > Host: ssh > > > Directory: > hello_swift-20110809-1343-2es2hel2/jobs/0/echo-0oldh7ek > > > - - - > > > > > > > > > Caused by: null > > > Caused by: > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > Caused by: > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > Can't > > > read key due to cryptography problems: > > > java.security.NoSuchAlgorithmException: Unsupported > passphrase > > > algorithm: AES-128-CBC > > > Final status: time: Tue, 09 Aug 2011 13:43:26 -0500 > Failed:1 > > > The following errors have occurred: > > > 1. Can't read key due to cryptography problems: > > > java.security.NoSuchAlgorithmException: Unsupported > passphrase > > > algorithm: AES-128-CBC > > > > > > > > > > > > > > > any thoughts on this? > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > From benc at hawaga.org.uk Thu Aug 11 04:57:15 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 11 Aug 2011 11:57:15 +0200 Subject: [Swift-devel] Call function. Message-ID: <84DC8B4A-4324-46AD-929C-2ADA0EFBDA62@hawaga.org.uk> That's moving a big jump away from compile time type checking: you can't check the return types if you don't know anything about the function to call. Does that matter for Swift? Its nice to find errors before you embark on a long run. But the strongly-typed-ness of swift doesn't otherwise seem too useful. Do you need general string based invocation? Where are you getting these strings from? If you want to preserve type checking, then in your example in your message, you could use first order function references (eg: functions in Haskell, function pointers in C) which can carry their type with them. Your example might then look like: int assoc_array [ string ] [ int ]; assoc_array[do_x][1] = 1000; assoc_array[do_x][2] = 1001; assoc_array[do_y][1] = 5000;assoc_array[do_y][2] = 5002; foreach i in assoc_array { foreach vi in assoc_array[ i ] { call ( i , vi ); }} All I did there was remove the quotes, and make each function usable as a variable name (which happens in other languages too - both C and Haskell). The return type of call( f, ...) is then the return type of f, and the type of other parameters of the call are the type the other parameters of f. That's making the type system more fancy, though, in a way that might not actually make this more useful for users doing actual things. (but I don't know what your real application use case is). It also excludes the use case of the function names really being dynamic - for example, something like: s = read(file_containing_a_function_name); call(s,4); Is that a use case you expect? From yadudoc1729 at gmail.com Thu Aug 11 07:45:38 2011 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Thu, 11 Aug 2011 18:15:38 +0530 Subject: [Swift-devel] Call function. In-Reply-To: <84DC8B4A-4324-46AD-929C-2ADA0EFBDA62@hawaga.org.uk> References: <84DC8B4A-4324-46AD-929C-2ADA0EFBDA62@hawaga.org.uk> Message-ID: > That's moving a big jump away from compile time type checking: you can't check the return types if you don't know anything about the function to call. > Does that matter for Swift? Its nice to find errors before you embark on a long run. But the strongly-typed-ness of swift doesn't otherwise seem too useful. Well, What I plan on doing is the first string passed to a "call" function will need to be a function identifier and as we translate to karajan lookup the type of the function and ensure that the return and input args match. I haven't gotten there yet, I'm still arm-twisting the parser to accept the new syntax. > Do you need general string based invocation? Where are you getting these strings from? Well, I don't understand if it makes a difference. Its easier with strings, because we then just need to pass them on to executeElement which now accepts the string identifier of a procedure. > If you want to preserve type checking, then in your example in your message, you could use first order function references (eg: functions in Haskell, function pointers in C) which can carry their type with them. > > Your example might then look like: > > int assoc_array [ string ] [ int ]; > assoc_array[do_x][1] = 1000; assoc_array[do_x][2] = 1001; > assoc_array[do_y][1] = 5000;assoc_array[do_y][2] = 5002; > foreach i in assoc_array { > ? ? foreach vi ?in assoc_array[ i ] { > ? ? ? ? ? ?call ( i , vi ?); > }} > > All I did there was remove the quotes, and make each function usable as a variable name (which happens in other languages too - both C and Haskell). > > The return type of call( f, ...) is then the return type of f, and the type of other parameters of the call are the type the other parameters of f. I don't know enough to actually comment on that, I think. > That's making the type system more fancy, though, in a way that might not actually make this more useful for users doing actual things. (but I don't know what your real application use case is). > It also excludes the use case of the function names really being dynamic - for example, something like: > s = read(file_containing_a_function_name); > call(s,4); > Is that a use case you expect? I don't understand this example. What I need is a way to pass functions to other functions. In the map , reduce , fold style functions we need to pass a function and the list of items to operate on. I'm trying to make that possible here. The end result I'm looking for is the map-reduce style. -- Thanks and Regards, Yadu Nand B From benc at hawaga.org.uk Thu Aug 11 08:03:35 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 11 Aug 2011 13:03:35 +0000 (GMT) Subject: [Swift-devel] Call function. In-Reply-To: References: <84DC8B4A-4324-46AD-929C-2ADA0EFBDA62@hawaga.org.uk> Message-ID: > > That's moving a big jump away from compile time type checking: you > > can't check the return types if you don't know anything about the > > function to call. Does that matter for Swift? Its nice to find errors > > before you embark on a long run. But the strongly-typed-ness of swift > > doesn't otherwise seem too useful. > Well, What I plan on doing is the first string passed to a "call" function > will need to be a function identifier and as we translate to karajan lookup > the type of the function and ensure that the return and input args match. If you have an arbitrary string, you can't know what is in that string until runtime - potentially after a lot of other stuff has run. So you can only do that check at runtime - potentially after a lot of other stuff has run. You will eventually be able to check - but I was mostly highlighting the fact that this can't happen at compile time, in general; and then asking (swift people in general) if compile time (i.e. start of the run) type checking matters here. -- From alberto_chavez at live.com Thu Aug 11 08:31:53 2011 From: alberto_chavez at live.com (Alberto Chavez) Date: Thu, 11 Aug 2011 08:31:53 -0500 Subject: [Swift-devel] ssh test case on pads/beagle In-Reply-To: <1313047087.3215.6.camel@blabla> References: <20110811045409.99852121A8@zimbra.anl.gov>, , <1313047087.3215.6.camel@blabla> Message-ID: Sure, attached are the output of stdout and stderror, and the log generated by swift. > Subject: RE: [Swift-devel] ssh test case on pads/beagle > From: hategan at mcs.anl.gov > To: alberto_chavez at live.com > CC: jonmon at mcs.anl.gov; swift-devel at ci.uchicago.edu > Date: Thu, 11 Aug 2011 00:18:07 -0700 > > Can you post (a link to) the entire log file? Since it contains both the > tc.data and sites.xml and the error, it's probably always better to post > than individual snippets. > > On Thu, 2011-08-11 at 01:17 -0500, Alberto Chavez wrote: > > Sure: > > > > > > > > > > > > 0 > > /home/achavez/swiftwork > > > > > > > > > > ______________________________________________________________________ > > To: alberto_chavez at live.com > > From: jonmon at mcs.anl.gov > > CC: hategan at mcs.anl.gov; swift-devel at ci.uchicago.edu > > Subject: Re: [Swift-devel] ssh test case on pads/beagle > > Date: Wed, 10 Aug 2011 23:54:24 -0500 > > > > Could you post the sites file? > > > > ----- Reply message ----- > > From: "Alberto Chavez" > > Date: Wed, Aug 10, 2011 7:16 pm > > Subject: [Swift-devel] ssh test case on pads/beagle > > To: > > Cc: "Mihael Hategan" , "Swift Devel" > > > > > > > > > > Exit code "127" normally means that a particular function doesn't > > exist. Are you sure that all those paths to apps exist? > > > Yes, I doubled check that and those are the right paths to the apps. > > > > > > Also, I am not sure if this is a problem but shouldn't there be a > > third column in the app file? LIke > > "ssh echo /bin/echo null null null" > > > > > > > > > > Looking at the documentation for the transformation catalog, the > > structure should be: > > > > site, transformation name, executable path, installation status, > > platform, and profile entries. > > > > > > > > > > > > The installation status and platform fields are not used. Set them > > to INSTALLED and INTEL32::LINUX respectively. > > > > The profiles field should be set to null if no profile entries are to > > be specified, or should contain the profile entries separated by > > semicolons. > > > > > > but even when I switch the columns to INSTALLED and INTEL32::LINUX and > > keep the profiles field set to null, I'm still getting the same exit > > code. > > > > > > On Aug 10, 2011, at 6:41 PM, Alberto Chavez wrote: > > > > I changed my ssh-key, and they worked on the MCS machines > > because the authorized_keys file has not been updated yet on > > the CI Machines. > > I created a new ssh-key using: > > ssh-keygen -t rsa -b 2048 > > exactly as the MCS site suggested, > > On the other hand, I still have a problem, I am getting the > > following error: > > > > > > Swift svn swift-r4978 cog-r3226 > > > > > > RunID: 20110810-1819-1cdo2o62 > > Progress: time: Wed, 10 Aug 2011 18:19:42 -0500 > > Exception in cat: > > Arguments: [data.txt] > > Host: ssh > > Directory: > > 001-catsn-ssh-20110810-1819-1cdo2o62/jobs/9/cat-9jd0g9ek > > - - - > > Caused by: null > > Caused by: > > org.globus.cog.abstraction.impl.common.execution.JobException: > > Job failed with an exit code of 127 > > Final status: time: Wed, 10 Aug 2011 18:20:00 -0500 > > Failed:10 > > The following errors have occurred: > > 1. Job failed with an exit code of 127 (10 times) > > > > > > > > > > These are the contents of the log: > > > > > > Execution completed with errors > > > > > > 2011-08-10 18:19:43,251-0500 INFO ConnectionProtocol Freeing > > channel 0 [Unnamed Channel] > > 2011-08-10 18:19:43,263-0500 INFO Exec Exit code 127 > > 2011-08-10 18:19:43,269-0500 INFO ConnectionProtocol Freeing > > channel 0 [Unnamed Channel] > > 2011-08-10 18:19:43,277-0500 DEBUG vdl:execute2 > > APPLICATION_EXCEPTION jobid=cat-9jd0g9ek - Application > > exception: null > > Caused by: > > org.globus.cog.abstraction.impl.common.execution.JobException: > > Job failed with an exit code of 127 > > 2011-08-10 18:19:43,280-0500 INFO vdl:execute END_FAILURE > > thread=0-5-3-1 tr=cat > > 2011-08-10 18:19:43,281-0500 INFO vdl:execute Exception in > > cat: > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254) > > at > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27) > > at > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > at > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > at > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > at > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > at > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > at java.util.concurrent.Executors > > $RunnableAdapter.call(Executors.java:471) > > at java.util.concurrent.FutureTask > > $Sync.innerRun(FutureTask.java:334) > > at > > java.util.concurrent.FutureTask.run(FutureTask.java:166) > > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > at java.util.concurrent.ThreadPoolExecutor > > $Worker.run(ThreadPoolExecutor.java:603) > > at java.lang.Thread.run(Thread.java:636) > > 2011-08-10 18:20:00,332-0500 INFO ExecutionContext Detailed > > exception: > > > > > > Execution completed with errors > > > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254) > > at > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27) > > at > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > at > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > at > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > at > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > at > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > at java.util.concurrent.Executors > > $RunnableAdapter.call(Executors.java:471) > > at java.util.concurrent.FutureTask > > $Sync.innerRun(FutureTask.java:334) > > at > > java.util.concurrent.FutureTask.run(FutureTask.java:166) > > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > at java.util.concurrent.ThreadPoolExecutor > > $Worker.run(ThreadPoolExecutor.java:603) > > at java.lang.Thread.run(Thread.java:636) > > > > I believe that the problem resides on the TC file because when > > I run a much simpler SwiftScript like: > > > > > > int i = 9; > > trace(i); > > > > > > I get the following output: > > > > > > swift traceme.swift -tc.file tc.template.data > > -sites.file sites.template.xml -config cf > > Swift svn swift-r4978 cog-r3226 > > > > > > RunID: 20110810-1832-buktjj3d > > Progress: time: Wed, 10 Aug 2011 18:32:30 -0500 > > SwiftScript trace: 9.0 > > Final status: time: Wed, 10 Aug 2011 18:32:30 -0500 > > > > > > but as soon as I start using the commands stated the TC file, > > I get the "exit code 127" > > > > > > My tc file reads: > > > > > > ssh echo /bin/echo null null > > ssh cat /bin/cat null null > > ssh ls /bin/ls null null > > ssh grep /bin/grep null null > > ssh sort /bin/sort null null > > ssh paste /bin/paste null null > > ssh wc /usr/bin/wc null null > > > > > > I am working on the login node of the MCS machine trying to > > ssh via Swift to steamroller. > > > > > > > Subject: Re: [Swift-devel] ssh test case on pads/beagle > > > From: hategan at mcs.anl.gov > > > To: alberto_chavez at live.com > > > CC: swift-devel at ci.uchicago.edu > > > Date: Tue, 9 Aug 2011 11:57:06 -0700 > > > > > > Hmm: Unsupported passphrase algorithm: AES-128-CBC > > > > > > I'll try to see how that can be fixed. In the mean time, can > > you > > > generate a new key pair with 3DES encryption instead and use > > that? > > > > > > On Tue, 2011-08-09 at 13:43 -0500, Alberto Chavez wrote: > > > > Hello, > > > > > > > > > > > > I am trying to run a simpler case than ssh-pbs-coaster > > test case, and > > > > I'm still having the same error. > > > > Now I am running only ssh test case > > > > (/tests/providers/ssh/001-catsn-ssn.swift) > > > > > > > > > > > > The command line is: > > > > swift -config cf -tc.file tc.template.data -sites.file > > > > sites.template.xml 001-catsn-ssh.swift > > > > > > > > > > > > The output: > > > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > > > > > > > > > > RunID: 20110809-1336-ohte788a > > > > Progress: time: Tue, 09 Aug 2011 13:36:42 -0500 > > > > Exception in cat: > > > > Arguments: [data.txt] > > > > Host: ssh > > > > Directory: > > 001-catsn-ssh-20110809-1336-ohte788a/jobs/m/cat-mq74h7ek > > > > - - - > > > > > > > > > > > > Caused by: null > > > > Caused by: > > > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > > Caused by: > > > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > > Can't > > > > read key due to cryptography problems: > > > > java.security.NoSuchAlgorithmException: Unsupported > > passphrase > > > > algorithm: AES-128-CBC > > > > Progress: time: Tue, 09 Aug 2011 13:36:43 -0500 Selecting > > site:8 > > > > Submitting:1 Failed:1 > > > > Exception in cat: > > > > Arguments: [data.txt] > > > > Host: ssh > > > > Directory: > > 001-catsn-ssh-20110809-1336-ohte788a/jobs/n/cat-nq74h7ek > > > > - - - > > > > > > > > > > > > Caused by: null > > > > Caused by: > > > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > > Caused by: > > > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > > Can't > > > > read key due to cryptography problems: > > > > java.security.NoSuchAlgorithmException: Unsupported > > passphrase > > > > algorithm: AES-128-CBC > > > > Progress: time: Tue, 09 Aug 2011 13:36:44 -0500 Selecting > > site:7 > > > > Submitting:1 Failed:2 > > > > Exception in cat: > > > > Arguments: [data.txt] > > > > Host: ssh > > > > Directory: > > 001-catsn-ssh-20110809-1336-ohte788a/jobs/o/cat-oq74h7ek > > > > - - - > > > > > > > > > > > > Caused by: null > > > > Caused by: > > > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > > Caused by: > > > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > > Can't > > > > read key due to cryptography problems: > > > > java.security.NoSuchAlgorithmException: Unsupported > > passphrase > > > > algorithm: AES-128-CBC > > > > "error_log.log" 105L, 5770C > > > > > > > > > > > > My auth.defaults reads: > > > > > > > > > > > > login1.beagle.ci.uchicago.edu.type=key > > > > login1.beagle.ci.uchicago.edu.username=achavez > > > > > > login1.beagle.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > > > > > > > > login1.pads.ci.uchicago.edu.type=key > > > > login1.pads.ci.uchicago.edu.username=achavez > > > > > > login1.pads.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > > > > > > > > > > > > > > > > and it has been set to 600, I ommited the passphrase line, > > but it is > > > > there, and the passphrase is right because I just verified > > it in two > > > > ways: > > > > 1) by logging to pads and beagle without providing a > > password > > > > 2) "changed" the password. I the "new" password is the > > same as the > > > > "old" one. > > > > > > > > sites.templates.xml: > > > > > > > > > > > > > > > > > url="login1.pads.ci.uchicago.edu" > > > > jobmanager="ssh"/> > > > > > url="login1.pads.ci.uchicago.edu" /> > > > > 0 > > > > /home/achavez/swiftwork > > > > > > > > > > > > > > > > > > > > config file: > > > > > > > > wrapperlog.always.transfer=true > > > > sitedir.keep=true > > > > execution.retries=0 > > > > lazy.errors=true > > > > status.mode=provider > > > > use.provider.staging=true > > > > provider.staging.pin.swiftfiles=false > > > > foreach.max.threads=10 > > > > provenance.log=true > > > > > > > > > > > > > > > > > > > > > > > > I also tried a simpler SwiftScript: > > > > > > > > > > > > type filemsg; > > > > > > > > > > > > app (filemsg output) hello(string s) > > > > { > > > > echo s stdout=@filename(output); > > > > } > > > > > > > > > > > > filemsg myfile<"dogcatdinosaur.out">; > > > > myfile = hello("dog,cat,dinosaur"); > > > > > > > > > > > > and I get the following output: > > > > > > > > > > > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > > > > > > > > > > RunID: 20110809-1343-2es2hel2 > > > > Progress: time: Tue, 09 Aug 2011 13:43:25 -0500 > > > > Exception in echo: > > > > Arguments: [dog,cat,dinosaur] > > > > Host: ssh > > > > Directory: > > hello_swift-20110809-1343-2es2hel2/jobs/0/echo-0oldh7ek > > > > - - - > > > > > > > > > > > > Caused by: null > > > > Caused by: > > > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > > Caused by: > > > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > > Can't > > > > read key due to cryptography problems: > > > > java.security.NoSuchAlgorithmException: Unsupported > > passphrase > > > > algorithm: AES-128-CBC > > > > Final status: time: Tue, 09 Aug 2011 13:43:26 -0500 > > Failed:1 > > > > The following errors have occurred: > > > > 1. Can't read key due to cryptography problems: > > > > java.security.NoSuchAlgorithmException: Unsupported > > passphrase > > > > algorithm: AES-128-CBC > > > > > > > > > > > > > > > > > > > > any thoughts on this? > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 001-catsn-ssh-20110811-0828-s51oubu6.log Type: text/x-log Size: 167486 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ssh-test-output.log Type: text/x-log Size: 3559 bytes Desc: not available URL: From wilde at mcs.anl.gov Thu Aug 11 08:57:36 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 11 Aug 2011 08:57:36 -0500 (CDT) Subject: [Swift-devel] ssh test case on pads/beagle In-Reply-To: Message-ID: <2078636548.210814.1313071056159.JavaMail.root@zimbra.anl.gov> Mihael, Ive never seen sites.xml entries showing up in the log - are they supposed to be now? They are not in the log Alberto attached, nor have I seen them in any other log yet. Can we log all the files mentioned in the command line report (the first line of the log) right at the front, along with the source text? Ie, script, tc, sites, and config? Ideally values for all of the swift.properties? Ideally auth.defaults with suitable masking? 0.94 feature? >> 2011-08-11 08:28:03,762-0500 DEBUG Loader arguments: [001-catsn-ssh.swift, -tc.file, tc.template.data, -sites.file, sites.template.xml, -config, cf] Alberto, stop by and we can try to debug this in person, as ssh requires a fair bit of correct configuration to work. We need to look at the cf, sites.template.xml, and cf file. - Mike ----- Original Message ----- From: "Alberto Chavez" To: "Mihael Hategan" Cc: "Swift Devel" Sent: Thursday, August 11, 2011 8:31:53 AM Subject: Re: [Swift-devel] ssh test case on pads/beagle Sure, attached are the output of stdout and stderror, and the log generated by swift. > Subject: RE: [Swift-devel] ssh test case on pads/beagle > From: hategan at mcs.anl.gov > To: alberto_chavez at live.com > CC: jonmon at mcs.anl.gov; swift-devel at ci.uchicago.edu > Date: Thu, 11 Aug 2011 00:18:07 -0700 > > Can you post (a link to) the entire log file? Since it contains both the > tc.data and sites.xml and the error, it's probably always better to post > than individual snippets. > > On Thu, 2011-08-11 at 01:17 -0500, Alberto Chavez wrote: > > Sure: > > > > > > > > > > > > 0 > > /home/achavez/swiftwork > > > > > > > > > > ______________________________________________________________________ > > To: alberto_chavez at live.com > > From: jonmon at mcs.anl.gov > > CC: hategan at mcs.anl.gov; swift-devel at ci.uchicago.edu > > Subject: Re: [Swift-devel] ssh test case on pads/beagle > > Date: Wed, 10 Aug 2011 23:54:24 -0500 > > > > Could you post the sites file? > > > > ----- Reply message ----- > > From: "Alberto Chavez" > > Date: Wed, Aug 10, 2011 7:16 pm > > Subject: [Swift-devel] ssh test case on pads/beagle > > To: > > Cc: "Mihael Hategan" , "Swift Devel" > > > > > > > > > > Exit code "127" normally means that a particular function doesn't > > exist. Are you sure that all those paths to apps exist? > > > Yes, I doubled check that and those are the right paths to the apps. > > > > > > Also, I am not sure if this is a problem but shouldn't there be a > > third column in the app file? LIke > > "ssh echo /bin/echo null null null" > > > > > > > > > > Looking at the documentation for the transformation catalog, the > > structure should be: > > > > site, transformation name, executable path, installation status, > > platform, and profile entries. > > > > > > > > > > > > The installation status and platform fields are not used. Set them > > to INSTALLED and INTEL32::LINUX respectively. > > > > The profiles field should be set to null if no profile entries are to > > be specified, or should contain the profile entries separated by > > semicolons. > > > > > > but even when I switch the columns to INSTALLED and INTEL32::LINUX and > > keep the profiles field set to null, I'm still getting the same exit > > code. > > > > > > On Aug 10, 2011, at 6:41 PM, Alberto Chavez wrote: > > > > I changed my ssh-key, and they worked on the MCS machines > > because the authorized_keys file has not been updated yet on > > the CI Machines. > > I created a new ssh-key using: > > ssh-keygen -t rsa -b 2048 > > exactly as the MCS site suggested, > > On the other hand, I still have a problem, I am getting the > > following error: > > > > > > Swift svn swift-r4978 cog-r3226 > > > > > > RunID: 20110810-1819-1cdo2o62 > > Progress: time: Wed, 10 Aug 2011 18:19:42 -0500 > > Exception in cat: > > Arguments: [data.txt] > > Host: ssh > > Directory: > > 001-catsn-ssh-20110810-1819-1cdo2o62/jobs/9/cat-9jd0g9ek > > - - - > > Caused by: null > > Caused by: > > org.globus.cog.abstraction.impl.common.execution.JobException: > > Job failed with an exit code of 127 > > Final status: time: Wed, 10 Aug 2011 18:20:00 -0500 > > Failed:10 > > The following errors have occurred: > > 1. Job failed with an exit code of 127 (10 times) > > > > > > > > > > These are the contents of the log: > > > > > > Execution completed with errors > > > > > > 2011-08-10 18:19:43,251-0500 INFO ConnectionProtocol Freeing > > channel 0 [Unnamed Channel] > > 2011-08-10 18:19:43,263-0500 INFO Exec Exit code 127 > > 2011-08-10 18:19:43,269-0500 INFO ConnectionProtocol Freeing > > channel 0 [Unnamed Channel] > > 2011-08-10 18:19:43,277-0500 DEBUG vdl:execute2 > > APPLICATION_EXCEPTION jobid=cat-9jd0g9ek - Application > > exception: null > > Caused by: > > org.globus.cog.abstraction.impl.common.execution.JobException: > > Job failed with an exit code of 127 > > 2011-08-10 18:19:43,280-0500 INFO vdl:execute END_FAILURE > > thread=0-5-3-1 tr=cat > > 2011-08-10 18:19:43,281-0500 INFO vdl:execute Exception in > > cat: > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254) > > at > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27) > > at > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > at > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > at > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > at > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > at > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > at java.util.concurrent.Executors > > $RunnableAdapter.call(Executors.java:471) > > at java.util.concurrent.FutureTask > > $Sync.innerRun(FutureTask.java:334) > > at > > java.util.concurrent.FutureTask.run(FutureTask.java:166) > > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > at java.util.concurrent.ThreadPoolExecutor > > $Worker.run(ThreadPoolExecutor.java:603) > > at java.lang.Thread.run(Thread.java:636) > > 2011-08-10 18:20:00,332-0500 INFO ExecutionContext Detailed > > exception: > > > > > > Execution completed with errors > > > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254) > > at > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27) > > at > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > at > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > at > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > at > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > at > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > at java.util.concurrent.Executors > > $RunnableAdapter.call(Executors.java:471) > > at java.util.concurrent.FutureTask > > $Sync.innerRun(FutureTask.java:334) > > at > > java.util.concurrent.FutureTask.run(FutureTask.java:166) > > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > at java.util.concurrent.ThreadPoolExecutor > > $Worker.run(ThreadPoolExecutor.java:603) > > at java.lang.Thread.run(Thread.java:636) > > > > I believe that the problem resides on the TC file because when > > I run a much simpler SwiftScript like: > > > > > > int i = 9; > > trace(i); > > > > > > I get the following output: > > > > > > swift traceme.swift -tc.file tc.template.data > > -sites.file sites.template.xml -config cf > > Swift svn swift-r4978 cog-r3226 > > > > > > RunID: 20110810-1832-buktjj3d > > Progress: time: Wed, 10 Aug 2011 18:32:30 -0500 > > SwiftScript trace: 9.0 > > Final status: time: Wed, 10 Aug 2011 18:32:30 -0500 > > > > > > but as soon as I start using the commands stated the TC file, > > I get the "exit code 127" > > > > > > My tc file reads: > > > > > > ssh echo /bin/echo null null > > ssh cat /bin/cat null null > > ssh ls /bin/ls null null > > ssh grep /bin/grep null null > > ssh sort /bin/sort null null > > ssh paste /bin/paste null null > > ssh wc /usr/bin/wc null null > > > > > > I am working on the login node of the MCS machine trying to > > ssh via Swift to steamroller. > > > > > > > Subject: Re: [Swift-devel] ssh test case on pads/beagle > > > From: hategan at mcs.anl.gov > > > To: alberto_chavez at live.com > > > CC: swift-devel at ci.uchicago.edu > > > Date: Tue, 9 Aug 2011 11:57:06 -0700 > > > > > > Hmm: Unsupported passphrase algorithm: AES-128-CBC > > > > > > I'll try to see how that can be fixed. In the mean time, can > > you > > > generate a new key pair with 3DES encryption instead and use > > that? > > > > > > On Tue, 2011-08-09 at 13:43 -0500, Alberto Chavez wrote: > > > > Hello, > > > > > > > > > > > > I am trying to run a simpler case than ssh-pbs-coaster > > test case, and > > > > I'm still having the same error. > > > > Now I am running only ssh test case > > > > (/tests/providers/ssh/001-catsn-ssn.swift) > > > > > > > > > > > > The command line is: > > > > swift -config cf -tc.file tc.template.data -sites.file > > > > sites.template.xml 001-catsn-ssh.swift > > > > > > > > > > > > The output: > > > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > > > > > > > > > > RunID: 20110809-1336-ohte788a > > > > Progress: time: Tue, 09 Aug 2011 13:36:42 -0500 > > > > Exception in cat: > > > > Arguments: [data.txt] > > > > Host: ssh > > > > Directory: > > 001-catsn-ssh-20110809-1336-ohte788a/jobs/m/cat-mq74h7ek > > > > - - - > > > > > > > > > > > > Caused by: null > > > > Caused by: > > > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > > Caused by: > > > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > > Can't > > > > read key due to cryptography problems: > > > > java.security.NoSuchAlgorithmException: Unsupported > > passphrase > > > > algorithm: AES-128-CBC > > > > Progress: time: Tue, 09 Aug 2011 13:36:43 -0500 Selecting > > site:8 > > > > Submitting:1 Failed:1 > > > > Exception in cat: > > > > Arguments: [data.txt] > > > > Host: ssh > > > > Directory: > > 001-catsn-ssh-20110809-1336-ohte788a/jobs/n/cat-nq74h7ek > > > > - - - > > > > > > > > > > > > Caused by: null > > > > Caused by: > > > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > > Caused by: > > > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > > Can't > > > > read key due to cryptography problems: > > > > java.security.NoSuchAlgorithmException: Unsupported > > passphrase > > > > algorithm: AES-128-CBC > > > > Progress: time: Tue, 09 Aug 2011 13:36:44 -0500 Selecting > > site:7 > > > > Submitting:1 Failed:2 > > > > Exception in cat: > > > > Arguments: [data.txt] > > > > Host: ssh > > > > Directory: > > 001-catsn-ssh-20110809-1336-ohte788a/jobs/o/cat-oq74h7ek > > > > - - - > > > > > > > > > > > > Caused by: null > > > > Caused by: > > > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > > Caused by: > > > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > > Can't > > > > read key due to cryptography problems: > > > > java.security.NoSuchAlgorithmException: Unsupported > > passphrase > > > > algorithm: AES-128-CBC > > > > "error_log.log" 105L, 5770C > > > > > > > > > > > > My auth.defaults reads: > > > > > > > > > > > > login1.beagle.ci.uchicago.edu.type=key > > > > login1.beagle.ci.uchicago.edu.username=achavez > > > > > > login1.beagle.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > > > > > > > > login1.pads.ci.uchicago.edu.type=key > > > > login1.pads.ci.uchicago.edu.username=achavez > > > > > > login1.pads.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > > > > > > > > > > > > > > > > and it has been set to 600, I ommited the passphrase line, > > but it is > > > > there, and the passphrase is right because I just verified > > it in two > > > > ways: > > > > 1) by logging to pads and beagle without providing a > > password > > > > 2) "changed" the password. I the "new" password is the > > same as the > > > > "old" one. > > > > > > > > sites.templates.xml: > > > > > > > > > > > > > > > > > url="login1.pads.ci.uchicago.edu" > > > > jobmanager="ssh"/> > > > > > url="login1.pads.ci.uchicago.edu" /> > > > > 0 > > > > /home/achavez/swiftwork > > > > > > > > > > > > > > > > > > > > config file: > > > > > > > > wrapperlog.always.transfer=true > > > > sitedir.keep=true > > > > execution.retries=0 > > > > lazy.errors=true > > > > status.mode=provider > > > > use.provider.staging=true > > > > provider.staging.pin.swiftfiles=false > > > > foreach.max.threads=10 > > > > provenance.log=true > > > > > > > > > > > > > > > > > > > > > > > > I also tried a simpler SwiftScript: > > > > > > > > > > > > type filemsg; > > > > > > > > > > > > app (filemsg output) hello(string s) > > > > { > > > > echo s stdout=@filename(output); > > > > } > > > > > > > > > > > > filemsg myfile<"dogcatdinosaur.out">; > > > > myfile = hello("dog,cat,dinosaur"); > > > > > > > > > > > > and I get the following output: > > > > > > > > > > > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > > > > > > > > > > RunID: 20110809-1343-2es2hel2 > > > > Progress: time: Tue, 09 Aug 2011 13:43:25 -0500 > > > > Exception in echo: > > > > Arguments: [dog,cat,dinosaur] > > > > Host: ssh > > > > Directory: > > hello_swift-20110809-1343-2es2hel2/jobs/0/echo-0oldh7ek > > > > - - - > > > > > > > > > > > > Caused by: null > > > > Caused by: > > > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > > Caused by: > > > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > > Can't > > > > read key due to cryptography problems: > > > > java.security.NoSuchAlgorithmException: Unsupported > > passphrase > > > > algorithm: AES-128-CBC > > > > Final status: time: Tue, 09 Aug 2011 13:43:26 -0500 > > Failed:1 > > > > The following errors have occurred: > > > > 1. Can't read key due to cryptography problems: > > > > java.security.NoSuchAlgorithmException: Unsupported > > passphrase > > > > algorithm: AES-128-CBC > > > > > > > > > > > > > > > > > > > > any thoughts on this? > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From alberto_chavez at live.com Thu Aug 11 10:04:02 2011 From: alberto_chavez at live.com (Alberto Chavez) Date: Thu, 11 Aug 2011 10:04:02 -0500 Subject: [Swift-devel] ssh test case on pads/beagle In-Reply-To: <2078636548.210814.1313071056159.JavaMail.root@zimbra.anl.gov> References: , <2078636548.210814.1313071056159.JavaMail.root@zimbra.anl.gov> Message-ID: Mike helped me to track down the problem to the configuration file. Since I am not using coasters, the line:user.provider.stagingshould be set to false $ cat cfwrapperlog.always.transfer=truesitedir.keep=trueexecution.retries=0lazy.errors=truestatus.mode=provideruse.provider.staging=falseprovider.staging.pin.swiftfiles=falseforeach.max.threads=10provenance.log=true $ swift -tc.file tc.template.data -sites.file sites.template.xml -config cf 001-catsn-ssh.swift -n=1Swift svn swift-r4978 cog-r3226 RunID: 20110811-1002-pylik8vgProgress: time: Thu, 11 Aug 2011 10:02:39 -0500Progress: time: Thu, 11 Aug 2011 10:02:40 -0500 Submitted:1Final status: time: Thu, 11 Aug 2011 10:02:40 -0500 Finished successfully:1 > Date: Thu, 11 Aug 2011 08:57:36 -0500 > From: wilde at mcs.anl.gov > To: alberto_chavez at live.com > CC: swift-devel at ci.uchicago.edu; hategan at mcs.anl.gov > Subject: Re: [Swift-devel] ssh test case on pads/beagle > > Mihael, Ive never seen sites.xml entries showing up in the log - are they supposed to be now? They are not in the log Alberto attached, nor have I seen them in any other log yet. > > Can we log all the files mentioned in the command line report (the first line of the log) right at the front, along with the source text? Ie, script, tc, sites, and config? Ideally values for all of the swift.properties? Ideally auth.defaults with suitable masking? 0.94 feature? > > >> 2011-08-11 08:28:03,762-0500 DEBUG Loader arguments: [001-catsn-ssh.swift, -tc.file, tc.template.data, -sites.file, sites.template.xml, -config, cf] > > Alberto, stop by and we can try to debug this in person, as ssh requires a fair bit of correct configuration to work. > > We need to look at the cf, sites.template.xml, and cf file. > > - Mike > > > ----- Original Message ----- > > > From: "Alberto Chavez" > To: "Mihael Hategan" > Cc: "Swift Devel" > Sent: Thursday, August 11, 2011 8:31:53 AM > Subject: Re: [Swift-devel] ssh test case on pads/beagle > > > Sure, attached are the output of stdout and stderror, and the log generated by swift. > > > > > Subject: RE: [Swift-devel] ssh test case on pads/beagle > > From: hategan at mcs.anl.gov > > To: alberto_chavez at live.com > > CC: jonmon at mcs.anl.gov; swift-devel at ci.uchicago.edu > > Date: Thu, 11 Aug 2011 00:18:07 -0700 > > > > Can you post (a link to) the entire log file? Since it contains both the > > tc.data and sites.xml and the error, it's probably always better to post > > than individual snippets. > > > > On Thu, 2011-08-11 at 01:17 -0500, Alberto Chavez wrote: > > > Sure: > > > > > > > > > > > > > > > > > > 0 > > > /home/achavez/swiftwork > > > > > > > > > > > > > > > ______________________________________________________________________ > > > To: alberto_chavez at live.com > > > From: jonmon at mcs.anl.gov > > > CC: hategan at mcs.anl.gov; swift-devel at ci.uchicago.edu > > > Subject: Re: [Swift-devel] ssh test case on pads/beagle > > > Date: Wed, 10 Aug 2011 23:54:24 -0500 > > > > > > Could you post the sites file? > > > > > > ----- Reply message ----- > > > From: "Alberto Chavez" > > > Date: Wed, Aug 10, 2011 7:16 pm > > > Subject: [Swift-devel] ssh test case on pads/beagle > > > To: > > > Cc: "Mihael Hategan" , "Swift Devel" > > > > > > > > > > > > > > > Exit code "127" normally means that a particular function doesn't > > > exist. Are you sure that all those paths to apps exist? > > > > Yes, I doubled check that and those are the right paths to the apps. > > > > > > > > > Also, I am not sure if this is a problem but shouldn't there be a > > > third column in the app file? LIke > > > "ssh echo /bin/echo null null null" > > > > > > > > > > > > > > > Looking at the documentation for the transformation catalog, the > > > structure should be: > > > > > > site, transformation name, executable path, installation status, > > > platform, and profile entries. > > > > > > > > > > > > > > > > > > The installation status and platform fields are not used. Set them > > > to INSTALLED and INTEL32::LINUX respectively. > > > > > > The profiles field should be set to null if no profile entries are to > > > be specified, or should contain the profile entries separated by > > > semicolons. > > > > > > > > > but even when I switch the columns to INSTALLED and INTEL32::LINUX and > > > keep the profiles field set to null, I'm still getting the same exit > > > code. > > > > > > > > > On Aug 10, 2011, at 6:41 PM, Alberto Chavez wrote: > > > > > > I changed my ssh-key, and they worked on the MCS machines > > > because the authorized_keys file has not been updated yet on > > > the CI Machines. > > > I created a new ssh-key using: > > > ssh-keygen -t rsa -b 2048 > > > exactly as the MCS site suggested, > > > On the other hand, I still have a problem, I am getting the > > > following error: > > > > > > > > > Swift svn swift-r4978 cog-r3226 > > > > > > > > > RunID: 20110810-1819-1cdo2o62 > > > Progress: time: Wed, 10 Aug 2011 18:19:42 -0500 > > > Exception in cat: > > > Arguments: [data.txt] > > > Host: ssh > > > Directory: > > > 001-catsn-ssh-20110810-1819-1cdo2o62/jobs/9/cat-9jd0g9ek > > > - - - > > > Caused by: null > > > Caused by: > > > org.globus.cog.abstraction.impl.common.execution.JobException: > > > Job failed with an exit code of 127 > > > Final status: time: Wed, 10 Aug 2011 18:20:00 -0500 > > > Failed:10 > > > The following errors have occurred: > > > 1. Job failed with an exit code of 127 (10 times) > > > > > > > > > > > > > > > These are the contents of the log: > > > > > > > > > Execution completed with errors > > > > > > > > > 2011-08-10 18:19:43,251-0500 INFO ConnectionProtocol Freeing > > > channel 0 [Unnamed Channel] > > > 2011-08-10 18:19:43,263-0500 INFO Exec Exit code 127 > > > 2011-08-10 18:19:43,269-0500 INFO ConnectionProtocol Freeing > > > channel 0 [Unnamed Channel] > > > 2011-08-10 18:19:43,277-0500 DEBUG vdl:execute2 > > > APPLICATION_EXCEPTION jobid=cat-9jd0g9ek - Application > > > exception: null > > > Caused by: > > > org.globus.cog.abstraction.impl.common.execution.JobException: > > > Job failed with an exit code of 127 > > > 2011-08-10 18:19:43,280-0500 INFO vdl:execute END_FAILURE > > > thread=0-5-3-1 tr=cat > > > 2011-08-10 18:19:43,281-0500 INFO vdl:execute Exception in > > > cat: > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254) > > > at > > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27) > > > at > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > at > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > at > > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > > at > > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > > at > > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > > at > > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > > at > > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > > at java.util.concurrent.Executors > > > $RunnableAdapter.call(Executors.java:471) > > > at java.util.concurrent.FutureTask > > > $Sync.innerRun(FutureTask.java:334) > > > at > > > java.util.concurrent.FutureTask.run(FutureTask.java:166) > > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > > at java.util.concurrent.ThreadPoolExecutor > > > $Worker.run(ThreadPoolExecutor.java:603) > > > at java.lang.Thread.run(Thread.java:636) > > > 2011-08-10 18:20:00,332-0500 INFO ExecutionContext Detailed > > > exception: > > > > > > > > > Execution completed with errors > > > > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254) > > > at > > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27) > > > at > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > at > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > at > > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > > at > > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > > at > > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > > at > > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > > at > > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > > at java.util.concurrent.Executors > > > $RunnableAdapter.call(Executors.java:471) > > > at java.util.concurrent.FutureTask > > > $Sync.innerRun(FutureTask.java:334) > > > at > > > java.util.concurrent.FutureTask.run(FutureTask.java:166) > > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > > at java.util.concurrent.ThreadPoolExecutor > > > $Worker.run(ThreadPoolExecutor.java:603) > > > at java.lang.Thread.run(Thread.java:636) > > > > > > I believe that the problem resides on the TC file because when > > > I run a much simpler SwiftScript like: > > > > > > > > > int i = 9; > > > trace(i); > > > > > > > > > I get the following output: > > > > > > > > > swift traceme.swift -tc.file tc.template.data > > > -sites.file sites.template.xml -config cf > > > Swift svn swift-r4978 cog-r3226 > > > > > > > > > RunID: 20110810-1832-buktjj3d > > > Progress: time: Wed, 10 Aug 2011 18:32:30 -0500 > > > SwiftScript trace: 9.0 > > > Final status: time: Wed, 10 Aug 2011 18:32:30 -0500 > > > > > > > > > but as soon as I start using the commands stated the TC file, > > > I get the "exit code 127" > > > > > > > > > My tc file reads: > > > > > > > > > ssh echo /bin/echo null null > > > ssh cat /bin/cat null null > > > ssh ls /bin/ls null null > > > ssh grep /bin/grep null null > > > ssh sort /bin/sort null null > > > ssh paste /bin/paste null null > > > ssh wc /usr/bin/wc null null > > > > > > > > > I am working on the login node of the MCS machine trying to > > > ssh via Swift to steamroller. > > > > > > > > > > Subject: Re: [Swift-devel] ssh test case on pads/beagle > > > > From: hategan at mcs.anl.gov > > > > To: alberto_chavez at live.com > > > > CC: swift-devel at ci.uchicago.edu > > > > Date: Tue, 9 Aug 2011 11:57:06 -0700 > > > > > > > > Hmm: Unsupported passphrase algorithm: AES-128-CBC > > > > > > > > I'll try to see how that can be fixed. In the mean time, can > > > you > > > > generate a new key pair with 3DES encryption instead and use > > > that? > > > > > > > > On Tue, 2011-08-09 at 13:43 -0500, Alberto Chavez wrote: > > > > > Hello, > > > > > > > > > > > > > > > I am trying to run a simpler case than ssh-pbs-coaster > > > test case, and > > > > > I'm still having the same error. > > > > > Now I am running only ssh test case > > > > > (/tests/providers/ssh/001-catsn-ssn.swift) > > > > > > > > > > > > > > > The command line is: > > > > > swift -config cf -tc.file tc.template.data -sites.file > > > > > sites.template.xml 001-catsn-ssh.swift > > > > > > > > > > > > > > > The output: > > > > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > > > > > > > > > > > > > RunID: 20110809-1336-ohte788a > > > > > Progress: time: Tue, 09 Aug 2011 13:36:42 -0500 > > > > > Exception in cat: > > > > > Arguments: [data.txt] > > > > > Host: ssh > > > > > Directory: > > > 001-catsn-ssh-20110809-1336-ohte788a/jobs/m/cat-mq74h7ek > > > > > - - - > > > > > > > > > > > > > > > Caused by: null > > > > > Caused by: > > > > > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > > > Caused by: > > > > > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > > > Can't > > > > > read key due to cryptography problems: > > > > > java.security.NoSuchAlgorithmException: Unsupported > > > passphrase > > > > > algorithm: AES-128-CBC > > > > > Progress: time: Tue, 09 Aug 2011 13:36:43 -0500 Selecting > > > site:8 > > > > > Submitting:1 Failed:1 > > > > > Exception in cat: > > > > > Arguments: [data.txt] > > > > > Host: ssh > > > > > Directory: > > > 001-catsn-ssh-20110809-1336-ohte788a/jobs/n/cat-nq74h7ek > > > > > - - - > > > > > > > > > > > > > > > Caused by: null > > > > > Caused by: > > > > > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > > > Caused by: > > > > > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > > > Can't > > > > > read key due to cryptography problems: > > > > > java.security.NoSuchAlgorithmException: Unsupported > > > passphrase > > > > > algorithm: AES-128-CBC > > > > > Progress: time: Tue, 09 Aug 2011 13:36:44 -0500 Selecting > > > site:7 > > > > > Submitting:1 Failed:2 > > > > > Exception in cat: > > > > > Arguments: [data.txt] > > > > > Host: ssh > > > > > Directory: > > > 001-catsn-ssh-20110809-1336-ohte788a/jobs/o/cat-oq74h7ek > > > > > - - - > > > > > > > > > > > > > > > Caused by: null > > > > > Caused by: > > > > > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > > > Caused by: > > > > > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > > > Can't > > > > > read key due to cryptography problems: > > > > > java.security.NoSuchAlgorithmException: Unsupported > > > passphrase > > > > > algorithm: AES-128-CBC > > > > > "error_log.log" 105L, 5770C > > > > > > > > > > > > > > > My auth.defaults reads: > > > > > > > > > > > > > > > login1.beagle.ci.uchicago.edu.type=key > > > > > login1.beagle.ci.uchicago.edu.username=achavez > > > > > > > > login1.beagle.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > > > > > > > > > > > login1.pads.ci.uchicago.edu.type=key > > > > > login1.pads.ci.uchicago.edu.username=achavez > > > > > > > > login1.pads.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > > > > > > > > > > > > > > > > > > > > > and it has been set to 600, I ommited the passphrase line, > > > but it is > > > > > there, and the passphrase is right because I just verified > > > it in two > > > > > ways: > > > > > 1) by logging to pads and beagle without providing a > > > password > > > > > 2) "changed" the password. I the "new" password is the > > > same as the > > > > > "old" one. > > > > > > > > > > sites.templates.xml: > > > > > > > > > > > > > > > > > > > > > > url="login1.pads.ci.uchicago.edu" > > > > > jobmanager="ssh"/> > > > > > > > url="login1.pads.ci.uchicago.edu" /> > > > > > 0 > > > > > /home/achavez/swiftwork > > > > > > > > > > > > > > > > > > > > > > > > > config file: > > > > > > > > > > wrapperlog.always.transfer=true > > > > > sitedir.keep=true > > > > > execution.retries=0 > > > > > lazy.errors=true > > > > > status.mode=provider > > > > > use.provider.staging=true > > > > > provider.staging.pin.swiftfiles=false > > > > > foreach.max.threads=10 > > > > > provenance.log=true > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I also tried a simpler SwiftScript: > > > > > > > > > > > > > > > type filemsg; > > > > > > > > > > > > > > > app (filemsg output) hello(string s) > > > > > { > > > > > echo s stdout=@filename(output); > > > > > } > > > > > > > > > > > > > > > filemsg myfile<"dogcatdinosaur.out">; > > > > > myfile = hello("dog,cat,dinosaur"); > > > > > > > > > > > > > > > and I get the following output: > > > > > > > > > > > > > > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > > > > > > > > > > > > > RunID: 20110809-1343-2es2hel2 > > > > > Progress: time: Tue, 09 Aug 2011 13:43:25 -0500 > > > > > Exception in echo: > > > > > Arguments: [dog,cat,dinosaur] > > > > > Host: ssh > > > > > Directory: > > > hello_swift-20110809-1343-2es2hel2/jobs/0/echo-0oldh7ek > > > > > - - - > > > > > > > > > > > > > > > Caused by: null > > > > > Caused by: > > > > > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > > > Caused by: > > > > > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > > > Can't > > > > > read key due to cryptography problems: > > > > > java.security.NoSuchAlgorithmException: Unsupported > > > passphrase > > > > > algorithm: AES-128-CBC > > > > > Final status: time: Tue, 09 Aug 2011 13:43:26 -0500 > > > Failed:1 > > > > > The following errors have occurred: > > > > > 1. Can't read key due to cryptography problems: > > > > > java.security.NoSuchAlgorithmException: Unsupported > > > passphrase > > > > > algorithm: AES-128-CBC > > > > > > > > > > > > > > > > > > > > > > > > > any thoughts on this? > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Aug 11 10:08:47 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 11 Aug 2011 10:08:47 -0500 (CDT) Subject: [Swift-devel] ssh test case on pads/beagle In-Reply-To: <2078636548.210814.1313071056159.JavaMail.root@zimbra.anl.gov> Message-ID: <1715921449.211141.1313075327075.JavaMail.root@zimbra.anl.gov> Alberto's ssh test now runs. It was failing because provider staging was specified in the -config file; that seemed to cause the error code 127. I did not go back and search for a message to that effect in the prior log Alberto sent, but we should, to see if it was reported in some reasonable fashion which could be presented more clearly to the user. We might want to check to ensure that provider staging is not specified for providers that can't support it. Is such a check feasible and sensible? Also, this case illustrates the benefit of having the properties settings (and -config overrides) echoed in the .log file. - Mike ----- Original Message ----- > From: "Michael Wilde" > To: "Alberto Chavez" > Cc: "Swift Devel" > Sent: Thursday, August 11, 2011 8:57:36 AM > Subject: Re: [Swift-devel] ssh test case on pads/beagle > Mihael, Ive never seen sites.xml entries showing up in the log - are > they supposed to be now? They are not in the log Alberto attached, nor > have I seen them in any other log yet. > > Can we log all the files mentioned in the command line report (the > first line of the log) right at the front, along with the source text? > Ie, script, tc, sites, and config? Ideally values for all of the > swift.properties? Ideally auth.defaults with suitable masking? 0.94 > feature? > > >> 2011-08-11 08:28:03,762-0500 DEBUG Loader arguments: > >> [001-catsn-ssh.swift, -tc.file, tc.template.data, -sites.file, > >> sites.template.xml, -config, cf] > > Alberto, stop by and we can try to debug this in person, as ssh > requires a fair bit of correct configuration to work. > > We need to look at the cf, sites.template.xml, and cf file. > > - Mike > > > ----- Original Message ----- > > > From: "Alberto Chavez" > To: "Mihael Hategan" > Cc: "Swift Devel" > Sent: Thursday, August 11, 2011 8:31:53 AM > Subject: Re: [Swift-devel] ssh test case on pads/beagle > > > Sure, attached are the output of stdout and stderror, and the log > generated by swift. > > > > > Subject: RE: [Swift-devel] ssh test case on pads/beagle > > From: hategan at mcs.anl.gov > > To: alberto_chavez at live.com > > CC: jonmon at mcs.anl.gov; swift-devel at ci.uchicago.edu > > Date: Thu, 11 Aug 2011 00:18:07 -0700 > > > > Can you post (a link to) the entire log file? Since it contains both > > the > > tc.data and sites.xml and the error, it's probably always better to > > post > > than individual snippets. > > > > On Thu, 2011-08-11 at 01:17 -0500, Alberto Chavez wrote: > > > Sure: > > > > > > > > > > > > > > > > > > 0 > > > /home/achavez/swiftwork > > > > > > > > > > > > > > > ______________________________________________________________________ > > > To: alberto_chavez at live.com > > > From: jonmon at mcs.anl.gov > > > CC: hategan at mcs.anl.gov; swift-devel at ci.uchicago.edu > > > Subject: Re: [Swift-devel] ssh test case on pads/beagle > > > Date: Wed, 10 Aug 2011 23:54:24 -0500 > > > > > > Could you post the sites file? > > > > > > ----- Reply message ----- > > > From: "Alberto Chavez" > > > Date: Wed, Aug 10, 2011 7:16 pm > > > Subject: [Swift-devel] ssh test case on pads/beagle > > > To: > > > Cc: "Mihael Hategan" , "Swift Devel" > > > > > > > > > > > > > > > Exit code "127" normally means that a particular function doesn't > > > exist. Are you sure that all those paths to apps exist? > > > > Yes, I doubled check that and those are the right paths to the > > > > apps. > > > > > > > > > Also, I am not sure if this is a problem but shouldn't there be a > > > third column in the app file? LIke > > > "ssh echo /bin/echo null null null" > > > > > > > > > > > > > > > Looking at the documentation for the transformation catalog, the > > > structure should be: > > > > > > site, transformation name, executable path, installation status, > > > platform, and profile entries. > > > > > > > > > > > > > > > > > > The installation status and platform fields are not used. Set them > > > to INSTALLED and INTEL32::LINUX respectively. > > > > > > The profiles field should be set to null if no profile entries are > > > to > > > be specified, or should contain the profile entries separated by > > > semicolons. > > > > > > > > > but even when I switch the columns to INSTALLED and INTEL32::LINUX > > > and > > > keep the profiles field set to null, I'm still getting the same > > > exit > > > code. > > > > > > > > > On Aug 10, 2011, at 6:41 PM, Alberto Chavez wrote: > > > > > > I changed my ssh-key, and they worked on the MCS machines > > > because the authorized_keys file has not been updated yet on > > > the CI Machines. > > > I created a new ssh-key using: > > > ssh-keygen -t rsa -b 2048 > > > exactly as the MCS site suggested, > > > On the other hand, I still have a problem, I am getting the > > > following error: > > > > > > > > > Swift svn swift-r4978 cog-r3226 > > > > > > > > > RunID: 20110810-1819-1cdo2o62 > > > Progress: time: Wed, 10 Aug 2011 18:19:42 -0500 > > > Exception in cat: > > > Arguments: [data.txt] > > > Host: ssh > > > Directory: > > > 001-catsn-ssh-20110810-1819-1cdo2o62/jobs/9/cat-9jd0g9ek > > > - - - > > > Caused by: null > > > Caused by: > > > org.globus.cog.abstraction.impl.common.execution.JobException: > > > Job failed with an exit code of 127 > > > Final status: time: Wed, 10 Aug 2011 18:20:00 -0500 > > > Failed:10 > > > The following errors have occurred: > > > 1. Job failed with an exit code of 127 (10 times) > > > > > > > > > > > > > > > These are the contents of the log: > > > > > > > > > Execution completed with errors > > > > > > > > > 2011-08-10 18:19:43,251-0500 INFO ConnectionProtocol Freeing > > > channel 0 [Unnamed Channel] > > > 2011-08-10 18:19:43,263-0500 INFO Exec Exit code 127 > > > 2011-08-10 18:19:43,269-0500 INFO ConnectionProtocol Freeing > > > channel 0 [Unnamed Channel] > > > 2011-08-10 18:19:43,277-0500 DEBUG vdl:execute2 > > > APPLICATION_EXCEPTION jobid=cat-9jd0g9ek - Application > > > exception: null > > > Caused by: > > > org.globus.cog.abstraction.impl.common.execution.JobException: > > > Job failed with an exit code of 127 > > > 2011-08-10 18:19:43,280-0500 INFO vdl:execute END_FAILURE > > > thread=0-5-3-1 tr=cat > > > 2011-08-10 18:19:43,281-0500 INFO vdl:execute Exception in > > > cat: > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254) > > > at > > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27) > > > at > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > at > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > at > > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > > at > > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > > at > > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > > at > > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > > at > > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > > at java.util.concurrent.Executors > > > $RunnableAdapter.call(Executors.java:471) > > > at java.util.concurrent.FutureTask > > > $Sync.innerRun(FutureTask.java:334) > > > at > > > java.util.concurrent.FutureTask.run(FutureTask.java:166) > > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > > at java.util.concurrent.ThreadPoolExecutor > > > $Worker.run(ThreadPoolExecutor.java:603) > > > at java.lang.Thread.run(Thread.java:636) > > > 2011-08-10 18:20:00,332-0500 INFO ExecutionContext Detailed > > > exception: > > > > > > > > > Execution completed with errors > > > > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254) > > > at > > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27) > > > at > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > at > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > at > > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > > at > > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > > at > > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > > at > > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > > at > > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > > at java.util.concurrent.Executors > > > $RunnableAdapter.call(Executors.java:471) > > > at java.util.concurrent.FutureTask > > > $Sync.innerRun(FutureTask.java:334) > > > at > > > java.util.concurrent.FutureTask.run(FutureTask.java:166) > > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > > at java.util.concurrent.ThreadPoolExecutor > > > $Worker.run(ThreadPoolExecutor.java:603) > > > at java.lang.Thread.run(Thread.java:636) > > > > > > I believe that the problem resides on the TC file because when > > > I run a much simpler SwiftScript like: > > > > > > > > > int i = 9; > > > trace(i); > > > > > > > > > I get the following output: > > > > > > > > > swift traceme.swift -tc.file tc.template.data > > > -sites.file sites.template.xml -config cf > > > Swift svn swift-r4978 cog-r3226 > > > > > > > > > RunID: 20110810-1832-buktjj3d > > > Progress: time: Wed, 10 Aug 2011 18:32:30 -0500 > > > SwiftScript trace: 9.0 > > > Final status: time: Wed, 10 Aug 2011 18:32:30 -0500 > > > > > > > > > but as soon as I start using the commands stated the TC file, > > > I get the "exit code 127" > > > > > > > > > My tc file reads: > > > > > > > > > ssh echo /bin/echo null null > > > ssh cat /bin/cat null null > > > ssh ls /bin/ls null null > > > ssh grep /bin/grep null null > > > ssh sort /bin/sort null null > > > ssh paste /bin/paste null null > > > ssh wc /usr/bin/wc null null > > > > > > > > > I am working on the login node of the MCS machine trying to > > > ssh via Swift to steamroller. > > > > > > > > > > Subject: Re: [Swift-devel] ssh test case on pads/beagle > > > > From: hategan at mcs.anl.gov > > > > To: alberto_chavez at live.com > > > > CC: swift-devel at ci.uchicago.edu > > > > Date: Tue, 9 Aug 2011 11:57:06 -0700 > > > > > > > > Hmm: Unsupported passphrase algorithm: AES-128-CBC > > > > > > > > I'll try to see how that can be fixed. In the mean time, can > > > you > > > > generate a new key pair with 3DES encryption instead and use > > > that? > > > > > > > > On Tue, 2011-08-09 at 13:43 -0500, Alberto Chavez wrote: > > > > > Hello, > > > > > > > > > > > > > > > I am trying to run a simpler case than ssh-pbs-coaster > > > test case, and > > > > > I'm still having the same error. > > > > > Now I am running only ssh test case > > > > > (/tests/providers/ssh/001-catsn-ssn.swift) > > > > > > > > > > > > > > > The command line is: > > > > > swift -config cf -tc.file tc.template.data -sites.file > > > > > sites.template.xml 001-catsn-ssh.swift > > > > > > > > > > > > > > > The output: > > > > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > > > > > > > > > > > > > RunID: 20110809-1336-ohte788a > > > > > Progress: time: Tue, 09 Aug 2011 13:36:42 -0500 > > > > > Exception in cat: > > > > > Arguments: [data.txt] > > > > > Host: ssh > > > > > Directory: > > > 001-catsn-ssh-20110809-1336-ohte788a/jobs/m/cat-mq74h7ek > > > > > - - - > > > > > > > > > > > > > > > Caused by: null > > > > > Caused by: > > > > > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: > > > Invalid private key or passphrase > > > > > Caused by: > > > > > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > > > Can't > > > > > read key due to cryptography problems: > > > > > java.security.NoSuchAlgorithmException: Unsupported > > > passphrase > > > > > algorithm: AES-128-CBC > > > > > Progress: time: Tue, 09 Aug 2011 13:36:43 -0500 Selecting > > > site:8 > > > > > Submitting:1 Failed:1 > > > > > Exception in cat: > > > > > Arguments: [data.txt] > > > > > Host: ssh > > > > > Directory: > > > 001-catsn-ssh-20110809-1336-ohte788a/jobs/n/cat-nq74h7ek > > > > > - - - > > > > > > > > > > > > > > > Caused by: null > > > > > Caused by: > > > > > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: > > > Invalid private key or passphrase > > > > > Caused by: > > > > > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > > > Can't > > > > > read key due to cryptography problems: > > > > > java.security.NoSuchAlgorithmException: Unsupported > > > passphrase > > > > > algorithm: AES-128-CBC > > > > > Progress: time: Tue, 09 Aug 2011 13:36:44 -0500 Selecting > > > site:7 > > > > > Submitting:1 Failed:2 > > > > > Exception in cat: > > > > > Arguments: [data.txt] > > > > > Host: ssh > > > > > Directory: > > > 001-catsn-ssh-20110809-1336-ohte788a/jobs/o/cat-oq74h7ek > > > > > - - - > > > > > > > > > > > > > > > Caused by: null > > > > > Caused by: > > > > > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: > > > Invalid private key or passphrase > > > > > Caused by: > > > > > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > > > Can't > > > > > read key due to cryptography problems: > > > > > java.security.NoSuchAlgorithmException: Unsupported > > > passphrase > > > > > algorithm: AES-128-CBC > > > > > "error_log.log" 105L, 5770C > > > > > > > > > > > > > > > My auth.defaults reads: > > > > > > > > > > > > > > > login1.beagle.ci.uchicago.edu.type=key > > > > > login1.beagle.ci.uchicago.edu.username=achavez > > > > > > > > login1.beagle.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > > > > > > > > > > > login1.pads.ci.uchicago.edu.type=key > > > > > login1.pads.ci.uchicago.edu.username=achavez > > > > > > > > login1.pads.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > > > > > > > > > > > > > > > > > > > > > and it has been set to 600, I ommited the passphrase line, > > > but it is > > > > > there, and the passphrase is right because I just verified > > > it in two > > > > > ways: > > > > > 1) by logging to pads and beagle without providing a > > > password > > > > > 2) "changed" the password. I the "new" password is the > > > same as the > > > > > "old" one. > > > > > > > > > > sites.templates.xml: > > > > > > > > > > > > > > > > > > > > > > url="login1.pads.ci.uchicago.edu" > > > > > jobmanager="ssh"/> > > > > > > > url="login1.pads.ci.uchicago.edu" /> > > > > > 0 > > > > > /home/achavez/swiftwork > > > > > > > > > > > > > > > > > > > > > > > > > config file: > > > > > > > > > > wrapperlog.always.transfer=true > > > > > sitedir.keep=true > > > > > execution.retries=0 > > > > > lazy.errors=true > > > > > status.mode=provider > > > > > use.provider.staging=true > > > > > provider.staging.pin.swiftfiles=false > > > > > foreach.max.threads=10 > > > > > provenance.log=true > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I also tried a simpler SwiftScript: > > > > > > > > > > > > > > > type filemsg; > > > > > > > > > > > > > > > app (filemsg output) hello(string s) > > > > > { > > > > > echo s stdout=@filename(output); > > > > > } > > > > > > > > > > > > > > > filemsg myfile<"dogcatdinosaur.out">; > > > > > myfile = hello("dog,cat,dinosaur"); > > > > > > > > > > > > > > > and I get the following output: > > > > > > > > > > > > > > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > > > > > > > > > > > > > RunID: 20110809-1343-2es2hel2 > > > > > Progress: time: Tue, 09 Aug 2011 13:43:25 -0500 > > > > > Exception in echo: > > > > > Arguments: [dog,cat,dinosaur] > > > > > Host: ssh > > > > > Directory: > > > hello_swift-20110809-1343-2es2hel2/jobs/0/echo-0oldh7ek > > > > > - - - > > > > > > > > > > > > > > > Caused by: null > > > > > Caused by: > > > > > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: > > > Invalid private key or passphrase > > > > > Caused by: > > > > > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > > > Can't > > > > > read key due to cryptography problems: > > > > > java.security.NoSuchAlgorithmException: Unsupported > > > passphrase > > > > > algorithm: AES-128-CBC > > > > > Final status: time: Tue, 09 Aug 2011 13:43:26 -0500 > > > Failed:1 > > > > > The following errors have occurred: > > > > > 1. Can't read key due to cryptography problems: > > > > > java.security.NoSuchAlgorithmException: Unsupported > > > passphrase > > > > > algorithm: AES-128-CBC > > > > > > > > > > > > > > > > > > > > > > > > > any thoughts on this? > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Thu Aug 11 10:17:16 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 11 Aug 2011 10:17:16 -0500 Subject: [Swift-devel] Persistent coasters running one job per worker In-Reply-To: <1312917410.3416.2.camel@blabla> References: <1312916367.2671.4.camel@blabla> <1312917410.3416.2.camel@blabla> Message-ID: On Tue, Aug 9, 2011 at 2:16 PM, Mihael Hategan wrote: > Ah! > > If the workers connect before the client does, then jobsPerNode does not > make it to the coaster service. > > I'll think about this. In the mean time, you could have the workers > started after the client sends its first job to the service. > I did this and it worked. Thanks Mihael. > > I'm thinking that maybe jobsPerNode should be a setting that the workers > themselves could be started with. > > On Tue, 2011-08-09 at 14:09 -0500, Ketan Maheshwari wrote: > > I do not see any recent log in ~/.globus/coasters. The stdout/err of > > the coaster service run is in the attached service.log and the > > coaster.log is in the attached swift.log. > > > > > > > > > > On Tue, Aug 9, 2011 at 1:59 PM, Mihael Hategan > > wrote: > > but but but I checked this, and it worked fine... > > > > Can you also post the coasters log (on the machine the coaster > > service > > is on, in ~/.globus/coasters)? > > > > > > On Tue, 2011-08-09 at 13:47 -0500, Ketan Maheshwari wrote: > > > Mihael, > > > > > > > > > I was discussing this with Justin and we thought you could > > help: > > > > > > > > > I am observing that persistent coasters are running one job > > per worker > > > as opposed to the number specified in jobspernode (I also > > tried > > > nodegranularity) on sites.xml. > > > > > > > > > Attaching the log, and the sites.xml for the run. Swift is > > 0.93 (Swift > > > svn swift-r4968 cog-r3225). > > > > > > > > > The script is Mike's catsnsleep that sleeps for 20s with > > n=10. > > > > > > -- > > > Ketan > > > > > > > > > > > > > > > > > > > > > > > -- > > Ketan > > > > > > > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidk at ci.uchicago.edu Thu Aug 11 12:45:28 2011 From: davidk at ci.uchicago.edu (David Kelly) Date: Thu, 11 Aug 2011 12:45:28 -0500 (CDT) Subject: [Swift-devel] Cogkit SVN access Message-ID: <1287204456.62270.1313084728219.JavaMail.root@zimbra-mb2.anl.gov> Hello, How can I request access to the cogkit SVN repo? I have a patch I'd like to apply that allows 0.93 to compile under Java 1.5. My sourceforge username is davidkelly999. Thanks, David From hategan at mcs.anl.gov Thu Aug 11 13:23:18 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 11 Aug 2011 11:23:18 -0700 Subject: [Swift-devel] ssh test case on pads/beagle In-Reply-To: References: <20110811045409.99852121A8@zimbra.anl.gov> , ,<1313047087.3215.6.camel@blabla> Message-ID: <1313086998.8503.0.camel@blabla> You have provider staging enabled, and ssh doesn't support that. I'll make sure it actually throws an exception instead of trying to run jobs with staging directives. On Thu, 2011-08-11 at 08:31 -0500, Alberto Chavez wrote: > Sure, attached are the output of stdout and stderror, and the log > generated by swift. > > > > Subject: RE: [Swift-devel] ssh test case on pads/beagle > > From: hategan at mcs.anl.gov > > To: alberto_chavez at live.com > > CC: jonmon at mcs.anl.gov; swift-devel at ci.uchicago.edu > > Date: Thu, 11 Aug 2011 00:18:07 -0700 > > > > Can you post (a link to) the entire log file? Since it contains both > the > > tc.data and sites.xml and the error, it's probably always better to > post > > than individual snippets. > > > > On Thu, 2011-08-11 at 01:17 -0500, Alberto Chavez wrote: > > > Sure: > > > > > > > > > > > > > > > > > > 0 > > > /home/achavez/swiftwork > > > > > > > > > > > > > > > > ______________________________________________________________________ > > > To: alberto_chavez at live.com > > > From: jonmon at mcs.anl.gov > > > CC: hategan at mcs.anl.gov; swift-devel at ci.uchicago.edu > > > Subject: Re: [Swift-devel] ssh test case on pads/beagle > > > Date: Wed, 10 Aug 2011 23:54:24 -0500 > > > > > > Could you post the sites file? > > > > > > ----- Reply message ----- > > > From: "Alberto Chavez" > > > Date: Wed, Aug 10, 2011 7:16 pm > > > Subject: [Swift-devel] ssh test case on pads/beagle > > > To: > > > Cc: "Mihael Hategan" , "Swift Devel" > > > > > > > > > > > > > > > Exit code "127" normally means that a particular function doesn't > > > exist. Are you sure that all those paths to apps exist? > > > > Yes, I doubled check that and those are the right paths to the > apps. > > > > > > > > > Also, I am not sure if this is a problem but shouldn't there be a > > > third column in the app file? LIke > > > "ssh echo /bin/echo null null null" > > > > > > > > > > > > > > > Looking at the documentation for the transformation catalog, the > > > structure should be: > > > > > > site, transformation name, executable path, installation status, > > > platform, and profile entries. > > > > > > > > > > > > > > > > > > The installation status and platform fields are not used. Set them > > > to INSTALLED and INTEL32::LINUX respectively. > > > > > > The profiles field should be set to null if no profile entries are > to > > > be specified, or should contain the profile entries separated by > > > semicolons. > > > > > > > > > but even when I switch the columns to INSTALLED and INTEL32::LINUX > and > > > keep the profiles field set to null, I'm still getting the same > exit > > > code. > > > > > > > > > On Aug 10, 2011, at 6:41 PM, Alberto Chavez wrote: > > > > > > I changed my ssh-key, and they worked on the MCS machines > > > because the authorized_keys file has not been updated yet on > > > the CI Machines. > > > I created a new ssh-key using: > > > ssh-keygen -t rsa -b 2048 > > > exactly as the MCS site suggested, > > > On the other hand, I still have a problem, I am getting the > > > following error: > > > > > > > > > Swift svn swift-r4978 cog-r3226 > > > > > > > > > RunID: 20110810-1819-1cdo2o62 > > > Progress: time: Wed, 10 Aug 2011 18:19:42 -0500 > > > Exception in cat: > > > Arguments: [data.txt] > > > Host: ssh > > > Directory: > > > 001-catsn-ssh-20110810-1819-1cdo2o62/jobs/9/cat-9jd0g9ek > > > - - - > > > Caused by: null > > > Caused by: > > > org.globus.cog.abstraction.impl.common.execution.JobException: > > > Job failed with an exit code of 127 > > > Final status: time: Wed, 10 Aug 2011 18:20:00 -0500 > > > Failed:10 > > > The following errors have occurred: > > > 1. Job failed with an exit code of 127 (10 times) > > > > > > > > > > > > > > > These are the contents of the log: > > > > > > > > > Execution completed with errors > > > > > > > > > 2011-08-10 18:19:43,251-0500 INFO ConnectionProtocol Freeing > > > channel 0 [Unnamed Channel] > > > 2011-08-10 18:19:43,263-0500 INFO Exec Exit code 127 > > > 2011-08-10 18:19:43,269-0500 INFO ConnectionProtocol Freeing > > > channel 0 [Unnamed Channel] > > > 2011-08-10 18:19:43,277-0500 DEBUG vdl:execute2 > > > APPLICATION_EXCEPTION jobid=cat-9jd0g9ek - Application > > > exception: null > > > Caused by: > > > org.globus.cog.abstraction.impl.common.execution.JobException: > > > Job failed with an exit code of 127 > > > 2011-08-10 18:19:43,280-0500 INFO vdl:execute END_FAILURE > > > thread=0-5-3-1 tr=cat > > > 2011-08-10 18:19:43,281-0500 INFO vdl:execute Exception in > > > cat: > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250) > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254) > > > at > > > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27) > > > at > > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > at > > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > at > > > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > > at > > > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > > at > > > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > > at > > > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > > at > > > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > > at java.util.concurrent.Executors > > > $RunnableAdapter.call(Executors.java:471) > > > at java.util.concurrent.FutureTask > > > $Sync.innerRun(FutureTask.java:334) > > > at > > > java.util.concurrent.FutureTask.run(FutureTask.java:166) > > > at > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > > at java.util.concurrent.ThreadPoolExecutor > > > $Worker.run(ThreadPoolExecutor.java:603) > > > at java.lang.Thread.run(Thread.java:636) > > > 2011-08-10 18:20:00,332-0500 INFO ExecutionContext Detailed > > > exception: > > > > > > > > > Execution completed with errors > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:250) > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:254) > > > at > > > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:27) > > > at > > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > at > > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > at > > > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > > at > > > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > > at > > > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > > at > > > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > > at > > > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > > at java.util.concurrent.Executors > > > $RunnableAdapter.call(Executors.java:471) > > > at java.util.concurrent.FutureTask > > > $Sync.innerRun(FutureTask.java:334) > > > at > > > java.util.concurrent.FutureTask.run(FutureTask.java:166) > > > at > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > > at java.util.concurrent.ThreadPoolExecutor > > > $Worker.run(ThreadPoolExecutor.java:603) > > > at java.lang.Thread.run(Thread.java:636) > > > > > > I believe that the problem resides on the TC file because when > > > I run a much simpler SwiftScript like: > > > > > > > > > int i = 9; > > > trace(i); > > > > > > > > > I get the following output: > > > > > > > > > swift traceme.swift -tc.file tc.template.data > > > -sites.file sites.template.xml -config cf > > > Swift svn swift-r4978 cog-r3226 > > > > > > > > > RunID: 20110810-1832-buktjj3d > > > Progress: time: Wed, 10 Aug 2011 18:32:30 -0500 > > > SwiftScript trace: 9.0 > > > Final status: time: Wed, 10 Aug 2011 18:32:30 -0500 > > > > > > > > > but as soon as I start using the commands stated the TC file, > > > I get the "exit code 127" > > > > > > > > > My tc file reads: > > > > > > > > > ssh echo /bin/echo null null > > > ssh cat /bin/cat null null > > > ssh ls /bin/ls null null > > > ssh grep /bin/grep null null > > > ssh sort /bin/sort null null > > > ssh paste /bin/paste null null > > > ssh wc /usr/bin/wc null null > > > > > > > > > I am working on the login node of the MCS machine trying to > > > ssh via Swift to steamroller. > > > > > > > > > > Subject: Re: [Swift-devel] ssh test case on pads/beagle > > > > From: hategan at mcs.anl.gov > > > > To: alberto_chavez at live.com > > > > CC: swift-devel at ci.uchicago.edu > > > > Date: Tue, 9 Aug 2011 11:57:06 -0700 > > > > > > > > Hmm: Unsupported passphrase algorithm: AES-128-CBC > > > > > > > > I'll try to see how that can be fixed. In the mean time, can > > > you > > > > generate a new key pair with 3DES encryption instead and use > > > that? > > > > > > > > On Tue, 2011-08-09 at 13:43 -0500, Alberto Chavez wrote: > > > > > Hello, > > > > > > > > > > > > > > > I am trying to run a simpler case than ssh-pbs-coaster > > > test case, and > > > > > I'm still having the same error. > > > > > Now I am running only ssh test case > > > > > (/tests/providers/ssh/001-catsn-ssn.swift) > > > > > > > > > > > > > > > The command line is: > > > > > swift -config cf -tc.file tc.template.data -sites.file > > > > > sites.template.xml 001-catsn-ssh.swift > > > > > > > > > > > > > > > The output: > > > > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > > > > > > > > > > > > > RunID: 20110809-1336-ohte788a > > > > > Progress: time: Tue, 09 Aug 2011 13:36:42 -0500 > > > > > Exception in cat: > > > > > Arguments: [data.txt] > > > > > Host: ssh > > > > > Directory: > > > 001-catsn-ssh-20110809-1336-ohte788a/jobs/m/cat-mq74h7ek > > > > > - - - > > > > > > > > > > > > > > > Caused by: null > > > > > Caused by: > > > > > > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > > > Caused by: > > > > > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > > > Can't > > > > > read key due to cryptography problems: > > > > > java.security.NoSuchAlgorithmException: Unsupported > > > passphrase > > > > > algorithm: AES-128-CBC > > > > > Progress: time: Tue, 09 Aug 2011 13:36:43 -0500 Selecting > > > site:8 > > > > > Submitting:1 Failed:1 > > > > > Exception in cat: > > > > > Arguments: [data.txt] > > > > > Host: ssh > > > > > Directory: > > > 001-catsn-ssh-20110809-1336-ohte788a/jobs/n/cat-nq74h7ek > > > > > - - - > > > > > > > > > > > > > > > Caused by: null > > > > > Caused by: > > > > > > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > > > Caused by: > > > > > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > > > Can't > > > > > read key due to cryptography problems: > > > > > java.security.NoSuchAlgorithmException: Unsupported > > > passphrase > > > > > algorithm: AES-128-CBC > > > > > Progress: time: Tue, 09 Aug 2011 13:36:44 -0500 Selecting > > > site:7 > > > > > Submitting:1 Failed:2 > > > > > Exception in cat: > > > > > Arguments: [data.txt] > > > > > Host: ssh > > > > > Directory: > > > 001-catsn-ssh-20110809-1336-ohte788a/jobs/o/cat-oq74h7ek > > > > > - - - > > > > > > > > > > > > > > > Caused by: null > > > > > Caused by: > > > > > > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > > > Caused by: > > > > > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > > > Can't > > > > > read key due to cryptography problems: > > > > > java.security.NoSuchAlgorithmException: Unsupported > > > passphrase > > > > > algorithm: AES-128-CBC > > > > > "error_log.log" 105L, 5770C > > > > > > > > > > > > > > > My auth.defaults reads: > > > > > > > > > > > > > > > login1.beagle.ci.uchicago.edu.type=key > > > > > login1.beagle.ci.uchicago.edu.username=achavez > > > > > > > > login1.beagle.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > > > > > > > > > > > login1.pads.ci.uchicago.edu.type=key > > > > > login1.pads.ci.uchicago.edu.username=achavez > > > > > > > > login1.pads.ci.uchicago.edu.key=/home/Alberto/.ssh/identity > > > > > > > > > > > > > > > > > > > > > > > > > and it has been set to 600, I ommited the passphrase line, > > > but it is > > > > > there, and the passphrase is right because I just verified > > > it in two > > > > > ways: > > > > > 1) by logging to pads and beagle without providing a > > > password > > > > > 2) "changed" the password. I the "new" password is the > > > same as the > > > > > "old" one. > > > > > > > > > > sites.templates.xml: > > > > > > > > > > > > > > > > > > > > > > url="login1.pads.ci.uchicago.edu" > > > > > jobmanager="ssh"/> > > > > > > > url="login1.pads.ci.uchicago.edu" /> > > > > > 0 > > > > > /home/achavez/swiftwork > > > > > > > > > > > > > > > > > > > > > > > > > config file: > > > > > > > > > > wrapperlog.always.transfer=true > > > > > sitedir.keep=true > > > > > execution.retries=0 > > > > > lazy.errors=true > > > > > status.mode=provider > > > > > use.provider.staging=true > > > > > provider.staging.pin.swiftfiles=false > > > > > foreach.max.threads=10 > > > > > provenance.log=true > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I also tried a simpler SwiftScript: > > > > > > > > > > > > > > > type filemsg; > > > > > > > > > > > > > > > app (filemsg output) hello(string s) > > > > > { > > > > > echo s stdout=@filename(output); > > > > > } > > > > > > > > > > > > > > > filemsg myfile<"dogcatdinosaur.out">; > > > > > myfile = hello("dog,cat,dinosaur"); > > > > > > > > > > > > > > > and I get the following output: > > > > > > > > > > > > > > > Swift svn swift-r4861 (swift modified locally) cog-r3183 > > > > > > > > > > > > > > > RunID: 20110809-1343-2es2hel2 > > > > > Progress: time: Tue, 09 Aug 2011 13:43:25 -0500 > > > > > Exception in echo: > > > > > Arguments: [dog,cat,dinosaur] > > > > > Host: ssh > > > > > Directory: > > > hello_swift-20110809-1343-2es2hel2/jobs/0/echo-0oldh7ek > > > > > - - - > > > > > > > > > > > > > > > Caused by: null > > > > > Caused by: > > > > > > > > > org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Invalid private key or passphrase > > > > > Caused by: > > > > > > > > com.sshtools.j2ssh.transport.publickey.InvalidSshKeyException: > > > Can't > > > > > read key due to cryptography problems: > > > > > java.security.NoSuchAlgorithmException: Unsupported > > > passphrase > > > > > algorithm: AES-128-CBC > > > > > Final status: time: Tue, 09 Aug 2011 13:43:26 -0500 > > > Failed:1 > > > > > The following errors have occurred: > > > > > 1. Can't read key due to cryptography problems: > > > > > java.security.NoSuchAlgorithmException: Unsupported > > > passphrase > > > > > algorithm: AES-128-CBC > > > > > > > > > > > > > > > > > > > > > > > > > any thoughts on this? > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > > > > From hategan at mcs.anl.gov Thu Aug 11 13:50:43 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 11 Aug 2011 11:50:43 -0700 Subject: [Swift-devel] Call function. In-Reply-To: References: <84DC8B4A-4324-46AD-929C-2ADA0EFBDA62@hawaga.org.uk> Message-ID: <1313088643.8503.21.camel@blabla> On Thu, 2011-08-11 at 18:15 +0530, Yadu Nand wrote: > > That's moving a big jump away from compile time type checking: you can't check the return types if you don't know anything about the function to call. > > Does that matter for Swift? Its nice to find errors before you embark on a long run. But the strongly-typed-ness of swift doesn't otherwise seem too useful. > Well, What I plan on doing is the first string passed to a "call" function > will need to be a function identifier and as we translate to karajan lookup > the type of the function and ensure that the return and input args match. > I haven't gotten there yet, I'm still arm-twisting the parser to accept the > new syntax. Right. Well, Ben nails it here. > > > Do you need general string based invocation? Where are you getting these strings from? > Well, I don't understand if it makes a difference. Its easier with strings, > because we then just need to pass them on to executeElement which > now accepts the string identifier of a procedure. A string is not a function. It's as simple as that. One important quality of strong typing is that values of a type don't magically transform into values of another type. So a string shouldn't mean a bunch of characters in one context and a function in others. So that means that we need function types. These will have to look like signatures without actual bodies: (file b) proc(file a) mycat; That's somewhat clear. What is unclear is how (and what) we assign to mycat. Given a standard cat (matching the above signature), we could have: mycat = cat; But the issue there is that now variables and procedures appear to live in the same namespace. So the semantics of the following code are unclear: int f = 2; (int r) f(int i) {...}; x = f; What is assigned to x? There are three resolutions I can think of: 1. Keep them in the same namespace, treat all procs as if they were equivalent to name = signature {body}. Disallow variables and procedures with the same name. 2. Do not keep them in the same namespace and consider the namespace implicit in the type. So if I assign to a non-procedure type then it's a normal variable, and if I assign to (or use in the context of) a procedure type, then it's a procedure. This could be confusing to a user and it requires one to look at the context of an expression to determine its type, which complicates the compiler code. 3. Have some special keyword that indicates the procedure namespace: myfn= proc f (or myfn = proc(f) or myfn = proc:f). From hategan at mcs.anl.gov Thu Aug 11 13:54:12 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 11 Aug 2011 11:54:12 -0700 Subject: [Swift-devel] Cogkit SVN access In-Reply-To: <1287204456.62270.1313084728219.JavaMail.root@zimbra-mb2.anl.gov> References: <1287204456.62270.1313084728219.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: <1313088852.8503.24.camel@blabla> I gave you access to the cog svn, but the question is whether we still care about java 1.5. Are there any machines that haven't moved to 1.6? On the other hand, it could be argued that the cost of keeping 1.5 compatibility is relatively low, so we might as well just do it. On Thu, 2011-08-11 at 12:45 -0500, David Kelly wrote: > Hello, > > How can I request access to the cogkit SVN repo? I have a patch I'd like to apply that allows 0.93 to compile under Java 1.5. My sourceforge username is davidkelly999. > > Thanks, > David > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From benc at hawaga.org.uk Thu Aug 11 14:09:01 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 11 Aug 2011 19:09:01 +0000 (GMT) Subject: [Swift-devel] Call function. In-Reply-To: <1313088643.8503.21.camel@blabla> References: <84DC8B4A-4324-46AD-929C-2ADA0EFBDA62@hawaga.org.uk> <1313088643.8503.21.camel@blabla> Message-ID: > 1. Keep them in the same namespace, treat all procs as if they were > equivalent to name = signature {body}. Disallow variables and procedures > with the same name. This is what both C and Haskell do, very roughly. C has a distinction between functions directly defined, and functions being referenced (by a pointer) - the name of the function invoked as f(x); in the first case, and (*f)(x) in the second case. Haskell treats them the same: f x The other two options you give look ugly to me. But this project is not about abstract PL research/development - what makes this stuff easier to use? I'm inclined to think some function type (rather than strings) does, (but I'm very into compile time safety so that's unsurprising) but I'm not sure if its worth the effort for what I think is a fairly restricted use case. -- From davidk at ci.uchicago.edu Thu Aug 11 14:25:55 2011 From: davidk at ci.uchicago.edu (David Kelly) Date: Thu, 11 Aug 2011 14:25:55 -0500 (CDT) Subject: [Swift-devel] Cogkit SVN access In-Reply-To: <1313088852.8503.24.camel@blabla> Message-ID: <633281416.62517.1313090755025.JavaMail.root@zimbra-mb2.anl.gov> Thanks Mihael. I think there are a few systems still out there running 1.5, but not many. Intrepid runs 1.5 by default, but I think you can modify it with softenv. There was another machine called sisboombah that only had 1.5. I don't have any strong feelings about it one way or another, but it's a probably a good idea to come up with a list of what is supported (version, IBM java, openjdk?) and add it to the list of things we test before a new release. David ----- Original Message ----- > From: "Mihael Hategan" > To: "David Kelly" > Cc: swift-devel at ci.uchicago.edu > Sent: Thursday, August 11, 2011 1:54:12 PM > Subject: Re: [Swift-devel] Cogkit SVN access > I gave you access to the cog svn, but the question is whether we still > care about java 1.5. Are there any machines that haven't moved to 1.6? > > On the other hand, it could be argued that the cost of keeping 1.5 > compatibility is relatively low, so we might as well just do it. > > On Thu, 2011-08-11 at 12:45 -0500, David Kelly wrote: > > Hello, > > > > How can I request access to the cogkit SVN repo? I have a patch I'd > > like to apply that allows 0.93 to compile under Java 1.5. My > > sourceforge username is davidkelly999. > > > > Thanks, > > David > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Thu Aug 11 14:40:02 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 11 Aug 2011 12:40:02 -0700 Subject: [Swift-devel] Call function. In-Reply-To: References: <84DC8B4A-4324-46AD-929C-2ADA0EFBDA62@hawaga.org.uk> <1313088643.8503.21.camel@blabla> Message-ID: <1313091602.9185.5.camel@blabla> On Thu, 2011-08-11 at 19:09 +0000, Ben Clifford wrote: > > 1. Keep them in the same namespace, treat all procs as if they were > > equivalent to name = signature {body}. Disallow variables and procedures > > with the same name. > > This is what both C and Haskell do, very roughly. I'd argue that C does a bit of both the above and x = proc f (x = &f). And let's not take the credit from ML because it did what Haskell does before there was a Haskell. > > C has a distinction between functions directly defined, and functions > being referenced (by a pointer) - the name of the function invoked as > f(x); in the first case, and (*f)(x) in the second case. Haskell treats > them the same: f x > > The other two options you give look ugly to me. > > But this project is not about abstract PL research/development - what > makes this stuff easier to use? x = proc f. The name clash I think would be annoying. Inference (i.e. context dependent meanings) are not intuitive. > > I'm inclined to think some function type (rather than strings) does, (but > I'm very into compile time safety so that's unsurprising) but I'm not sure > if its worth the effort for what I think is a fairly restricted use case. > It's restricted because it's not implemented. But we have no (easy) way of knowing how much it will actually be used should it be there. From wozniak at mcs.anl.gov Thu Aug 11 16:06:37 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Thu, 11 Aug 2011 16:06:37 -0500 (CDT) Subject: [Swift-devel] Cogkit SVN access In-Reply-To: <633281416.62517.1313090755025.JavaMail.root@zimbra-mb2.anl.gov> References: <633281416.62517.1313090755025.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: Putting this in the release notes is a good idea. The BG/P and Cray both have 1.6. Justin On Thu, 11 Aug 2011, David Kelly wrote: > Thanks Mihael. > > I think there are a few systems still out there running 1.5, but not > many. Intrepid runs 1.5 by default, but I think you can modify it with > softenv. There was another machine called sisboombah that only had 1.5. > I don't have any strong feelings about it one way or another, but it's a > probably a good idea to come up with a list of what is supported > (version, IBM java, openjdk?) and add it to the list of things we test > before a new release. > > David > > ----- Original Message ----- >> From: "Mihael Hategan" >> To: "David Kelly" >> Cc: swift-devel at ci.uchicago.edu >> Sent: Thursday, August 11, 2011 1:54:12 PM >> Subject: Re: [Swift-devel] Cogkit SVN access >> I gave you access to the cog svn, but the question is whether we still >> care about java 1.5. Are there any machines that haven't moved to 1.6? >> >> On the other hand, it could be argued that the cost of keeping 1.5 >> compatibility is relatively low, so we might as well just do it. >> >> On Thu, 2011-08-11 at 12:45 -0500, David Kelly wrote: >>> Hello, >>> >>> How can I request access to the cogkit SVN repo? I have a patch I'd >>> like to apply that allows 0.93 to compile under Java 1.5. My >>> sourceforge username is davidkelly999. >>> >>> Thanks, >>> David >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -- Justin M Wozniak From jonmon at mcs.anl.gov Thu Aug 11 17:13:37 2011 From: jonmon at mcs.anl.gov (=?utf-8?B?Sm9uYXRoYW4gTW9uZXR0ZQ==?=) Date: Thu, 11 Aug 2011 17:13:37 -0500 Subject: [Swift-devel] =?utf-8?q?Cogkit_SVN_access?= Message-ID: <20110811221323.C0F05126F4@zimbra.anl.gov> I don't have access to BG/P but I know on Beagle there was an issue with the IBMs Java. It was throwing an EOFException and you needed to install Suns jre. Ketan reported this awhile back and I experienced the issue as well. I do not know if this problem has been resolved though. ----- Reply message ----- From: "Justin M Wozniak" Date: Thu, Aug 11, 2011 4:06 pm Subject: [Swift-devel] Cogkit SVN access To: "David Kelly" Cc: Putting this in the release notes is a good idea. The BG/P and Cray both have 1.6. Justin On Thu, 11 Aug 2011, David Kelly wrote: > Thanks Mihael. > > I think there are a few systems still out there running 1.5, but not > many. Intrepid runs 1.5 by default, but I think you can modify it with > softenv. There was another machine called sisboombah that only had 1.5. > I don't have any strong feelings about it one way or another, but it's a > probably a good idea to come up with a list of what is supported > (version, IBM java, openjdk?) and add it to the list of things we test > before a new release. > > David > > ----- Original Message ----- >> From: "Mihael Hategan" >> To: "David Kelly" >> Cc: swift-devel at ci.uchicago.edu >> Sent: Thursday, August 11, 2011 1:54:12 PM >> Subject: Re: [Swift-devel] Cogkit SVN access >> I gave you access to the cog svn, but the question is whether we still >> care about java 1.5. Are there any machines that haven't moved to 1.6? >> >> On the other hand, it could be argued that the cost of keeping 1.5 >> compatibility is relatively low, so we might as well just do it. >> >> On Thu, 2011-08-11 at 12:45 -0500, David Kelly wrote: >>> Hello, >>> >>> How can I request access to the cogkit SVN repo? I have a patch I'd >>> like to apply that allows 0.93 to compile under Java 1.5. My >>> sourceforge username is davidkelly999. >>> >>> Thanks, >>> David >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -- Justin M Wozniak _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadudoc1729 at gmail.com Fri Aug 12 07:25:51 2011 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Fri, 12 Aug 2011 17:55:51 +0530 Subject: [Swift-devel] Call function. In-Reply-To: <1313091602.9185.5.camel@blabla> References: <84DC8B4A-4324-46AD-929C-2ADA0EFBDA62@hawaga.org.uk> <1313088643.8503.21.camel@blabla> <1313091602.9185.5.camel@blabla> Message-ID: Hi, Do all procedures have the same default namespace in swift ? (not considering imports) If that is the case, the reason for having functions passed to other functions almost goes away. I can't imagine any scenario in which we might need that. What remains is the functional style iterators like map and reduce, which by convention use functions passed to it. Is this the style we need ? int sum1 = call ( (int)func1(int[ ]) , [1,2,3] ); (I can't help but say that if we gave a bit of freedom for call to ignore the strong typed'ness of swift, It could try some really cool things, just saying, thats all) >> I'm inclined to think some function type (rather than strings) does, (but >> I'm very into compile time safety so that's unsurprising) but I'm not sure >> if its worth the effort for what I think is a fairly restricted use case. Well, how else would we do the map - reduce style functionality. Sure, we can probably do map by writing a separate procedure for applying a function over every item in a list, but I think map is easier (and cooler!) > It's restricted because it's not implemented. But we have no (easy) way > of knowing how much it will actually be used should it be there. :) In erlang you can write code to do map, reduce and fold, but users like these functions which help them save probably a couple more lines of code. -- Thanks and Regards, Yadu Nand B From benc at hawaga.org.uk Fri Aug 12 07:46:43 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 12 Aug 2011 14:46:43 +0200 Subject: [Swift-devel] Call function. In-Reply-To: References: <84DC8B4A-4324-46AD-929C-2ADA0EFBDA62@hawaga.org.uk> <1313088643.8503.21.camel@blabla> <1313091602.9185.5.camel@blabla> Message-ID: On Aug 12, 2011, at 2:25 PM, Yadu Nand wrote: > Do all procedures have the same default namespace in swift ? > (not considering imports) I don't really understand what you mean by that question. There's only one function namespace at the moment. VDS/VDL, the predecessor to swift, had more namespace structure, but nothing has really driven that to happen in swift. (something that might drive that, for example, could be people wanting to write libraries in swift (rather than libraries to use *with* swift, but written in a different language) - so maybe they'll appear in swift too, one day) > Sure, we > can probably do map by writing a separate procedure for applying a function > over every item in a list, but I think map is easier (and cooler!) You can do a map now using foreach, without writing a separate procedure. that was one of the original "interesting things" that swift did beyond VDL. hategan posted recently about using foreach to iterate "sequentially" over data (meaning a value in the output array can depend on everything to the left of it) which looks like it could do a lot of fold-like stuff too. (in a thread about getting rid of iterate). (That needed the ability to access "the previous" element - when you're numbering your output array with an integer, thats easy: here-1. But when you're using 'auto', then that doesn't work (and the meaning of "the predecessor" is not immediately apparent in the case of 'auto' - there are a few different things it could mean)) What you can't do at the moment is use some function identity (be it a string or be it some richer function reference) - and given that, I find your comment: > If that is the case, the reason for having > functions passed to other functions almost goes away. I can't > imagine any scenario in which we might need that. a bit perplexing because I thought that throwing functions around as values was exactly what you wanted to do? -- From yadudoc1729 at gmail.com Fri Aug 12 08:34:11 2011 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Fri, 12 Aug 2011 19:04:11 +0530 Subject: [Swift-devel] Call function. In-Reply-To: References: <84DC8B4A-4324-46AD-929C-2ADA0EFBDA62@hawaga.org.uk> <1313088643.8503.21.camel@blabla> <1313091602.9185.5.camel@blabla>

Message-ID: > There's only one function namespace at the moment. Yes, that answers the question. > You can do a map now using foreach, without writing a separate procedure. that was one of the original "interesting things" that swift did beyond VDL. > hategan posted recently about using foreach to iterate "sequentially" over data (meaning a value in the output array can depend on everything to the left of it) which looks like it could do a lot of fold-like stuff too. (in a thread about getting rid of iterate). > (That needed the ability to access "the previous" element - when you're numbering your output array with an integer, thats easy: here-1. But when you're using 'auto', then that doesn't work (and the meaning of "the predecessor" is not immediately apparent in the case of 'auto' - there are a few different things it could mean)) So we don't need functional iterators in swift ? Why was this on the gsoc-ideas list ? I was under the impression that functional iterators will be useful in some way :( > What you can't do at the moment is use some function identity (be it a string or be it some richer function reference) - and given that, I find your comment: > >> ?If that is the case, the reason for having >> functions passed to other functions almost goes away. I can't >> imagine any scenario in which we might need that. > > > a bit perplexing because I thought that throwing functions around as values was exactly what you wanted to do? If all the functions existed in the same namespace (or if there is no concept of namespace), we don't need to pass functions to other functions, do we ? What I understand is that, we pass function-a to function-b so that function-a becomes available under function-b. -- Thanks and Regards, Yadu Nand B From hategan at mcs.anl.gov Fri Aug 12 10:15:19 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 12 Aug 2011 08:15:19 -0700 Subject: [Swift-devel] Call function. In-Reply-To: References: <84DC8B4A-4324-46AD-929C-2ADA0EFBDA62@hawaga.org.uk> <1313088643.8503.21.camel@blabla> <1313091602.9185.5.camel@blabla>

Message-ID: <1313162119.15746.7.camel@blabla> On Fri, 2011-08-12 at 19:04 +0530, Yadu Nand wrote: > If all the functions existed in the same namespace (or if there is no > concept of namespace), > we don't need to pass functions to other functions, do we ? What I > understand is that, we > pass function-a to function-b so that function-a becomes available > under function-b. > No. You use first class functions of type T to be able to write a generic function G that can use functions of type T without being tied to a specific function of type T (say F) at the time you write G. In other words, you can produce a G that works with a class of functions (T) rather than a specific function. In a sense you are correct. You don't really need it. You can always copy-and-paste the body of G and manually replace calls to F. But that seems to be a crappy way to do things. From benc at hawaga.org.uk Fri Aug 12 11:39:57 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 12 Aug 2011 16:39:57 +0000 (GMT) Subject: [Swift-devel] Call function. In-Reply-To: <1313162119.15746.7.camel@blabla> References: <84DC8B4A-4324-46AD-929C-2ADA0EFBDA62@hawaga.org.uk> <1313088643.8503.21.camel@blabla> <1313091602.9185.5.camel@blabla>

<1313162119.15746.7.camel@blabla> Message-ID: What would be needed to write 'map' in swift(script) for use as a library function by other swift code? (rather than writing it in karajan or java) It might look something like this: (standby for bleeding eyes on the first line) (X out[]) map( (X)f(Y), Y inp[]) { foreach v,i in inp { out[i] = f(v); // or equivalently out[i] = f( inp[i] ); } } The above adds syntax for passing a function f which takes a value of type Y and returns a value of type X. f is invoked by juxtaposition rather than by an explicit call, though I think that is irrelevant for this message. But for map to work for arbitrary 1-d arrays, X and Y need to be type variables of some kind, not actual concrete types. We haven't discussed that at all in this thread, but I think it would be needed to do the above kind of thing. That's nothing particularly fancy though its another shift away from fortran-era types - C++ templates, java generics can both express this in some form, as can haskell. Comments? -- http://www.hawaga.org.uk/ben/ From hategan at mcs.anl.gov Fri Aug 12 11:48:50 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 12 Aug 2011 09:48:50 -0700 Subject: [Swift-devel] Call function. In-Reply-To: References: <84DC8B4A-4324-46AD-929C-2ADA0EFBDA62@hawaga.org.uk> <1313088643.8503.21.camel@blabla> <1313091602.9185.5.camel@blabla> Message-ID: <1313167730.15746.13.camel@blabla> On Fri, 2011-08-12 at 17:55 +0530, Yadu Nand wrote: > >> I'm inclined to think some function type (rather than strings) does, (but > >> I'm very into compile time safety so that's unsurprising) but I'm not sure > >> if its worth the effort for what I think is a fairly restricted use case. > Well, how else would we do the map - reduce style functionality. Sure, we > can probably do map by writing a separate procedure for applying a function > over every item in a list, but I think map is easier (and cooler!) We have to distinguish between "map" as it appears in standard functional languages (i.e. map(S -> T, [S]) -> [T]) and "map" as in Google map/reduce function signature (i.e. map(K1, T1) -> [(K2, T2)]). The semantics of the functional map can be achieved in swift using foreach. From hategan at mcs.anl.gov Fri Aug 12 12:11:38 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 12 Aug 2011 10:11:38 -0700 Subject: [Swift-devel] Call function. In-Reply-To: